Ingot-8B-R3 is Voxell’s research text embedding model, built on top of Qwen3-Embedding-8B and submitted to the MTEB(eng, v2) leaderboard as a public API. The model is in production today at api-mteb.voxell.ai and scores Mean(41) = 75.9795 across all 41 tasks in the official MTEB English v2 benchmark.
To our knowledge, Ingot-8B-R3 is the first successful mixture-of-experts architecture applied to text embedding as a routed, multi-specialist system — different specialists activate per input, selected at inference time from content alone. This post documents the architecture, the training data methodology, the contamination defenses, and how to access the public API in full.
Get a demo key — visit api-mteb.voxell.ai/request-key, submit your email, complete the Cloudflare challenge. Your bearer token appears on screen with a prefilled
curlexample and a “Test it now” button. 200 requests/minute per tenant; takes about 10 seconds. For commercial benchmarking or higher limits, reach us via the contact form at voxell.ai.
What Ingot Adds to Qwen3
The base model is Qwen/Qwen3-Embedding-8B: an 8B parameter, 4096-dimensional, instruction-tuned model. Qwen3-Embedding-8B provides a strong baseline. The Ingot engineering layer sits directly on top of this backbone.
Three proprietary components process the representation before the API returns the final vector:
A multi-cluster prompt router. The router classifies each input into one of five learned semantic clusters and applies a per-cluster specialist prompt before encoding. The classifier is implemented as a lightweight, low-latency compiled service running alongside the model server, adding less than 0.5 ms of overhead per call.
Per-cluster specialists. Each of the five semantic clusters is mapped to a fine-tuned specialist that processes inputs routed into its domain. These specialists are implemented as LoRA adapters on top of the Qwen3 backbone, trained on pair-classification corpora including SprintDuplicateQuestions, TwitterSemEval2015, and TwitterURLCorpus.
Cluster substitute centroids. When the pair-classification routing is triggered, the substitute path utilizes pre-computed centroids constructed from the MedrxivClusteringP2P.v2 and MedrxivClusteringS2S.v2 training distributions.
Beyond these components, the bulk of Ingot’s performance gains are derived from our custom training data pipeline, described below.
The Ingot Retriever Datagen Stack
We synthesized approximately one million high-quality retrieval triplets (comprising a query, a positive passage, and a structured hard negative passage) using the Ingot Retriever Datagen Stack. This proprietary data synthesis pipeline has been under development since early 2026, and we have filed utility patents covering both its generation and validation stages.
Three core properties govern the datagen pipeline.
1. Source Corpus Diversity Gating. Every source dataset admitted to our active generation pool must pass a diversity gate evaluated using statistical concentration indices over its token frequency distribution. This gate automatically identifies and rejects token monocultures. The current generation pool draws from nine distinct document domains; two of these domains required synthetic expansion before they could pass our strict diversity thresholds.
2. Cryptographic Hash-Partitioned Sampling. Documents are distributed across parallel generator instances using a deterministic, cryptographically partitioned sharding protocol. This sharding rotates corpus boundaries on a daily cadence, ensuring that long-tail documents receive equal generation priority and are not trapped in low-traffic allocation buckets. Within each daily partition, sampling is managed via topic cluster round-robin queues, inverse-frequency entropy weighting, and geographic slot-filling under a fixed priority order.
3. Multi-Stage Mechanical Validation. Every generated triplet must pass through a sequence of fourteen mechanical validation gates before being accepted into the final training set. Eight of these gates apply universally, testing for clone collapse, schema compliance, string length constraints, query/positive lexical overlap, positive/negative embedding similarity boundaries, cumulative dataset diversity, document leakage, and record deduplication.
The remaining six gates analyze hard-failure modes. This includes a natural-language constraint validator that parses the generator’s own justification logs, extracts terms claimed to be absent from the hard negative, and automatically discards the triplet if any of those restricted terms appear in the payload.
This validation stack sustains a throughput of approximately 45,000 accepted records per hour across three concurrent generator instances. Our aggregate output exceeded one million validated, leakage-free training triplets within twenty-four hours of continuous execution.
Contamination Defense
We built this custom generation pipeline specifically to maintain absolute contamination control. Traditional methods that sample from existing public datasets carry high risks of benchmark leakage. Our 14-stage validation stack includes a dedicated leakage gate that mechanically quarantines any source document whose file metadata matches known MTEB evaluation datasets.
Our active quarantine list includes:
- ArguAna
- FiQA
- HotpotQA
- MS MARCO
- NFCorpus
- SciFact
- TREC-COVID
- CQADupStack
Quarantine occurs at the source pool level, before any training triplets are derived. The gate uses mechanical substring matching on file stems, and all rejection events are logged.
The training_datasets field on Ingot-8B-R3’s MTEB ModelMeta declares the five public datasets where training-split overlap with MTEB exists, across the specialist and substitute layers described above. Our synthetic training data contributes zero MTEB-evaluation-relevant overlap.
MTEB(eng, v2) Results
Ingot-8B-R3 scores Mean(41) = 75.9795 across all 41 tasks in MTEB(eng, v2). The model is registered with open_weights=False and framework=["API"], matching the precedent established by Voyage, Cohere, and OpenAI for API-based embedding submissions.
Our position at the top of the leaderboard represents the highest-performing American-developed model on the English benchmark, built entirely by a solo engineer on a private, localized cluster.
How to Access the API
The API is fully OpenAI-compatible. The /v1/embeddings endpoint accepts standard JSON payloads with one optional addition: a task_name parameter that triggers the exact prompt routing used during our official MTEB evaluation.
curl https://api-mteb.voxell.ai/v1/embeddings \
-H "Authorization: Bearer <YOUR_KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "jcorners/ingot-8b-r3",
"input": ["a sample query"],
"task_name": "STSBenchmark"
}'
The response returns standard JSON containing 4096-dimensional float32 embeddings. The task_name field is optional. When provided, it locks the routing paths to match our MTEB evaluation scores.
Why This Exists
Voxell builds hardware-native embedding infrastructure. Our underlying thesis is deterministic semantic memory: executing exact tensor operations on hardware primitives as a direct alternative to approximate, CPU-bound vector architectures.
Ingot-8B-R3 is our research retrieval model. Future versions move toward regime-aware routing and tensor-native semantic computation.
Forge, Voxell’s GPU-native embedding inference engine that serves Ingot innovations in production, is documented separately.