The world's only zero trust embedding engine.
87ms. Three quality tiers. No API key to leak.
Every model runs on the same proprietary CUDA engine. OpenAI-compatible API. Choose per request — no provisioning, no cold starts.
- Dimensions 1024
- Best for Fast, precise RAG
- Dimensions 2560
- Best for Postgres PGVector
- Dimensions 4096
- Best for Maximum fidelity
I can drop a document into ChatGPT and it works fine. Why do I need this?
For a one-off question about a single document, yes — ChatGPT works. You paste it in, ask a question, get an answer in 5–10 seconds. That's fine for personal use if you don't mind the wait.
It breaks down the moment you need to do this at scale. A support team with 50,000 articles. A legal team searching across 10 years of contracts. A product that needs to answer user questions from a live knowledge base. You can't paste 50,000 documents into a chat window.
This is why vector embeddings are booming. They let you pre-process your entire corpus into searchable meaning — once — and then retrieve the right passages in milliseconds when a question comes in. The LLM only sees the most relevant context, not everything. It's faster, cheaper, more accurate, and it actually scales.
The catch: Stanford researchers have identified what they call "Semantic Collapse" — the point where retrieval-augmented generation breaks down at scale. At 1,000 documents, RAG systems hit ~85% accuracy. At 10,000, accuracy drops to ~45%. At 50,000, it's ~22%. The system doesn't gradually degrade — it collapses. And the root cause isn't the LLM. It's the retrieval layer. The embedding model surfaces the wrong passages, and the LLM confidently answers from the wrong source.
That's what Forge is for. Not the one-off question — the system that needs to answer thousands of questions correctly, every day, from a corpus that keeps growing. Better embeddings are the difference between a RAG system that works at demo scale and one that works in production.
What is a vector embedding?
A vector embedding is a list of numbers that represents the meaning of a piece of text. The word "king" might become [0.21, -0.87, 0.44, ...] — a point in high-dimensional space.
The key insight: texts with similar meaning land near each other. "How do I reset my password?" and "I forgot my login credentials" produce vectors that are close together, even though they share almost no words. This is what makes semantic search, RAG pipelines, and recommendation systems work — you search by meaning, not keywords.
Dimensions (like 1024 or 4096) are how many numbers are in each vector. More dimensions capture finer distinctions in meaning, but cost more to store and compare. Forge lets you choose the right tradeoff per workload.
Why do embeddings matter for RAG?
In retrieval-augmented generation (RAG), an LLM answers questions using documents you provide. But it can only use the documents you retrieve — and retrieval quality depends entirely on your embeddings.
If your embedding model misses a relevant paragraph, the LLM never sees it. If it retrieves the wrong paragraph, the LLM confidently answers from the wrong source. Embedding quality is the ceiling on your RAG system's accuracy. Better embeddings mean the LLM gets the right context more often, which means better answers with fewer hallucinations.
What is MTEB and why does it matter?
MTEB (Massive Text Embedding Benchmark) is the standard benchmark for evaluating embedding models. It tests across dozens of real-world tasks — retrieval, classification, clustering, semantic similarity — and produces a single aggregate score.
It matters because it's the closest thing to an objective answer to "how good is this embedding model?" A model that scores 75 on MTEB retrieves relevant documents more accurately than one scoring 70 — and in a RAG pipeline, that difference directly affects the quality of your LLM's output.
MTEB uses metrics like nDCG (normalized discounted cumulative gain) under the hood to measure retrieval quality. We'll publish a deep dive on nDCG and how Forge's models perform across MTEB subtasks soon.
Don't take our word for it
20% off annual plans for the first 100 customers.
Lock in founding pricing before general availability.
See plansUpgrade your embeddings in under 60 seconds.
No signup required. Paste text, get vectors, see the difference.
or Try the playground →