Documentation Index
Fetch the complete documentation index at: https://docs.agentium.in/llms.txt
Use this file to discover all available pages before exploring further.
Reranking
What is reranking?
Vector search uses a bi-encoder: the query and each document are embedded separately into a fixed vector, then ranked by cosine similarity. It scales to billions of documents but loses fine-grained relevance because the encoders never see the query and document together.
A reranker uses a cross-encoder: it scores each (query, document) pair jointly. This is far more accurate but ~100x more expensive, so you run it only on the top candidates from the bi-encoder.
The standard two-stage retrieval pipeline:
┌─────────────────────┐
user query ───▶ │ Vector / BM25 search │ ─▶ top 30 candidates
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Reranker │ ─▶ top 5 final results
└─────────────────────┘
Empirical wins published by Cohere, Voyage, Jina, and the ColBERTv2/PLAID papers are 10–30% improvement in nDCG@10 over vector-only retrieval.
The Reranker interface
All four built-in providers implement the same interface so you can swap them freely.
interface Reranker {
/** Provider identifier for logs/telemetry (e.g. "cohere", "voyage"). */
readonly providerId: string;
/**
* Reorder documents by their relevance to `query`.
* Returns results sorted by score descending.
*/
rerank(
query: string,
documents: RerankDocument[],
options?: RerankOptions,
): Promise<RerankResult[]>;
}
RerankDocument
Accepts either a plain string OR an object with optional id + metadata:
type RerankDocument =
| string
| {
id?: string; // preserved through the reranker
content: string; // the text to score
metadata?: Record<string, unknown>; // arbitrary payload, returned unchanged
};
RerankOptions
interface RerankOptions {
/** Maximum results to return. If omitted, returns all reranked docs. */
topK?: number;
/** Drop any result with `score < minScore`. */
minScore?: number;
}
RerankResult
interface RerankResult {
index: number; // original index in the input array
score: number; // relevance score, higher = more relevant
content: string; // doc text as fed to the reranker
id?: string; // copied from input if it had one
metadata?: Record<string, unknown>; // copied from input
}
The index field is the most important detail — it lets you trace each result back to the original input array without comparing strings.
Built-in providers
CohereReranker
import { CohereReranker } from "@agentium/core";
const reranker = new CohereReranker({
apiKey: process.env.COHERE_API_KEY, // defaults to COHERE_API_KEY env
model: "rerank-v3.5", // defaults to "rerank-v3.5"
});
Requires: npm install cohere-ai (optional peer dep).
Model options:
| Model | Languages | Notes |
|---|
rerank-v3.5 | 100+ | Default. Best balance of quality + cost. |
rerank-multilingual-v3.5 | 100+ | Optimized for non-English. |
rerank-english-v3.5 | English only | Slightly faster on English. |
Retry behavior: automatic on HTTP 429 / 500 / 502 / 503 with exponential backoff (1s → 2s → fail), up to 2 retries.
VoyageReranker
import { VoyageReranker } from "@agentium/core";
const reranker = new VoyageReranker({
apiKey: process.env.VOYAGE_API_KEY, // defaults to VOYAGE_API_KEY env
model: "rerank-2", // defaults to "rerank-2"
baseURL: "https://api.voyageai.com/v1", // override if you self-host
});
No SDK install required — uses the global fetch API directly. Throws Error("VoyageReranker: missing API key") if neither apiKey nor VOYAGE_API_KEY env is set.
Model options:
| Model | Notes |
|---|
rerank-2 | Default. General purpose. |
rerank-2-lite | ~5x cheaper, slightly lower recall. |
JinaReranker
import { JinaReranker } from "@agentium/core";
const reranker = new JinaReranker({
apiKey: process.env.JINA_API_KEY,
model: "jina-reranker-v2-base-multilingual", // default
baseURL: "https://api.jina.ai/v1",
});
No SDK install required. Same fetch-based pattern as Voyage.
ColbertReranker (local, no API key)
import { ColbertReranker } from "@agentium/core";
const reranker = new ColbertReranker({
model: "Xenova/ms-marco-MiniLM-L-6-v2", // default - small + fast
prewarm: true, // load model on construction
});
Requires: npm install @xenova/transformers (optional peer dep).
Runs a HuggingFace cross-encoder model entirely in process via WASM/ONNX. The first call after construction lazy-loads the model (~50MB download for MiniLM-L-6-v2); subsequent calls are local-only.
Important: the default MiniLM-L-6-v2 is a classic cross-encoder, not true ColBERT v2 late interaction. For production-grade ColBERT (~3x better quality, similar latency), point this at a dedicated endpoint such as JinaAI ColBERT or self-host ColBERTv2/PLAID. The class name “ColbertReranker” refers to the role (late-interaction reranker), not the model itself.
Wiring into a vector store
Every VectorStore in @agentium/core accepts a rerank option:
import { CohereReranker, InMemoryVectorStore, OpenAIEmbedding } from "@agentium/core";
const embedder = new OpenAIEmbedding();
const store = new InMemoryVectorStore(embedder);
const reranker = new CohereReranker();
const results = await store.search("docs", "Tell me about cats.", {
topK: 5,
rerank: reranker,
rerankMultiplier: 3, // fetch 5*3=15 candidates, rerank down to 5
});
How rerankMultiplier works
When a reranker is set:
- The vector backend fetches
topK * rerankMultiplier candidates from the underlying ANN index.
- The reranker scores each one against the original query.
- The reranker returns the top
topK by its own score.
rerankMultiplier defaults to 3. Larger values give the reranker more candidates to choose from (better recall) at the cost of latency + reranker tokens. topK=5, rerankMultiplier=10 is a sensible “high quality” setting.
Query types the reranker sees
The reranker requires a text query. The vector backend hands it whatever it can extract:
| Original query | What the reranker gets |
|---|
string | the string verbatim |
ContentPart[] with at least one text part | concatenated text parts joined with spaces |
ContentPart[] with no text (e.g. image-only) | reranker is skipped, vector ranking is used |
number[] (precomputed vector) | reranker is skipped, vector ranking is used |
This matters for multimodal indexes: if you want reranking on an image query, supply a text caption alongside the image part.
Backend-by-backend behavior
All four built-in backends call the same BaseVectorStore.applyRerank() chokepoint, so behavior is identical:
InMemoryVectorStore — fetches topK * multiplier from the local cosine ranking.
PgVectorStore — adjusts the SQL LIMIT to the larger fetch size; doesn’t apply minScore until after rerank.
QdrantVectorStore — sets limit: fetchK and omits score_threshold when reranker is set (rerank handles thresholding).
MongoDBVectorStore — applies to both the Atlas $vectorSearch path and the in-process brute-force fallback.
minScore interaction
When you pass minScore with rerank:
await store.search("docs", query, { topK: 5, minScore: 0.7, rerank });
The threshold is applied by the reranker, not by the vector backend, because the two score distributions are completely different (cosine 0–1 vs Cohere relevance scores typically 0–10).
Standalone usage
A reranker also works without a vector store, e.g. to reorder a BM25 candidate list or to score a set of LLM-generated options:
const ranked = await reranker.rerank(
"Which big cat lives in Asia?",
[
{ id: "1", content: "Tigers roam Asian forests." },
{ id: "2", content: "Lions live in African savannahs." },
{ id: "3", content: "Snow leopards inhabit the Himalayas." },
],
{ topK: 2 },
);
// ranked[0] => { index: 0, score: ~0.92, id: "1", content: "Tigers..." }
Composing rerankers
You can stack rerankers cheaply by calling them in sequence:
// Stage 2a: fast lite reranker narrows 100 -> 30
const stage2a = await new VoyageReranker({ model: "rerank-2-lite" })
.rerank(query, candidates, { topK: 30 });
// Stage 2b: expensive top-tier reranker scores the final 30 -> 5
const stage2b = await new CohereReranker().rerank(
query,
stage2a.map((r) => ({ id: r.id, content: r.content })),
{ topK: 5 },
);
| Provider | Latency (50 docs) | Cost per 1K reranks | Notes |
|---|
Cohere rerank-v3.5 | ~200ms | $1.00 (Cohere pricing) | HTTPS round-trip + model |
Voyage rerank-2 | ~250ms | $0.50 | Comparable quality |
Voyage rerank-2-lite | ~150ms | $0.05 | Great for large batches |
Jina jina-reranker-v2 | ~300ms | $0.10 | Multilingual focus |
Local MiniLM-L-6-v2 | ~50ms per doc, batched | free | First call: 50MB model download |
(Numbers are rough; benchmark your own workload.)
Errors and edge cases
| Situation | Behavior |
|---|
Empty documents array | Returns [] immediately without calling the API |
apiKey missing AND env var missing | Constructor succeeds; first rerank() call throws "missing API key" |
cohere-ai not installed (Cohere provider) | Constructor throws "cohere-ai is required..." with install hint |
| HTTP 429 / 500 / 502 / 503 | Auto-retry up to 2 times with exponential backoff |
| HTTP 400 / 401 / 403 / 404 | Throws immediately (no retry) |
Reranker returns more results than topK | Truncated to topK |
See also