Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentium.in/llms.txt

Use this file to discover all available pages before exploring further.

Reranking

What is reranking?

Vector search uses a bi-encoder: the query and each document are embedded separately into a fixed vector, then ranked by cosine similarity. It scales to billions of documents but loses fine-grained relevance because the encoders never see the query and document together. A reranker uses a cross-encoder: it scores each (query, document) pair jointly. This is far more accurate but ~100x more expensive, so you run it only on the top candidates from the bi-encoder. The standard two-stage retrieval pipeline:
                   ┌─────────────────────┐
   user query ───▶ │ Vector / BM25 search │ ─▶ top 30 candidates
                   └─────────────────────┘


                   ┌─────────────────────┐
                   │      Reranker        │ ─▶ top 5 final results
                   └─────────────────────┘
Empirical wins published by Cohere, Voyage, Jina, and the ColBERTv2/PLAID papers are 10–30% improvement in nDCG@10 over vector-only retrieval.

The Reranker interface

All four built-in providers implement the same interface so you can swap them freely.
interface Reranker {
  /** Provider identifier for logs/telemetry (e.g. "cohere", "voyage"). */
  readonly providerId: string;

  /**
   * Reorder documents by their relevance to `query`.
   * Returns results sorted by score descending.
   */
  rerank(
    query: string,
    documents: RerankDocument[],
    options?: RerankOptions,
  ): Promise<RerankResult[]>;
}

RerankDocument

Accepts either a plain string OR an object with optional id + metadata:
type RerankDocument =
  | string
  | {
      id?: string;                          // preserved through the reranker
      content: string;                      // the text to score
      metadata?: Record<string, unknown>;   // arbitrary payload, returned unchanged
    };

RerankOptions

interface RerankOptions {
  /** Maximum results to return. If omitted, returns all reranked docs. */
  topK?: number;

  /** Drop any result with `score < minScore`. */
  minScore?: number;
}

RerankResult

interface RerankResult {
  index: number;                          // original index in the input array
  score: number;                          // relevance score, higher = more relevant
  content: string;                        // doc text as fed to the reranker
  id?: string;                            // copied from input if it had one
  metadata?: Record<string, unknown>;     // copied from input
}
The index field is the most important detail — it lets you trace each result back to the original input array without comparing strings.

Built-in providers

CohereReranker

import { CohereReranker } from "@agentium/core";

const reranker = new CohereReranker({
  apiKey: process.env.COHERE_API_KEY,  // defaults to COHERE_API_KEY env
  model: "rerank-v3.5",                // defaults to "rerank-v3.5"
});
Requires: npm install cohere-ai (optional peer dep). Model options:
ModelLanguagesNotes
rerank-v3.5100+Default. Best balance of quality + cost.
rerank-multilingual-v3.5100+Optimized for non-English.
rerank-english-v3.5English onlySlightly faster on English.
Retry behavior: automatic on HTTP 429 / 500 / 502 / 503 with exponential backoff (1s → 2s → fail), up to 2 retries.

VoyageReranker

import { VoyageReranker } from "@agentium/core";

const reranker = new VoyageReranker({
  apiKey: process.env.VOYAGE_API_KEY,   // defaults to VOYAGE_API_KEY env
  model: "rerank-2",                    // defaults to "rerank-2"
  baseURL: "https://api.voyageai.com/v1", // override if you self-host
});
No SDK install required — uses the global fetch API directly. Throws Error("VoyageReranker: missing API key") if neither apiKey nor VOYAGE_API_KEY env is set. Model options:
ModelNotes
rerank-2Default. General purpose.
rerank-2-lite~5x cheaper, slightly lower recall.

JinaReranker

import { JinaReranker } from "@agentium/core";

const reranker = new JinaReranker({
  apiKey: process.env.JINA_API_KEY,
  model: "jina-reranker-v2-base-multilingual",  // default
  baseURL: "https://api.jina.ai/v1",
});
No SDK install required. Same fetch-based pattern as Voyage.

ColbertReranker (local, no API key)

import { ColbertReranker } from "@agentium/core";

const reranker = new ColbertReranker({
  model: "Xenova/ms-marco-MiniLM-L-6-v2", // default - small + fast
  prewarm: true,                          // load model on construction
});
Requires: npm install @xenova/transformers (optional peer dep). Runs a HuggingFace cross-encoder model entirely in process via WASM/ONNX. The first call after construction lazy-loads the model (~50MB download for MiniLM-L-6-v2); subsequent calls are local-only. Important: the default MiniLM-L-6-v2 is a classic cross-encoder, not true ColBERT v2 late interaction. For production-grade ColBERT (~3x better quality, similar latency), point this at a dedicated endpoint such as JinaAI ColBERT or self-host ColBERTv2/PLAID. The class name “ColbertReranker” refers to the role (late-interaction reranker), not the model itself.

Wiring into a vector store

Every VectorStore in @agentium/core accepts a rerank option:
import { CohereReranker, InMemoryVectorStore, OpenAIEmbedding } from "@agentium/core";

const embedder = new OpenAIEmbedding();
const store = new InMemoryVectorStore(embedder);
const reranker = new CohereReranker();

const results = await store.search("docs", "Tell me about cats.", {
  topK: 5,
  rerank: reranker,
  rerankMultiplier: 3,        // fetch 5*3=15 candidates, rerank down to 5
});

How rerankMultiplier works

When a reranker is set:
  1. The vector backend fetches topK * rerankMultiplier candidates from the underlying ANN index.
  2. The reranker scores each one against the original query.
  3. The reranker returns the top topK by its own score.
rerankMultiplier defaults to 3. Larger values give the reranker more candidates to choose from (better recall) at the cost of latency + reranker tokens. topK=5, rerankMultiplier=10 is a sensible “high quality” setting.

Query types the reranker sees

The reranker requires a text query. The vector backend hands it whatever it can extract:
Original queryWhat the reranker gets
stringthe string verbatim
ContentPart[] with at least one text partconcatenated text parts joined with spaces
ContentPart[] with no text (e.g. image-only)reranker is skipped, vector ranking is used
number[] (precomputed vector)reranker is skipped, vector ranking is used
This matters for multimodal indexes: if you want reranking on an image query, supply a text caption alongside the image part.

Backend-by-backend behavior

All four built-in backends call the same BaseVectorStore.applyRerank() chokepoint, so behavior is identical:
  • InMemoryVectorStore — fetches topK * multiplier from the local cosine ranking.
  • PgVectorStore — adjusts the SQL LIMIT to the larger fetch size; doesn’t apply minScore until after rerank.
  • QdrantVectorStore — sets limit: fetchK and omits score_threshold when reranker is set (rerank handles thresholding).
  • MongoDBVectorStore — applies to both the Atlas $vectorSearch path and the in-process brute-force fallback.

minScore interaction

When you pass minScore with rerank:
await store.search("docs", query, { topK: 5, minScore: 0.7, rerank });
The threshold is applied by the reranker, not by the vector backend, because the two score distributions are completely different (cosine 0–1 vs Cohere relevance scores typically 0–10).

Standalone usage

A reranker also works without a vector store, e.g. to reorder a BM25 candidate list or to score a set of LLM-generated options:
const ranked = await reranker.rerank(
  "Which big cat lives in Asia?",
  [
    { id: "1", content: "Tigers roam Asian forests." },
    { id: "2", content: "Lions live in African savannahs." },
    { id: "3", content: "Snow leopards inhabit the Himalayas." },
  ],
  { topK: 2 },
);
// ranked[0] => { index: 0, score: ~0.92, id: "1", content: "Tigers..." }

Composing rerankers

You can stack rerankers cheaply by calling them in sequence:
// Stage 2a: fast lite reranker narrows 100 -> 30
const stage2a = await new VoyageReranker({ model: "rerank-2-lite" })
  .rerank(query, candidates, { topK: 30 });

// Stage 2b: expensive top-tier reranker scores the final 30 -> 5
const stage2b = await new CohereReranker().rerank(
  query,
  stage2a.map((r) => ({ id: r.id, content: r.content })),
  { topK: 5 },
);

Performance characteristics

ProviderLatency (50 docs)Cost per 1K reranksNotes
Cohere rerank-v3.5~200ms$1.00 (Cohere pricing)HTTPS round-trip + model
Voyage rerank-2~250ms$0.50Comparable quality
Voyage rerank-2-lite~150ms$0.05Great for large batches
Jina jina-reranker-v2~300ms$0.10Multilingual focus
Local MiniLM-L-6-v2~50ms per doc, batchedfreeFirst call: 50MB model download
(Numbers are rough; benchmark your own workload.)

Errors and edge cases

SituationBehavior
Empty documents arrayReturns [] immediately without calling the API
apiKey missing AND env var missingConstructor succeeds; first rerank() call throws "missing API key"
cohere-ai not installed (Cohere provider)Constructor throws "cohere-ai is required..." with install hint
HTTP 429 / 500 / 502 / 503Auto-retry up to 2 times with exponential backoff
HTTP 400 / 401 / 403 / 404Throws immediately (no retry)
Reranker returns more results than topKTruncated to topK

See also