Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentium.in/llms.txt

Use this file to discover all available pages before exploring further.

Semantic Tool Selection

The problem

When an agent has many tools — common with multiple MCP servers connected, or skill bundles — passing every tool definition to the model on every turn causes three problems:
  1. Prompt bloat. 100 tools × ~80 tokens each = 8K tokens of pure tool definitions per turn, before the user’s message even gets to the model.
  2. Tool confusion. Models pick worse tools when more options are available (a phenomenon Cohere and Anthropic have both documented).
  3. Slower generation. More input tokens means more time-to-first-token.
SemanticToolSelector solves this by embedding each tool’s name + description once on init, then picking the top-K most relevant tools per user turn.

How it works

                                  ┌──────────────────────────────────┐
   Tools at init  ────────────────│ embedder.embedBatch              │ ─▶ one vector per tool, cached
                                  └──────────────────────────────────┘

   User turn ────▶ embedder.embed(query) ─▶ cosine vs cached tool vectors ─▶ top-K (optionally reranked)
The cost per turn is one embed call (~10ms) instead of thousands of extra prompt tokens.

Quick start

import { Agent, openai, OpenAIEmbedding, SemanticToolSelector } from "@agentium/core";

const selector = new SemanticToolSelector({
  embedder: new OpenAIEmbedding({ model: "text-embedding-3-small" }),
  topK: 5,
});

await selector.indexTools(allMyTools); // 100+ tools

// Per turn:
const shortlist = await selector.select(userInput);
const agent = new Agent({ name: "scoped", model: openai("gpt-4o"), tools: shortlist });

Configuration

interface SemanticToolSelectorConfig {
  embedder: EmbeddingProvider;    // required — see Embeddings doc for choices
  reranker?: Reranker;            // optional — for tighter selection
  topK?: number;                  // default 10
  rerankMultiplier?: number;      // default 3 — only used when reranker is set
}

embedder

Any EmbeddingProvider works. For tool selection you want:
  • Cheap and fast — picks happen every turn.
  • Decent on short text — tool names + 1-2 sentence descriptions.
text-embedding-3-small is the sweet spot. text-embedding-3-large is overkill; gemini-embedding-2 works too but is slower.

reranker

Optional second pass. Useful when top-K from the bi-encoder still has too many irrelevant tools.
import { CohereReranker } from "@agentium/core";

const selector = new SemanticToolSelector({
  embedder: new OpenAIEmbedding(),
  reranker: new CohereReranker(),
  topK: 5,
  rerankMultiplier: 4,            // fetch top-20, rerank to top-5
});
When reranker is set:
  1. Bi-encoder scores all indexed tools by cosine similarity to the query.
  2. Top topK * rerankMultiplier tools are passed to the reranker.
  3. Reranker scores each (query, tool description) pair and returns the top topK.

topK and rerankMultiplier

SettingResult
topK: 3, no rerankerSmallest prompt; risk missing useful tools if descriptions are noisy.
topK: 8, no rerankerSafe default for ~50 tools.
topK: 5, reranker, rerankMultiplier: 4Highest quality; ~250ms extra latency.
topK: 20+Diminishing returns; just pass everything.

API

indexTools(tools: ToolDef[])

Embeds each tool’s name: description string in parallel. Async; await before the first select() call.
await selector.indexTools(allTools);
console.log(`Indexed ${selector.size} tools`); // 100
Call this:
  • At agent boot
  • Whenever the tool set changes (e.g. an MCP server connects)
  • Not on every turn

select(query: string, options?: { topK?: number }): Promise<ToolDef[]>

Returns a shortened ToolDef[] ready to drop into a new Agent.
const tools = await selector.select("What's the weather in Tokyo?", { topK: 3 });
options.topK overrides the constructor default for this call. Behavior:
  • Returns [] if indexTools hasn’t been called.
  • Empty query — still returns the closest topK (the embedding of "" is rarely useful but doesn’t error).
  • Same tool indexed twice — both copies returned independently; dedupe at construction time.

size

selector.size // number of indexed tools

Wire into a per-request agent

Pairs naturally with AgentFactory:
const selector = new SemanticToolSelector({ embedder, topK: 5 });
await selector.indexTools(allTools);

app.post("/chat", async (req, res) => {
  const shortlist = await selector.select(req.body.input);
  const agent = factory.create({
    tenantId: req.user.tenant,
    userId: req.user.id,
  });
  agent.setTools([...alwaysOn, ...shortlist]); // mix critical + selected
  const result = await agent.run(req.body.input);
  res.json(result);
});

Tips

  • Always include critical tools unconditionally. handoff, approval, pollResult, getArtifact should always be in the agent’s toolset regardless of semantic match.
  • Tool descriptions matter more than names. “fetch the temperature for a city” beats “weather_api_v3”.
  • Reindex after dynamic tool registration. MCP connections happen async; call indexTools again after a successful connect.
  • Cache the index. If your tool set is stable across processes, persist the embeddings to disk and re-hydrate on boot.

Performance characteristics

ScenarioLatencyTokens saved
100 tools indexed, 5 selected, no reranker~15ms (one embed)~7,000 input tokens per turn
100 tools indexed, 5 selected, with Cohere rerank~250ms (embed + rerank)~7,000 input tokens per turn
20 tools, just pass them all0ms extra(selector adds overhead, skip it)

See also