Documentation Index
Fetch the complete documentation index at: https://docs.agentium.in/llms.txt
Use this file to discover all available pages before exploring further.
Semantic Tool Selection
The problem
When an agent has many tools — common with multiple MCP servers connected, or skill bundles — passing every tool definition to the model on every turn causes three problems:- Prompt bloat. 100 tools × ~80 tokens each = 8K tokens of pure tool definitions per turn, before the user’s message even gets to the model.
- Tool confusion. Models pick worse tools when more options are available (a phenomenon Cohere and Anthropic have both documented).
- Slower generation. More input tokens means more time-to-first-token.
SemanticToolSelector solves this by embedding each tool’s name + description once on init, then picking the top-K most relevant tools per user turn.
How it works
Quick start
Configuration
embedder
Any EmbeddingProvider works. For tool selection you want:
- Cheap and fast — picks happen every turn.
- Decent on short text — tool names + 1-2 sentence descriptions.
text-embedding-3-small is the sweet spot. text-embedding-3-large is overkill; gemini-embedding-2 works too but is slower.
reranker
Optional second pass. Useful when top-K from the bi-encoder still has too many irrelevant tools.
reranker is set:
- Bi-encoder scores all indexed tools by cosine similarity to the query.
- Top
topK * rerankMultipliertools are passed to the reranker. - Reranker scores each (query, tool description) pair and returns the top
topK.
topK and rerankMultiplier
| Setting | Result |
|---|---|
topK: 3, no reranker | Smallest prompt; risk missing useful tools if descriptions are noisy. |
topK: 8, no reranker | Safe default for ~50 tools. |
topK: 5, reranker, rerankMultiplier: 4 | Highest quality; ~250ms extra latency. |
topK: 20+ | Diminishing returns; just pass everything. |
API
indexTools(tools: ToolDef[])
Embeds each tool’s name: description string in parallel. Async; await before the first select() call.
- At agent boot
- Whenever the tool set changes (e.g. an MCP server connects)
- Not on every turn
select(query: string, options?: { topK?: number }): Promise<ToolDef[]>
Returns a shortened ToolDef[] ready to drop into a new Agent.
options.topK overrides the constructor default for this call.
Behavior:
- Returns
[]ifindexToolshasn’t been called. - Empty query — still returns the closest
topK(the embedding of""is rarely useful but doesn’t error). - Same tool indexed twice — both copies returned independently; dedupe at construction time.
size
Wire into a per-request agent
Pairs naturally withAgentFactory:
Tips
- Always include critical tools unconditionally.
handoff,approval,pollResult,getArtifactshould always be in the agent’s toolset regardless of semantic match. - Tool descriptions matter more than names. “fetch the temperature for a city” beats “weather_api_v3”.
- Reindex after dynamic tool registration. MCP connections happen async; call
indexToolsagain after a successful connect. - Cache the index. If your tool set is stable across processes, persist the embeddings to disk and re-hydrate on boot.
Performance characteristics
| Scenario | Latency | Tokens saved |
|---|---|---|
| 100 tools indexed, 5 selected, no reranker | ~15ms (one embed) | ~7,000 input tokens per turn |
| 100 tools indexed, 5 selected, with Cohere rerank | ~250ms (embed + rerank) | ~7,000 input tokens per turn |
| 20 tools, just pass them all | 0ms extra | (selector adds overhead, skip it) |
See also
- Reranking — the optional second pass
- Tool Polish — pair with
inputExamplesfor better selection accuracy - AgentFactory — for per-request agents