Documentation Index
Fetch the complete documentation index at: https://docs.xhipai.com/llms.txt
Use this file to discover all available pages before exploring further.
Performance
Agentium is built for minimal overhead — fewer tokens, faster responses, lower cost. This page covers the key optimizations and benchmark results.
Benchmark Results
All benchmarks use gpt-4o-mini, identical prompts, and 5 runs per scenario. Agentium and LangChain run on Node.js; Agno runs on Python.
Simple Completion
| Metric | Agentium | LangChain | Agno |
|---|
| Startup (ms) | 171 | 301 | 2730 |
| Avg Response (ms) | 769 | 737 | 2077 |
| Avg Prompt Tokens | 28 | 28 | 28 |
| Avg Total Tokens | 35 | 35 | 35 |
| Avg Cost / Run | $0.000008 | $0.000008 | $0.000008 |
| Metric | Agentium | LangChain | Agno |
|---|
| Avg Response (ms) | 1617 | 1678 | 3064 |
| Avg Prompt Tokens | 167 | 167 | 173 |
| Avg Total Tokens | 196 | 196 | 202 |
| Avg Cost / Run | $0.000042 | $0.000042 | $0.000043 |
Agentium and LangChain produce identical tool schemas (167 prompt tokens). Agentium strips verbose JSON Schema metadata ($schema, additionalProperties) to keep schemas compact.
Multi-turn Memory
| Metric | Agentium | LangChain | Agno |
|---|
| Avg Response (ms) | 2408 | 2324 | 5892 |
| Avg Prompt Tokens | 189 | 309 | 94 |
| Avg Completion Tokens | 30 | 57 | 66 |
| Avg Total Tokens | 219 | 366 | 160 |
| Avg Cost / Run | $0.000046 | $0.000081 | $0.000054 |
Agentium uses 39% fewer prompt tokens and 43% less cost than LangChain for multi-turn conversations. LangChain injects heavier system prompts and history formatting overhead.
Summary
| Scenario | Fastest | Fewest Tokens | Cheapest |
|---|
| Simple Completion | LangChain (737ms) | Tied (35) | Tied |
| Tool Calling | Agentium (1617ms) | Agentium (196) | Tied |
| Multi-turn Memory | LangChain (2324ms) | Agno (160) | Agentium ($0.000046) |
Agentium is the fastest for tool calling, the cheapest for multi-turn conversations, and matches LangChain on tool schema efficiency. Response latency is within noise across simple completions.
Optimizations
Tool definitions (Zod-to-JSON Schema conversion) are computed once at agent construction and cached. Verbose JSON Schema metadata ($schema, additionalProperties, description on the root object) is stripped automatically — reducing token overhead without losing semantic information.
const agent = new Agent({
name: "bot",
model: openai("gpt-4o"),
tools: [weatherTool, searchTool],
});
For OpenAI models, tools can opt into strict mode for guaranteed valid JSON output:
const weatherTool = defineTool({
name: "getWeather",
description: "Get weather for a city",
parameters: z.object({ city: z.string() }),
execute: async ({ city }) => `Sunny in ${city}`,
strict: true, // enables OpenAI Structured Outputs on this tool
});
Automatic Retry
Transient LLM API failures are automatically retried with exponential backoff + jitter. Retryable errors include HTTP 429, 5xx, and network errors.
const agent = new Agent({
name: "reliable-bot",
model: openai("gpt-4o"),
retry: {
maxRetries: 5,
initialDelayMs: 1000,
maxDelayMs: 30000,
},
});
Default: 3 retries, 500ms initial delay, 10s max delay.
Token-Based History Trimming
Set maxContextTokens to automatically trim conversation history (oldest messages first) to fit within a token budget:
const agent = new Agent({
name: "bot",
model: openai("gpt-4o"),
maxContextTokens: 8000,
});
Non-Blocking User Memory
When userMemory is configured, fact extraction runs asynchronously in the background after the response is returned. This eliminates 500-1000ms+ of latency per request.
Smart Context Deduplication
When userMemory.asTool() is registered in the agent’s tools, user facts are not also injected into the system prompt. The agent retrieves facts on demand via the tool, saving tokens.
Streaming Usage Tracking
Token usage (promptTokens, completionTokens, totalTokens, reasoningTokens) is accurately tracked in both run() and stream() modes. Stream usage is accumulated from provider finish chunks.
Methodology
- All benchmarks use
gpt-4o-mini with identical prompts
- Each scenario runs 5 times; results are averaged
- Startup time measures framework import + agent initialization
- Cost uses gpt-4o-mini pricing: 0.15/1Minput,0.6/1M output
- Network latency to OpenAI is shared across all frameworks
- Full benchmark scripts are in
benchmarks/ in the repository