Documentation Index
Fetch the complete documentation index at: https://docs.xhipai.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
One user action can trigger dozens of LLM calls. Token-unaware rate limits block 41% of legitimate traffic. Agentium provides token-aware rate limiting with sliding windows, concurrency control, and graceful degradation.
Quick Start
import { Agent, openai } from "@agentium/core";
const agent = new Agent({
name: "rate-limited-agent",
model: openai("gpt-4o"),
rateLimit: {
maxTokensPerMinute: 100_000,
maxRequestsPerMinute: 60,
maxConcurrent: 5,
perTenant: true,
onLimitReached: "degrade",
degradeStrategy: {
useCheaperModel: openai("gpt-4o-mini"),
reduceMaxTokens: 1000,
},
},
});
Token Rate Limiter
Sliding-window token counting with per-scope tracking:
import { TokenRateLimiter } from "@agentium/core";
const limiter = new TokenRateLimiter({
maxTokensPerMinute: 100_000,
maxTokensPerHour: 1_000_000,
maxRequestsPerMinute: 60,
perTenant: true,
perUser: true,
});
// Check without consuming
const status = limiter.check({ tenantId: "acme", userId: "u1" });
// { allowed: true, remaining: 95000, resetMs: 45000 }
// Acquire tokens (pre-call estimate)
const result = limiter.acquire(500, { tenantId: "acme" });
// Reconcile after actual usage
limiter.record(actualTokens, estimatedTokens, { tenantId: "acme" });
Concurrency Limiter
Control maximum concurrent LLM calls:
import { ConcurrencyLimiter } from "@agentium/core";
const limiter = new ConcurrencyLimiter(5, 30_000); // max 5, 30s timeout
const release = await limiter.acquire();
try {
await callLLM();
} finally {
release();
}
console.log(limiter.active); // current concurrent calls
console.log(limiter.pending); // queued requests
console.log(limiter.available); // remaining capacity
Limit Reached Strategies
| Strategy | Behavior |
|---|
"queue" | Queue requests until capacity is available |
"reject" | Immediately reject with error |
"degrade" | Switch to cheaper model and reduce token limits |
Events
| Event | Payload |
|---|
rateLimit.throttled | { scope, limitType, resetMs } |
rateLimit.degraded | { scope, originalModel, degradedModel } |
rateLimit.rejected | { scope, reason } |