Documentation Index
Fetch the complete documentation index at: https://docs.xhipai.com/llms.txt
Use this file to discover all available pages before exploring further.
System Architecture
This page describes the Agentium architecture: how packages are organized, how layers interact, and how data flows through the system.
Monorepo Structure
Agentium is organized as a monorepo with four primary packages. Each package has a focused responsibility and can be used independently or together.
Package Overview
| Package | Purpose |
|---|
| @agentium/core | Agents, models, tools, memory, storage, voice agents, vector stores, MCP client, A2A client |
| @agentium/transport | Express REST API, Socket.IO gateway, Voice gateway, Browser gateway, A2A server |
| @agentium/queue | BullMQ background job processing |
| @agentium/browser | Vision-based autonomous browser automation with Playwright |
Layered Architecture
Agentium is built in layers. Higher layers depend on lower ones and infrastructure is pluggable.
-
SDK Layer — Agent, Team, Workflow, VoiceAgent, BrowserAgent. The primary API surface for defining behavior, orchestrating agents, and running workflows.
-
Engine Layer — LLM Loop, Tool Executor, MemoryManager (sessions, summaries, user facts, user profile, entities, decisions, learnings), SkillManager. Core execution logic with automatic retry, tool caching, token-based history trimming, reasoning, and cross-session personalization.
-
Safety Layer — Sandbox (isolated subprocess execution with timeout and memory limits), Approval Manager (human-in-the-loop gating before tool execution), Guardrails (input/output validation).
-
Model Abstraction — ModelProvider interface and adapters for text models. RealtimeProvider interface for voice/streaming models. Factory functions:
openai(), anthropic(), google(), ollama(), vertex(), openaiRealtime(), googleLive().
-
Protocol Integration — MCP Client for consuming external tools, A2A Client for calling remote agents.
-
Infrastructure — Storage (in-memory, SQLite, PostgreSQL, MongoDB), Vector Stores, and Embeddings. All pluggable.
-
Registry & Auto-Discovery — Agents, Teams, and Workflows auto-register into a global
Registry on construction. Transport layers read from the registry dynamically, so entities created at any time are immediately available over HTTP and WebSocket without restart or re-wiring.
-
Transport (Optional) — Express REST, Socket.IO WebSocket, Voice Gateway (real-time audio streaming), Browser Gateway (live browser observation), and A2A Server. Uses the Registry for live auto-discovery of agents, teams, and workflows.
-
Queue (Optional) — BullMQ workers for background job processing.
Data Flow — Text Agent
A typical text agent request flows through the system as follows:
User Input
│
Agent.run() / Agent.stream()
│
buildMessages (history + system instructions + MemoryManager.buildContext() + skill instructions)
│
LLM Loop (with retry)
│
ModelProvider (OpenAI / Anthropic / Google / Ollama / Vertex)
│
Response (text / tool calls)
│
Tool Executor (if tool calls)
├── Approval check (if requiresApproval is set)
├── Sandbox execution (if sandbox is enabled)
├── Local tools (with optional caching)
├── MCP tools (external servers)
└── A2A tools (remote agents)
│
Loop until final response
│
MemoryManager.appendMessages() → auto-summarize overflow
│
MemoryManager.afterRun() → fire-and-forget extraction
(user facts, user profile, entities, learnings)
│
Output to caller
Detailed Flow
- User Input — A string or multi-modal content (text, images, files).
- Agent — Receives input, loads session history from MemoryManager, injects memory context and skill instructions into the system prompt.
- buildMessages — Constructs the message array: system prompt (with summaries, user facts, user profile, entities, decisions, learnings, skill instructions), session history (auto-trimmed if maxTokens is set), current user message.
- LLM Loop — Sends messages to the model with automatic retry on transient failures (429, 5xx, network errors).
- ModelProvider — Translates to the provider API format.
- Response — Either text or tool calls.
- Tool Executor — If tool calls:
- Checks human approval if
requiresApproval is set on the tool or agent.
- Runs the tool in a sandboxed subprocess if
sandbox is enabled.
- Executes the tool, appends results, and loops back to the model.
- MemoryManager.appendMessages — Persists the new turn to session storage and auto-summarizes overflow.
- MemoryManager.afterRun — Asynchronously extracts user facts, user profile, entities, and learnings from the conversation for future personalization.
- Output — Returns or streams the final response to the caller.
Data Flow — Voice Agent
Audio Input (WebSocket / Socket.IO)
│
VoiceAgent.connect()
│
RealtimeProvider (OpenAI Realtime / Google Live)
│
Bidirectional audio stream
│
Tool calls (if any) → Tool Executor
│
MemoryManager.appendMessages() (session persistence)
│
MemoryManager.afterRun() (non-blocking extraction)
│
Audio Output → Client
The VoiceAgent manages:
- VoiceSession — wraps the realtime provider connection, routes tool calls, emits events.
- Session persistence — conversation history saved via MemoryManager, restored on reconnect.
- Memory extraction — user facts, profile, entities, and learnings extracted from voice transcripts (non-blocking).
Data Flow — Browser Agent
Task (string)
│
BrowserAgent.run()
│
Launch Playwright (with stealth config + humanize settings)
│
Screenshot → ModelProvider (vision)
│
LLM decides action (click, type, scroll, navigate, done, fail)
│
BrowserProvider executes action
├── CredentialVault resolves {{placeholders}} for type actions
├── DOM extraction (optional, for hybrid vision+DOM approach)
└── Loop detection (maxRepeats threshold)
│
Screenshot → next iteration
│
Loop until "done" or "fail" or maxSteps reached
│
Close browser (with optional cookie/auth persistence)
│
Output result + action history
The BrowserAgent supports:
- Stealth mode — patches
navigator.webdriver, WebGL, plugins to avoid bot detection.
- Humanize mode — random delays, mouse movement curves, typing variation.
- Credential vault — secrets never reach the LLM; only
{{placeholders}} are used.
- Video recording — Playwright-native recording of browser sessions.
- Parallel browsing — multiple pages/tabs via BrowserProvider.
- Cookie persistence — save and restore
storageState across runs.
Event System
All agents emit typed events via the EventBus. This enables logging, analytics, transport integration, and custom middleware without coupling.
| Event | Emitted by |
|---|
run.start, run.complete, run.error | Agent |
run.stream.chunk | Agent (streaming) |
tool.call, tool.result, tool.error | Tool Executor |
tool.approval.request, tool.approval.response | Approval Manager |
voice.session.start, voice.session.end | VoiceAgent |
voice.tool.call, voice.tool.result | VoiceSession |
browser.step, browser.action, browser.done, browser.error | BrowserAgent |
memory.extract, memory.stored, memory.error | MemoryManager |
skill.loaded, skill.learned | SkillManager |
Memory Architecture
Agentium provides a unified memory system through MemoryManager. A single memory config works identically across Agent, VoiceAgent, and BrowserAgent.
| Store | Scope | Default | Purpose |
|---|
| Sessions | Per-session | ON | Message history, auto-trimmed by maxMessages or maxTokens. |
| Summaries | Per-session | ON | LLM-generated summaries of overflow messages for long-term context. |
| User Facts | Per-user, cross-session | OFF | Extracted facts — “prefers dark mode”, “lives in Mumbai”. |
| User Profile | Per-user, cross-session | OFF | Structured data — name, role, company, timezone. |
| Entity Memory | Global / per-namespace | OFF | Companies, people, projects with facts, events, relationships. |
| Decision Log | Per-agent | OFF | Audit trail of agent decisions — what, why, outcome. |
| Learned Knowledge | Global (vector-backed) | OFF | Reusable insights discovered during conversations. |
All stores share a single StorageDriver (InMemory, SQLite, PostgreSQL, MongoDB). All extraction is non-blocking (fire-and-forget).
Skills Architecture
Skills are pre-packaged tool bundles loaded from local directories, npm packages, or remote URLs. The SkillManager orchestrates loading and provides lazy initialization (loaded on first run, not at construction).
| Feature | Description |
|---|
| Pre-packaged Skills | Local, npm, or remote tool bundles with manifests. |
| Learned Skills | Agent-saved multi-step tool call patterns for replay. |
| Lazy Loading | Skills loaded on first run(), not at construction. |
| Instruction Injection | Skill instructions auto-injected into system prompt. |
Registry & Auto-Discovery
Agentium includes a global Registry singleton. Every Agent, Team, and Workflow automatically registers itself on construction (unless register: false is set).
import { Agent, openai, registry } from "@agentium/core";
new Agent({ name: "bot", model: openai("gpt-4o") });
registry.list();
// { agents: ["bot"], teams: [], workflows: [] }
The Express router and Socket.IO gateway read from this registry at request time. Agents created after the transport layer starts become available immediately — no restart or re-wiring needed.
| Feature | Description |
|---|
| Auto-register | Instances register on construction. Opt out with register: false. |
kind discriminant | Each class has a readonly kind ("agent", "team", "workflow") for reliable runtime type identification. |
| Dynamic routing | Transport routes resolve by name from the registry on each request. |
| List endpoints | GET /agents, GET /teams, GET /workflows return metadata. GET /registry returns all names. |
| Custom registries | Pass a custom Registry instance to createAgentRouter() or createAgentGateway() for isolated scoping. |
| Optimization | Impact |
|---|
| Tool schema caching | Tool definitions are converted to JSON Schema once at construction, not on every LLM roundtrip. |
| Minimal schema serialization | Strips verbose JSON Schema fields ($schema, additionalProperties) to reduce token overhead. |
| Strict mode | Optional strict: true on tools enables OpenAI Structured Outputs for guaranteed valid JSON. |
| Session read deduplication | Session data is loaded once per run/stream call and reused for both context and history. |
| Non-blocking memory extraction | All memory extraction (facts, profile, entities, learnings) runs in background without blocking. |
| Token-based history trimming | maxContextTokens auto-trims history (oldest first) to prevent context window overflow. |
| Automatic retry | Transient LLM API failures (429, 5xx, network errors) are retried with exponential backoff. |
| Streaming usage tracking | Token usage is accurately tracked in both run and stream modes. |
| Sandbox subprocess pooling | Sandboxed tools run in isolated child processes without affecting the main event loop. |
Core Design Principles
- Zero Meta-Framework Dependency — No Next.js, Remix, or framework-specific runtime. Use Agentium with any Node.js server or headless.
- Optional Peer Dependencies — Providers (openai, anthropic, etc.) are peer dependencies. Lazy-loaded so you only bundle what you use.
- Event-Driven — EventBus emits lifecycle events. Subscribe for logging, analytics, or custom middleware.
- Pluggable Everything — Storage, models, vector stores, and transport are all swappable. Configure once, change later without rewriting logic.
- Safety by Default — Sandbox execution and human-in-the-loop approval are opt-in per tool or agent-wide. Guardrails validate input and output.
- Open Protocol Support — MCP for tool integration and A2A for agent interoperability. Connect to the broader AI ecosystem without vendor lock-in.
- Production Resilient — Automatic retry with exponential backoff, token-based context trimming, and non-blocking background operations ensure reliability at scale.