System Architecture

The big picture (in plain terms)

Agentium is built like a well-run company, where each department has one job and they all work together:

The AI model (GPT, Claude, …) is the smart new hire who can think and write.
The memory is the filing cabinet that remembers every customer.
The tools are the systems the hire can actually operate — your database, your APIs.
The safety layer is the manager who approves anything risky before it happens.
The transport layer is the front desk — how the outside world (a website, a phone call) reaches the team.

The key design idea: each part is independent and swappable. Don’t like one database? Swap it. Want a different AI model? One line. Need voice instead of chat? Same brain, different front desk. Nothing is welded together.

For non-engineers: the one thing worth remembering is modularity. Agentium isn’t one giant block — it’s separate pieces that snap together. That’s why a team can start with a simple chatbot and grow it into a voice product or an enterprise SaaS without rebuilding from scratch.

The rest of this page is the engineering detail — how those pieces are packaged and how data flows between them.

Monorepo Structure

Agentium is organized as a monorepo with four primary packages. Each package has a focused responsibility and can be used independently or together.

Package Overview

Package	Purpose
@agentium/core	Agents, models, tools, memory, storage, voice agents, vector stores, MCP client, A2A client
@agentium/transport	Express REST API, Socket.IO gateway, Voice gateway, Browser gateway, A2A server
@agentium/queue	BullMQ background job processing
@agentium/browser	Vision-based autonomous browser automation with Playwright

Layered Architecture

Agentium is built in layers. Higher layers depend on lower ones and infrastructure is pluggable.

SDK Layer — Agent, Team, Workflow, VoiceAgent, BrowserAgent. The primary API surface for defining behavior, orchestrating agents, and running workflows.
Engine Layer — LLM Loop, Tool Executor, MemoryManager (sessions, summaries, user facts, user profile, entities, decisions, learnings), SkillManager. Core execution logic with automatic retry, tool caching, token-based history trimming, reasoning, and cross-session personalization.
Safety Layer — Sandbox (isolated subprocess execution with timeout and memory limits), Approval Manager (human-in-the-loop gating before tool execution), Guardrails (input/output validation).
Model Abstraction — ModelProvider interface and adapters for text models. RealtimeProvider interface for voice/streaming models. Factory functions: openai(), anthropic(), google(), ollama(), vertex(), openaiRealtime(), googleLive().
Protocol Integration — MCP Client for consuming external tools, A2A Client for calling remote agents.
Infrastructure — Storage (in-memory, SQLite, PostgreSQL, MongoDB), Vector Stores, and Embeddings. All pluggable.
Registry & Auto-Discovery — Agents, Teams, and Workflows auto-register into a global Registry on construction. Transport layers read from the registry dynamically, so entities created at any time are immediately available over HTTP and WebSocket without restart or re-wiring.
Transport (Optional) — Express REST, Socket.IO WebSocket, Voice Gateway (real-time audio streaming), Browser Gateway (live browser observation), and A2A Server. Uses the Registry for live auto-discovery of agents, teams, and workflows.
Queue (Optional) — BullMQ workers for background job processing.

Data Flow — Text Agent

A typical text agent request flows through the system as follows:

User Input
    │
Agent.run() / Agent.stream()
    │
buildMessages (history + system instructions + MemoryManager.buildContext() + skill instructions)
    │
LLM Loop (with retry)
    │
ModelProvider (OpenAI / Anthropic / Google / Ollama / Vertex)
    │
Response (text / tool calls)
    │
Tool Executor (if tool calls)
  ├── Approval check (if requiresApproval is set)
  ├── Sandbox execution (if sandbox is enabled)
  ├── Local tools (with optional caching)
  ├── MCP tools (external servers)
  └── A2A tools (remote agents)
    │
Loop until final response
    │
MemoryManager.appendMessages() → auto-summarize overflow
    │
MemoryManager.afterRun() → fire-and-forget extraction
  (user facts, user profile, entities, learnings)
    │
Output to caller

Detailed Flow

User Input — A string or multi-modal content (text, images, files).
Agent — Receives input, loads session history from MemoryManager, injects memory context and skill instructions into the system prompt.
buildMessages — Constructs the message array: system prompt (with summaries, user facts, user profile, entities, decisions, learnings, skill instructions), session history (auto-trimmed if maxTokens is set), current user message.
LLM Loop — Sends messages to the model with automatic retry on transient failures (429, 5xx, network errors).
ModelProvider — Translates to the provider API format.
Response — Either text or tool calls.
Tool Executor — If tool calls:
- Checks human approval if requiresApproval is set on the tool or agent.
- Runs the tool in a sandboxed subprocess if sandbox is enabled.
- Executes the tool, appends results, and loops back to the model.
MemoryManager.appendMessages — Persists the new turn to session storage and auto-summarizes overflow.
MemoryManager.afterRun — Asynchronously extracts user facts, user profile, entities, and learnings from the conversation for future personalization.
Output — Returns or streams the final response to the caller.

Data Flow — Voice Agent

Audio Input (WebSocket / Socket.IO)
    │
VoiceAgent.connect()
    │
RealtimeProvider (OpenAI Realtime / Google Live)
    │
Bidirectional audio stream
    │
Tool calls (if any) → Tool Executor
    │
MemoryManager.appendMessages() (session persistence)
    │
MemoryManager.afterRun() (non-blocking extraction)
    │
Audio Output → Client

The VoiceAgent manages:

VoiceSession — wraps the realtime provider connection, routes tool calls, emits events.
Session persistence — conversation history saved via MemoryManager, restored on reconnect.
Memory extraction — user facts, profile, entities, and learnings extracted from voice transcripts (non-blocking).

Data Flow — Browser Agent

Task (string)
    │
BrowserAgent.run()
    │
Launch Playwright (with stealth config + humanize settings)
    │
Screenshot → ModelProvider (vision)
    │
LLM decides action (click, type, scroll, navigate, done, fail)
    │
BrowserProvider executes action
  ├── CredentialVault resolves {{placeholders}} for type actions
  ├── DOM extraction (optional, for hybrid vision+DOM approach)
  └── Loop detection (maxRepeats threshold)
    │
Screenshot → next iteration
    │
Loop until "done" or "fail" or maxSteps reached
    │
Close browser (with optional cookie/auth persistence)
    │
Output result + action history

The BrowserAgent supports:

Stealth mode — patches navigator.webdriver, WebGL, plugins to avoid bot detection.
Humanize mode — random delays, mouse movement curves, typing variation.
Credential vault — secrets never reach the LLM; only {{placeholders}} are used.
Video recording — Playwright-native recording of browser sessions.
Parallel browsing — multiple pages/tabs via BrowserProvider.
Cookie persistence — save and restore storageState across runs.

Event System

All agents emit typed events via the EventBus. This enables logging, analytics, transport integration, and custom middleware without coupling.

Event	Emitted by
`run.start`, `run.complete`, `run.error`	Agent
`run.stream.chunk`	Agent (streaming)
`tool.call`, `tool.result`, `tool.error`	Tool Executor
`tool.approval.request`, `tool.approval.response`	Approval Manager
`voice.session.start`, `voice.session.end`	VoiceAgent
`voice.tool.call`, `voice.tool.result`	VoiceSession
`browser.step`, `browser.action`, `browser.done`, `browser.error`	BrowserAgent
`memory.extract`, `memory.stored`, `memory.error`	MemoryManager
`skill.loaded`, `skill.learned`	SkillManager

Memory Architecture

Agentium provides a unified memory system through MemoryManager. A single memory config works identically across Agent, VoiceAgent, and BrowserAgent.

Store	Scope	Default	Purpose
Sessions	Per-session	ON	Message history, auto-trimmed by `maxMessages` or `maxTokens`.
Summaries	Per-session	ON	LLM-generated summaries of overflow messages for long-term context.
User Facts	Per-user, cross-session	OFF	Extracted facts — “prefers dark mode”, “lives in Mumbai”.
User Profile	Per-user, cross-session	OFF	Structured data — name, role, company, timezone.
Entity Memory	Global / per-namespace	OFF	Companies, people, projects with facts, events, relationships.
Decision Log	Per-agent	OFF	Audit trail of agent decisions — what, why, outcome.
Learned Knowledge	Global (vector-backed)	OFF	Reusable insights discovered during conversations.

All stores share a single StorageDriver (InMemory, SQLite, PostgreSQL, MongoDB). All extraction is non-blocking (fire-and-forget).

Skills Architecture

Skills are pre-packaged tool bundles loaded from local directories, npm packages, or remote URLs. The SkillManager orchestrates loading and provides lazy initialization (loaded on first run, not at construction).

Feature	Description
Pre-packaged Skills	Local, npm, or remote tool bundles with manifests.
Learned Skills	Agent-saved multi-step tool call patterns for replay.
Lazy Loading	Skills loaded on first `run()`, not at construction.
Instruction Injection	Skill instructions auto-injected into system prompt.

Registry & Auto-Discovery

Agentium includes a global Registry singleton. Every Agent, Team, and Workflow automatically registers itself on construction (unless register: false is set).

import { Agent, openai, registry } from "@agentium/core";

new Agent({ name: "bot", model: openai("gpt-4o") });

registry.list();
// { agents: ["bot"], teams: [], workflows: [] }

The Express router and Socket.IO gateway read from this registry at request time. Agents created after the transport layer starts become available immediately — no restart or re-wiring needed.

Feature	Description
Auto-register	Instances register on construction. Opt out with `register: false`.
`kind` discriminant	Each class has a `readonly kind` (`"agent"`, `"team"`, `"workflow"`) for reliable runtime type identification.
Dynamic routing	Transport routes resolve by name from the registry on each request.
List endpoints	`GET /agents`, `GET /teams`, `GET /workflows` return metadata. `GET /registry` returns all names.
Custom registries	Pass a custom `Registry` instance to `createAgentRouter()` or `createAgentGateway()` for isolated scoping.

Performance Optimizations

Optimization	Impact
Tool schema caching	Tool definitions are converted to JSON Schema once at construction, not on every LLM roundtrip.
Minimal schema serialization	Strips verbose JSON Schema fields (`$schema`, `additionalProperties`) to reduce token overhead.
Strict mode	Optional `strict: true` on tools enables OpenAI Structured Outputs for guaranteed valid JSON.
Session read deduplication	Session data is loaded once per run/stream call and reused for both context and history.
Non-blocking memory extraction	All memory extraction (facts, profile, entities, learnings) runs in background without blocking.
Token-based history trimming	maxContextTokens auto-trims history (oldest first) to prevent context window overflow.
Automatic retry	Transient LLM API failures (429, 5xx, network errors) are retried with exponential backoff.
Streaming usage tracking	Token usage is accurately tracked in both run and stream modes.
Sandbox subprocess pooling	Sandboxed tools run in isolated child processes without affecting the main event loop.

Core Design Principles

Zero Meta-Framework Dependency — No Next.js, Remix, or framework-specific runtime. Use Agentium with any Node.js server or headless.
Optional Peer Dependencies — Providers (openai, anthropic, etc.) are peer dependencies. Lazy-loaded so you only bundle what you use.
Event-Driven — EventBus emits lifecycle events. Subscribe for logging, analytics, or custom middleware.
Pluggable Everything — Storage, models, vector stores, and transport are all swappable. Configure once, change later without rewriting logic.
Safety by Default — Sandbox execution and human-in-the-loop approval are opt-in per tool or agent-wide. Guardrails validate input and output.
Open Protocol Support — MCP for tool integration and A2A for agent interoperability. Connect to the broader AI ecosystem without vendor lock-in.
Production Resilient — Automatic retry with exponential backoff, token-based context trimming, and non-blocking background operations ensure reliability at scale.

​System Architecture

​The big picture (in plain terms)

​Monorepo Structure

​Package Overview

​Layered Architecture

​Data Flow — Text Agent

​Detailed Flow

​Data Flow — Voice Agent

​Data Flow — Browser Agent

​Event System

​Memory Architecture

​Skills Architecture

​Registry & Auto-Discovery

​Performance Optimizations

​Core Design Principles

System Architecture

The big picture (in plain terms)

Monorepo Structure

Package Overview

Layered Architecture

Data Flow — Text Agent

Detailed Flow

Data Flow — Voice Agent

Data Flow — Browser Agent

Event System

Memory Architecture

Skills Architecture

Registry & Auto-Discovery

Performance Optimizations

Core Design Principles