Documentation Index
Fetch the complete documentation index at: https://docs.agentium.in/llms.txt
Use this file to discover all available pages before exploring further.
Summaries
In plain terms
A summary is a short recap of the older parts of a conversation. When a chat gets long, keeping every word is expensive and eventually overflows the model’s limit — so the oldest messages get compressed into a sentence or two and the verbatim text is dropped.The analogy: turning ten pages of meeting notes into a single sticky note. You lose the word-for-word detail but keep the gist.Summaries are on by default — you get them for free with any
storage backend. They’re what lets a conversation continue for hours without the prompt ballooning.
When to use it
- Long-running conversations — support cases, ongoing projects, multi-day threads. The agent remembers the arc of the discussion without carrying every message.
- Cost control on chatty sessions — instead of re-sending 200 messages each turn, the agent sends a compact recap plus the recent messages.
When to turn it off
- Strictly short conversations that never exceed
maxMessages— there’s nothing to summarize, so it’s harmless either way, but you can disable it to skip the (rare) summarization LLM call:
- When you need verbatim history for audit/legal reasons — but note: summaries don’t replace the raw transcript, they supplement it. The full session is still stored; summaries are an additional compact layer.
Configuration
| Property | Type | Default | What it controls |
|---|---|---|---|
maxCount | number | 10 | How many recap snippets to keep per conversation. Oldest pruned first |
maxTokens | number | 2000 | Token budget for the summary text injected into the prompt each run |
maxCount— each time the session overflowsmaxMessages, one summary is created. More summaries = the agent remembers more of the conversation’s history, but old ones eventually drop. Raise it for threads that span days; lower it for short chats.maxTokens— directly affects cost, since these tokens are sent on every run. Lower it (1000) to save money; raise it (4000) when the conversation is rich and the agent needs more of its history to respond well.
How it works
- A conversation grows past
maxMessages(see Sessions). - The oldest messages are handed to a model (use a cheap one via
memory.model) which writes a short recap. - The recap is stored as a summary; the original messages are removed from the active thread.
- On future runs, summaries are injected newest-first within the
maxTokensbudget — so the most recent context always survives if space is tight.
Cross-references
- Sessions & History — where overflow comes from
- Context Budget — how summaries compete with other memory for prompt space
- Memory Overview — the full memory lifecycle