Summaries

In plain terms

A summary is a short recap of the older parts of a conversation. When a chat gets long, keeping every word is expensive and eventually overflows the model’s limit — so the oldest messages get compressed into a sentence or two and the verbatim text is dropped.

The analogy: turning ten pages of meeting notes into a single sticky note. You lose the word-for-word detail but keep the gist.

Summaries are on by default — you get them for free with any storage backend. They’re what lets a conversation continue for hours without the prompt ballooning.

When to use it

Long-running conversations — support cases, ongoing projects, multi-day threads. The agent remembers the arc of the discussion without carrying every message.
Cost control on chatty sessions — instead of re-sending 200 messages each turn, the agent sends a compact recap plus the recent messages.

It’s on by default, so for most agents you simply leave it alone.

When to turn it off

Strictly short conversations that never exceed maxMessages — there’s nothing to summarize, so it’s harmless either way, but you can disable it to skip the (rare) summarization LLM call:

memory: { storage, summaries: false }

When you need verbatim history for audit/legal reasons — but note: summaries don’t replace the raw transcript, they supplement it. The full session is still stored; summaries are an additional compact layer.

Configuration

Property	Type	Default	What it controls
`maxCount`	`number`	`10`	How many recap snippets to keep per conversation. Oldest pruned first
`maxTokens`	`number`	`2000`	Token budget for the summary text injected into the prompt each run

// Default (ON)
memory: { storage }

// Fewer, shorter summaries — leaner prompts, lower cost
memory: { storage, summaries: { maxCount: 5, maxTokens: 1000 } }

// Disable entirely
memory: { storage, summaries: false }

Tuning guidance:

maxCount — each time the session overflows maxMessages, one summary is created. More summaries = the agent remembers more of the conversation’s history, but old ones eventually drop. Raise it for threads that span days; lower it for short chats.
maxTokens — directly affects cost, since these tokens are sent on every run. Lower it (1000) to save money; raise it (4000) when the conversation is rich and the agent needs more of its history to respond well.

How it works

A conversation grows past maxMessages (see Sessions).
The oldest messages are handed to a model (use a cheap one via memory.model) which writes a short recap.
The recap is stored as a summary; the original messages are removed from the active thread.
On future runs, summaries are injected newest-first within the maxTokens budget — so the most recent context always survives if space is tight.

Cross-references

Sessions & History — where overflow comes from
Context Budget — how summaries compete with other memory for prompt space
Memory Overview — the full memory lifecycle

Sessions & History User Facts

​Summaries

​In plain terms

​When to use it

​When to turn it off

​Configuration

​How it works

​Cross-references