Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentium.in/llms.txt

Use this file to discover all available pages before exploring further.

Memory Pointer Pattern

The problem

When a tool returns 200KB of logs, or 5,000 database rows, or a full PDF, passing the raw output back into the LLM context causes three failures:
  1. Token blowup. A 200KB log dump is ~50,000 tokens. At GPT-4o’s 2.50/1Minputrate,thats2.50 / 1M input rate, that's 0.13 per turn just to feed the model context it doesn’t need.
  2. Silent truncation. Most providers cap at 128K–200K tokens. A few big tool calls can break the conversation entirely.
  3. Worse answers. Models get distracted by huge irrelevant blobs (the “needle in a haystack” problem).
IBM reported reducing 20,000,000 tokens to 1,234 tokens in production by adopting this pattern. The principle is simple: store the big value outside the LLM’s view, give the LLM a short pointer instead.

How it works

                   ┌──────────────────────────────────┐
   tool returns ──▶│  ToolExecutor (size check)        │
                   └──────────────────────────────────┘

            < maxToolOutputBytes ─▼      ≥ maxToolOutputBytes ─▼
             ┌────────────────┐        ┌─────────────────────────────────┐
             │ Pass through   │        │ Store in ArtifactStore on        │
             │ unchanged      │        │ RunContext.sessionState          │
             └────────────────┘        │ Return { pointer, preview, ... } │
                                       └─────────────────────────────────┘
The agent sees only the short JSON pointer envelope. If it needs the full value later, it calls the auto-injected getArtifact(pointer) tool.

Enabling on an Agent

import { Agent, openai } from "@agentium/core";

const agent = new Agent({
  name: "log-investigator",
  model: openai("gpt-4o"),
  tools: [fetchLogs, summarize],
  artifacts: {
    enabled: true,                    // turns the pattern on
    maxToolOutputBytes: 50 * 1024,    // default: 50KB
    previewChars: 200,                // default: 200
  },
});

ArtifactsConfig fields

FieldTypeDefaultMeaning
enabledbooleanfalseMaster switch. When true, the auto-conversion runs after every tool call.
maxToolOutputBytesnumber51200 (50KB)Threshold above which the result becomes an artifact. UTF-8 byte length of the stringified content.
previewCharsnumber200How many chars of the original content are kept in the visible preview field.
When enabled: true, three tools are auto-injected into the agent’s tool list:
  • storeArtifact(name, value, contentType?)
  • getArtifact(pointerOrName)
  • listArtifacts()

The auto-converted result

When a tool exceeds the threshold, its result is replaced with a JSON string of this shape:
{
  "pointer": "art:550e8400-e29b-41d4-a716-446655440000",
  "preview": "[auth-service] error: something went wrong\nerror: something went wrong\n...(truncated, 98000 more chars)",
  "sizeBytes": 102400,
  "note": "Output too large; full value stored as artifact. Call getArtifact(pointer) to read it."
}
The LLM sees ~250 tokens instead of ~25,000. The preview field is critical — it lets the model decide whether the artifact is interesting before fetching the full value.

Manual artifact storage

Tools can opt into the pattern explicitly:
import { storeArtifact } from "@agentium/core";
import { defineTool } from "@agentium/core";
import { z } from "zod";

const fetchReport = defineTool({
  name: "fetchReport",
  description: "Download the quarterly report PDF as text.",
  parameters: z.object({ quarter: z.string() }),
  execute: async ({ quarter }, ctx) => {
    const huge = await downloadReport(quarter); // 10MB of text

    const ptr = storeArtifact(ctx, huge, {
      name: `report-${quarter}`,
      contentType: "application/json",
      previewChars: 500,
    });

    return JSON.stringify({
      pointer: ptr.pointer,
      preview: ptr.preview,
      sizeBytes: ptr.sizeBytes,
    });
  },
});
Now the LLM can refer to the artifact by name:
> Read report-2024-q4 and summarize the top 5 KPIs.
The agent calls getArtifact("report-2024-q4") (by name, not pointer) and gets the full text.

API reference

storeArtifact(ctx, value, opts?)

Stores a value and returns a pointer.
function storeArtifact(
  ctx: RunContext,
  value: unknown,
  opts?: {
    name?: string;          // optional human-readable name (also looked up via getArtifact)
    contentType?: string;   // MIME-style hint
    previewChars?: number;  // override the default preview length
  },
): ArtifactPointer;

interface ArtifactPointer {
  pointer: string;          // "art:<uuid>"
  preview: string;          // short preview of value
  sizeBytes: number;        // serialized size
  name?: string;            // echoed back if provided
}
value can be any JSON-serializable object or a string. Objects are JSON.stringify’d for preview and size computation; the raw value is preserved for retrieval.

getArtifact(ctx, pointerOrName)

Looks up an artifact by either its art: pointer or its name. Returns null for missing.
function getArtifact(ctx: RunContext, pointerOrName: string): StoredArtifact | null;

interface StoredArtifact {
  id: string;
  name?: string;
  value: unknown;
  preview: string;
  sizeBytes: number;
  storedAt: number;       // Date.now()
  contentType?: string;
}

listArtifacts(ctx)

Returns every artifact stored in the current RunContext, deduplicated (name aliases don’t double-count).
function listArtifacts(ctx: RunContext): StoredArtifact[];

isPointer(value)

Helper for runtime checks:
function isPointer(value: unknown): value is string;
// returns true for any string starting with "art:"

approxByteSize(value)

Quick UTF-8 size estimate used by the executor:
function approxByteSize(value: unknown): number;
Falls back to 0 for circular objects.

Auto-injected tools (when artifacts.enabled)

Tool nameParametersReturns
storeArtifact{ name: string, value: string, contentType?: string }JSON { pointer, preview, sizeBytes, name }
getArtifact{ pointerOrName: string }The raw value (or "[no artifact found for '...']")
listArtifacts{}JSON array of { pointer, name, sizeBytes, preview, contentType, storedAt }
These tools intentionally bypass the size threshold themselves — otherwise storeArtifact calls would recursively wrap their own output.

Lifecycle and scope

Artifacts live on RunContext.sessionState["__artifacts"] as a Map<string, StoredArtifact>. That means:
  • Per-run by default: A new RunContext starts with an empty map.
  • Per-session if you persist sessionState: Pass the same sessionState between runs (e.g. via your session manager) and artifacts carry forward.
  • Not persisted by default: The default SessionManager does write sessionState to storage, but the map serializes to [] unless you use a JSON-aware codec. For durable artifacts, persist them explicitly to your own storage and re-hydrate.

When to use

  • Database query tools that may return many rows
  • Web scraping / page fetch tools
  • Log search tools
  • File reading tools where files can be > a few hundred KB
  • Tool chains where output of step N is input to step N+1 but doesn’t need to pass through the LLM in between

When NOT to use

  • Short status checks (“is X online?”) — overhead isn’t worth it
  • Single-row lookups
  • Anything the LLM legitimately needs to reason over inline (e.g. a small JSON config)
  • Streaming output (the threshold check is one-shot on the final result)

See also

  • Tool PolishtoModelOutput is a more surgical way to shrink specific tool outputs.
  • Async HandleId Pattern — pair pointers with handles for long-running + large-output tools.