Self-Learning & Corrections Examples

Examples for the v2.4–v2.5 learning-layer features. The running scenario: an AP reconciliation agent that processes vendor invoices and learns from every human correction.

Correction Capture (Programmatic)

Record a human correction through agent.memory.recordCorrection(). The correction is embedded into the vector store and retrieved on future relevant runs — the agent stops repeating the mistake.

import { Agent, InMemoryStorage, InMemoryVectorStore, OpenAIEmbedding, openai } from "@agentium/core";

const storage = new InMemoryStorage();
const vectorStore = new InMemoryVectorStore(new OpenAIEmbedding());

const agent = new Agent({
  name: "ap-reconciler",
  model: openai("gpt-4o"),
  instructions: "You reconcile vendor invoices against quotes.",
  memory: {
    storage,
    corrections: { vectorStore, topK: 3 },
  },
});

// Run 1: the agent maps a charge incorrectly
const r1 = await agent.run("Reconcile invoice INV-99 from Vendor X: THC $250");
// → agent classifies the charge as "THC" (origin terminal handling)

// A reviewer fixes it:
await agent.memory!.recordCorrection({
  agentName: "ap-reconciler",
  runId: r1.runId,
  field: "chargeCode",
  originalValue: "THC",
  correctedValue: "DTHC",
  reason: "Vendor X labels destination THC as just THC on invoices",
  entityKey: "vendor-x",
  originalInput: "Reconcile invoice INV-99 from Vendor X: THC $250",
});

// Run 2 — next Vendor X invoice: the correction is injected into context:
//   <memory section="corrections">
//   Past corrections (avoid repeating these mistakes):
//   - chargeCode: "THC" was corrected to "DTHC" [vendor-x] — Vendor X labels...
//   </memory>
const r2 = await agent.run("Reconcile invoice INV-104 from Vendor X: THC $310");
// → agent now maps it to DTHC

Correction Capture (HTTP, from a review UI)

Every agent served via createAgentRouter() with corrections enabled exposes POST /agents/:name/corrections — wire your review UI’s “fix” button straight to it.

import { createAgentRouter } from "@agentium/transport";
import express from "express";

const app = express();
app.use(express.json());
app.use("/api", createAgentRouter()); // auto-discovers ap-reconciler

// From the review UI backend:
await fetch("http://localhost:3000/api/agents/ap-reconciler/corrections", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    field: "chargeCode",
    originalValue: "THC",
    correctedValue: "DTHC",
    reason: "Vendor X labels destination THC as just THC",
    entityKey: "vendor-x",
    runId: "run-abc123",
    originalInput: "Reconcile invoice INV-99 from Vendor X: THC $250",
  }),
});
// → 201 { id: "...", agentName: "ap-reconciler", ... }

Trust Tiers & Grounded Extraction

Learnings carry a provenance source. Context injection marks human-authored knowledge [verified] and AI-extracted knowledge [unverified], with a caveat telling the model not to assert unverified items as fact. Auto-extraction only saves learnings it can anchor to a verbatim quote.

const agent = new Agent({
  name: "ap-reconciler",
  model: openai("gpt-4o"),
  memory: {
    storage,
    learnings: {
      vectorStore,
      topK: 3,
      minScore: 0.4, // drop weak matches — they pollute the prompt
    },
  },
});

// Human-authored: saved with source "manual" → rendered [verified]
await agent.memory!.remember("Vendor X invoices always arrive on the 1st of the month");

// LLM auto-extraction after each run: saved with source "llm-extracted"
// → rendered [unverified], and REJECTED entirely unless the extractor
//   provides a verbatim supporting quote from the conversation.

// The agent's next prompt contains:
// Relevant learnings:
// - Vendor X invoice timing [verified]: Vendor X invoices always arrive on the 1st...
// - Vendor X currency guess [unverified]: Vendor X may bill in EUR (applies when: ...)
// Items marked [unverified] are AI-extracted hypotheses — treat them as hints to verify, not established facts.

Self-Corrective Memory

Recording a correction automatically invalidates unverified AI-extracted learnings that contradict it. Human-authored learnings are never auto-retired.

const memory = agent.memory!;
const learnings = memory.getLearnedKnowledge()!;

// An AI-extracted (unverified) learning exists:
// "Vendor X uses THC for terminal handling" (source: "llm-extracted")

await memory.recordCorrection({
  agentName: "ap-reconciler",
  field: "chargeCode",
  originalValue: "THC",
  correctedValue: "DTHC",
  reason: "Vendor X convention",
});
// → the contradicted unverified learning is invalidated (≥ 0.85 similarity)
// → emits "memory.learning.invalidated" { learningIds, supersededBy, agentName }

// Tune or disable:
// corrections: { vectorStore, contradictionThreshold: 0.9 }
// corrections: { vectorStore, invalidateContradicted: false }

agent.eventBus.on("memory.learning.invalidated", (e) => {
  console.log(`Correction ${e.supersededBy} retired ${e.learningIds.length} stale learning(s)`);
});

Regression Evals from Corrections

Corrections recorded with originalInput become replayable test cases — run them in CI to prove previously-corrected mistakes no longer recur.

import { EvalSuite, contains } from "@agentium/eval";

const store = agent.memory!.getCorrectionStore()!;
const cases = await store.toEvalCases({ agentName: "ap-reconciler" });

const suite = new EvalSuite({
  name: "correction-regression",
  agent,
  cases: cases.map((c) => ({
    name: `correction-${c.correctionId}`,
    input: c.input,        // the input that originally produced the mistake
    expected: c.expected,  // the human-corrected value
  })),
  scorers: [contains()],
});

const result = await suite.run();
console.log(`${result.passed}/${result.total} corrected mistakes stay fixed`);

Reflection & Confidence-Gated Escalation

With reflection.enabled, every run() gets an LLM-critic pass and bounded revision. The final critique lands on output.critique — use the score to route low-confidence outputs to human review.

const agent = new Agent({
  name: "ap-reconciler",
  model: openai("gpt-4o"),
  reflection: {
    enabled: true,
    maxReflections: 1,            // up to 1 critique→revise cycle
    critic: openai("gpt-4o-mini"), // cheaper model as the critic
    customCriteria: "Charge codes must match the vendor's known conventions.",
  },
});

const output = await agent.run("Reconcile invoice INV-120 from Vendor Y");

console.log(output.critique);
// → { pass: true, score: 0.92, feedback: "...", revisions: 1 }

if (output.critique && output.critique.score < 0.6) {
  await sendToReviewQueue({ input: "INV-120", output }); // human decides
} else {
  await postToERP(output.structured);
}

Accuracy Metrics

Corrections and critique scores flow into MetricsExporter automatically — correctionRate is the inverse of first-pass accuracy.

import { MetricsExporter } from "@agentium/observability";

const metrics = new MetricsExporter();
metrics.attach(agent.eventBus);

// ... after some runs and corrections ...

const m = metrics.getMetrics("ap-reconciler");
console.log({
  runs: m.runs,
  correctionsTotal: m.correctionsTotal,
  correctionRate: m.correctionRate,      // e.g. 0.12 → 88% first-pass accuracy
  avgCritiqueScore: m.avgCritiqueScore,  // e.g. 0.86
});

// Prometheus: agentium_agent_correction_rate{agent="ap-reconciler"} 0.12
app.get("/metrics", (_req, res) => res.type("text/plain").send(metrics.toPrometheus()));

// Per-vendor accuracy trend:
const stats = await agent.memory!.getCorrectionStore()!.getStats({ agentName: "ap-reconciler" });
// → { total: 42, byEntityKey: { "vendor-x": 17, "vendor-y": 3 }, byField: { chargeCode: 23 } }

External Agents (Any Framework)

Serve a LangGraph graph — or any custom code — through Agentium’s runtime with defineExternalAgent().

import { defineExternalAgent } from "@agentium/core";
import { createAgentRouter } from "@agentium/transport";

const researcher = defineExternalAgent({
  name: "researcher",
  instructions: "Researches shipping regulations.",
  run: async (input) => {
    const result = await langGraph.invoke({ messages: [String(input)] });
    return result.messages.at(-1).content; // plain string is fine
  },
});

// Full runtime for free: routes, registry, observability events
app.use("/api", createAgentRouter());
// POST /api/agents/researcher/run now serves the LangGraph agent

const out = await researcher.run("What are the IMO 2026 sulfur rules?");
console.log(out.text, out.runId, out.status);

Ephemeral Agents (Names Are Labels)

Since v2.3.2, registry names are last-write-wins — creating the same-named agent repeatedly never throws. Per-request and in-loop agents just work.

// Called for every inbound email — same name every call, no workarounds needed
async function classifyEmail(subject: string, body: string) {
  const agent = new Agent({
    name: "email-classifier",          // safe to repeat
    model: openai("gpt-4o-mini"),
    instructions: "Classify the email as arrival-notice, invoice, or other.",
    structuredOutput: classificationSchema,
    register: false,                    // optional: skip the registry for throwaways
  });
  return agent.run(`Subject: ${subject}\n\n${body}`);
}

const results = await Promise.all(emails.map((e) => classifyEmail(e.subject, e.body)));

Memory Maintenance (Reconcile)

Vector-backed stores dual-write to KV + vector index. Run curator.reconcile() on a schedule to repair drift after crashes.

import { CronJob } from "cron";

new CronJob("0 3 * * *", async () => {
  const { learnings, corrections } = await agent.memory!.curator.reconcile();
  console.log(`Re-indexed ${learnings} learnings, ${corrections} corrections`);
}).start();

Core Examples

Multi-Agent & Orchestration

Knowledge & RAG

Deployment & Transport

Monitoring & Cost

Advanced & Integrations

End-to-End Applications

Self-Learning & Corrections Examples

Correction Capture (Programmatic)

Correction Capture (HTTP, from a review UI)

Trust Tiers & Grounded Extraction

Self-Corrective Memory

Regression Evals from Corrections

Reflection & Confidence-Gated Escalation

Accuracy Metrics

External Agents (Any Framework)

Ephemeral Agents (Names Are Labels)

Memory Maintenance (Reconcile)

​Correction Capture (Programmatic)

​Correction Capture (HTTP, from a review UI)

​Trust Tiers & Grounded Extraction

​Self-Corrective Memory

​Regression Evals from Corrections

​Reflection & Confidence-Gated Escalation

​Accuracy Metrics

​External Agents (Any Framework)

​Ephemeral Agents (Names Are Labels)

​Memory Maintenance (Reconcile)

Correction Capture (Programmatic)

Correction Capture (HTTP, from a review UI)

Trust Tiers & Grounded Extraction

Self-Corrective Memory

Regression Evals from Corrections

Reflection & Confidence-Gated Escalation

Accuracy Metrics

External Agents (Any Framework)

Ephemeral Agents (Names Are Labels)

Memory Maintenance (Reconcile)