Reliability Evaluation

Overview

ReliabilityEval asserts that agents call expected tools, handle errors correctly, and produce non-empty responses.

Quick Start

import { ReliabilityEval } from "@agentium/eval";
import { Agent, openai } from "@agentium/core";

const agent = new Agent({
  name: "tool-agent",
  model: openai("gpt-4o"),
  tools: [searchTool, calcTool],
});

const eval = new ReliabilityEval({
  name: "tool-reliability",
  agent,
  cases: [
    { name: "uses-search", input: "Search for latest news", expectedTools: ["search"] },
    { name: "handles-error", input: "Divide by zero", shouldError: true },
  ],
});

const result = await eval.run();

Case Options

Field	Type	Description
`expectedTools`	`string[]`	Tool names that should be called
`shouldError`	`boolean`	Whether the case should throw an error

Tool Call Match Scorer

Use toolCallMatch as a standalone scorer:

import { EvalSuite, toolCallMatch } from "@agentium/eval";

const suite = new EvalSuite({
  name: "tools-test",
  agent,
  scorers: [toolCallMatch(["search", "calculate"])],
  cases: [{ name: "test", input: "Search and calculate" }],
});

​Overview

​Quick Start

​Case Options

​Tool Call Match Scorer

Overview

Quick Start

Case Options

Tool Call Match Scorer