Documentation Index
Fetch the complete documentation index at: https://docs.xhipai.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
AgentJudgeEval evaluates agent responses against multiple custom criteria using an LLM judge. Supports both numeric scoring (0.0–1.0) and binary (PASS/FAIL) modes.
Quick Start
import { AgentJudgeEval } from "@agentium/eval";
import { Agent, openai } from "@agentium/core";
const agent = new Agent({ name: "writer", model: openai("gpt-4o") });
const eval = new AgentJudgeEval({
name: "writing-quality",
agent,
judge: openai("gpt-4o-mini"),
criteria: [
"Response is grammatically correct",
"Response is concise (under 200 words)",
"Response directly answers the question",
],
scoringMode: "numeric",
cases: [
{ name: "explain-recursion", input: "Explain recursion in simple terms" },
],
});
const result = await eval.run();
Scoring Modes
numeric (default): Each criterion scored 0.0–1.0
binary: Each criterion scored PASS (1.0) or FAIL (0.0)