Runs an evaluation suite against an agent.
Name for this evaluation run
Test cases to evaluate
Function that takes input and returns agent output
Optional context: stringOptional config: EvalConfigEvaluation configuration
The completed evaluation run
Evaluates a single test case.
The test case
The agent's actual output
Optional config: EvalConfigEvaluation configuration
Test result
Registers a custom scorer.
Scorer name
Scoring function
Compares two evaluation runs.
First run ID
Second run ID
Comparison results
Interface for the agent evaluator.
Example