Skip to main content

Class: Evaluator

Defined in: packages/agentos/src/core/evaluation/Evaluator.ts:206

Agent evaluation framework implementation.

Implements

Constructors

Constructor

new Evaluator(): Evaluator

Defined in: packages/agentos/src/core/evaluation/Evaluator.ts:210

Returns

Evaluator

Methods

compareRuns()

compareRuns(runId1, runId2): Promise<EvalComparison>

Defined in: packages/agentos/src/core/evaluation/Evaluator.ts:392

Compares two evaluation runs.

Parameters

runId1

string

First run ID

runId2

string

Second run ID

Returns

Promise<EvalComparison>

Comparison results

Implementation of

IEvaluator.compareRuns


evaluateTestCase()

evaluateTestCase(testCase, actualOutput, config?): Promise<EvalTestResult>

Defined in: packages/agentos/src/core/evaluation/Evaluator.ts:306

Evaluates a single test case.

Parameters

testCase

EvalTestCase

The test case

actualOutput

string

The agent's actual output

config?

EvalConfig

Evaluation configuration

Returns

Promise<EvalTestResult>

Test result

Implementation of

IEvaluator.evaluateTestCase


generateReport()

generateReport(runId, format): Promise<string>

Defined in: packages/agentos/src/core/evaluation/Evaluator.ts:433

Generates a report for a run.

Parameters

runId

string

Run ID

format

Report format

"json" | "markdown" | "html"

Returns

Promise<string>

Report content

Implementation of

IEvaluator.generateReport


getRun()

getRun(runId): Promise<EvalRun | undefined>

Defined in: packages/agentos/src/core/evaluation/Evaluator.ts:382

Gets an evaluation run by ID.

Parameters

runId

string

Run ID

Returns

Promise<EvalRun | undefined>

The evaluation run or undefined

Implementation of

IEvaluator.getRun


listRuns()

listRuns(limit?): Promise<EvalRun[]>

Defined in: packages/agentos/src/core/evaluation/Evaluator.ts:386

Lists recent evaluation runs.

Parameters

limit?

number = 50

Maximum runs to return

Returns

Promise<EvalRun[]>

Array of runs

Implementation of

IEvaluator.listRuns


registerScorer()

registerScorer(name, fn): void

Defined in: packages/agentos/src/core/evaluation/Evaluator.ts:378

Registers a custom scorer.

Parameters

name

string

Scorer name

fn

ScorerFunction

Scoring function

Returns

void

Implementation of

IEvaluator.registerScorer


runEvaluation()

runEvaluation(name, testCases, agentFn, config?): Promise<EvalRun>

Defined in: packages/agentos/src/core/evaluation/Evaluator.ts:220

Runs an evaluation suite against an agent.

Parameters

name

string

Name for this evaluation run

testCases

EvalTestCase[]

Test cases to evaluate

agentFn

(input, context?) => Promise<string>

Function that takes input and returns agent output

config?

EvalConfig

Evaluation configuration

Returns

Promise<EvalRun>

The completed evaluation run

Implementation of

IEvaluator.runEvaluation


score()

score(scorer, actual, expected?, references?): Promise<number>

Defined in: packages/agentos/src/core/evaluation/Evaluator.ts:365

Scores output using a specific scorer.

Parameters

scorer

string

Scorer name

actual

string

Actual output

expected?

string

Expected output

references?

string[]

Reference outputs

Returns

Promise<number>

Score (0-1)

Implementation of

IEvaluator.score