Agent evaluation framework implementation.

Implements

Constructors

Methods

  • Runs an evaluation suite against an agent.

    Parameters

    • name: string

      Name for this evaluation run

    • testCases: EvalTestCase[]

      Test cases to evaluate

    • agentFn: ((input, context?) => Promise<string>)

      Function that takes input and returns agent output

        • (input, context?): Promise<string>
        • Parameters

          • input: string
          • Optional context: string

          Returns Promise<string>

    • Optional config: EvalConfig

      Evaluation configuration

    Returns Promise<EvalRun>

    The completed evaluation run

  • Scores output using a specific scorer.

    Parameters

    • scorer: string

      Scorer name

    • actual: string

      Actual output

    • Optional expected: string

      Expected output

    • Optional references: string[]

      Reference outputs

    Returns Promise<number>

    Score (0-1)

  • Generates a report for a run.

    Parameters

    • runId: string

      Run ID

    • format: "json" | "markdown" | "html"

      Report format

    Returns Promise<string>

    Report content