Unique test case ID
Test case name
Optional categoryCategory or tag
Input to the agent
Optional expectedExpected output (for comparison)
Optional referenceReference outputs for similarity comparison
Optional contextContext or system prompt
Optional expectedExpected tool calls
Optional args?: Record<string, unknown>Optional criteriaEvaluation criteria
Optional metadataMetadata
A test case for evaluation.