Aggregate metrics across a run.
Total test cases
Passed tests
Failed tests
Pass rate (0-1)
Average score (0-1)
Score standard deviation
Average latency ms
P50 latency
P95 latency
P99 latency
Total tokens used
Total estimated cost
Optional
Metrics by category
Aggregate metrics across a run.