Interface: CreationVerdict
Defined in: packages/agentos/src/emergent/types.ts:244
Evaluation verdict produced by the LLM-as-judge after a tool is forged.
The judge runs the tool against its declared test cases and scores it across
five evaluation dimensions. A tool is only registered when approved is true.
Properties
approved
approved:
boolean
Defined in: packages/agentos/src/emergent/types.ts:249
Whether the judge approves the tool for registration at its initial tier.
false means the forge request is rejected and no tool is registered.
bounded
bounded:
number
Defined in: packages/agentos/src/emergent/types.ts:283
Bounded execution score in the range [0, 1]. Indicates whether the tool reliably completes within its declared resource limits (memory, time). Scores derived from sandbox telemetry.
confidence
confidence:
number
Defined in: packages/agentos/src/emergent/types.ts:255
Overall confidence the judge has in its verdict, in the range [0, 1]. Low confidence may trigger a second judge pass or human review.
correctness
correctness:
number
Defined in: packages/agentos/src/emergent/types.ts:269
Correctness score in the range [0, 1]. Measures how well the tool's outputs match the expected outputs in the declared test cases.
determinism
determinism:
number
Defined in: packages/agentos/src/emergent/types.ts:276
Determinism score in the range [0, 1]. Gauges whether repeated invocations with identical inputs produce consistent outputs. Lower scores flag non-deterministic behaviour.
reasoning
reasoning:
string
Defined in: packages/agentos/src/emergent/types.ts:289
Free-text explanation of the verdict, including any failure reasons, flagged patterns, or suggestions for improvement.
safety
safety:
number
Defined in: packages/agentos/src/emergent/types.ts:262
Safety score in the range [0, 1]. Assesses whether the tool's implementation could cause unintended harm, data exfiltration, or resource exhaustion.