Eval
What it is
Capability + behavior baselines on AI emissions. Captured cases run through a subject, output graded against expectations.
When to use it
'Does the agent still emit a valid CardSpec for this input?'
Example gates
agent-output-schema — validates every exemplar against CardZ.
See also
- The pyramid — where this mode sits relative to the others
- Writing a gate — how to scaffold a new gate in this mode
- Decision tree — "I just wrote feature X — what tests do I owe?"