TestingGatesGatesjudge-calibration

judge-calibration

Field	Value
ID	`judge-calibration`
Version	`1.0.0`
Mode	`llm-judge`
Layer	Anthropic
Category	`eval`
Severity	🟡 warning
SLA	30,000 ms
Depends on	none
Source	`packages/ai/__gates__/judge-calibration.gate.ts`

What it asserts

LLM-judge agreement with human ratings stays above Cohen's κ 0.85. Below threshold auto-demotes dependent judge gates to shadow.

Run it locally

bun run gates --gate=judge-calibration

See also

llm-judge mode
Anthropic layer
Allowlists — how to bound a known finding with an expiration
Contributing — how to evolve this gate or write a new one

Generated by apps/design/scripts/generate-gate-pages.ts from the gate's source-of-truth metadata. Edit this page by editing the gate file's description / version / etc.

internal-ops-exposure

Every x-matter-internal op is absent from perMode.live and perMode.test, present in perMode.sandbox, and never named by llms-full.txt or the action catalog.

knip-deadcode

Knip dead-code report (.matter/knip.json) surfaces unused exports / types / unlisted deps as warnings.

On this page

What it asserts Run it locally See also