Writing a Gate
The shape
import { defineGate, emit } from "@matter/testing";
export default defineGate({
id: "my-gate", // kebab-case, globally unique
version: "1.0.0", // semver; bump when logic changes
category: "contract", // contract | budget | eval | security | capability | safety | cost | latency
layer: "Stripe", // Stripe | Vercel | Anthropic
mode: "static", // see /testing/pyramid for all 19 modes
severity: "blocking", // blocking | warning | info | shadow
description: "One sentence",
slaMs: 5_000, // optional wall-clock ceiling
dependsOn: [], // optional gate ids that must run first
async run(ctx) {
return [
emit(this, {
reason: "machine_readable_reason",
message: "Human-readable one-line message",
target: "/path/or/value",
source: { file: "src/example.ts", line: 42 },
fixHint: "How to fix this — ≤140 chars",
}),
];
},
});Drop it anywhere named *.gate.ts. The runner auto-discovers it on the next bun run gates.
Templates
Copy a starter from packages/testing/templates/:
static.template.ts— file-level analysis (route coverage, lint-shaped invariants)property.template.ts—fast-checkinvariants over generated inputseval.template.ts— AI capability + behavior casesred-team.template.ts— adversarial corpora against AI subjectsprompt-regression.template.ts— captured-prompt replay
Once bun matter-gates new --mode=<mode> ships, it scaffolds from these into the right surface directory.
The shadow-mode promotion path
New gates often produce false positives the first few runs while you refine the predicates. Launching as severity: "shadow" lets the gate run in CI without failing builds:
- Ship the gate as
severity: "shadow". It runs every PR, telemetry flows, the scorecard tracks its findings — but the build never fails. - Watch the scorecard for 14 days or 30 runs (whichever comes first). Iterate on false positives.
- Promote:
bun run gates --promote=<id>. This edits the gate file'sseveritytoblockingin a PR you can review.
The framework doesn't require this path — production-shaped gates can launch blocking immediately. Use shadow when you're not yet sure your predicate is right.
Choosing a mode
Use the decision matrix in the README or:
- Static / drift — you're checking file contents or generator output parity. No execution.
- Unit / property / fuzz / mutation — you're checking a pure function.
propertyif the assertion is "for all inputs in D";fuzzif you want random/adversarial;mutationif you're verifying that your existing tests catch bugs. - Integration / contract / visual — you're composing multiple units.
contractfor consumer-provider expectations (Pact-shaped);visualfor pixel regressions. - E2E / smoke / synthetic — real browser or production canary.
- Eval / safety / red-team / prompt-regression / hallucination / llm-judge / determinism — AI emissions. Pick the mode that matches the question you're answering.
Choosing a severity
blocking— the right default. Findings should fail the build. Use this for any well-understood gate.warning— exit 0 but report prominently. Use sparingly: "this is bad but we're not blocking on it yet." Most warnings should graduate to blocking or be removed.info— exit 0, quiet report. Useful for advisory checks (e.g. "this file is over 500 LOC").shadow— for new gates pre-promotion. See above.
Allowlists
If a gate produces a finding that you can't fix immediately (e.g. third-party vulnerability, intentional design exception with a tracking issue), add an allowlist entry:
// packages/<your-package>/<your-gate>.allowlist.json
{
"entries": [
{
"key": "<finding-fingerprint>", // the finding's fingerprint (shown in the runner output)
"reason": "Vendor patch not yet released — switching libs in Q3.",
"tracking_issue": "MAT-1234",
"expires_at": "2026-09-01", // ISO date
"added_by": "alice",
"added_at": "2026-05-15"
}
]
}Past-expiry entries become hard failures. Allowlists shrink over time; they never grow indefinitely.
Quarantines
For flaky gates or flaky findings, use quarantine instead — same shape, but the entry includes a flake_rate field and is for temporal suppression while you stabilise the gate. See /testing/quarantines.
Glossary
Definitions of every framework term — Finding, fingerprint, gate, mode, layer, category, severity, allowlist, quarantine, budget, eval, judge, red-team, capability, hallucination, determinism — so contributors and agents share vocabulary.
Allowlists
How the bounded-allowlist machinery works. Every entry has a tracking issue and a hard expiration date — allowlists shrink monotonically.