Matter Testing Framework
The framework lives in @matter/testing and ships a runner that auto-discovers *.gate.ts files anywhere in the monorepo. Three philosophies in one taxonomy:
- Stripe — contract sync between layers (OpenAPI ↔ MCP ↔ SDK ↔ docs ↔ dispatchers; route coverage; codegen drift).
- Vercel — measurable regression budgets (perf, a11y, bundle-size, cost, latency).
- Anthropic — AI behavior baselines (capability, safety, alignment, hallucination, determinism).
Run gates locally
bun run gates # every discovered gate
bun run gates --list # show all gates
bun run gates --gate=card-action-targets
bun run gates --mode=eval # filter to AI eval gates
bun run gates --since=origin/main # diff mode — new findings only
bun run gates --explain # Claude-assisted fix suggestionsAnatomy of a gate
A gate is a single TypeScript file whose default export conforms to the Gate interface. Drop it anywhere named *.gate.ts and the runner auto-discovers it.
import { defineGate, emit } from "@matter/testing";
export default defineGate({
id: "my-gate",
version: "1.0.0",
category: "contract", // contract | budget | eval | security | capability | safety | cost | latency
layer: "Stripe", // Stripe | Vercel | Anthropic
mode: "static", // 19 modes — see /testing/pyramid
severity: "blocking", // blocking | warning | info | shadow
description: "One sentence",
slaMs: 5_000,
async run(ctx) {
return [
emit(this, {
reason: "example_finding",
message: "Something is wrong",
target: "/path/to/thing",
source: { file: "src/example.ts", line: 42 },
fixHint: "Try X or Y",
}),
];
},
});Three audiences
Developers
- Writing a gate:
/testing/contributing - Per-mode templates: ship under
packages/testing/templates/. Scaffold withbun matter-gates new --mode=<mode>. - Local loop:
bun run gates --watchreruns on source change.
Operators
- CI:
.github/workflows/framework.ymlruns every gate as a matrix job; results upload as artifacts; PR comment summarises only NEW findings vs. base. - Telemetry: every run emits OpenTelemetry spans (
gate.runwithgate.id,gate.version,gate.mode,findings.count,duration.ms,sla.exceeded). Pluggable OTLP sink; local fallback at.matter/telemetry/. Grafana dashboard template atpackages/testing/grafana/. - Scorecard: the public scorecard at
/testing/scorecardrenders allowlist size, fix velocity, runtime trends, flake rates from the telemetry stream.
Agents
- Structured catalog:
apps/design/public/testing-framework.json— every gate's id, version, mode, layer, severity, slaMs, description. - Flat text:
apps/design/public/llms-testing.txt— the same content as a single text bundle for context windows. - JSON Schema:
apps/design/public/testing-framework.schema.jsondescribes the catalog shape.
The five AI failure modes the framework explicitly catches
Matter ships AI-emitted artifacts (CardSpecs, MCP tool outputs, AI-authored documents) into a high-trust legal context. The framework's eval layer targets the five failure modes that the deterministic-code testing pyramid was never built for. See /testing/ai-failure-modes for the full breakdown.