Testing Framework

The framework lives in @matter/testing and ships a runner that auto-discovers *.gate.ts files anywhere in the monorepo. Three philosophies in one taxonomy:

Stripe — contract sync between layers (OpenAPI ↔ MCP ↔ SDK ↔ docs ↔ dispatchers; route coverage; codegen drift).
Vercel — measurable regression budgets (perf, a11y, bundle-size, cost, latency).
Anthropic — AI behavior baselines (capability, safety, alignment, hallucination, determinism).

Run gates locally

bun run gates              # every discovered gate
bun run gates --list       # show all gates
bun run gates --gate=card-action-targets
bun run gates --mode=eval  # filter to AI eval gates
bun run gates --since=origin/main   # diff mode — new findings only
bun run gates --explain    # Claude-assisted fix suggestions

Anatomy of a gate

A gate is a single TypeScript file whose default export conforms to the Gate interface. Drop it anywhere named *.gate.ts and the runner auto-discovers it.

import { defineGate, emit } from "@matter/testing";

export default defineGate({
  id: "my-gate",
  version: "1.0.0",
  category: "contract",      // contract | budget | eval | security | capability | safety | cost | latency
  layer: "Stripe",           // Stripe | Vercel | Anthropic
  mode: "static",            // 19 modes — see /testing/pyramid
  severity: "blocking",      // blocking | warning | info | shadow
  description: "One sentence",
  slaMs: 5_000,
  async run(ctx) {
    return [
      emit(this, {
        reason: "example_finding",
        message: "Something is wrong",
        target: "/path/to/thing",
        source: { file: "src/example.ts", line: 42 },
        fixHint: "Try X or Y",
      }),
    ];
  },
});

Three audiences

Developers

Writing a gate: /testing/contributing
Per-mode templates: ship under packages/testing/templates/. Scaffold with bun matter-gates new --mode=<mode>.
Local loop: bun run gates --watch reruns on source change.

Operators

CI: .github/workflows/framework.yml runs every gate as a matrix job; results upload as artifacts; PR comment summarises only NEW findings vs. base.
Telemetry: every run emits OpenTelemetry spans (gate.run with gate.id, gate.version, gate.mode, findings.count, duration.ms, sla.exceeded). Pluggable OTLP sink; local fallback at .matter/telemetry/. Grafana dashboard template at packages/testing/grafana/.
Scorecard: the public scorecard at /testing/scorecard renders allowlist size, fix velocity, runtime trends, flake rates from the telemetry stream.

Agents

Structured catalog: apps/design/public/testing-framework.json — every gate's id, version, mode, layer, severity, slaMs, description.
Flat text: apps/design/public/llms-testing.txt — the same content as a single text bundle for context windows.
JSON Schema: apps/design/public/testing-framework.schema.json describes the catalog shape.

The five AI failure modes the framework explicitly catches

Matter ships AI-emitted artifacts (CardSpecs, MCP tool outputs, AI-authored documents) into a high-trust legal context. The framework's eval layer targets the five failure modes that the deterministic-code testing pyramid was never built for. See /testing/ai-failure-modes for the full breakdown.

Matter Testing Framework