The Pyramid

Matter's framework uses a trophy with an AI cap — the classic Kent C. Dodds Testing Trophy (static → unit → integration → e2e) with an explicit fifth layer for AI emissions above e2e:

              ┌──────────────────────┐
              │  AI / Evals          │  capability · safety · red-team · prompt-regression
              │                      │  hallucination · llm-judge · determinism
              ├──────────────────────┤
              │  E2E                 │  e2e · smoke · synthetic
              ├──────────────────────┤
              │  Integration         │  integration · contract · visual
              ├──────────────────────┤
              │  Unit                │  unit · property · fuzz · mutation
              ├──────────────────────┤
              │  Static              │  static · drift
              └──────────────────────┘

Investment ratio

The target is 4 : 4 : 1 : 0.5 : 0.5 — static : unit : integration : e2e : evals.

Heavy at the base (fast, deterministic, cheap), thin at the top (slow, brittle, expensive). Evals sit above e2e as a deliberate quality investment, not a volume one — quality, not quantity.

What lives where

Layer	Modes	When to use
Static	`static`, `drift`	File-level analysis. Route coverage. Codegen-source ↔ generated output. Schema-runtime parity. No execution required.
Unit	`unit`, `property`, `fuzz`, `mutation`	Pure functions, parsers, validators, encoders. `property` and `fuzz` catch what example-based unit tests miss. `mutation` proves the tests catch bugs at all.
Integration	`integration`, `contract`, `visual`	Multiple units composed: API + DB, MCP + spec, dispatcher + router. `visual` catches pixel regressions Chromatic-style.
E2E	`e2e`, `smoke`, `synthetic`	A real user flow in a real browser; a single-page health check; a production canary.
AI / Evals	`eval`, `safety`, `red-team`, `prompt-regression`, `hallucination`, `llm-judge`, `determinism`	AI-emitted artifacts. Capability + safety + alignment baselines, adversarial corpora, captured-prompt replay, ground-truth checks, LLM-as-judge, stability assertions.

How a `mode` and a `layer` relate

Two orthogonal axes. Every gate declares both:

mode says which layer of the pyramid the gate lives on — what kind of test it is.
layer says which industry philosophy the gate serves — what question it answers (Stripe = sync, Vercel = budget, Anthropic = behavior, Security = vulnerability).

A mode: "property" gate at the Unit layer can still be layer: "Stripe", category: "contract" — for example, property-fuzzing the route matcher.

A worked example

Take Matter's first real gate, card-action-targets (when it lands in Part 2). It's:

mode: "static" — file-level analysis, no execution
layer: "Stripe" — contract sync between exemplar CardSpecs and the apps/app route manifest
category: "contract" — same axis as layer semantically; the category enum is the runner's filter knob

A sibling gate at the Unit layer could be card-action-targets-property — mode: "property", same layer + category — that fuzzes the matcher.

The Pyramid

Investment ratio

What lives where

How a mode and a layer relate

A worked example

On this page

How a `mode` and a `layer` relate