Skip to content

The Pyramid

Matter's framework uses a trophy with an AI cap — the classic Kent C. Dodds Testing Trophy (static → unit → integration → e2e) with an explicit fifth layer for AI emissions above e2e:

              ┌──────────────────────┐
              │  AI / Evals          │  capability · safety · red-team · prompt-regression
              │                      │  hallucination · llm-judge · determinism
              ├──────────────────────┤
              │  E2E                 │  e2e · smoke · synthetic
              ├──────────────────────┤
              │  Integration         │  integration · contract · visual
              ├──────────────────────┤
              │  Unit                │  unit · property · fuzz · mutation
              ├──────────────────────┤
              │  Static              │  static · drift
              └──────────────────────┘

Investment ratio

The target is 4 : 4 : 1 : 0.5 : 0.5 — static : unit : integration : e2e : evals.

Heavy at the base (fast, deterministic, cheap), thin at the top (slow, brittle, expensive). Evals sit above e2e as a deliberate quality investment, not a volume one — quality, not quantity.

What lives where

LayerModesWhen to use
Staticstatic, driftFile-level analysis. Route coverage. Codegen-source ↔ generated output. Schema-runtime parity. No execution required.
Unitunit, property, fuzz, mutationPure functions, parsers, validators, encoders. property and fuzz catch what example-based unit tests miss. mutation proves the tests catch bugs at all.
Integrationintegration, contract, visualMultiple units composed: API + DB, MCP + spec, dispatcher + router. visual catches pixel regressions Chromatic-style.
E2Ee2e, smoke, syntheticA real user flow in a real browser; a single-page health check; a production canary.
AI / Evalseval, safety, red-team, prompt-regression, hallucination, llm-judge, determinismAI-emitted artifacts. Capability + safety + alignment baselines, adversarial corpora, captured-prompt replay, ground-truth checks, LLM-as-judge, stability assertions.

How a mode and a layer relate

Two orthogonal axes. Every gate declares both:

  • mode says which layer of the pyramid the gate lives on — what kind of test it is.
  • layer says which industry philosophy the gate serves — what question it answers (Stripe = sync, Vercel = budget, Anthropic = behavior, Security = vulnerability).

A mode: "property" gate at the Unit layer can still be layer: "Stripe", category: "contract" — for example, property-fuzzing the route matcher.

A worked example

Take Matter's first real gate, card-action-targets (when it lands in Part 2). It's:

  • mode: "static" — file-level analysis, no execution
  • layer: "Stripe" — contract sync between exemplar CardSpecs and the apps/app route manifest
  • category: "contract" — same axis as layer semantically; the category enum is the runner's filter knob

A sibling gate at the Unit layer could be card-action-targets-propertymode: "property", same layer + category — that fuzzes the matcher.

On this page