Scorecard

The scorecard is rendered at build time from OpenTelemetry artifacts CI uploads to Vercel Blob. Once the data pipeline lands (Slice 0.5), the trends below render live; until then, this page documents the shape of what's coming.

Metrics tracked per gate

Metric	Source	Interpretation
Status	latest CI run	pass / warn / fail / shadow / quarantined
Allowlist size	gate's `*.allowlist.json`	Number of suppressed findings. Should shrink over time.
Quarantine size	gate's `*.quarantine.json`	Number of flaky entries. Should hover near zero.
Findings (28d)	OTel `findings.count` aggregated	Total findings produced in the last 28 days.
Fix velocity (28d)	diff of findings vs. fixes per PR	How fast issues are being resolved.
Runtime p50 / p95	OTel `duration.ms`	Gate latency over the same window. Trend up = slow drift.
SLA breaches (28d)	OTel `sla.exceeded`	How often the gate exceeded its declared `slaMs`.
Flake rate	run-to-run finding diff variance	Inconsistent findings flagged as candidates for quarantine.
Pass rate	runs where exit code = 0	Should be near 100% for healthy gates.

AI-specific metrics

For gates in the AI / Evals layer (category: "eval" | "safety" | "capability" | "cost" | "latency"):

Metric	Source	Interpretation
Eval pass rate	per-case grader outputs	Capability + safety baseline health.
Cost per run	`cost-meter` totals	Token + USD spend.
Per-model spend	`cost-meter.events` grouped by model	Which model is dominating cost?
p50 / p95 latency	`latency-budget` distributions	Per-call latency, distinct from gate runtime.
Red-team pass rate	`red-team` findings / corpus size	Adversarial robustness.
Determinism rate	`determinism` findings / runs	How often does the subject diverge across repeats?

What the scorecard is NOT

It is not a coverage report. There are no coverage thresholds (per Matter's testing posture).
It is not a per-test report. The unit is the gate, not the test.
It is not a vendor service. The data pipeline is in-tree (apps/design/scripts/generate-scorecard.ts) reading from a Vercel Blob bucket Matter owns.

Data pipeline

[CI run]
  └─ bun run gates --json > findings.json
  └─ flushTelemetry() → .matter/telemetry/<run-id>.json
  └─ actions/upload-artifact → Vercel Blob bucket

[apps/design build]
  └─ generate-scorecard.ts reads recent runs
  └─ emits apps/design/public/testing-scorecard.json
  └─ this page renders the JSON

The Grafana template at packages/testing/grafana/framework-overview.json is a parallel live-view consumer of the same OTel stream — for teams that want a real-time dashboard. The scorecard is the canonical public surface.

Scorecard

Metrics tracked per gate

AI-specific metrics

What the scorecard is NOT

Data pipeline

On this page