Scorecard
The scorecard is rendered at build time from OpenTelemetry artifacts CI uploads to Vercel Blob. Once the data pipeline lands (Slice 0.5), the trends below render live; until then, this page documents the shape of what's coming.
Metrics tracked per gate
| Metric | Source | Interpretation |
|---|---|---|
| Status | latest CI run | pass / warn / fail / shadow / quarantined |
| Allowlist size | gate's *.allowlist.json | Number of suppressed findings. Should shrink over time. |
| Quarantine size | gate's *.quarantine.json | Number of flaky entries. Should hover near zero. |
| Findings (28d) | OTel findings.count aggregated | Total findings produced in the last 28 days. |
| Fix velocity (28d) | diff of findings vs. fixes per PR | How fast issues are being resolved. |
| Runtime p50 / p95 | OTel duration.ms | Gate latency over the same window. Trend up = slow drift. |
| SLA breaches (28d) | OTel sla.exceeded | How often the gate exceeded its declared slaMs. |
| Flake rate | run-to-run finding diff variance | Inconsistent findings flagged as candidates for quarantine. |
| Pass rate | runs where exit code = 0 | Should be near 100% for healthy gates. |
AI-specific metrics
For gates in the AI / Evals layer (category: "eval" | "safety" | "capability" | "cost" | "latency"):
| Metric | Source | Interpretation |
|---|---|---|
| Eval pass rate | per-case grader outputs | Capability + safety baseline health. |
| Cost per run | cost-meter totals | Token + USD spend. |
| Per-model spend | cost-meter.events grouped by model | Which model is dominating cost? |
| p50 / p95 latency | latency-budget distributions | Per-call latency, distinct from gate runtime. |
| Red-team pass rate | red-team findings / corpus size | Adversarial robustness. |
| Determinism rate | determinism findings / runs | How often does the subject diverge across repeats? |
What the scorecard is NOT
- It is not a coverage report. There are no coverage thresholds (per Matter's testing posture).
- It is not a per-test report. The unit is the gate, not the test.
- It is not a vendor service. The data pipeline is in-tree (
apps/design/scripts/generate-scorecard.ts) reading from a Vercel Blob bucket Matter owns.
Data pipeline
[CI run]
└─ bun run gates --json > findings.json
└─ flushTelemetry() → .matter/telemetry/<run-id>.json
└─ actions/upload-artifact → Vercel Blob bucket
[apps/design build]
└─ generate-scorecard.ts reads recent runs
└─ emits apps/design/public/testing-scorecard.json
└─ this page renders the JSONThe Grafana template at packages/testing/grafana/framework-overview.json is a parallel live-view consumer of the same OTel stream — for teams that want a real-time dashboard. The scorecard is the canonical public surface.