Skip to content

Scorecard

The scorecard is rendered at build time from OpenTelemetry artifacts CI uploads to Vercel Blob. Once the data pipeline lands (Slice 0.5), the trends below render live; until then, this page documents the shape of what's coming.

Metrics tracked per gate

MetricSourceInterpretation
Statuslatest CI runpass / warn / fail / shadow / quarantined
Allowlist sizegate's *.allowlist.jsonNumber of suppressed findings. Should shrink over time.
Quarantine sizegate's *.quarantine.jsonNumber of flaky entries. Should hover near zero.
Findings (28d)OTel findings.count aggregatedTotal findings produced in the last 28 days.
Fix velocity (28d)diff of findings vs. fixes per PRHow fast issues are being resolved.
Runtime p50 / p95OTel duration.msGate latency over the same window. Trend up = slow drift.
SLA breaches (28d)OTel sla.exceededHow often the gate exceeded its declared slaMs.
Flake raterun-to-run finding diff varianceInconsistent findings flagged as candidates for quarantine.
Pass rateruns where exit code = 0Should be near 100% for healthy gates.

AI-specific metrics

For gates in the AI / Evals layer (category: "eval" | "safety" | "capability" | "cost" | "latency"):

MetricSourceInterpretation
Eval pass rateper-case grader outputsCapability + safety baseline health.
Cost per runcost-meter totalsToken + USD spend.
Per-model spendcost-meter.events grouped by modelWhich model is dominating cost?
p50 / p95 latencylatency-budget distributionsPer-call latency, distinct from gate runtime.
Red-team pass ratered-team findings / corpus sizeAdversarial robustness.
Determinism ratedeterminism findings / runsHow often does the subject diverge across repeats?

What the scorecard is NOT

  • It is not a coverage report. There are no coverage thresholds (per Matter's testing posture).
  • It is not a per-test report. The unit is the gate, not the test.
  • It is not a vendor service. The data pipeline is in-tree (apps/design/scripts/generate-scorecard.ts) reading from a Vercel Blob bucket Matter owns.

Data pipeline

[CI run]
  └─ bun run gates --json > findings.json
  └─ flushTelemetry() → .matter/telemetry/<run-id>.json
  └─ actions/upload-artifact → Vercel Blob bucket

[apps/design build]
  └─ generate-scorecard.ts reads recent runs
  └─ emits apps/design/public/testing-scorecard.json
  └─ this page renders the JSON

The Grafana template at packages/testing/grafana/framework-overview.json is a parallel live-view consumer of the same OTel stream — for teams that want a real-time dashboard. The scorecard is the canonical public surface.

On this page