Activation Plan
19 of the 31 framework gates ship in severity: "shadow" today — interface-complete, but waiting on a prerequisite (Playwright wiring, ANTHROPIC_API_KEY, a captured baseline, a corpus, a running service, etc.). This page is the contract for what unlocks each one.
Order: cheapest activation at the top.
| Gate | What to do |
|---|
voice-judge | Set ANTHROPIC_API_KEY in CI secrets. Promote severity to blocking after observing 30 green runs. |
assistant-determinism | Same — ANTHROPIC_API_KEY in CI. |
webhook-delivery-audit | Set SVIX_TOKEN. Implementation TODO: pull recent attempts via Svix's API. |
production-smoke | Already runs in any environment with internet. Promote to blocking when the gate has run cleanly against api.mattermode.com from the framework workflow for 14 days. |
visual-regression | Set CHROMATIC_PROJECT_TOKEN. Wire npx chromatic --exit-zero-on-changes into the gate's run() and read the status. |
| Gate | What to do |
|---|
prompt-regression | Write bun run --filter @repo/ai capture-baseline script. Run it once to seed packages/ai/__regression__/baseline.json and commit. |
assistant-red-team | Ship a versioned corpus at apps/app/__red-team__/corpus.json matching RedTeamCorpus schema. ≥30 entries across prompt_injection, jailbreak, data_extraction, role_hijack, tool_misuse, ssrf, instruction_leak, pii_extraction, authority_escalation. |
document-hallucination-check | Collect AI-authored documents into apps/app/test-data/ai-docs/. Implement claim extractor + per-kind oracle (jurisdictions, OpenAPI, MCP). |
api-contract | Write per-consumer contracts to apps/api/__contracts__/{node-sdk,python-sdk,mcp-server}.json. |
| Gate | What to do |
|---|
cost-budget | Wire CostMeter from @matter/testing into every AI call site (assistant, mock-founder, brand-voice-learner, voice-judge). Emit JSON to .matter/cost-meter/<feature>-<run-id>.json. |
latency-budget | Wire latency recording into the same call sites. Emit { feature, durations_ms } to .matter/latency/<feature>.json. |
| Gate | What to do |
|---|
mcp-tool-roundtrip | Add bun run --filter matter-mcp-server dev & + readiness poll to the framework CI workflow. Implement per-tool happy-path dispatch + response-schema validation. |
e2e-entity-lifecycle | Add @playwright/test to apps/app devDeps. Wire dev-server-start + readiness-poll in CI. Implement the 7-step lifecycle script (assistant → formation → grant → franchise tax → dissolution → webhook sequence assertion). |
accessibility-budget | Add @playwright/test + axe-core to apps/web devDeps. Spin up apps/web dev server in CI. Walk each route, run axe, assert violations = 0. |
| Gate | What to do |
|---|
perf-budget | Add bun run --filter web build as a CI prerequisite step. Optional: switch from build-manifest reading to a bundle-analyzer JSON for richer per-chunk data. Optional: layer Lighthouse-CI on top for LCP/TTFB/CLS. |
| Gate | What to do |
|---|
mock-founder-capability | Already activates the moment mock-founder runs land in apps/app/test-data/mock-founder-runs/. Promote to blocking once the run corpus is stable across 2 consecutive model upgrades. |
mock-founder-safety | Same. Surface findings are real today (65 destructive-verb-without-confirmation findings on first run) — these are eval signals worth triaging before promotion. |
- Activate per the table above.
- Watch the scorecard for N consecutive green runs (default: 14 days or 30 CI runs).
- Edit the gate's
severity field: shadow → blocking.
- Open a PR. The PR comment will show "no new findings vs. main" if the prerequisite step landed correctly.
- Merge. New regressions in the gated surface now fail CI.
These are listed for transparency but aren't yet scaffolded as gates:
- Bundle-size drift across PRs (Vercel-style) — needs CI-side delta computation. Could wrap
perf-budget with a diff mode.
- OWASP ZAP against preview deployments — needs preview-env hooks Matter doesn't yet expose.
- Mutation testing via Stryker — heavier than the manual mutation gate; would need
@stryker-mutator/core integration.
- AI cost ceilings per PR — needs PR-scoped cost-meter aggregation, not just per-feature totals.
- Property tests for every Zod schema — would generate
Arbitrary<T> from each schema, fuzz consumers. Substantial scope.