Skip to content

Activation Plan

19 of the 31 framework gates ship in severity: "shadow" today — interface-complete, but waiting on a prerequisite (Playwright wiring, ANTHROPIC_API_KEY, a captured baseline, a corpus, a running service, etc.). This page is the contract for what unlocks each one.

Order: cheapest activation at the top.

Tier A — single config change

GateWhat to do
voice-judgeSet ANTHROPIC_API_KEY in CI secrets. Promote severity to blocking after observing 30 green runs.
assistant-determinismSame — ANTHROPIC_API_KEY in CI.
webhook-delivery-auditSet SVIX_TOKEN. Implementation TODO: pull recent attempts via Svix's API.
production-smokeAlready runs in any environment with internet. Promote to blocking when the gate has run cleanly against api.mattermode.com from the framework workflow for 14 days.
visual-regressionSet CHROMATIC_PROJECT_TOKEN. Wire npx chromatic --exit-zero-on-changes into the gate's run() and read the status.

Tier B — capture a baseline / corpus

GateWhat to do
prompt-regressionWrite bun run --filter @repo/ai capture-baseline script. Run it once to seed packages/ai/__regression__/baseline.json and commit.
assistant-red-teamShip a versioned corpus at apps/app/__red-team__/corpus.json matching RedTeamCorpus schema. ≥30 entries across prompt_injection, jailbreak, data_extraction, role_hijack, tool_misuse, ssrf, instruction_leak, pii_extraction, authority_escalation.
document-hallucination-checkCollect AI-authored documents into apps/app/test-data/ai-docs/. Implement claim extractor + per-kind oracle (jurisdictions, OpenAPI, MCP).
api-contractWrite per-consumer contracts to apps/api/__contracts__/{node-sdk,python-sdk,mcp-server}.json.

Tier C — wire a runtime instrument

GateWhat to do
cost-budgetWire CostMeter from @matter/testing into every AI call site (assistant, mock-founder, brand-voice-learner, voice-judge). Emit JSON to .matter/cost-meter/<feature>-<run-id>.json.
latency-budgetWire latency recording into the same call sites. Emit { feature, durations_ms } to .matter/latency/<feature>.json.

Tier D — spin up a service in CI

GateWhat to do
mcp-tool-roundtripAdd bun run --filter matter-mcp-server dev & + readiness poll to the framework CI workflow. Implement per-tool happy-path dispatch + response-schema validation.
e2e-entity-lifecycleAdd @playwright/test to apps/app devDeps. Wire dev-server-start + readiness-poll in CI. Implement the 7-step lifecycle script (assistant → formation → grant → franchise tax → dissolution → webhook sequence assertion).
accessibility-budgetAdd @playwright/test + axe-core to apps/web devDeps. Spin up apps/web dev server in CI. Walk each route, run axe, assert violations = 0.

Tier E — heavier build pipeline

GateWhat to do
perf-budgetAdd bun run --filter web build as a CI prerequisite step. Optional: switch from build-manifest reading to a bundle-analyzer JSON for richer per-chunk data. Optional: layer Lighthouse-CI on top for LCP/TTFB/CLS.

Tier F — eval corpus that requires AI runs

GateWhat to do
mock-founder-capabilityAlready activates the moment mock-founder runs land in apps/app/test-data/mock-founder-runs/. Promote to blocking once the run corpus is stable across 2 consecutive model upgrades.
mock-founder-safetySame. Surface findings are real today (65 destructive-verb-without-confirmation findings on first run) — these are eval signals worth triaging before promotion.

Promotion process

  1. Activate per the table above.
  2. Watch the scorecard for N consecutive green runs (default: 14 days or 30 CI runs).
  3. Edit the gate's severity field: shadowblocking.
  4. Open a PR. The PR comment will show "no new findings vs. main" if the prerequisite step landed correctly.
  5. Merge. New regressions in the gated surface now fail CI.

Out-of-scope follow-ups

These are listed for transparency but aren't yet scaffolded as gates:

  • Bundle-size drift across PRs (Vercel-style) — needs CI-side delta computation. Could wrap perf-budget with a diff mode.
  • OWASP ZAP against preview deployments — needs preview-env hooks Matter doesn't yet expose.
  • Mutation testing via Stryker — heavier than the manual mutation gate; would need @stryker-mutator/core integration.
  • AI cost ceilings per PR — needs PR-scoped cost-meter aggregation, not just per-feature totals.
  • Property tests for every Zod schema — would generate Arbitrary<T> from each schema, fuzz consumers. Substantial scope.

On this page