Skip to content
TestingModesModesAI / EvalsSafety

Safety

What it is

Alignment / refusal / constitutional checks. Asserts the assistant refuses things it should refuse.

When to use it

PII extraction attempts, authority-escalation attempts, signing-without-consent.

Example gates

Future slice — needs a captured corpus of refusal-expected prompts.

See also

On this page