Mesmer Documentation
Learn Mesmer's composable Python framework for authorized LLM red-team experiments, jailbreak research, and safety benchmarking.
Mesmer is a small Python framework for turning jailbreak and red-team ideas into reproducible experiments.
It gives you reusable primitives for objectives, techniques, operators, target interaction, evaluation, constraints, feedback, stopping, telemetry, replay artifacts, and benchmark reports. The goal is to move from "I have a prompt idea" to "I can compare techniques across targets" without rebuilding a harness every time.
Use Mesmer for authorized LLM safety testing, defensive evaluation, paper workflow reproduction, prototype attack loops, and measurements that preserve enough state to inspect what actually happened.
Why Mesmer Exists
- Build attacks as named
Techniquerecipes instead of one-off scripts. - Extend behavior with typed
Operatortransitions and strategy objects. - Run real targets through LiteLLM, HTTP JSON, SSE, WebSocket, or Python callables.
- Keep experiments inspectable with JSONL logs, state transitions, token usage, costs, and replay artifacts.
- Compare probe, best-of-N, frontier-search, conversation-agent, population-fuzzing, and paper-style techniques with shared benchmark metrics.
- Stay Python-first: normal objects and functions come before registries and saved specs.
Core Shape
Runtime kernel -> State + Operator + Transition + Workflow
Attack recipe -> Technique
Objectives + target -> Run
Many runs -> Benchmark
Runner -> logs, transitions, replay artifacts, metrics, reportsThat split lets you reuse the same technique against different objective sets, target adapters, evaluators, and budgets.
The important detail is that Mesmer keeps mechanics visible. A transform runs through ops.ApplyTransforms; a constraint check writes state.Constraints; a target call goes through ops.QueryTarget; and benchmark reports can keep row-level evidence instead of only a final score.