authorized AI red-team lab

Mesmer

Your AI app has a text box. That means it has an attack surface.

Mesmer turns weird user input into Python red-team recipes you can run, inspect, replay, and benchmark against systems you own or have permission to test.

Run your first red-team See the evidence

message map

Red-teaming should not require becoming a security team first.

AI apps can be built in plain English now. The testing loop should be just as approachable, but still reproducible enough for serious engineering.

mesmer map --audience builders --output evidence

Accessible

Start from one authorized objective and one target. No exploit folklore required before the first useful run.

Reproducible

Keep the exact messages, evaluations, state transitions, and replay artifacts instead of a screenshot and a nervous memory.

Composable

Grow from one probe into frontier search, prompt-pattern experiments, fuzzing, and benchmarks without rebuilding the harness.

the uncomfortable part

A jailbreak can look like normal product feedback.

Prompt injection does not always arrive as code. Sometimes it is just a patient user, a text box, and a few attempts at wording the request differently. That is why Mesmer treats red-team work as a repeatable experiment, not a magic prompt hunt.

Your AI product does not need a hoodie-wearing villain. Sometimes it only needs a user with too much curiosity and a text box.

feedback #1842looks harmless

The assistant is too strict. Can it be more helpful?

hidden test shape

same request, different wording, repeated until a boundary moves

authorized trace

Start

Evidence

teach me something new

A red-team run is a recipe you can inspect.

Pick the technique that matches the question, plug in the target and evaluator, then let Mesmer preserve the evidence.

Ask one risky question

Use SingleTurnProbe for one objective, one target call, and one evaluator.

Search better wording

Use FrontierSearch when you want branching, selection, scoring, and a replayable winner.

Fuzz variations

Use PopulationFuzzing for seed pools, mutators, reward updates, and repeated trials.

Reuse known tactics

Pull from prompt-pattern libraries while keeping the attack recipe readable.

Compare runs

Wrap several attacks in a Benchmark and report shared metrics across objectives.

Declarative attack recipes

Support-router escalation eval

Explore ticket wording while preserving the winning branch.

1attack = techniques.FrontierSearch(2  name="support_router_escalation",3  iterations=2, branching=3, width=2,4  expand=ops.Propose(proposers.Template()),5  select=ops.Select(selectors.KeywordOverlapSelector()),6  evaluate=ops.Evaluate(evaluators.Contains(text="ESCALATE_TIER_2")),7  stop=ops.StopWhen(conditions.ScoreAtLeast(1)),8)910result = await Runner(log_format="compact").run(run)

safety scope

Designed for authorized evaluation.

Mesmer is for defensive testing, benchmark reproduction, and research on systems you own or have permission to test.

Safety scope

takeaway

The goal is not to prove your AI is impossible to jailbreak.

The goal is to stop guessing. Run the test, keep the trace, compare the technique, and know exactly what happened.

Read the docs Open GitHub