
Mesmer blog
Research notes from the safety harness
Introducing Mesmer's Operator Runtime
Why Mesmer models LLM red-team experiments as techniques, typed state transitions, and replayable artifacts.
Mesmer starts from a practical frustration: prompt ideas are easy to sketch, but reproducible LLM safety experiments need more than a clever string.
An experiment needs objectives, target interaction, generation, filtering, selection, evaluation, feedback, stopping, logging, cost accounting, and replay artifacts. When those pieces live inside a single script, the result is hard to compare, hard to debug, and hard to turn into a benchmark.
Mesmer treats the attack workflow as a typed transition system. The stable kernel is small:
State + Operator + Transition + WorkflowThe user-facing layer is a technique. A technique like FrontierSearch or PopulationFuzzing assembles operators into a workflow. Operators do one state transition: propose candidates, query a target, evaluate responses, assign rewards, add feedback, or stop.
The order still matters. Selection before a target query changes which messages are sent. Feedback after evaluation changes the next proposal. A stopping operator consumes recorded evidence instead of pretending to be a judge.
The payoff is inspectability. A successful run should not end with only true or false; it should emit the target replay messages, target metadata, judgement details, and operator trace needed to understand and reproduce the result.
That is the shape Mesmer optimizes for: composable techniques, real target boundaries, typed state, operator transitions, and evidence you can inspect.