Introducing Petri 3.0 – Meridian Labs

Today we’re releasing Petri 3.0. It’s the biggest update to the alignment auditing agent yet, and the first developed here at Meridian — Petri’s new home.

In the seven months since its release, Petri has become the basis for a range of alignment research. The UK AI Security Institute built its alignment evaluation pipeline on Petri to test frontier models for research-sabotage propensity, and used a prototype of 3.0 in its pre-deployment evaluations of Claude Mythos and Opus 4.7. Researchers from Constellation and the Anthropic Fellows Program used it for an independent safety evaluation of Kimi K2.5. Others have used it to study whistleblowing under controlled ablations, to measure honesty, corrigibility, and scheming, and to systematically audit how well frontier models follow their constitutions.

We took on Petri’s development because this is exactly the kind of work we exist to support. Petri now sits alongside Inspect AI, Inspect Scout, and Inspect Flow in our open-source AI research and evaluation stack. Our goal as stewards is simple: keep Petri useful for the projects already building on it, and make it more hackable for the ones that haven’t started yet. Petri 3.0 is shaped around that goal. Anthropic will continue to support Petri and use it in its own alignment assessments.

The headline change in 3.0 is architectural. In earlier versions, the auditor and target were tightly coupled. The auditor manipulated the target’s message history directly — constructing system prompts, simulating tool outputs, and managing conversation state. That was easy to implement, but it made customization painful: researchers wanting to modify either side had to untangle interleaved code that wasn’t designed to come apart. Petri 3.0 splits the auditor and target into independent components that communicate through a well-defined interface.

With the target as a separate component, you can build custom targets without touching the auditor. Dish, now in research preview, runs audits inside Claude Code, Codex, Gemini CLI, and other real deployment scaffolds. With the auditor as a separate component, you can extend it without touching the target loop. UK AISI used an early prototype of Petri 3.0 to give the auditor access to real codebases in its Mythos and Opus 4.7 evaluations. Bloom, which generates targeted evaluation suites around a single behavior, has moved to Meridian alongside Petri; it now uses a custom Petri auditor as its backbone.

Architecture and Customization

In earlier versions of Petri, the auditor managed the target’s conversation state directly, which made it hard to modify either one in isolation. Petri 3.0 splits the auditor and target into independent components with a well-defined interface between them.

This makes the system far more hackable. Want to point the auditor at Claude Code instead of a bare API? Increase its test-time compute? Build a prompted model organism as the target? Execute some target tools in real environments? Previously this meant fighting with interleaved auditor and target code. Now you just modify the piece you care about.

Auditor and target agents trading control

Concretely, both sides are Inspect agents running asynchronously in their own coroutines. The auditor drives the target through a small command protocol (set a system prompt, stage a tool, return a tool result, roll back to an earlier turn, etc.) and the target is anything that can respond to those commands. It’s often very powerful to let the auditor try an approach, observe the result, and try another from the same starting point. To support rollback for arbitrary targets, Petri records every event in the target’s trajectory and, on rollback, replays them up to the chosen branch point — putting the target back into an identical state as before.

The Inspect transcript viewer now natively supports Petri transcripts. A Petri run produces a lot of transcripts, and the viewer lets you easily sort and filter them by judge dimension to find the ones worth reviewing manually. Within a selected transcript, you can view and navigate between the different branches of the target trajectory.

Dish: Auditing Real Agent Scaffolds

A persistent challenge for Petri, and for alignment evaluations more broadly, has been realism. As models become more capable, they are increasingly able to detect when they are being evaluated and may behave differently than they would in real deployment. If a model can tell it’s being tested, the evaluation may not reflect its actual behavior. In Petri 1.0, many transcripts contained clues that could signal to a model that it was being tested. Petri 2.0’s realism classifier and seed instruction revisions made meaningful progress, reducing verbalized eval-awareness substantially. But the simulated tools, system prompts, and scaffolds in Petri’s environments still had tells that models could pick up on.

Recent work by Kissane et al. studied this problem directly in the context of agentic coding audits. Using realism win rate (Souly et al.), how often an LLM judge shown a real and a simulated transcript fails to identify which is which, they found that giving the auditor access to real deployment resources (example real world system prompts, tool definitions, and codebases) substantially improved realism.

Dish tackles this by improving scaffold realism: the system prompts, tool definitions, and system reminders that real scaffolds use. Rather than having the auditor invent synthetic versions of these, Dish runs the target in real agent scaffolds directly. When auditing Claude Code, for example, the target sees Claude Code’s real system prompt and tool definitions rather than approximations the auditor made up. The auditor still simulates tool responses, and there are still sometimes small inconsistencies in those, but the overall environment is much closer to what the model would see in deployment.

Many realism challenges remain. Scenario plausibility, conversation structure, and task-driven cues are still open problems—a model given unusual authority or access to sensitive information may still recognize the situation as contrived, regardless of how realistic the tools look. But Dish removes a significant class of environment-driven tells that previously made audits easy to distinguish from real deployment.

The UK AI Security Institute used a prototype of Petri 3.0 for some of their evaluations of Claude Mythos Preview, combining Dish with the auditor-side codebase grounding described above to improve realism across multiple axes simultaneously.

Dish is in research preview and can be found here.

Bloom Integration

In December 2025, Anthropic released Bloom, an open-source framework for generating targeted behavioral evaluations. Where Petri explores broadly, probing a target model across many scenarios and scoring along many dimensions, Bloom goes deep on a single behavior, automatically generating evaluation suites that quantify how often and how severely it occurs.

Researchers have already demonstrated the value of combining the two tools. Petrova and Burden adapted Bloom for scenario generation and Petri for execution in their evaluation of frontier models, producing graded behavioral assessments rather than binary correctness judgments.

Petri 3.0 makes this composition a first-class feature: Bloom now uses Petri as its backbone for executing evaluations, including against real agent scaffolds via Dish. Bloom is now at Meridian alongside Petri, and is available at https://meridianlabs-ai.github.io/petri_bloom/.

Closing Thoughts

There’s a lot left to do. As models advance, alignment evaluation needs to advance with them, and we want to help more groups build the evals that make that possible.

Petri 3.0 is available now at https://meridianlabs-ai.github.io/inspect_petri/. We welcome issues, pull requests, and new seed instructions. If you’re using Petri in your research, we’d love to hear about it.

Acknowledgements

Petri was created at Anthropic as part of MATS and the Anthropic Fellows Program, and we’re grateful to Anthropic for entrusting its development to us. We would like to thank Samuel R. Bowman, Trenton Bricken, David Lindner, and Sydney Von Arx for useful discussion. We are grateful to the UK AI Security Institute for their continued collaboration on this work.