Projects
Inspect AI
PyPI
GitHub
Framework for systematic evaluation of large language models, built by the UK AISI in collaboration with Meridian. Inspect can be used for a broad range of evaluations that measure coding, agentic tasks, reasoning, knowledge, behavior, and multi-modal understanding.
Inspect Scout
PyPI
GitHub
In-depth analysis of AI agent transcripts with rich visualization of results and high-performance parallel scanning. Detect issues like misconfigured environments, refusals, and evaluation awareness using LLM-based or pattern-based scanners.
Inspect Petri
PyPI
GitHub
Alignment auditing agent for probing language model behavior, built in collaboration with UK AISI. Rapidly test concrete alignment hypotheses end-to-end by generating realistic audit scenarios and orchestrating multi-turn audits.
Petri Bloom
PyPI
GitHub
Framework for generating behavioral evaluations of frontier AI models. Given a “seed” configuration describing the target behavior and evaluation parameters, Bloom produces diverse test scenarios, runs conversations with the target model, and scores the results.
Petri Dish
PyPI
GitHub
Run Petri alignment audits against real agent scaffolds (Claude Code, Codex CLI, Gemini CLI) providing a highly realistic environment for evaluations.
Inspect Flow
PyPI
GitHub
Workflow orchestration for Inspect AI that enables running evaluations at scale with repeatability and maintainability. Inspect Flow is designed for researchers and engineers running systematic AI evaluations who need to scale beyond ad-hoc scripts.
Inspect SWE
PyPI
GitHub
Software engineering agents for Inspect AI. Use popular coding agents like Claude Code, Codex CLI, Gemini CLI, and Mini SWE Agent and standard Inspect agents for any evaluation.
Inspect Harbor
PyPI
GitHub
Interface to run Harbor tasks using Inspect AI. Includes over 80 agentic datasets including SWE-bench Pro, Terminal Bench 2.0, ReplicationBench, Finance Agent, and more.
Inspect Viz
PyPI
GitHub
Data visualization library for creating high-quality interactive visualizations from Inspect AI evaluation results. Use built-in views for comparing models, tasks, and evaluation factors or create custom interactive plots.
Inspect Sandboxes
PyPI
GitHub
Daytona and Modal sandbox providers for Inspect AI. Use scalable cloud infrastructe to run hundreds of evaluation tasks in parallel.
Inspect VSCode
Marketplace
GitHub
Visual Studio Code extension for productive use of Inspect AI and Inspect Scout. Features integrated log and scan viewing and debugging tools for tasks and scanners.