Projects

Inspect AI PyPI GitHub

Framework for systematic evaluation of large language models, built by the UK AISI in collaboration with Meridian. Inspect can be used for a broad range of evaluations that measure coding, agentic tasks, reasoning, knowledge, behavior, and multi-modal understanding.

Inspect Scout PyPI GitHub

In-depth analysis of AI agent transcripts with rich visualization of results and high-performance parallel scanning. Detect issues like misconfigured environments, refusals, and evaluation awareness using LLM-based or pattern-based scanners.

Inspect Petri PyPI GitHub

Alignment auditing agent for probing language model behavior, built in collaboration with UK AISI. Rapidly test concrete alignment hypotheses end-to-end by generating realistic audit scenarios and orchestrating multi-turn audits.

Petri Bloom PyPI GitHub

Framework for generating behavioral evaluations of frontier AI models. Given a “seed” configuration describing the target behavior and evaluation parameters, Bloom produces diverse test scenarios, runs conversations with the target model, and scores the results.

Inspect Flow PyPI GitHub

Workflow orchestration for Inspect AI that enables running evaluations at scale with repeatability and maintainability. Inspect Flow is designed for researchers and engineers running systematic AI evaluations who need to scale beyond ad-hoc scripts.

Inspect SWE PyPI GitHub

Software engineering agents for Inspect AI. Use popular coding agents like Claude Code, Codex CLI, Gemini CLI, and Mini SWE Agent and standard Inspect agents for any evaluation.

Inspect Harbor PyPI GitHub

Interface to run Harbor tasks using Inspect AI. Includes over 80 agentic datasets including SWE-bench Pro, Terminal Bench 2.0, ReplicationBench, Finance Agent, and more.

Inspect Sandboxes PyPI GitHub

Daytona and Modal sandbox providers for Inspect AI. Use scalable cloud infrastructe to run hundreds of evaluation tasks in parallel.

Inspect Viz PyPI GitHub

Data visualization library for creating high-quality interactive visualizations from Inspect AI evaluation results. Use built-in views for comparing models, tasks, and evaluation factors or create custom interactive plots.

Inspect VSCode Marketplace GitHub

Visual Studio Code extension for productive use of Inspect AI and Inspect Scout. Features integrated log and scan viewing and debugging tools for tasks and scanners.

No matching items