Inspect Flow is the workflow layer for Inspect that makes it easier to run evals at scale.
- Scaling experiments: Run many tasks × models × params without writing orchestration scripts.
- Avoid re-running work: Reuse logs from the Flow Store and only run what’s missing.
- Clean configs: Define eval workflows declaratively instead of ad-hoc Python scripts.
- Systematic sweeps: Built-in matrix patterns for exploring tasks, models, and params.
Define a simple workflow with a list of tasks:
from inspect_flow import FlowSpec, FlowTask
FlowSpec(
log_dir="logs",
tasks=[
FlowTask(
name="inspect_evals/gpqa_diamond",
model="openai/gpt-4o",
),
FlowTask(
name="inspect_evals/mmlu_0_shot",
model="openai/gpt-4o",
),
],
)Then run:
flow run config.pyFor more complex experiments, use matrix patterns to systematically sweep across tasks, models, and parameters:
FlowSpec(
log_dir="logs",
tasks=tasks_matrix(
task=[
"inspect_evals/gpqa_diamond",
"inspect_evals/mmlu_pro",
],
model=models_matrix(
model=[
"openai/gpt-5",
"openai/gpt-5-mini",
],
config=configs_matrix(
reasoning_effort=["low", "medium", "high"],
),
),
),
)
# → produces 12 evaluations
# 2 tasks × 2 models × 3 reasoning levelsFlow expands the task/model/config matrix, reuses logs from the Flow Store, and only runs what’s missing.
Get started with the Inspect Flow documentation.
