Introducing Inspect Flow

A workflow layer for Inspect that makes it easier to run evals at scale with declarative configs, matrix sweeps, and automatic log reuse.
Author

Alexandra Abbas

Published

March 4, 2026

Inspect Flow is the workflow layer for Inspect that makes it easier to run evals at scale.

Define a simple workflow with a list of tasks:

from inspect_flow import FlowSpec, FlowTask

FlowSpec(
    log_dir="logs",
    tasks=[
        FlowTask(
            name="inspect_evals/gpqa_diamond",
            model="openai/gpt-4o",
        ),
        FlowTask(
            name="inspect_evals/mmlu_0_shot",
            model="openai/gpt-4o",
        ),
    ],
)

Then run:

flow run config.py

For more complex experiments, use matrix patterns to systematically sweep across tasks, models, and parameters:

FlowSpec(
    log_dir="logs",
    tasks=tasks_matrix(
        task=[
            "inspect_evals/gpqa_diamond",
            "inspect_evals/mmlu_pro",
        ],
        model=models_matrix(
            model=[
                "openai/gpt-5",
                "openai/gpt-5-mini",
            ],
            config=configs_matrix(
                reasoning_effort=["low", "medium", "high"],
            ),
        ),
    ),
)
# → produces 12 evaluations
#   2 tasks × 2 models × 3 reasoning levels

Flow expands the task/model/config matrix, reuses logs from the Flow Store, and only runs what’s missing.

Get started with the Inspect Flow documentation.