Cloud Sandboxes for Inspect

A new package adding managed cloud sandbox providers to Inspect, so you can run evals at scale without provisioning your own infrastructure.
Author

Alexandra Abbas

Published

April 30, 2026

Over the last year, a wave of cloud sandbox providers has emerged to run autonomous coding agents, evaluations, and RL training at scale. Today we are excited to announce that we are bringing two of these sandboxes (Daytona and Modal) to Inspect with the Inspect Sandboxes package.

Inspect already has sandbox providers for Docker, Kubernetes, EC2, and Proxmox, but Docker relies on local computing resources and the others require you to provision and maintain your own infrastructure. With these new sandboxes, you don’t need a Docker daemon on your machine or a cluster of your own.

Install the Inspect Sandboxes package from PyPI with:

pip install inspect-sandboxes

Use the "daytona" or "modal" sandbox as you would any other:

from inspect_ai import Task, eval
from inspect_ai.agent import react
from inspect_ai.tool import bash, python

task = Task(
    dataset=[...],
    solver=react(tools=[bash(), python()]),
    sandbox="daytona"
)

eval(task)

Note that if your samples already define a Dockerfile or compose.yaml, it will be automatically used by the cloud sandbox provider. You can also substitute a cloud sandbox for "docker" at the CLI:

inspect eval inspect_harbor/terminal_bench_2_0 --sandbox daytona

Multiple Containers

Some agent benchmarks need more than a single container: a database, a victim service to attack, a verifier endpoint, or a tool runtime alongside the agent. On Daytona, a compose file with two or more services runs in Docker-in-Docker, with each service exposed as a separate SandboxEnvironment. However, multi-service compose is not yet supported on Modal (we plan on enabling this as soon as it is supported natively by Modal).

Learning More

Consult the following provider-specific documentation to learn more about providing credentials, network policies, resource limits, GPUs, etc.:

Cloud sandboxes are a great fit for a wide variety of agentic evaluations but their applicability will vary based on your specific needs. To learn about and compare all of the available sandboxes see the Sandbox Extensions listing on the Inspect website.