Skip to content
Install
Back to Guides

What Is an Agent Execution Sandbox?

May 3, 2026
Molisha Shah
Molisha Shah
What Is an Agent Execution Sandbox?

An agent execution sandbox is a production isolation boundary for AI-generated code because it restricts filesystem access, network egress, and host interaction independently of the agent's decisions.

TL;DR

AI agents generate and execute code at runtime from inputs that may be attacker-controlled. Standard containers share the host kernel, so a single runtime CVE can compromise the host. Production-safe agent execution requires hardware-level isolation (microVMs or userspace kernels), default-deny filesystem and network policies, and layered escape prevention. This guide covers each layer with working configurations.

Why Agents Need Sandboxes: The Risk Model for Autonomous Code Execution

Traditional application code has a fixed, auditable instruction set determined at compile or deploy time. Autonomous AI agents generate and execute novel code at runtime from natural language inputs, creating a risk surface that conventional security controls were not designed to address.

Three AI-specific factors make agent execution sandbox design structurally different from traditional application sandboxing:

  1. Runtime code generation from untrusted inputs: LLM-generated code is treated as trusted even though each agent operates with its own prompt and partial context, so instructions from attacker-controlled sources flow into execution
  2. Unpredictable runtime decisions: Autonomous agents make decisions about API calls and resource usage that static policies cannot anticipate
  3. Stateful memory systems: Coding agents maintain persistent memory vulnerable to manipulation across sessions

Palo Alto Networks Unit 42 research, as documented in Unit 42 reporting and corroborated by subsequent comparative studies (arXiv:2512.14860), demonstrated that ChatGPT-4o deployed as an autonomous agent successfully executed SQL injection, SSRF, and unauthorized data exfiltration: attacks its chat-only counterpart consistently refused. Safety mechanisms designed for conversational AI do not translate cleanly to agentic contexts.

OWASP AIVSS assigns a CVSS v4.0 Base Score of 9.4 to the interpreter tool attack scenario where an LLM-based agent is manipulated into executing attacker-provided arbitrary code.

IncidentAttack VectorImpact
CVE-2025-58372 (Roo Code)Prompt injection → workspace file write → arbitrary code execution (RCE)CNA-assigned CVSS of 8.1; third-party aggregators report a 9.8 critical score (NVD assessment pending at time of writing)
CVE-2025-53773 (GitHub Copilot)Command injection (officially); third-party reports describe malicious instructions in repository filesLocal code execution
CVE-2025-59528 (Flowise)Unsafe config handling → JavaScript injection~15K exposed instances
Replit production DB deletionCoding agent with live DB access confused by empty inputsProduction database deleted
Postmark MCP BCC injectionProduction MCP server injected BCC field into email tool callsAll outgoing email silently exfiltrated

Prompt injection currently has no fool-proof, deterministic prevention. Mitigations rely on probabilistic and layered technical controls such as classifiers, hardened system prompts, and strict input/output validation, because natural language input can overlap with both benign and malicious instructions. An agent execution sandbox cannot prevent prompt injection, but it can contain the impact and keep compromised agent operations isolated, which aligns with the focus on constraining and monitoring agent access in NIST CAISI's hijacking evaluations.

This containment requirement scales fast in multi-agent workflows, where several coordinated agents each generate code in parallel. Intent addresses this by organizing work into isolated workspaces, each backed by its own git worktree, so every workspace is a safe place to explore a change, run agents, and review results without affecting other work. A Coordinator can fan tasks out to specialist Implementor agents without giving any single agent reach into the broader system.

See how Intent's isolated workspaces keep parallel agents contained from the first commit.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

Isolation Strategies: Containers, VMs, and Language-Level Sandboxes

Sandboxed AI agent execution requires selecting an isolation primitive that matches the threat model. The three primary categories: containers, VMs, and language-level sandboxes, differ in their security boundaries, performance characteristics, and escape complexity.

Container-Based Isolation (Docker + Security Profiles)

Standard containers share the host kernel. As gVisor's documentation states directly: "with standard containers, the workload is only one system call away from host compromise." Security profiles such as capabilities, seccomp, and MAC can reduce privilege-escalation risk in container environments, though kernel-level escape paths may still remain.

bash
# Hardened Docker configuration for AI agent workloads
docker run \
--security-opt seccomp=/etc/docker/seccomp-strict.json \
--security-opt apparmor=docker-ai-agent \
--security-opt no-new-privileges \
--cap-drop ALL \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=100m \
--memory 512m \
--cpus 1.0 \
--network none \
myimage

Standard Docker with hardened profiles is a reasonable starting point for development environments. For production agent execution sandbox deployments handling untrusted code, this is explicitly insufficient. Two documented runc CVEs, CVE-2019-5736 and CVE-2024-21626, exploit the container runtime itself to enable container escape, though they are not described as universally bypassing seccomp, AppArmor, and SELinux; in some configurations (for example, with SELinux enforcing) exploitation can be prevented.

gVisor: User-Space Kernel Interception

gVisor reimplements Linux syscalls in a user-space application kernel (the Sentry), written in Go. Applications running under gVisor generally do not issue system calls directly to the host kernel; their syscalls are intercepted and handled by gVisor, though gVisor itself may allow a limited set of host syscalls. Escape requires a bug in the Sentry's reimplementation AND a bug in the host kernel's handling of the Sentry's permitted syscalls, meaning attackers must defeat two independent codebases.

gVisor supports GPU workloads through nvproxy, which passes through ioctl(2) calls to NVIDIA devices with negligible overhead for GPU-bound workloads. Modal Labs runs its multi-tenant sandbox infrastructure on gVisor, and Modal reports that during a single weekend event, Lovable users created 250,000 applications on Modal Sandboxes, with over 1 million sandbox invocations and up to 20,000 concurrent at peak.

yaml
# Kubernetes RuntimeClass for gVisor
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
---
apiVersion: v1
kind: Pod
spec:
runtimeClassName: gvisor
containers:
- name: agent-executor
image: myimage

Firecracker MicroVMs: Hardware-Level Isolation

Firecracker is a VMM that uses Linux KVM, written in Rust, exposing 6 emulated devices (virtio-net, virtio-block, virtio-vsock, virtio-balloon, serial console, keyboard controller). QEMU's many virtual devices, by contrast, can increase the exploitable attack surface.

The performance characteristics make Firecracker attractive for high-throughput agent execution where each task needs its own VM:

MetricFirecracker Specification
Boot time (to /sbin/init)≤ 125 ms (serial console disabled)
Guest CPU performance> 95% of bare-metal
Memory overhead per microVM≤ 5 MiB
Creation rateUp to 150 microVMs/second/host
Codebase size~50K lines Rust (vs. QEMU's ~2M lines C)

These numbers come from the official Firecracker specification and the original AWS announcement.

E2B builds its sandbox cloud on Firecracker microVMs, with each code execution running in its own microVM. E2B reports sandbox initialization in approximately 150ms using pre-warmed snapshot pools.

Hard constraint: Firecracker does not currently provide built-in, officially supported GPU passthrough, and its initial PCIe support excludes VFIO-based device passthrough and PCI hot-plugging. Upstream Firecracker has no native GPU passthrough, though experimental PCI/vfio work has demonstrated attaching GPUs (and thus running GPU-accelerated inference workloads) inside Firecracker microVMs.

Language-Level Sandboxes (WebAssembly, V8 Isolates, Deno)

WebAssembly provides memory isolation through bounds-checked linear memory, with WASI's capability model denying filesystem, network, and OS access by default. Wasmtime is in the process of implementing control-flow-integrity (CFI) mechanisms that use hardware state to keep Wasm inside its sandbox and reduce the impact of potential Cranelift compiler bugs.

Cloudflare has noted that V8 has relatively more bugs reported against it than virtual machines, so isolate-based sandboxes require additional layers of defense in depth, as discussed in Cloudflare's writeup on dynamic worker isolation. Cloudflare is also building a container platform for running containers across its network, with support for multiple runtimes including gVisor.

Python sandboxing is notoriously difficult because Python's dynamic introspection features provide multiple paths to dangerous capabilities even when surface-level imports are blocked. The correct isolation layer for Python is OS-level: tools like nsjail constrain what the entire Python process can do at the kernel interface using namespaces, resource limits, and seccomp-bpf filters.

ApproachSecurity BoundaryGPU SupportBoot TimeEscape Complexity
Docker + seccomp/AppArmorNamespaces + cgroups✅ NativeVariesSingle CVE or filter bypass
gVisor (runsc)User-space kernel✅ nvproxyMilliseconds (no VM boot)Sentry bug AND host kernel bug
Firecracker microVMHardware VM boundary❌ No GPU≤125msKVM or minimal virtio device bug
Kata ContainersVM per podLimited~150-300ms+ (varies by VMM)Guest, hypervisor, or interaction vulnerability
Wasmtime (standalone)Wasm linear memory + WASISub-msCranelift compiler or related runtime bug
V8 IsolatesProcess-internal JS VM~few msJIT compiler OOB/UAF/type confusion

Boot times for gVisor and Kata Containers are drawn from public benchmarks and are highly configuration-dependent; treat them as orders of magnitude rather than exact numbers.

Filesystem and Network Restrictions: What to Lock Down

Filesystem and network restrictions form the second layer of an agent execution sandbox. Even with strong isolation primitives, misconfigured access policies create paths for data exfiltration and lateral movement.

Filesystem Lockdown

A read-only root filesystem with scoped writable tmpfs prevents agents from modifying binaries, writing backdoors, or poisoning persistent state:

yaml
# Kubernetes securityContext for agent pods
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 10001
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL

Writable space should be limited to tmpfs mounts with size constraints and noexec flags. The noexec flag prevents binary execution from the mount, and nosuid prevents setuid/setgid bits from being honored:

bash
docker run \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,nodev,size=128m,mode=1777 \
--tmpfs /var/tmp:rw,noexec,nosuid,nodev,size=32m \
my-agent-image

Network Egress: Default-Deny with Allowlist

Network egress controls are among the most critical for cloud-hosted agent workloads. Agents that can reach the IMDS (169.254.169.254) can acquire host instance credentials, so the default policy should deny everything and explicitly allow only required endpoints.

yaml
# Kubernetes NetworkPolicy: deny-all egress with allowlist
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: agent-egress-restricted
namespace: agent-sandbox
spec:
podSelector:
matchLabels:
role: ai-agent
policyTypes:
- Egress
- Ingress
ingress: []
egress:
- ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
- to:
- ipBlock:
cidr: 203.0.113.0/24 # LLM provider CIDR
ports:
- protocol: TCP
port: 443

Block the metadata endpoint explicitly via iptables for defense in depth, following AWS IMDS guidance:

bash
# Block all forwarded container traffic to cloud metadata endpoint
sudo iptables -I FORWARD -d 169.254.169.254/32 -j DROP

Resource Limits via cgroups v2

Limits must be enforced at the cgroup level because application-level limits can be bypassed by agent-generated code. OWASP Top 10 for LLMs 2025 classifies unbounded resource consumption as LLM10:2025.

bash
# CPU: limit to 50% of one core
echo "50000 100000" | sudo tee /sys/fs/cgroup/ai-agent/cpu.max
# Memory hard limit: 512 MB
echo "536870912" | sudo tee /sys/fs/cgroup/ai-agent/memory.max
# PID limit (prevents fork bombs)
echo "256" | sudo tee /sys/fs/cgroup/ai-agent/pids.max

Determinism Guarantees: Making Agent Runs Reproducible

Deterministic agent execution requires controlling non-determinism at three layers: LLM inference, external I/O, and runtime environment. Setting temperature=0 is insufficient because IEEE 754 makes floating-point addition non-associative, so any change in operation ordering produces different outputs.

The VCR Pattern: Record and Replay External Calls

Docker's cagent captures the full request/response cycle during recording and serves from the cassette during replay, with zero network calls and millisecond replay latency:

bash
# Record an agent session: all LLM API calls captured
cagent --record cassette.yaml -- python my_agent.py
# Replay from cassette: millisecond execution, zero API calls
cagent --fake cassette.yaml -- python my_agent.py

Tool call IDs are normalized before cassette matching to keep replay behavior stable.

Checkpoint/Restore for State Capture

CRIU (Checkpoint/Restore In Userspace) freezes a running container and checkpoints state to disk, capturing file descriptor information, memory maps, process credentials, and memory page contents. Firecracker provides VM-level snapshots that capture the entire guest OS state.

E2B uses pre-warmed microVM pools and VM snapshots to achieve roughly 150ms restoration/provisioning times: boot microVMs to a fully initialized state, take a full snapshot, then restore incoming requests from that snapshot.

Runtime Environment Pinning

dockerfile
# Update tag and digest to the latest supported release before use.
# Alpine 3.21 remains in security support, but 3.23 is the current stable branch as of May 2026.
FROM alpine:3.21.3@sha256:a8560b36e8b8210634f77d9f7f9efd7ffa463e380b75e2e74aff4511df3ef88c
ENV PYTHONHASHSEED=0 # Fix Python hash seed
ENV TZ=UTC # Fix timezone
ENV LC_ALL=C # Fix locale
python
# Seed all PRNG sources explicitly
import random, numpy as np
random.seed(42)
np.random.seed(42)
# Directory listings: filesystem order is NOT guaranteed
files = sorted(os.listdir(directory))

Temporal provides deterministic execution recovery through event history replay. After a crash, the workflow replays deterministically up to the crash point using the recorded event history, then resumes live execution.

Reproducibility gets harder when several agents collaborate on the same change. In Intent, work starts from a spec that evolves as agents make progress, agents handle execution while the developer focuses on what should be built, and as code changes the agents read from and update the spec so every human and agent stays aligned. Reruns and recoveries replay against the same plan rather than a stale prompt.

Explore how Intent's living specs keep parallel agents reproducible across coordinated runs.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline
···
$ cat build.log | auggie --print --quiet \
"Summarize the failure"
Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42
Fix: npm install lodash @types/lodash

Escape Prevention: Common Sandbox Breakout Patterns and Mitigations

Agent sandbox security depends on understanding documented escape vectors and applying layered mitigations. Frontier models' success on apprentice-level cybersecurity tasks rose from under 10% in late 2023/early 2024 to roughly 50% in 2025, with the first expert-level task completed by a model during 2025. Sandbox designs calibrated to 2023 model capability may be insufficient for 2026+ models.

Container Runtime Escapes

Container runtime CVEs target the runtime itself rather than the syscalls that seccomp, AppArmor, or SELinux normally filter. The two most consequential examples illustrate why runtime patching matters as much as kernel hardening.

CVE-2024-21626 (Leaky Vessels): In runc ≤1.1.11, a crafted Dockerfile sets WORKDIR /proc/self/fd/[ID] where the ID points to the host filesystem. Improper handling of the container's working directory can allow it to resolve to the host filesystem, after which the attacker can traverse the host directory tree. CVSS: 8.6. Fix: runc 1.1.12.

CVE-2019-5736: runc ≤1.0-rc6 allows attackers to overwrite the host runc binary via /proc/self/exe manipulation, obtaining host root privileges. SELinux enforcement blocks this specific CVE through AVC denial, preventing container_t from writing to container_runtime_exec_t.

Both CVEs exploit the runc runtime, so seccomp, AppArmor, and SELinux profiles offer only partial protection against runtime-level vulnerabilities; AppArmor and SELinux may mitigate some attack vectors, but runtime patching remains the primary defense.

AI-Agent-Specific Escape: Configuration-Based Sandbox Escape (CBSE)

Security researchers have documented configuration-persistence and poisoning risks in some AI coding and agent sandboxes, where attackers may alter trusted agent files to affect future behavior. The configuration persists across future sessions, and the attack can be delivered via prompt injection through normal agent operations like reading malicious workspace content.

Open source
augmentcode/auggie205
Star on GitHub

A key security concern is whether agents can modify local workspace settings or other writable configuration files to extend their reach. The mitigation is treating sandbox config as immutable code: no agent should have write access to its own approval policy or sandbox mode configuration.

Mitigation Matrix

The mitigations vary by escape category, but the layered pattern is consistent: a primary control plus at least one defense-in-depth measure that assumes the primary control fails.

Escape CategoryPrimary MitigationDefense-in-Depth
Container runtime (runc CVEs)Upgrade runc ≥1.1.12; enforce SELinux/AppArmorRead-only rootfs; no --privileged; restrict docker exec
Kernel exploits from containersSeccomp allowlist; disable unprivileged user namespacesgVisor/Kata; patched kernel
VM escape (QEMU/KVM)Minimize emulated device surface; use FirecrackerSeccomp-filtered QEMU process; disable 3D GPU, floppy, legacy NICs
Filesystem/symlink/TOCTOUopenat2() with RESOLVE_BENEATH; getcwd() validationRead-only mounts; noexec; filtered /proc
Resource exhaustioncgroup v2 hard limits; hypervisor-level quotasRLIMIT_NPROC; network egress rate limits
AI agent prompt injectionPre/post-execution semantic gates; tool least-privilegeHuman approval for high-impact actions
AI agent CBSEImmutable sandbox config; no agent self-modificationAudit all config write paths

Research has found that several successful attacks were achieved without sandbox escape, by exploiting the agent's planning logic to produce unsafe code within the sandbox's constraints. A layered defense for AI agents can include input sanitization, static code validation, pre-execution checks, isolated execution, runtime monitoring, and post-execution review.

Production Sandbox Checklist: Minimum Requirements for Safe Agent Execution

The minimum acceptable isolation for a production agent execution sandbox is typically a Firecracker/Kata microVM, with gVisor used in some environments as a fallback or lighter-weight option depending on the threat model. Standard Docker/runc shares the host kernel and is explicitly insufficient for untrusted agent code execution. This finding is consistent with public architecture documentation for production platforms including E2B, Modal, and AWS Lambda, as well as security incident documentation. Augment Cosmos, currently in research preview for MAX users, takes a similar agent-runtime approach at the platform layer, exposing isolation and scheduling as primitives that agents can plug into across laptops, Dev-VMs, and cloud environments.

Isolation Boundary

The isolation boundary is the highest-leverage decision in the sandbox stack because every other control assumes it holds. Get this wrong and the rest of the layers reduce to defense-in-depth against an attacker who already escaped.

  • Select isolation primitive: Firecracker microVM (max isolation, no GPU) OR gVisor (strong isolation with GPU) OR Kata Containers (OCI-compatible VM)
  • Provision one sandbox per execution session rather than sharing across users or tenants
  • Drop all Linux capabilities: capabilities: drop: ALL
  • Apply a seccomp profile aligned with NIST SP 800-190 (the 2017 container security guide remains the primary NIST reference, though it predates the agentic-AI threat model and should be supplemented with current AI-specific guidance)
  • Keep the control plane (policy enforcement) separate from the execution plane (sandbox pool)

Filesystem

Filesystem rules close the path most agent attacks reach for first: writing executables, poisoning configuration, or persisting state across sessions.

  • Ephemeral root filesystem destroyed on sandbox termination
  • No host filesystem bind mounts into the sandbox
  • Writable layers scoped to /tmp with noexec, nosuid, and size limits
  • Validate agent-specified file paths against allowed directories

Network

Network policy keeps a compromised agent from reaching credentials, internal services, or unapproved external endpoints. Default-deny is the only sustainable starting point.

  • Default-deny egress; allowlist-only for required endpoints
  • Block 169.254.169.254 (cloud metadata) via network policy
  • Block RFC 1918 internal ranges to prevent lateral movement
  • Validate resolved IPs at connection time in addition to DNS resolution (prevents DNS rebinding)
  • Log all outbound connections with sandbox ID, destination, port, bytes transferred

Resource Limits

Resource limits stop fork bombs, runaway loops, and memory exhaustion from spreading beyond a single sandbox.

  • CPU: cpu.max via cgroup v2
  • Memory: hard memory.max limit; disable swap or limit memory.memsw
  • PIDs: pids.max via cgroup to prevent fork bombs
  • Wall-clock timeout enforced at the orchestration layer above the sandbox

Secrets

Secrets handling determines whether a compromised sandbox leaks anything beyond its own ephemeral state.

  • No host credentials in sandbox filesystem or environment variables
  • No Docker socket mount
  • No Kubernetes service account tokens unless explicitly required with minimum permissions
  • Short-lived, scoped tokens via credential proxy; rotate after sandbox termination

Monitoring

Monitoring closes the loop: even with strong isolation, anomalous behavior should produce signals before damage spreads.

  • Alert on syscalls outside expected profile
  • Alert on outbound connections to non-allowlisted destinations
  • Alert on writes to unexpected paths or mount attempts
  • Alert on sandboxes approaching time limits
  • Immutable audit log written to external sink before sandbox terminates

Replit's governing principle applies to every production agent sandbox: "Every layer of the infrastructure where customer code runs is designed with defense in depth. No single control is the last line of defense; every layer assumes the one above it might fail."

Choose the Isolation Boundary Before Your Next Agent Deployment

The highest-priority decision in agent infrastructure is the isolation boundary. Standard Docker containers leave teams exposed to the shared-kernel risk described throughout this guide, while microVMs and userspace kernels add an independent control layer. Start with Firecracker for CPU-only workloads requiring maximum isolation, or gVisor for workloads that need GPU access. Then apply filesystem restrictions, network egress controls, and resource limits using the configurations above. Treat sandbox configuration as immutable code, and never let an agent modify its own approval policy.

The orchestration layer matters just as much. Intent runs each agent in an isolated git-worktree workspace, coordinates parallel agents through a Coordinator, Implementor, and Verifier model, and keeps a living spec as the canonical record of what every sandboxed step is supposed to do. That combination gives you per-agent isolation, pre and post-execution checks, and a stable contract the sandbox can enforce against. Teams looking to extend that pattern across an entire SDLC can also evaluate Augment Cosmos, the agentic-development platform now in research preview, which carries the same runtime, context, and human-in-the-loop primitives across laptops, Dev-VMs, and cloud.

See how Intent's coordinated multi-agent workspaces operate above the sandbox boundary you choose.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

FAQ

Written by

Molisha Shah

Molisha Shah

Molisha is an early GTM and Customer Champion at Augment Code, where she focuses on helping developers understand and adopt modern AI coding practices. She writes about clean code principles, agentic development environments, and how teams are restructuring their workflows around AI agents. She holds a degree in Business and Cognitive Science from UC Berkeley.


Get Started

Give your codebase the agents it deserves

Install Augment to get started. Works with codebases of any size, from side projects to enterprise monorepos.