An agent execution sandbox is a production isolation boundary for AI-generated code because it restricts filesystem access, network egress, and host interaction independently of the agent's decisions.
TL;DR
AI agents generate and execute code at runtime from inputs that may be attacker-controlled. Standard containers share the host kernel, so a single runtime CVE can compromise the host. Production-safe agent execution requires hardware-level isolation (microVMs or userspace kernels), default-deny filesystem and network policies, and layered escape prevention. This guide covers each layer with working configurations.
Why Agents Need Sandboxes: The Risk Model for Autonomous Code Execution
Traditional application code has a fixed, auditable instruction set determined at compile or deploy time. Autonomous AI agents generate and execute novel code at runtime from natural language inputs, creating a risk surface that conventional security controls were not designed to address.
Three AI-specific factors make agent execution sandbox design structurally different from traditional application sandboxing:
- Runtime code generation from untrusted inputs: LLM-generated code is treated as trusted even though each agent operates with its own prompt and partial context, so instructions from attacker-controlled sources flow into execution
- Unpredictable runtime decisions: Autonomous agents make decisions about API calls and resource usage that static policies cannot anticipate
- Stateful memory systems: Coding agents maintain persistent memory vulnerable to manipulation across sessions
Palo Alto Networks Unit 42 research, as documented in Unit 42 reporting and corroborated by subsequent comparative studies (arXiv:2512.14860), demonstrated that ChatGPT-4o deployed as an autonomous agent successfully executed SQL injection, SSRF, and unauthorized data exfiltration: attacks its chat-only counterpart consistently refused. Safety mechanisms designed for conversational AI do not translate cleanly to agentic contexts.
OWASP AIVSS assigns a CVSS v4.0 Base Score of 9.4 to the interpreter tool attack scenario where an LLM-based agent is manipulated into executing attacker-provided arbitrary code.
| Incident | Attack Vector | Impact |
|---|---|---|
| CVE-2025-58372 (Roo Code) | Prompt injection → workspace file write → arbitrary code execution (RCE) | CNA-assigned CVSS of 8.1; third-party aggregators report a 9.8 critical score (NVD assessment pending at time of writing) |
| CVE-2025-53773 (GitHub Copilot) | Command injection (officially); third-party reports describe malicious instructions in repository files | Local code execution |
| CVE-2025-59528 (Flowise) | Unsafe config handling → JavaScript injection | ~15K exposed instances |
| Replit production DB deletion | Coding agent with live DB access confused by empty inputs | Production database deleted |
| Postmark MCP BCC injection | Production MCP server injected BCC field into email tool calls | All outgoing email silently exfiltrated |
Prompt injection currently has no fool-proof, deterministic prevention. Mitigations rely on probabilistic and layered technical controls such as classifiers, hardened system prompts, and strict input/output validation, because natural language input can overlap with both benign and malicious instructions. An agent execution sandbox cannot prevent prompt injection, but it can contain the impact and keep compromised agent operations isolated, which aligns with the focus on constraining and monitoring agent access in NIST CAISI's hijacking evaluations.
This containment requirement scales fast in multi-agent workflows, where several coordinated agents each generate code in parallel. Intent addresses this by organizing work into isolated workspaces, each backed by its own git worktree, so every workspace is a safe place to explore a change, run agents, and review results without affecting other work. A Coordinator can fan tasks out to specialist Implementor agents without giving any single agent reach into the broader system.
See how Intent's isolated workspaces keep parallel agents contained from the first commit.
Free tier available · VS Code extension · Takes 2 minutes
Isolation Strategies: Containers, VMs, and Language-Level Sandboxes
Sandboxed AI agent execution requires selecting an isolation primitive that matches the threat model. The three primary categories: containers, VMs, and language-level sandboxes, differ in their security boundaries, performance characteristics, and escape complexity.
Container-Based Isolation (Docker + Security Profiles)
Standard containers share the host kernel. As gVisor's documentation states directly: "with standard containers, the workload is only one system call away from host compromise." Security profiles such as capabilities, seccomp, and MAC can reduce privilege-escalation risk in container environments, though kernel-level escape paths may still remain.
Standard Docker with hardened profiles is a reasonable starting point for development environments. For production agent execution sandbox deployments handling untrusted code, this is explicitly insufficient. Two documented runc CVEs, CVE-2019-5736 and CVE-2024-21626, exploit the container runtime itself to enable container escape, though they are not described as universally bypassing seccomp, AppArmor, and SELinux; in some configurations (for example, with SELinux enforcing) exploitation can be prevented.
gVisor: User-Space Kernel Interception
gVisor reimplements Linux syscalls in a user-space application kernel (the Sentry), written in Go. Applications running under gVisor generally do not issue system calls directly to the host kernel; their syscalls are intercepted and handled by gVisor, though gVisor itself may allow a limited set of host syscalls. Escape requires a bug in the Sentry's reimplementation AND a bug in the host kernel's handling of the Sentry's permitted syscalls, meaning attackers must defeat two independent codebases.
gVisor supports GPU workloads through nvproxy, which passes through ioctl(2) calls to NVIDIA devices with negligible overhead for GPU-bound workloads. Modal Labs runs its multi-tenant sandbox infrastructure on gVisor, and Modal reports that during a single weekend event, Lovable users created 250,000 applications on Modal Sandboxes, with over 1 million sandbox invocations and up to 20,000 concurrent at peak.
Firecracker MicroVMs: Hardware-Level Isolation
Firecracker is a VMM that uses Linux KVM, written in Rust, exposing 6 emulated devices (virtio-net, virtio-block, virtio-vsock, virtio-balloon, serial console, keyboard controller). QEMU's many virtual devices, by contrast, can increase the exploitable attack surface.
The performance characteristics make Firecracker attractive for high-throughput agent execution where each task needs its own VM:
| Metric | Firecracker Specification |
|---|---|
| Boot time (to /sbin/init) | ≤ 125 ms (serial console disabled) |
| Guest CPU performance | > 95% of bare-metal |
| Memory overhead per microVM | ≤ 5 MiB |
| Creation rate | Up to 150 microVMs/second/host |
| Codebase size | ~50K lines Rust (vs. QEMU's ~2M lines C) |
These numbers come from the official Firecracker specification and the original AWS announcement.
E2B builds its sandbox cloud on Firecracker microVMs, with each code execution running in its own microVM. E2B reports sandbox initialization in approximately 150ms using pre-warmed snapshot pools.
Hard constraint: Firecracker does not currently provide built-in, officially supported GPU passthrough, and its initial PCIe support excludes VFIO-based device passthrough and PCI hot-plugging. Upstream Firecracker has no native GPU passthrough, though experimental PCI/vfio work has demonstrated attaching GPUs (and thus running GPU-accelerated inference workloads) inside Firecracker microVMs.
Language-Level Sandboxes (WebAssembly, V8 Isolates, Deno)
WebAssembly provides memory isolation through bounds-checked linear memory, with WASI's capability model denying filesystem, network, and OS access by default. Wasmtime is in the process of implementing control-flow-integrity (CFI) mechanisms that use hardware state to keep Wasm inside its sandbox and reduce the impact of potential Cranelift compiler bugs.
Cloudflare has noted that V8 has relatively more bugs reported against it than virtual machines, so isolate-based sandboxes require additional layers of defense in depth, as discussed in Cloudflare's writeup on dynamic worker isolation. Cloudflare is also building a container platform for running containers across its network, with support for multiple runtimes including gVisor.
Python sandboxing is notoriously difficult because Python's dynamic introspection features provide multiple paths to dangerous capabilities even when surface-level imports are blocked. The correct isolation layer for Python is OS-level: tools like nsjail constrain what the entire Python process can do at the kernel interface using namespaces, resource limits, and seccomp-bpf filters.
| Approach | Security Boundary | GPU Support | Boot Time | Escape Complexity |
|---|---|---|---|---|
| Docker + seccomp/AppArmor | Namespaces + cgroups | ✅ Native | Varies | Single CVE or filter bypass |
| gVisor (runsc) | User-space kernel | ✅ nvproxy | Milliseconds (no VM boot) | Sentry bug AND host kernel bug |
| Firecracker microVM | Hardware VM boundary | ❌ No GPU | ≤125ms | KVM or minimal virtio device bug |
| Kata Containers | VM per pod | Limited | ~150-300ms+ (varies by VMM) | Guest, hypervisor, or interaction vulnerability |
| Wasmtime (standalone) | Wasm linear memory + WASI | ❌ | Sub-ms | Cranelift compiler or related runtime bug |
| V8 Isolates | Process-internal JS VM | ❌ | ~few ms | JIT compiler OOB/UAF/type confusion |
Boot times for gVisor and Kata Containers are drawn from public benchmarks and are highly configuration-dependent; treat them as orders of magnitude rather than exact numbers.
Filesystem and Network Restrictions: What to Lock Down
Filesystem and network restrictions form the second layer of an agent execution sandbox. Even with strong isolation primitives, misconfigured access policies create paths for data exfiltration and lateral movement.
Filesystem Lockdown
A read-only root filesystem with scoped writable tmpfs prevents agents from modifying binaries, writing backdoors, or poisoning persistent state:
Writable space should be limited to tmpfs mounts with size constraints and noexec flags. The noexec flag prevents binary execution from the mount, and nosuid prevents setuid/setgid bits from being honored:
Network Egress: Default-Deny with Allowlist
Network egress controls are among the most critical for cloud-hosted agent workloads. Agents that can reach the IMDS (169.254.169.254) can acquire host instance credentials, so the default policy should deny everything and explicitly allow only required endpoints.
Block the metadata endpoint explicitly via iptables for defense in depth, following AWS IMDS guidance:
Resource Limits via cgroups v2
Limits must be enforced at the cgroup level because application-level limits can be bypassed by agent-generated code. OWASP Top 10 for LLMs 2025 classifies unbounded resource consumption as LLM10:2025.
Determinism Guarantees: Making Agent Runs Reproducible
Deterministic agent execution requires controlling non-determinism at three layers: LLM inference, external I/O, and runtime environment. Setting temperature=0 is insufficient because IEEE 754 makes floating-point addition non-associative, so any change in operation ordering produces different outputs.
The VCR Pattern: Record and Replay External Calls
Docker's cagent captures the full request/response cycle during recording and serves from the cassette during replay, with zero network calls and millisecond replay latency:
Tool call IDs are normalized before cassette matching to keep replay behavior stable.
Checkpoint/Restore for State Capture
CRIU (Checkpoint/Restore In Userspace) freezes a running container and checkpoints state to disk, capturing file descriptor information, memory maps, process credentials, and memory page contents. Firecracker provides VM-level snapshots that capture the entire guest OS state.
E2B uses pre-warmed microVM pools and VM snapshots to achieve roughly 150ms restoration/provisioning times: boot microVMs to a fully initialized state, take a full snapshot, then restore incoming requests from that snapshot.
Runtime Environment Pinning
Temporal provides deterministic execution recovery through event history replay. After a crash, the workflow replays deterministically up to the crash point using the recorded event history, then resumes live execution.
Reproducibility gets harder when several agents collaborate on the same change. In Intent, work starts from a spec that evolves as agents make progress, agents handle execution while the developer focuses on what should be built, and as code changes the agents read from and update the spec so every human and agent stays aligned. Reruns and recoveries replay against the same plan rather than a stale prompt.
Explore how Intent's living specs keep parallel agents reproducible across coordinated runs.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
Escape Prevention: Common Sandbox Breakout Patterns and Mitigations
Agent sandbox security depends on understanding documented escape vectors and applying layered mitigations. Frontier models' success on apprentice-level cybersecurity tasks rose from under 10% in late 2023/early 2024 to roughly 50% in 2025, with the first expert-level task completed by a model during 2025. Sandbox designs calibrated to 2023 model capability may be insufficient for 2026+ models.
Container Runtime Escapes
Container runtime CVEs target the runtime itself rather than the syscalls that seccomp, AppArmor, or SELinux normally filter. The two most consequential examples illustrate why runtime patching matters as much as kernel hardening.
CVE-2024-21626 (Leaky Vessels): In runc ≤1.1.11, a crafted Dockerfile sets WORKDIR /proc/self/fd/[ID] where the ID points to the host filesystem. Improper handling of the container's working directory can allow it to resolve to the host filesystem, after which the attacker can traverse the host directory tree. CVSS: 8.6. Fix: runc 1.1.12.
CVE-2019-5736: runc ≤1.0-rc6 allows attackers to overwrite the host runc binary via /proc/self/exe manipulation, obtaining host root privileges. SELinux enforcement blocks this specific CVE through AVC denial, preventing container_t from writing to container_runtime_exec_t.
Both CVEs exploit the runc runtime, so seccomp, AppArmor, and SELinux profiles offer only partial protection against runtime-level vulnerabilities; AppArmor and SELinux may mitigate some attack vectors, but runtime patching remains the primary defense.
AI-Agent-Specific Escape: Configuration-Based Sandbox Escape (CBSE)
Security researchers have documented configuration-persistence and poisoning risks in some AI coding and agent sandboxes, where attackers may alter trusted agent files to affect future behavior. The configuration persists across future sessions, and the attack can be delivered via prompt injection through normal agent operations like reading malicious workspace content.
A key security concern is whether agents can modify local workspace settings or other writable configuration files to extend their reach. The mitigation is treating sandbox config as immutable code: no agent should have write access to its own approval policy or sandbox mode configuration.
Mitigation Matrix
The mitigations vary by escape category, but the layered pattern is consistent: a primary control plus at least one defense-in-depth measure that assumes the primary control fails.
| Escape Category | Primary Mitigation | Defense-in-Depth |
|---|---|---|
| Container runtime (runc CVEs) | Upgrade runc ≥1.1.12; enforce SELinux/AppArmor | Read-only rootfs; no --privileged; restrict docker exec |
| Kernel exploits from containers | Seccomp allowlist; disable unprivileged user namespaces | gVisor/Kata; patched kernel |
| VM escape (QEMU/KVM) | Minimize emulated device surface; use Firecracker | Seccomp-filtered QEMU process; disable 3D GPU, floppy, legacy NICs |
| Filesystem/symlink/TOCTOU | openat2() with RESOLVE_BENEATH; getcwd() validation | Read-only mounts; noexec; filtered /proc |
| Resource exhaustion | cgroup v2 hard limits; hypervisor-level quotas | RLIMIT_NPROC; network egress rate limits |
| AI agent prompt injection | Pre/post-execution semantic gates; tool least-privilege | Human approval for high-impact actions |
| AI agent CBSE | Immutable sandbox config; no agent self-modification | Audit all config write paths |
Research has found that several successful attacks were achieved without sandbox escape, by exploiting the agent's planning logic to produce unsafe code within the sandbox's constraints. A layered defense for AI agents can include input sanitization, static code validation, pre-execution checks, isolated execution, runtime monitoring, and post-execution review.
Production Sandbox Checklist: Minimum Requirements for Safe Agent Execution
The minimum acceptable isolation for a production agent execution sandbox is typically a Firecracker/Kata microVM, with gVisor used in some environments as a fallback or lighter-weight option depending on the threat model. Standard Docker/runc shares the host kernel and is explicitly insufficient for untrusted agent code execution. This finding is consistent with public architecture documentation for production platforms including E2B, Modal, and AWS Lambda, as well as security incident documentation. Augment Cosmos, currently in research preview for MAX users, takes a similar agent-runtime approach at the platform layer, exposing isolation and scheduling as primitives that agents can plug into across laptops, Dev-VMs, and cloud environments.
Isolation Boundary
The isolation boundary is the highest-leverage decision in the sandbox stack because every other control assumes it holds. Get this wrong and the rest of the layers reduce to defense-in-depth against an attacker who already escaped.
- Select isolation primitive: Firecracker microVM (max isolation, no GPU) OR gVisor (strong isolation with GPU) OR Kata Containers (OCI-compatible VM)
- Provision one sandbox per execution session rather than sharing across users or tenants
- Drop all Linux capabilities:
capabilities: drop: ALL - Apply a seccomp profile aligned with NIST SP 800-190 (the 2017 container security guide remains the primary NIST reference, though it predates the agentic-AI threat model and should be supplemented with current AI-specific guidance)
- Keep the control plane (policy enforcement) separate from the execution plane (sandbox pool)
Filesystem
Filesystem rules close the path most agent attacks reach for first: writing executables, poisoning configuration, or persisting state across sessions.
- Ephemeral root filesystem destroyed on sandbox termination
- No host filesystem bind mounts into the sandbox
- Writable layers scoped to
/tmpwithnoexec,nosuid, and size limits - Validate agent-specified file paths against allowed directories
Network
Network policy keeps a compromised agent from reaching credentials, internal services, or unapproved external endpoints. Default-deny is the only sustainable starting point.
- Default-deny egress; allowlist-only for required endpoints
- Block
169.254.169.254(cloud metadata) via network policy - Block RFC 1918 internal ranges to prevent lateral movement
- Validate resolved IPs at connection time in addition to DNS resolution (prevents DNS rebinding)
- Log all outbound connections with sandbox ID, destination, port, bytes transferred
Resource Limits
Resource limits stop fork bombs, runaway loops, and memory exhaustion from spreading beyond a single sandbox.
- CPU:
cpu.maxvia cgroup v2 - Memory: hard
memory.maxlimit; disable swap or limitmemory.memsw - PIDs:
pids.maxvia cgroup to prevent fork bombs - Wall-clock timeout enforced at the orchestration layer above the sandbox
Secrets
Secrets handling determines whether a compromised sandbox leaks anything beyond its own ephemeral state.
- No host credentials in sandbox filesystem or environment variables
- No Docker socket mount
- No Kubernetes service account tokens unless explicitly required with minimum permissions
- Short-lived, scoped tokens via credential proxy; rotate after sandbox termination
Monitoring
Monitoring closes the loop: even with strong isolation, anomalous behavior should produce signals before damage spreads.
- Alert on syscalls outside expected profile
- Alert on outbound connections to non-allowlisted destinations
- Alert on writes to unexpected paths or mount attempts
- Alert on sandboxes approaching time limits
- Immutable audit log written to external sink before sandbox terminates
Replit's governing principle applies to every production agent sandbox: "Every layer of the infrastructure where customer code runs is designed with defense in depth. No single control is the last line of defense; every layer assumes the one above it might fail."
Choose the Isolation Boundary Before Your Next Agent Deployment
The highest-priority decision in agent infrastructure is the isolation boundary. Standard Docker containers leave teams exposed to the shared-kernel risk described throughout this guide, while microVMs and userspace kernels add an independent control layer. Start with Firecracker for CPU-only workloads requiring maximum isolation, or gVisor for workloads that need GPU access. Then apply filesystem restrictions, network egress controls, and resource limits using the configurations above. Treat sandbox configuration as immutable code, and never let an agent modify its own approval policy.
The orchestration layer matters just as much. Intent runs each agent in an isolated git-worktree workspace, coordinates parallel agents through a Coordinator, Implementor, and Verifier model, and keeps a living spec as the canonical record of what every sandboxed step is supposed to do. That combination gives you per-agent isolation, pre and post-execution checks, and a stable contract the sandbox can enforce against. Teams looking to extend that pattern across an entire SDLC can also evaluate Augment Cosmos, the agentic-development platform now in research preview, which carries the same runtime, context, and human-in-the-loop primitives across laptops, Dev-VMs, and cloud.
See how Intent's coordinated multi-agent workspaces operate above the sandbox boundary you choose.
Free tier available · VS Code extension · Takes 2 minutes
FAQ
Related
- 9 Best AI Coding Agent Desktop Apps in 2026 (Ranked by Real-World Performance)
- 9 Open-Source Agent Orchestrators for AI Coding (2026)
- 6 Best Spec-Driven Development Tools for AI Coding in 2026
- 7 Best AI Agent Observability Tools for Coding Teams in 2026
- 9 Security Integrations That Keep AI Code Compliant in Enterprise Environments
Written by

Molisha Shah
Molisha is an early GTM and Customer Champion at Augment Code, where she focuses on helping developers understand and adopt modern AI coding practices. She writes about clean code principles, agentic development environments, and how teams are restructuring their workflows around AI agents. She holds a degree in Business and Cognitive Science from UC Berkeley.