Skip to content
Install
Back to Guides

Anthropic Agent SDK: What It Ships vs. What It Leaves to You

May 3, 2026
Molisha Shah
Molisha Shah
Anthropic Agent SDK: What It Ships vs. What It Leaves to You

The Anthropic Agent SDK ships a production-quality agent loop, tool use protocol, and streaming infrastructure, but organizations may still need to build context management, multi-agent orchestration, observability, security hardening, and state persistence themselves before production readiness.

TL;DR

The Anthropic Agent SDK provides strong primitives for single-agent tool-use loops, with official documentation describing it as a distinct SDK from the Client SDK rather than as two packages, and indicating support for hooks, observability, and subagents; durable execution is not clearly documented in the available sources. Teams building production systems should expect to construct additional platform layers or adopt an orchestration workspace like Intent, the macOS workspace from Augment Code, to address some of these gaps structurally.

Where the SDK Boundary Sits in Your Production Stack

Engineering teams evaluating the Anthropic Agent SDK face a specific gap: the distance between a working demo and a production deployment. Anthropic's guidance on building effective AI agents makes a similar point, recommending that teams find the simplest solution possible and only increase complexity when needed. SDK boundaries leave concerns like security to the application layer, while applications must design their own orchestration and persistence patterns around the SDK.

The result is a clean, stable API boundary with a significant build cost behind it. A documented production case illustrates the gap: running just four agents in production required ClaudeSDKClient with bypassPermissions, Docker containers, Kafka event streaming, Neo4j/Memgraph graph databases, and 15 active MCP servers. That infrastructure complexity captures the distance between the SDK's boundary and what production demands.

This guide maps what ships in the box, what teams need to build themselves, and where Intent closes those gaps as a coordinated workspace.

Intent's living specs and coordinator/verifier architecture replace large amounts of custom orchestration engineering for multi-agent coding workflows.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

What Anthropic Ships: Agent Loops, Tool Use, Streaming, Guardrails

The Anthropic Agent SDK consists of two distinct packages, and engineers should understand the separation before making architecture decisions.

anthropic-sdk-python (v0.97.0) is the core API client handling the Messages API, streaming, tool use protocol, prompt caching, and model configuration. claude-agent-sdk (v0.1.71) is the higher-level agent harness extracted from Claude Code, providing the agent loop, built-in tools, subagent spawning, and MCP integration.

ComponentWhat ShipsPackage
Agent loopGather context, take action, verify work, repeatclaude-agent-sdk
Built-in toolsbash, read, write, web_search; MCP integrationclaude-agent-sdk
Tool use protocolClient tools + server tools (two-tier model)anthropic-sdk-python
StreamingSSE events, sync/async streams, text_stream iteratoranthropic-sdk-python
Prompt caching5-minute default, 1-hour extended; cache hits can reduce input-token costs by about 90%anthropic-sdk-python
Permission systemRoutes tool requests through safety checks before dispatchclaude-agent-sdk
Context compactionConfigurable context_token_threshold (default 100,000 tokens)claude-agent-sdk
Subagent spawningagents: dict[str, AgentDefinition] in optionsclaude-agent-sdk
Multi-agent (GA)Claude Managed Agents, added in anthropic-sdk-python v0.92.0 (April 8, 2026) and launched in public betaanthropic-sdk-python

Anthropic materials describe the agent loop as an iterative process, though the cited source for a canonical cycle with an explicit self-verification phase could not be verified.

The tool use system is mature. It supports parallel tool calls with multiple tool_use blocks per response, dynamic tool discovery via tool_search to avoid 50,000+ token upfront definitions, strict: true schema enforcement, and fine-grained per-tool streaming via eager_input_streaming.

python
with client.messages.stream(
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
model="claude-opus-4-7",
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)

The stream yields text incrementally through text_stream, the behavior teams use for real-time UI updates and interactive terminals.

One architectural fact teams often discover late: the Claude Agent SDK runs its agent loop inside a prebuilt CLI binary bundled in a platform-specific wheel, separate from the Python process. Communication details between the Python layer and the bundled CLI are not publicly documented. Each release weighs in around 270 to 340 MiB depending on platform, and wheel availability has varied across releases. The size affects Docker image budgets and CI/CD design choices.

What Anthropic Leaves to You: Context, Orchestration, Security

The Anthropic Agent SDK provides no built-in observability, no durable execution, no state persistence across sessions, and no multi-agent coordination beyond spawning subagents as tools. Each of these gaps has documented production failure modes.

Context Window Management

Context compaction can fire automatically, and the SDK exposes a PreCompact hook, though Anthropic's official Agent SDK docs do not specify a fixed trigger at ~95% usage. A stop_reason: "compaction" event can be intercepted when compaction is enabled. Engineers building persistent agent sessions, such as chat bridges and long-running assistants, need to account for how compaction saves or discards state.

When accumulated subagent results approach the context window limit, the Claude Agent Python SDK automatically compresses conversation history to avoid hitting hard context limits and allow tasks to continue. Practitioner analysis from ML6.eu suggests an effective working context during agent execution of 60,000 to 80,000 tokens, despite a nominal 200,000-token window.

Multi-Agent Orchestration

Teams can assign different tool access and permissions to individual subagents. The desired pattern of a coordinator with read-only access delegating to specialists with scoped write access is not achievable natively. Structured handoff and continuation mechanisms exist for cases where an agent approaches or hits its output limits.

Anthropic's own engineering team has acknowledged that small changes to a lead agent's prompt can unpredictably affect subagent behavior.

Security

Prompt injection remains a structural problem in agent architectures that pass untrusted content into model context. OWASP guidance emphasizes validating and handling untrusted data carefully.

The prompt injection probe and filesystem sandboxing in Claude Code are Claude Code-specific features rather than general Agent SDK capabilities. Custom agents built via the API start from a significantly more exposed security posture than Claude Code's out-of-box experience suggests.

The table below summarizes documented gap categories. Specific issues backing each row include the missing compaction lifecycle hook tracked in a Python SDK issue, graceful degradation problems at context limits in a Claude Code issue, the lack of per-agent permission scoping in another Claude Code report, and broken state persistence across compaction in a separate filing.

Gap CategorySpecific Missing Capability
ContextNo compaction lifecycle hook
ContextNo graceful degradation at context limits
OrchestrationNo per-agent permission scoping
OrchestrationNo structured agent handoffs
ObservabilityNo tracing, no metrics, no logging
PersistenceNo state persistence across compaction
SecurityPrompt injection probe only in Claude Code

How the SDK Fits into a Wider Agent Platform Stack

The Anthropic Agent SDK is one component in the production agent stack. Higher layers can be handled through Anthropic-managed services or SDK features, depending on the implementation.

LangChain's analysis of agent harnesses frames the organizing principle: an agent decomposes into a model (intelligence) and a harness (everything that makes intelligence useful). Anthropic extends this with a production caveat: the harness encodes assumptions about what the model cannot do on its own, and those assumptions go stale as models improve.

text
Layer 8: Managed Infrastructure (Claude Managed Agents, public beta April 2026)
Layer 7: Observability & Evaluation ← SDK provides: built-in observability (OpenTelemetry traces, metrics, logs); evaluation patterns mostly via external tooling and guidance
Layer 6: Auth / RBAC / Human-in-the-Loop ← SDK provides: can_use_tool and tool permission controls
Layer 5: Multi-Agent Coordination ← SDK provides: subagents-as-tools
Layer 4: Orchestration / Durable Execution ← SDK provides: limited support
Layer 3: Memory (short-term + long-term) ← Anthropic offers a Memory tool in beta, but evidence does not confirm it is provided as part of the Agent SDK specifically
Layer 2: Context Engineering ← SDK provides: compaction
Layer 1: SDK / API Primitives
Layer 0: Model (Claude Haiku, Sonnet, or Opus)

The gap at Layer 4 is significant. ZenML has discussed durable execution as infrastructure for running production AI agents reliably, including recovery from worker crashes and continuation from saved workflow history. The Anthropic Agent SDK offers agent loop and context management features, while durable execution and checkpoint-based job resumption are described for Anthropic Managed Agents and third-party workflow platforms rather than as built-in SDK capabilities.

A different framing comes from Augment Cosmos, currently in research preview, which positions itself as an operating system for agentic software development. Cosmos consolidates these layers into shared primitives across the stack: an agent runtime, the Context Engine, an event bus tied to the SDLC, and an organization-wide knowledge layer that agents read from and write to. For teams weighing how much of the platform to own internally, this provides one alternative to assembling Layers 2 through 7 from independent components.

Intent sits inside that broader picture as the developer-facing workspace where multi-agent coordination happens day to day. Cosmos describes the platform across the full SDLC, while Intent goes beyond Layer 5 by treating multi-agent development as a single coordinated system: agents share a living spec and isolated workspace, stay aligned as the plan evolves, and adapt without restarts.

Capabilities and Gaps: What Works and What Does Not

The SDK's tool use system is its strongest component. Dynamic tool discovery, strict schema enforcement, parallel execution, and per-tool streaming represent mature, production-tested infrastructure. Prompt caching offers a concrete 10x cost reduction ($0.30/MTok cached versus $3.00/MTok uncached on Claude Sonnet pricing) that rewards deliberate architectural choices.

The hook system (PreToolUse, PostToolUse, Stop, SubagentStart, and others) provides extensibility for tool-level interception. MCP integration commonly uses external subprocess servers over stdio and can also use HTTP/SSE-based servers via the SDK's transport integrations.

Several rough edges remain. The boundary between SDK-provided behavior and Claude Code-only features confuses engineers systematically. Multiple security features like prompt injection probes and sandboxing live only inside Claude Code, and the documentation gap is not always clear. Several ClaudeAgentOptions fields (hooks, agents, sandbox, plugins) exist in source code but are missing from official documentation. The AgentDefinition dataclass uses camelCase while ClaudeAgentOptions uses snake_case, reflecting the underlying CLI's JSON schema instead of consistent Python conventions.

A documented bug report says Opus 4.7 can silently self-downgrade to Sonnet 4.6 mid-session. Responses can still be generated, though persistence behavior depends on the SDK and the memory implementation chosen.

The Build Cost of What Is Missing

Even teams using commercial orchestration frameworks still build custom infrastructure for the layers a framework does not cover. Spotify's advertising team used Google ADK with Google Cloud for session storage and Apollo (Spotify) for observability. Frameworks shift the engineering burden; they do not eliminate it.

Platform LayerInitial Build EstimateOngoing Maintenance
Context / Memory Management400 to 800 hoursHigh: continuous with model updates
Multi-Agent Orchestration600 to 1,200 hoursVery High: routing and boundary tuning
Security Hardening300 to 600 hoursContinuous: no stable plateau
Observability / Monitoring200 to 500 hoursModerate: updates as topology changes
Evaluation Pipeline400 to 800 hoursVery High: weekly cadence per practitioner data
State Persistence / Durable Execution300 to 600 hoursModerate: schema migrations, scaling
Total2,200 to 4,500 hoursN/A

These estimates are best read as directional synthesis, not precise budgeting data. Teams without distributed systems expertise, or in regulated industries, should plan toward the upper range. Hardening recommendations for agentic development security and weekly evaluation cadences described in practitioner field notes reinforce that this work compounds. A ZenML retrospective on more than 1,000 production deployments suggests that building these layers leans heavily on distributed systems and platform engineering skills, with implications for hiring.

Evaluation carries the highest ongoing maintenance-to-build ratio of all six layers. One organization reported evaluation costs running at 10x the baseline agent workload.

Intent's living specs and parallel agent waves replace much of this custom orchestration engineering with a coordinated workspace.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline
···
$ cat build.log | auggie --print --quiet \
"Summarize the failure"
Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42
Fix: npm install lodash @types/lodash

How Intent Fills the Gaps Anthropic Intentionally Leaves Open

Intent is a macOS desktop application from Augment Code for orchestrating multiple AI coding agents. It addresses the coordination, context, and verification gaps that the Anthropic Agent SDK leaves to custom engineering.

Open source
augmentcode/augment-swebench-agent871
Star on GitHub

Power users of AI coding tools end up with too many terminal panes, multiple agents running simultaneously, manual copy-pasting of context between them, and no reliable way to track which branch contains which changes. Intent targets the coordination problem directly by tracking branches, sharing context, and keeping agents aligned on a shared spec.

Intent uses a coordinator/implementor/verifier architecture:

  1. Coordinator agent: Uses the Context Engine to analyze the codebase, draft a spec, and delegate tasks to specialist agents.
  2. Implementor agents: Execute the approved plan in parallel waves, each running in an isolated git worktree.
  3. Verifier agent: Checks results against the spec and flags inconsistencies before returning work to the developer.

When Intent runs cross-service refactors, agents share architectural understanding because the Context Engine semantically indexes and maps code relationships across hundreds of thousands of files, instead of relying on per-session partial context.

DimensionClaude Agent SDK (Base)Intent
Execution modelSingle-session terminal agentCoordinator/specialist/verifier multi-agent
Context scopePer-session prompt (60-80k effective tokens)Persistent semantic index across 400,000+ files
Conflict preventionNo specific evidence of manual branch management in Anthropic's Agent SDK guidanceIsolated git worktrees per agent
Spec alignmentStatic initial promptLiving spec that updates as agents work
Agent providersClaude onlyClaude Code, Codex, OpenCode, plus native agents
ComplianceSOC 2 Type II, ISO/IEC 42001

Intent supports BYOA (Bring Your Own Agent): Claude Code sessions run under Intent's orchestration with CLAUDE.md configuration carrying over intact. No Augment subscription is required to use external agents. External agents receive the spec contract but lack the deep architectural context the Context Engine provides, which matters most for cross-service refactors in large codebases and less for greenfield features.

Two limitations are documented openly: Intent's public beta is macOS Apple Silicon only, and worktrees do not provide runtime isolation for external state such as databases and environment resources.

Production Readiness Checklist: SDK Plus Platform Requirements

This checklist separates what the Anthropic Agent SDK provides natively from what teams must build before deploying agents to production. Items marked [CRITICAL] have documented production failure modes.

Security

Security failures in agent systems are rarely about the model itself. They show up at the boundaries: external content entering context, MCP servers added without review, and tool access that was never scoped to least privilege. The SDK provides one of these controls; teams build the rest.

  • 🟢 Permission system routes tool requests through safety checks (SDK-native)
  • 🔴 [CRITICAL] Prompt injection defenses for all external content entering context: web retrieval, database reads, file system reads, and tool outputs should be treated as untrusted data
  • 🔴 [CRITICAL] MCP server supply chain controls remain an important security consideration, and new MCP servers should be handled cautiously
  • 🔴 Least-privilege tool access with documented justification per tool
  • 🔴 Emergency shutdown capabilities, tested in staging within 30 days of launch

Guardrails and Cost Controls

Cost overruns in production agents tend to come from runaway loops and missing budget enforcement, not per-token pricing. The SDK ships streaming refusal handling, and teams should layer explicit stopping conditions and circuit breakers above it.

  • 🟢 Streaming refusal handling (SDK-native)
  • 🔴 [CRITICAL] Explicit stopping conditions, since max_iterations is caller-managed and not enforced by the SDK
  • 🔴 [CRITICAL] Circuit breakers for agent loops guard against infinite or runaway multi-agent conversations that can drive up costs if left undetected
  • 🔴 [CRITICAL] Hard daily spending limits with automatic suspension at 50%, 80%, and 100% thresholds
  • 🔴 Session-level circuit breakers separate from aggregate cost caps

Observability and Error Handling

Agent observability is structurally different from web service monitoring. Teams need traces that capture every tool call, retry decisions that distinguish recoverable from unrecoverable errors, and deployment patterns that account for in-flight agent work.

  • 🔴 No SDK-native usage monitoring API documented for the Anthropic Agent SDK; usage and cost monitoring are available via separate Admin/Usage APIs and third-party integrations
  • 🔴 [CRITICAL] Distributed tracing for every agent step, tool call, and state transition
  • 🔴 [CRITICAL] Agent loop detection with alerting routed to on-call engineers
  • 🔴 Retry logic with exponential backoff distinguishing retryable from non-retryable errors
  • 🔴 Deployment strategy that accounts for long-running, in-flight agents during code changes

Compliance

Compliance for agent systems extends regulatory requirements that already apply to model deployments. Audit logging, PII handling, and AI Act readiness should be addressed before agents touch customer data, even when an upstream trust portal exists.

  • 🟢 Trust and compliance portal at trust.anthropic.com
  • 🔴 EU AI Act applicability assessment (full compliance currently required August 2, 2026 for high-risk Annex III AI systems; penalties up to €15M or 3% of global annual turnover). The European Commission's Digital Omnibus on AI proposed in November 2025 would push the deadline to December 2, 2027, but the proposal has not yet been adopted, so August 2, 2026 remains the operative date until further notice.
  • 🔴 Immutable audit logging with session ID, agent ID, step number, input hash, output hash, tool called, and timestamp
  • 🔴 PII controls that strip or pseudonymize PII at the input layer before it reaches the agent

Start with a Platform Assessment Before Your Next Agent Deployment

The Anthropic Agent SDK provides strong, stable primitives for tool use, streaming, and single-agent loops. The build cost above those primitives runs 2,200 to 4,500 engineer-hours of platform infrastructure that every production team will need regardless of framework choice. Teams should audit which of the six platform layers (context, orchestration, security, observability, evaluation, persistence) they can build internally versus adopt through an orchestration workspace. The SDK boundary is explicit and clean; the open question is how much of the remaining stack a team wants to own.

Intent's living specs and coordinator/verifier architecture handle context management and multi-agent coordination for multi-file tasks across large codebases.

See how Intent keeps parallel agents aligned without manual reconciliation.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

FAQ

Written by

Molisha Shah

Molisha Shah

Molisha is an early GTM and Customer Champion at Augment Code, where she focuses on helping developers understand and adopt modern AI coding practices. She writes about clean code principles, agentic development environments, and how teams are restructuring their workflows around AI agents. She holds a degree in Business and Cognitive Science from UC Berkeley.


Get Started

Give your codebase the agents it deserves

Install Augment to get started. Works with codebases of any size, from side projects to enterprise monorepos.