Does the Anthropic Agent SDK support models other than Claude?

The Claude Agent SDK is Claude-first but also supports providers including Bedrock, Vertex AI, and Azure, and is more model-flexible than a strictly Claude-only framework. LangGraph, OpenAI Agents SDK, CrewAI, and AutoGen all provide multi-provider model support. Teams requiring model diversity must add a multi-model routing layer themselves or adopt an orchestration platform like Intent that supports mixing Claude, GPT, and other models per task.

What is the subprocess architecture and why does it matter?

The Claude Agent SDK agent loop runs inside a prebuilt CLI binary bundled in a platform-specific wheel, separate from the Python process. The application communicates over stdin/stdout using NDJSON. Each release weighs in around 270 to 340 MiB depending on platform, and wheel availability has varied across releases, so teams should check the current PyPI release page for their target platform. The size affects Docker image budgets, CI/CD pipeline builds, and FastAPI integration, including a reported subprocess and stream handling problem related to anyio's TextReceiveStream and subprocess stdout.

How does the SDK compare to LangGraph for production agent systems?

LangGraph's documentation describes built-in state persistence, including SQLite and PostgreSQL-backed checkpointing, support for human-in-the-loop workflows, durable execution across process boundaries, and integration with LangSmith for observability. The Anthropic Agent SDK does not appear to provide built-in state persistence, durable execution across processes, or integrated observability tooling natively, though it does include native capabilities such as Sessions, Hooks, AskUserQuestion, Monitor, Subagents, and Permissions. The SDK's strength is its deep MCP integration and mature tool use system. Teams needing durable execution and checkpointing will find LangGraph addresses those requirements directly.

What security features exist in the SDK versus in Claude Code specifically?

Filesystem sandboxing is documented as a Claude Code feature, while the Agent SDK documentation does not explicitly describe sandboxing or a prompt injection probe as general SDK capabilities. Custom agents built via the API must implement equivalent controls. Anthropic's NIST RFI submission recommends evaluating agent security across four layers: model capability, tools, harness orchestration, and execution environment. Most security evaluation today concentrates on the model layer, even though harness and environment layers most reliably determine security outcomes.

Can Intent work with Claude Code sessions directly?

Intent supports Claude Code, Codex, and OpenCode as BYOA providers. CLAUDE.md configuration carries over intact when Claude Code sessions run under Intent's orchestration. No Augment subscription is required for external agent use. External agents receive the spec contract but lack Context Engine depth for architectural understanding across large codebases.

Anthropic Agent SDK: What It Ships vs. What It Leaves to You

The Anthropic Agent SDK ships a production-quality agent loop, tool use protocol, and streaming infrastructure, but organizations may still need to build context management, multi-agent orchestration, observability, security hardening, and state persistence themselves before production readiness.

TL;DR

The Anthropic Agent SDK provides strong primitives for single-agent tool-use loops, with official documentation describing it as a distinct SDK from the Client SDK rather than as two packages, and indicating support for hooks, observability, and subagents; durable execution is not clearly documented in the available sources. Teams building production systems should expect to construct additional platform layers or adopt an orchestration workspace like Intent, the macOS workspace from Augment Code, to address some of these gaps structurally.

Where the SDK Boundary Sits in Your Production Stack

Engineering teams evaluating the Anthropic Agent SDK face a specific gap: the distance between a working demo and a production deployment. Anthropic's guidance on building effective AI agents makes a similar point, recommending that teams find the simplest solution possible and only increase complexity when needed. SDK boundaries leave concerns like security to the application layer, while applications must design their own orchestration and persistence patterns around the SDK.

The result is a clean, stable API boundary with a significant build cost behind it. A documented production case illustrates the gap: running just four agents in production required ClaudeSDKClient with bypassPermissions, Docker containers, Kafka event streaming, Neo4j/Memgraph graph databases, and 15 active MCP servers. That infrastructure complexity captures the distance between the SDK's boundary and what production demands.

This guide maps what ships in the box, what teams need to build themselves, and where Intent closes those gaps as a coordinated workspace.

Intent's living specs and coordinator/verifier architecture replace large amounts of custom orchestration engineering for multi-agent coding workflows.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

What Anthropic Ships: Agent Loops, Tool Use, Streaming, Guardrails

The Anthropic Agent SDK consists of two distinct packages, and engineers should understand the separation before making architecture decisions.

anthropic-sdk-python (v0.97.0) is the core API client handling the Messages API, streaming, tool use protocol, prompt caching, and model configuration. claude-agent-sdk (v0.1.71) is the higher-level agent harness extracted from Claude Code, providing the agent loop, built-in tools, subagent spawning, and MCP integration.

Component	What Ships	Package
Agent loop	Gather context, take action, verify work, repeat	claude-agent-sdk
Built-in tools	bash, read, write, web_search; MCP integration	claude-agent-sdk
Tool use protocol	Client tools + server tools (two-tier model)	anthropic-sdk-python
Streaming	SSE events, sync/async streams, text_stream iterator	anthropic-sdk-python
Prompt caching	5-minute default, 1-hour extended; cache hits can reduce input-token costs by about 90%	anthropic-sdk-python
Permission system	Routes tool requests through safety checks before dispatch	claude-agent-sdk
Context compaction	Configurable context_token_threshold (default 100,000 tokens)	claude-agent-sdk
Subagent spawning	agents: dict[str, AgentDefinition] in options	claude-agent-sdk
Multi-agent (GA)	Claude Managed Agents, added in anthropic-sdk-python v0.92.0 (April 8, 2026) and launched in public beta	anthropic-sdk-python

Anthropic materials describe the agent loop as an iterative process, though the cited source for a canonical cycle with an explicit self-verification phase could not be verified.

The tool use system is mature. It supports parallel tool calls with multiple tool_use blocks per response, dynamic tool discovery via tool_search to avoid 50,000+ token upfront definitions, strict: true schema enforcement, and fine-grained per-tool streaming via eager_input_streaming.

python

with client.messages.stream(
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    model="claude-opus-4-7",
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

The stream yields text incrementally through text_stream, the behavior teams use for real-time UI updates and interactive terminals.

One architectural fact teams often discover late: the Claude Agent SDK runs its agent loop inside a prebuilt CLI binary bundled in a platform-specific wheel, separate from the Python process. Communication details between the Python layer and the bundled CLI are not publicly documented. Each release weighs in around 270 to 340 MiB depending on platform, and wheel availability has varied across releases. The size affects Docker image budgets and CI/CD design choices.

What Anthropic Leaves to You: Context, Orchestration, Security

The Anthropic Agent SDK provides no built-in observability, no durable execution, no state persistence across sessions, and no multi-agent coordination beyond spawning subagents as tools. Each of these gaps has documented production failure modes.

Context Window Management

Context compaction can fire automatically, and the SDK exposes a PreCompact hook, though Anthropic's official Agent SDK docs do not specify a fixed trigger at ~95% usage. A stop_reason: "compaction" event can be intercepted when compaction is enabled. Engineers building persistent agent sessions, such as chat bridges and long-running assistants, need to account for how compaction saves or discards state.

When accumulated subagent results approach the context window limit, the Claude Agent Python SDK automatically compresses conversation history to avoid hitting hard context limits and allow tasks to continue. Practitioner analysis from ML6.eu suggests an effective working context during agent execution of 60,000 to 80,000 tokens, despite a nominal 200,000-token window.

Multi-Agent Orchestration

Teams can assign different tool access and permissions to individual subagents. The desired pattern of a coordinator with read-only access delegating to specialists with scoped write access is not achievable natively. Structured handoff and continuation mechanisms exist for cases where an agent approaches or hits its output limits.

Anthropic's own engineering team has acknowledged that small changes to a lead agent's prompt can unpredictably affect subagent behavior.

Security

Prompt injection remains a structural problem in agent architectures that pass untrusted content into model context. OWASP guidance emphasizes validating and handling untrusted data carefully.

The prompt injection probe and filesystem sandboxing in Claude Code are Claude Code-specific features rather than general Agent SDK capabilities. Custom agents built via the API start from a significantly more exposed security posture than Claude Code's out-of-box experience suggests.

The table below summarizes documented gap categories. Specific issues backing each row include the missing compaction lifecycle hook tracked in a Python SDK issue, graceful degradation problems at context limits in a Claude Code issue, the lack of per-agent permission scoping in another Claude Code report, and broken state persistence across compaction in a separate filing.

Gap Category	Specific Missing Capability
Context	No compaction lifecycle hook
Context	No graceful degradation at context limits
Orchestration	No per-agent permission scoping
Orchestration	No structured agent handoffs
Observability	No tracing, no metrics, no logging
Persistence	No state persistence across compaction
Security	Prompt injection probe only in Claude Code

How the SDK Fits into a Wider Agent Platform Stack

The Anthropic Agent SDK is one component in the production agent stack. Higher layers can be handled through Anthropic-managed services or SDK features, depending on the implementation.

LangChain's analysis of agent harnesses frames the organizing principle: an agent decomposes into a model (intelligence) and a harness (everything that makes intelligence useful). Anthropic extends this with a production caveat: the harness encodes assumptions about what the model cannot do on its own, and those assumptions go stale as models improve.

text

Layer 8: Managed Infrastructure (Claude Managed Agents, public beta April 2026)
Layer 7: Observability & Evaluation        ← SDK provides: built-in observability (OpenTelemetry traces, metrics, logs); evaluation patterns mostly via external tooling and guidance
Layer 6: Auth / RBAC / Human-in-the-Loop   ← SDK provides: can_use_tool and tool permission controls
Layer 5: Multi-Agent Coordination           ← SDK provides: subagents-as-tools
Layer 4: Orchestration / Durable Execution  ← SDK provides: limited support
Layer 3: Memory (short-term + long-term)    ← Anthropic offers a Memory tool in beta, but evidence does not confirm it is provided as part of the Agent SDK specifically
Layer 2: Context Engineering                ← SDK provides: compaction
Layer 1: SDK / API Primitives
Layer 0: Model (Claude Haiku, Sonnet, or Opus)

The gap at Layer 4 is significant. ZenML has discussed durable execution as infrastructure for running production AI agents reliably, including recovery from worker crashes and continuation from saved workflow history. The Anthropic Agent SDK offers agent loop and context management features, while durable execution and checkpoint-based job resumption are described for Anthropic Managed Agents and third-party workflow platforms rather than as built-in SDK capabilities.

A different framing comes from Augment Cosmos, currently in research preview, which positions itself as an operating system for agentic software development. Cosmos consolidates these layers into shared primitives across the stack: an agent runtime, the Context Engine, an event bus tied to the SDLC, and an organization-wide knowledge layer that agents read from and write to. For teams weighing how much of the platform to own internally, this provides one alternative to assembling Layers 2 through 7 from independent components.

Intent sits inside that broader picture as the developer-facing workspace where multi-agent coordination happens day to day. Cosmos describes the platform across the full SDLC, while Intent goes beyond Layer 5 by treating multi-agent development as a single coordinated system: agents share a living spec and isolated workspace, stay aligned as the plan evolves, and adapt without restarts.

Capabilities and Gaps: What Works and What Does Not

The SDK's tool use system is its strongest component. Dynamic tool discovery, strict schema enforcement, parallel execution, and per-tool streaming represent mature, production-tested infrastructure. Prompt caching offers a concrete 10x cost reduction ($0.30/MTok cached versus $3.00/MTok uncached on Claude Sonnet pricing) that rewards deliberate architectural choices.

The hook system (PreToolUse, PostToolUse, Stop, SubagentStart, and others) provides extensibility for tool-level interception. MCP integration commonly uses external subprocess servers over stdio and can also use HTTP/SSE-based servers via the SDK's transport integrations.

Several rough edges remain. The boundary between SDK-provided behavior and Claude Code-only features confuses engineers systematically. Multiple security features like prompt injection probes and sandboxing live only inside Claude Code, and the documentation gap is not always clear. Several ClaudeAgentOptions fields (hooks, agents, sandbox, plugins) exist in source code but are missing from official documentation. The AgentDefinition dataclass uses camelCase while ClaudeAgentOptions uses snake_case, reflecting the underlying CLI's JSON schema instead of consistent Python conventions.

A documented bug report says Opus 4.7 can silently self-downgrade to Sonnet 4.6 mid-session. Responses can still be generated, though persistence behavior depends on the SDK and the memory implementation chosen.

The Build Cost of What Is Missing

Even teams using commercial orchestration frameworks still build custom infrastructure for the layers a framework does not cover. Spotify's advertising team used Google ADK with Google Cloud for session storage and Apollo (Spotify) for observability. Frameworks shift the engineering burden; they do not eliminate it.

Platform Layer	Initial Build Estimate	Ongoing Maintenance
Context / Memory Management	400 to 800 hours	High: continuous with model updates
Multi-Agent Orchestration	600 to 1,200 hours	Very High: routing and boundary tuning
Security Hardening	300 to 600 hours	Continuous: no stable plateau
Observability / Monitoring	200 to 500 hours	Moderate: updates as topology changes
Evaluation Pipeline	400 to 800 hours	Very High: weekly cadence per practitioner data
State Persistence / Durable Execution	300 to 600 hours	Moderate: schema migrations, scaling
Total	2,200 to 4,500 hours	N/A

These estimates are best read as directional synthesis, not precise budgeting data. Teams without distributed systems expertise, or in regulated industries, should plan toward the upper range. Hardening recommendations for agentic development security and weekly evaluation cadences described in practitioner field notes reinforce that this work compounds. A ZenML retrospective on more than 1,000 production deployments suggests that building these layers leans heavily on distributed systems and platform engineering skills, with implications for hiring.

Evaluation carries the highest ongoing maintenance-to-build ratio of all six layers. One organization reported evaluation costs running at 10x the baseline agent workload.

Intent's living specs and parallel agent waves replace much of this custom orchestration engineering with a coordinated workspace.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline

···

$ cat build.log | auggie --print --quiet \

"Summarize the failure"

Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42

Fix: npm install lodash @types/lodash

How Intent Fills the Gaps Anthropic Intentionally Leaves Open

Intent is a macOS desktop application from Augment Code for orchestrating multiple AI coding agents. It addresses the coordination, context, and verification gaps that the Anthropic Agent SDK leaves to custom engineering.

Open source

augmentcode/augment-swebench-agent★871

Star on GitHub

Power users of AI coding tools end up with too many terminal panes, multiple agents running simultaneously, manual copy-pasting of context between them, and no reliable way to track which branch contains which changes. Intent targets the coordination problem directly by tracking branches, sharing context, and keeping agents aligned on a shared spec.

Intent uses a coordinator/implementor/verifier architecture:

Coordinator agent: Uses the Context Engine to analyze the codebase, draft a spec, and delegate tasks to specialist agents.
Implementor agents: Execute the approved plan in parallel waves, each running in an isolated git worktree.
Verifier agent: Checks results against the spec and flags inconsistencies before returning work to the developer.

When Intent runs cross-service refactors, agents share architectural understanding because the Context Engine semantically indexes and maps code relationships across hundreds of thousands of files, instead of relying on per-session partial context.

Dimension	Claude Agent SDK (Base)	Intent
Execution model	Single-session terminal agent	Coordinator/specialist/verifier multi-agent
Context scope	Per-session prompt (60-80k effective tokens)	Persistent semantic index across 400,000+ files
Conflict prevention	No specific evidence of manual branch management in Anthropic's Agent SDK guidance	Isolated git worktrees per agent
Spec alignment	Static initial prompt	Living spec that updates as agents work
Agent providers	Claude only	Claude Code, Codex, OpenCode, plus native agents
Compliance	—	SOC 2 Type II, ISO/IEC 42001

Intent supports BYOA (Bring Your Own Agent): Claude Code sessions run under Intent's orchestration with CLAUDE.md configuration carrying over intact. No Augment subscription is required to use external agents. External agents receive the spec contract but lack the deep architectural context the Context Engine provides, which matters most for cross-service refactors in large codebases and less for greenfield features.

Two limitations are documented openly: Intent's public beta is macOS Apple Silicon only, and worktrees do not provide runtime isolation for external state such as databases and environment resources.

Production Readiness Checklist: SDK Plus Platform Requirements

This checklist separates what the Anthropic Agent SDK provides natively from what teams must build before deploying agents to production. Items marked [CRITICAL] have documented production failure modes.

Security

Security failures in agent systems are rarely about the model itself. They show up at the boundaries: external content entering context, MCP servers added without review, and tool access that was never scoped to least privilege. The SDK provides one of these controls; teams build the rest.

🟢 Permission system routes tool requests through safety checks (SDK-native)
🔴 [CRITICAL] Prompt injection defenses for all external content entering context: web retrieval, database reads, file system reads, and tool outputs should be treated as untrusted data
🔴 [CRITICAL] MCP server supply chain controls remain an important security consideration, and new MCP servers should be handled cautiously
🔴 Least-privilege tool access with documented justification per tool
🔴 Emergency shutdown capabilities, tested in staging within 30 days of launch

Guardrails and Cost Controls

Cost overruns in production agents tend to come from runaway loops and missing budget enforcement, not per-token pricing. The SDK ships streaming refusal handling, and teams should layer explicit stopping conditions and circuit breakers above it.

🟢 Streaming refusal handling (SDK-native)
🔴 [CRITICAL] Explicit stopping conditions, since max_iterations is caller-managed and not enforced by the SDK
🔴 [CRITICAL] Circuit breakers for agent loops guard against infinite or runaway multi-agent conversations that can drive up costs if left undetected
🔴 [CRITICAL] Hard daily spending limits with automatic suspension at 50%, 80%, and 100% thresholds
🔴 Session-level circuit breakers separate from aggregate cost caps

Observability and Error Handling

Agent observability is structurally different from web service monitoring. Teams need traces that capture every tool call, retry decisions that distinguish recoverable from unrecoverable errors, and deployment patterns that account for in-flight agent work.

🔴 No SDK-native usage monitoring API documented for the Anthropic Agent SDK; usage and cost monitoring are available via separate Admin/Usage APIs and third-party integrations
🔴 [CRITICAL] Distributed tracing for every agent step, tool call, and state transition
🔴 [CRITICAL] Agent loop detection with alerting routed to on-call engineers
🔴 Retry logic with exponential backoff distinguishing retryable from non-retryable errors
🔴 Deployment strategy that accounts for long-running, in-flight agents during code changes

Compliance

Compliance for agent systems extends regulatory requirements that already apply to model deployments. Audit logging, PII handling, and AI Act readiness should be addressed before agents touch customer data, even when an upstream trust portal exists.

🟢 Trust and compliance portal at trust.anthropic.com
🔴 EU AI Act applicability assessment (full compliance currently required August 2, 2026 for high-risk Annex III AI systems; penalties up to €15M or 3% of global annual turnover). The European Commission's Digital Omnibus on AI proposed in November 2025 would push the deadline to December 2, 2027, but the proposal has not yet been adopted, so August 2, 2026 remains the operative date until further notice.
🔴 Immutable audit logging with session ID, agent ID, step number, input hash, output hash, tool called, and timestamp
🔴 PII controls that strip or pseudonymize PII at the input layer before it reaches the agent

Start with a Platform Assessment Before Your Next Agent Deployment

The Anthropic Agent SDK provides strong, stable primitives for tool use, streaming, and single-agent loops. The build cost above those primitives runs 2,200 to 4,500 engineer-hours of platform infrastructure that every production team will need regardless of framework choice. Teams should audit which of the six platform layers (context, orchestration, security, observability, evaluation, persistence) they can build internally versus adopt through an orchestration workspace. The SDK boundary is explicit and clean; the open question is how much of the remaining stack a team wants to own.

Intent's living specs and coordinator/verifier architecture handle context management and multi-agent coordination for multi-file tasks across large codebases.

See how Intent keeps parallel agents aligned without manual reconciliation.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

Anthropic Agent SDK: What It Ships vs. What It Leaves to You

TL;DR

Where the SDK Boundary Sits in Your Production Stack

Intent's living specs and coordinator/verifier architecture replace large amounts of custom orchestration engineering for multi-agent coding workflows.

What Anthropic Ships: Agent Loops, Tool Use, Streaming, Guardrails

What Anthropic Leaves to You: Context, Orchestration, Security

Context Window Management

Multi-Agent Orchestration

Security

How the SDK Fits into a Wider Agent Platform Stack

Capabilities and Gaps: What Works and What Does Not

The Build Cost of What Is Missing

Intent's living specs and parallel agent waves replace much of this custom orchestration engineering with a coordinated workspace.

How Intent Fills the Gaps Anthropic Intentionally Leaves Open

Production Readiness Checklist: SDK Plus Platform Requirements

Security

Guardrails and Cost Controls

Observability and Error Handling

Compliance

Start with a Platform Assessment Before Your Next Agent Deployment

See how Intent keeps parallel agents aligned without manual reconciliation.

FAQ

Written by

Molisha Shah

Give your codebase the agents it deserves

TL;DR

Where the SDK Boundary Sits in Your Production Stack

Intent's living specs and coordinator/verifier architecture replace large amounts of custom orchestration engineering for multi-agent coding workflows.

What Anthropic Ships: Agent Loops, Tool Use, Streaming, Guardrails

What Anthropic Leaves to You: Context, Orchestration, Security

Context Window Management

Multi-Agent Orchestration

Security

How the SDK Fits into a Wider Agent Platform Stack

Capabilities and Gaps: What Works and What Does Not

The Build Cost of What Is Missing

Intent's living specs and parallel agent waves replace much of this custom orchestration engineering with a coordinated workspace.

How Intent Fills the Gaps Anthropic Intentionally Leaves Open

Production Readiness Checklist: SDK Plus Platform Requirements

Security

Guardrails and Cost Controls

Observability and Error Handling

Compliance

Start with a Platform Assessment Before Your Next Agent Deployment

See how Intent keeps parallel agents aligned without manual reconciliation.

FAQ

Does the Anthropic Agent SDK support models other than Claude?

What is the subprocess architecture and why does it matter?

How does the SDK compare to LangGraph for production agent systems?

What security features exist in the SDK versus in Claude Code specifically?

Can Intent work with Claude Code sessions directly?

Related

Written by

Molisha Shah

Give your codebase the agents it deserves