The Claude Agent SDK (claude-agent-sdk) is Anthropic's official Python package for building autonomous AI agents. It wraps a bundled Claude Code CLI binary over stdio, giving Python code direct access to file operations, terminal commands, and multi-step workflow chaining without manual tool-loop management.
TL;DR
The claude-agent-sdk package and the anthropic HTTP client are easy to confuse, but they use different classes, async patterns, and tool systems. This guide walks through installation, query(), ClaudeSDKClient, custom MCP tools, async execution, and multi-step workflows for the Agent SDK on Python 3.10+, then shows where Intent's spec-driven orchestration takes over once single-session agents hit coordination limits.
Picking the Right Anthropic Python Package
Python developers building their first Claude agent face an immediate problem: two separate Anthropic packages exist, and most tutorials conflate them. The anthropic package is the official Python SDK for the Anthropic REST API, with synchronous and asynchronous clients powered by httpx. The claude-agent-sdk package wraps the Claude Code CLI subprocess, giving agents autonomous access to file operations, terminal commands, and web search without manual tool-loop management.
Picking the right package is the first decision for any Python agent project. The Agent SDK handles the tool-call loop internally, so Claude autonomously reads files, runs terminal commands, and searches the web without application code managing each step. The Messages API takes a different approach, where developers write the tool execution loop in their application by following Anthropic's documented tool-use pattern. That approach offers flexibility at the cost of additional code.
The distinction matters because each package uses different classes, async patterns, and tool registration systems. Code using anthropic.Anthropic() and client.messages.create() does not use the Agent SDK at all. Each Python agent example in this guide uses the SDK's async streaming interface.
This guide covers the Agent SDK specifically: installation on Python 3.10+, creating a first agent with query() and ClaudeSDKClient, defining custom tools via the @tool decorator, running concurrent agents with asyncio.gather(), and chaining multi-step workflows with shared state. The final section covers where Intent, Augment Code's spec-driven workspace, adds orchestration capabilities for multi-agent coordination beyond a single SDK session.
Explore how Intent coordinates multi-step Python agent work through living specs that stay aligned as requirements change.
Free tier available · VS Code extension · Takes 2 minutes
Installation and Setup on Python 3.10+
The Claude Agent SDK requires Python 3.10 or higher and supports 3.10, 3.11, 3.12, and 3.13. It ships as a single pip-installable package with the CLI binary bundled inside the official wheel.
The installer skips any separate CLI download or PATH configuration. Wheel sizes vary by platform because the bundled CLI binary is compiled per architecture, so macOS Apple Silicon, macOS Intel, Linux x64, and Windows x64 each ship a different wheel.
| Extra | Install Command | Purpose |
|---|---|---|
| Development | pip install claude-agent-sdk[dev] | Contributing to the SDK |
| Examples | pip install claude-agent-sdk[examples] | Bundled example scripts |
| Tracing | pip install claude-agent-sdk[otel] | OpenTelemetry integration |
API key configuration uses environment variables:
Alternative providers include AWS Bedrock (CLAUDE_CODE_USE_BEDROCK=1), Google Cloud Vertex AI (CLAUDE_CODE_USE_VERTEX=1), and Anthropic Foundry (CLAUDE_CODE_USE_FOUNDRY=1).
Migration note: The previous claude-code-sdk package is deprecated, and the class ClaudeCodeOptions was renamed to ClaudeAgentOptions. Any existing code using the old names requires updating.
Your First Claude Agent in Python
The Claude Agent SDK provides two interaction modes for Python developers: query() for single-exchange agents and ClaudeSDKClient for persistent multi-turn conversations. Each mode handles message streaming, tool execution, and CLI subprocess management automatically, so application code can focus on prompts and response handling.
Minimal Agent with query()
The query() function is the primary entry point and returns an AsyncIterator of response messages:
The SDK uses anyio.run() in place of asyncio.run(), as documented in the Anthropic SDK docs. Developers should prefer anyio.run() because the SDK's async layer is built on anyio.
Multi-Turn Conversations with ClaudeSDKClient
ClaudeSDKClient maintains conversation history across query() calls, so the second message has full context of the first exchange. Each call to client.query() sends a message, and client.receive_response() returns an async iterator of response messages.
Configured Agent with Message Parsing
The ClaudeAgentOptions dataclass controls system prompts, model selection, and tool permissions, along with related configuration options:
| Permission Mode | Behavior |
|---|---|
| 'default' | CLI prompts for dangerous tools |
| 'acceptEdits' | Auto-accept file edits |
| 'plan' | Plan mode; restricts the agent to analysis and planning rather than file modification or command execution (verify exact behavior in current Claude Code docs) |
| 'bypassPermissions' | Allow all tools without prompts |
Common misconception: allowed_tools functions as a tool filter over the base set of available tools, and in some contexts it also controls which tools are allowed without prompting. Listed tools are auto-approved without prompting. Tools are governed by the SDK's permission flow and current permission_mode settings as documented. To actively block specific tools, use disallowed_tools.
Error Handling
The SDK defines specific error types for common failure modes:
| Error Class | Meaning |
|---|---|
| ClaudeSDKError | Base error class |
| CLINotFoundError | Claude Code not found or not installed |
| CLIConnectionError | Connection issues with CLI process |
| ProcessError | Claude Code process failed |
| CLIJSONDecodeError | JSON parsing failure in CLI response |
Adding Tools: Function Definitions and Response Handling
The Claude Agent SDK supports built-in tools such as Read, Write, Edit, Bash, Monitor, Glob, Grep, WebSearch, and WebFetch. Developers can extend this set with custom tools that run as in-process MCP servers, as shown in the Claude Agent SDK Python repository. Custom tool registration is the primary extension point for adapting agents to domain-specific workflows like database queries, internal APIs, or proprietary file formats.
Custom Tool Registration via MCP
Custom Python functions become in-process MCP servers with no subprocess overhead:
Custom tools work with both query() (by passing MCP servers in the mcpServers option) and with ClaudeSDKClient, which additionally supports stateful, interactive sessions. Tool permission names follow the pattern mcp__<server-name>__<tool-name>.
Messages API Tool Loop (anthropic Package)
For applications requiring direct control over tool execution, the anthropic package provides a manual tool-call loop where application code executes each tool and returns results. Claude returns stop_reason: "tool_use" when it wants to invoke a tool. Application code executes any requested tool, appends the result, and repeats while stop_reason == "tool_use", typically ending when the model returns stop_reason == "end_turn". One ordering constraint applies: tool_result blocks must appear before text content in the same message, or the API returns a 400 error on tool calls. The full implementation pattern is in the Anthropic tool-use documentation.
Async Patterns for Concurrent Agent Execution
The Claude Agent SDK is async-first. Production applications benefit from running multiple agent conversations concurrently, executing tool calls in parallel, and streaming responses without blocking.
The async patterns below use both the Agent SDK's query() interface and the anthropic package's AsyncAnthropic client. Multi-agent workflows often combine both: the Agent SDK for autonomous file and terminal operations, and the Messages API for lightweight conversational agents that do not need built-in tool access.
Concurrent Multi-Agent Execution
asyncio.gather() runs independent agent conversations in parallel. While one request awaits the network response, the event loop processes others:
This pattern works for independent tasks, though coordinating agents on a shared codebase requires more than parallel execution. When multiple agents modify overlapping files, prompt-level coordination breaks down quickly. Intent solves this by running each agent in an isolated git worktree under a shared living spec, eliminating the merge conflicts that asyncio.gather() alone cannot prevent.
See how Intent's isolated worktrees keep parallel Python agents from corrupting each other's work.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
Parallel Tool Execution
When Claude returns multiple tool_use blocks in a single response, application code should extract and execute each tool call and return the corresponding tool_result blocks correctly. Synchronous tools require wrapping with asyncio.to_thread() to prevent event loop blocking:
| Pattern | Method | When to Use |
|---|---|---|
| Async client | AsyncAnthropic() | Every async context |
| Concurrent conversations | asyncio.gather() | Independent agent tasks |
| Non-blocking sync tools | asyncio.to_thread() | CPU-bound tool functions |
| Parallel tool calls | asyncio.create_task() + gather() | Multiple tools in one turn |
Known issue: Custom MCP tools may not compact mid-loop because their results arrive via the MCP protocol and bypass the compaction check path, as described in GitHub issue #531. Confirm the issue's current status before treating it as a blocker for new pipelines.
Multi-Step Workflows: Chaining Agents with State
Multi-step workflows in the Claude Agent SDK use explicit Python control flow in place of implicit model reasoning. Anthropic's research on building effective agents distinguishes workflows (predefined code paths) from agents (dynamic self-directed processes) and recommends workflows for reliability.
The workflow examples in this section use the anthropic Messages API for direct control over model calls and tool execution. The same patterns apply when wrapping Agent SDK query() calls, though the Messages API makes the control flow explicit for multi-step orchestration.
State Management Through Prompt Injection
A common inter-agent state mechanism serializes prior step outputs as JSON and injects them into each agent's user message:
Each agent reads the full state and writes only to its own step slot. No external database or shared memory store is required for same-session pipelines. The tradeoff appears when requirements change mid-workflow: every active agent prompt becomes stale, and reconciliation falls to application code. Intent's living specs address this gap by propagating updates to all active agents automatically, which the Intent documentation covers in detail.
Sequential Pipeline with Asymmetric Model Selection
Anthropic's Opus 4.5 system card documents that pairing Claude Opus 4.5 with Haiku 4.5 subagents yields 87.0% performance compared to 74.8% for Claude Opus 4.5 alone. That benchmark is specific to the Opus 4.5 and Haiku 4.5 pair, but it illustrates a pattern that generalizes: pair capable orchestrators with cost-effective worker agents and route work by complexity. Teams running newer model lineups should re-benchmark the specific pair they deploy.
A three-agent content pipeline assigns models by task complexity:
| Agent | Model | Reasoning |
|---|---|---|
| Researcher | claude-opus-4-7 | Complex synthesis and analysis |
| Drafter | claude-sonnet-4-6 | Balanced writing quality and cost |
| Reviewer | claude-sonnet-4-6 | Quality review with structured approval scoring |
Conditional Branching with Triage
A triage agent using Haiku classifies incoming queries and routes to specialized branch agents. The classification step can run on a smaller model because routing decisions are simple compared to the work they dispatch:
Production Hardening: Guards, Retries, and Cost Control
Production agent deployments require safeguards against runaway execution, transient API failures, and uncontrolled costs. The patterns below apply to both claude-agent-sdk agents and anthropic Messages API tool loops, with examples showing the Messages API for explicit control flow visibility.
Loop Guards and Token Limits
Production agents need a ceiling on iterations to prevent runaway tool cycles. A max_iterations guard of 20 catches infinite loops before they consume excessive tokens:
Structured Retry Logic
API calls in production pipelines fail intermittently due to rate limits and transient server errors. Exponential backoff prevents retry storms while recovering from temporary failures:
Non-retryable errors (400, 401, 403, 413) indicate request or configuration problems. Retrying these wastes time and tokens.
Structured Logging and Observability
The SDK supports observability configuration via the Claude Code CLI, and tracing can be added through third-party OpenTelemetry instrumentation packages. Each query() call can be instrumented with spans that capture the full agent loop. Logging the specific message types from the async iterator makes debugging multi-step workflows in production much easier, since each message type signals a different phase of agent execution.
Pair this with structured JSON logging of trace IDs and token counts to correlate agent behavior with cost and performance metrics in your observability stack.
Cost Optimization Through Model Selection
Production pipelines benefit from asymmetric model assignment, following the same orchestrator-plus-worker pattern documented in the Opus 4.5 system card. Routing simple decisions to smaller models while reserving larger models for complex reasoning reduces pipeline costs while preserving most of the quality from an all-premium configuration. Re-benchmark with the actual model pair before locking in cost projections.
| Task Type | Recommended Model | Rationale |
|---|---|---|
| Routing/classification | claude-haiku-4-5-20251001 | Simple decisions, lowest cost |
| Content generation | claude-sonnet-4-6 | Balanced quality and throughput |
| Complex reasoning | claude-opus-4-7 | Anthropic's most capable generally available model for complex reasoning and agentic coding |
Security Considerations
Production deployments should avoid bypassPermissions mode. Use acceptEdits for automated pipelines that need file write access, and restrict tool access with disallowed_tools for any tools the agent should never invoke. API keys belong in secret managers like AWS Secrets Manager or HashiCorp Vault, never in committed .env files or container images.
Deployment Patterns
Wrapping query() in a FastAPI endpoint is a common pattern for HTTP-based agent invocation. The bundled CLI binary ships inside the wheel, so Docker containers need only pip install claude-agent-sdk with no separate binary installation step. Package size can be a consideration for container images and may influence cold start behavior in some serverless environments.
Rate limiting agent invocations at the application layer prevents cost overruns. Each query() call can spawn multiple internal tool calls, so a single HTTP request may generate several LLM invocations. Application-level concurrency controls (semaphores, request queues) should cap simultaneous agent sessions based on your API tier's rate limits. For long-running agent tasks, consider returning a job ID and polling for completion in place of holding HTTP connections open through the full agent loop.
Scaling Python Agents with Intent's Orchestration Layer
Individual Python agents built with the Claude Agent SDK handle single-session execution well. Coordinating multiple agents across a shared codebase introduces challenges the SDK alone does not address. Agents drift out of alignment as requirements evolve, concurrent work collides without git branch isolation, and outputs from parallel agents may diverge from the original specification without a verification step.
Intent operationalizes spec-driven development through coordinated agents, while the Claude Agent SDK provides agent sessions, subagents, and context management. The two work together: Python developers can run SDK-built agents inside Intent's workspace as part of a larger orchestrated workflow.
How Intent Extends Agent Capabilities
Intent runs a structured multi-agent pattern. A Coordinator agent drafts the spec and delegates work, Implementor agents execute tasks in parallel inside isolated git worktrees, and a Verifier agent checks results against the spec before changes reach the main branch.
The living spec at the center of this model auto-updates as agents complete work. When requirements change mid-workflow, updates propagate to all active agents without manual intervention, eliminating the staleness problem that prompt-injected JSON state hits the moment upstream requirements shift.
Intent supports BYOA (Bring Your Own Agent), so Python developers can run agents built with the Claude Agent SDK alongside Auggie, Claude Code, Codex, and OpenCode inside the same workspace. Every agent in Intent shares the same Context Engine, which processes 400,000+ files to build dependency graphs and cross-repo semantic retrieval, giving each specialist architectural awareness beyond its immediate task scope.
For teams that need agents running across the entire SDLC rather than inside a single workspace, Augment Cosmos (research preview) extends the same coordination model to triggers like Linear tickets, Slack feedback, and CI events.
Where Each Tool Fits
The capability split between the raw SDK and Intent's orchestration layer becomes clearest when mapped against the workflow stages where each tool earns its place.
| Capability | Claude Agent SDK | Intent |
|---|---|---|
| Single-agent execution | Native query() and ClaudeSDKClient | Delegates to agent providers |
| Custom tool registration | @tool decorator + MCP servers | MCP support at platform level |
| Multi-agent coordination | Manual Python orchestration | Coordinator/Implementor/Verifier model |
| State management | Prompt-injected JSON | Living specs with auto-propagation |
| Git isolation | Application code manages branches | Automatic worktree per workspace |
| Requirement changes | Re-inject updated state manually | Spec updates propagate to all agents |
| Agent providers | Claude Code only | Claude Code, Codex, OpenCode, Auggie |
Developers who build custom tool-equipped agents with the Claude Agent SDK can run those agents inside Intent's orchestration layer while gaining coordination, verification, and spec-management capabilities that raw SDK code otherwise requires significant custom infrastructure to replicate.
Build Reliable Multi-Agent Python Workflows
Single-agent Python scripts reach a coordination limit when multiple agents modify the same codebase under changing requirements. The Claude Agent SDK handles individual agent sessions well, though keeping parallel agents aligned on a shared specification requires infrastructure the SDK does not provide.
Intent's living specs and structured agent roles (Coordinator, Implementor, Verifier) close that gap by turning manual Python orchestration into managed, spec-driven parallel execution with automatic worktree isolation and requirement propagation.
See how Intent keeps parallel Python agents aligned through living specs across cross-service refactors.
Free tier available · VS Code extension · Takes 2 minutes
FAQ
Related
- 6 Best Spec-Driven Development Tools for AI Coding in 2026
- 5 Best Agentic Development Environments for Enterprise Teams in 2026
- 9 Open-Source Agent Orchestrators for AI Coding (2026)
- 9 Best AI Coding Agent Desktop Apps in 2026 (Ranked by Real-World Performance)
- 6 Best Devin Alternatives for AI Agent Orchestration in 2026
Written by

Molisha Shah
Molisha is an early GTM and Customer Champion at Augment Code, where she focuses on helping developers understand and adopt modern AI coding practices. She writes about clean code principles, agentic development environments, and how teams are restructuring their workflows around AI agents. She holds a degree in Business and Cognitive Science from UC Berkeley.