How many agents should a multi-agent system use?

The number depends on the workflow. The AdaptOrch benchmark measured a parallelism width of approximately 3.4 for coding tasks. Adding agents beyond this point increases coordination overhead faster than it increases throughput.

When should teams use multi-agent orchestration instead of a single agent?

Multi-agent orchestration is technically justified when privileged information boundaries exist between agents, when different agents represent different stakeholders, or when task scope exceeds a single context window. For tasks that fit within one context window, a single agent with unified tools remains competitive.

What is the most common failure in multi-agent systems?

Coordination failures (including communication breakdowns, state synchronization issues, and conflicting objectives) account for 36.94% of all failures across AutoGen, CrewAI, and LangGraph based on 1,600+ annotated traces.

How does multi-agent orchestration affect token costs?

Anthropic reports that multi-agent systems use approximately 15x more tokens than chat interactions, while single agents use approximately 4x more. Using fast, cheap models for verification roles and reasoning models for coordination roles helps control costs.

What prevents infinite loops in multi-agent systems?

Two mechanisms work together: intra-task iteration limits the number of execution cycles per agent, and inter-phase boolean exit gates prevent agents from declaring completion unless explicit success criteria are written to a shared state file.

Multi-Agent Orchestration: A Practical Architecture Without the Buzzwords

Multi-agent orchestration is a coordination layer that decomposes complex tasks into subtasks, routes each subtask to a specialized agent, maintains shared state across agent boundaries, and recovers from failures at every handoff point. The pattern works when tasks exceed what a single agent can hold in context; it fails when coordination overhead exceeds the cost of doing the work manually.

TL;DR

Multi-agent orchestration coordinates specialized AI agents through structured decomposition, routing, state management, and failure recovery. Single-agent systems break down on tasks that span multiple services or files because context windows fill up and coherence degrades. Orchestration solves this by splitting work across agents with isolated contexts and then reassembling the results through verified handoffs.

Engineering teams turn to multi-agent orchestration when single-agent systems hit their limits on complex, multi-service tasks. Anthropic reports that multi-agent systems use approximately 15x as many tokens as chat interactions, while research on failure taxonomies found that coordination failures account for 36.94% of all failures across AutoGen, CrewAI, and LangGraph. These numbers frame the central tension: orchestration adds cost and complexity, but without it, agents working on cross-service tasks lose coherence and produce conflicting outputs.

The sections that follow break down the four primitives that compose every multi-agent system: decomposition, routing, state, and recovery. They compare orchestration topologies with quantitative benchmarks, catalog five state management patterns with measured tradeoffs, and map documented failure modes to their recovery mechanisms.

Intent's living spec keeps parallel agents aligned across multi-service refactors.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

What Multi-Agent Orchestration Means

Multi-agent orchestration structures operate as a graph of dependent subtasks, allowing specialized agents to operate within a narrower context and with clearer handoffs. The HTN planning can be represented as a DAG of tasks or actions when the plan/decomposition is partially ordered and acyclic. But it is not correct to say every HTN paper or every HTN formulation must be a DAG in exactly the same way, because some formulations emphasize trees, hierarchical networks, or ordered method expansions instead.

Multi-agent orchestration matters for production codebases because context windows have hard limits. A single agent tasked with modifying files across multiple services will lose coherence as its context fills with conversation history, tool outputs, and prior code.

A single agent with a concatenated toolbox is the practical alternative. Research on single-agent systems demonstrates that a single LLM with general-purpose tools (code writing, code execution, and web browsing) can be competitive with multi-agent systems for tasks that fit within one context window. Multi-agent orchestration is technically justified when boundaries of privileged information exist between agents, or when multiple stakeholders are represented, each acting as a distinct principal in the system.

The Four Primitives: Task Decomposition, Routing, State, Recovery

The four primitives define how multi-agent orchestration works in practice: decomposition creates the task graph, routing assigns work, state carries context between steps, and recovery prevents local failures from cascading. Every multi-agent system comprises these four operations.

Task Decomposition

Task decomposition transforms a high-level goal into a structured set of subtasks with defined dependencies. The formal representation is discussed in the cited paper. At the implementation level, decomposition produces a DAG where nodes are subtasks and edges encode which outputs feed into which inputs.

The Agentic Lybic system uses a four-tier architecture: a Controller for global state and orchestration, a Manager for task decomposition and adaptive re-planning, Workers for specialized execution, and an Evaluator for continuous quality assessment and intervention. Unlike static delegation schemes that fix agent roles and topology at runtime, Agentic Lybic can trigger re-planning when quality degrades, making its delegation adaptive rather than one-shot.

A known limitation: mainstream multi-agent frameworks typically adopt static role definitions and lack adaptive mechanisms to dynamically adjust delegation logic during execution.

Routing

Routing determines which agent handles a given subtask. It operates at two levels: structural routing (which agent receives the task) and conditional routing (which execution branch activates based on the current state).

An Anthropic guide identifies routing as a core building block alongside prompt chaining, parallelization, orchestrator-workers, and evaluator-optimizer patterns. LangGraph implements routing via conditional edges, in which routing functions inspect the graph state to determine the next node. CrewAI supports routing patterns in community examples and tutorials, but official documentation does not substantiate the specific routing claim referenced in the broader literature.

python

def route_content(state: ContentState) -> Literal["technical", "creative", "business"]:
    return state["content_type"]

The AdaptOrch benchmark measured routing overhead at less than 50ms, compared to LLM inference latency of 2–15 seconds per call. Routing decisions are cheap; the agents doing the work are expensive.

State

State is the shared, persistent data structure that carries context between agent steps and across agent boundaries. LangGraph's StateGraph provides typed state schemas for state definition and data flow between nodes:

python

class State(MessagesState):
    TEAM_MEMBERS: list[str]
    TEAM_MEMBER_CONFIGRATIONS: dict[str, dict]
    next: str            # drives routing
    full_plan: str       # carries decomposed task plan
    deep_thinking_mode: bool
    search_before_planning: bool

The full_plan field carries the decomposed task plan; next drives subsequent routing. Typed schemas prevent runtime errors from inconsistent state manipulation.

Recovery

Recovery includes detection, retry, re-planning, and escalation. Anthropic's guidance for long-running agents highlights two main failure modes, context-loss-induced incoherence and premature wrap-up near context limits, and recommends context resets with structured handoff to a fresh agent. GraSP complements this with five local graph-repair primitives for skill-level recovery: Rebind, InsertPrereq, Substitute, Rewire, and Bypass, with escalation to global replanning when local repair is insufficient.

The GraSP paper defines five graph-repair primitives for skill-level recovery: Rebind (update arguments of a failed node), InsertPrereq (add a subgraph for missing preconditions), Substitute (replace a skill while preserving downstream interfaces), Rewire (edit edges locally), and Bypass (skip a node when the current state already satisfies downstream requirements).

Orchestration Topologies: Hub-and-Spoke vs. Mesh vs. Hierarchical

The choice of topology determines how agents communicate, where state lives, and how failures surface. No single topology dominates real workloads: the AdaptOrch benchmark on SWE-bench Verified found that adaptive topology selection achieved a 22.9% improvement over the single best baseline, with the router selecting 62% hybrid, 24% parallel, and 14% hierarchical patterns.

Dimension	Hub-and-Spoke	Mesh / Peer-to-Peer	Hierarchical	Sequential
Latency	Hub accumulates a bottleneck; each delegation adds a round-trip	Lower per-hop coordination overhead grows with agent count	Multiple levels multiply round-trip times	Strictly additive: total = sum of all agent execution times
Reliability	The hub is a single point of failure	No central failure point; emergent failures invisible in single-agent testing	Sub-teams operate autonomously; mid-level coordinators can still fail	Failures localized to stages; error propagation is the primary risk
Debugging	High observability: all traffic through one point	Very high complexity; 36.94% of MAST failures are coordination-type	Moderate; cross-team handoff failures hard to trace	Easiest: failures are stage-localized
State Consistency	Strong: globally owned	No global owner; generates semantic contradictions	Partitioned by level; handles context-window overflow	Strictly linear state handoff
Best Fit	Spec-driven refactors; compliance-auditable workflows	Adversarial testing; small fixed agent counts (2–4)	Monorepo features: codebase auditing at scale	Strictly ordered workflows with non-parallelizable dependencies

Hub-and-spoke places a central orchestrator that manages all task delegation. Specialist agents never communicate directly. LangGraph implements this via orchestrator nodes that fan out work to worker subgraphs using Send() primitives.

Mesh enables direct agent-to-agent communication without a central coordinator. Communication pathways scale as O(N²), with N being the agent count. Without a global state owner, parallel agents can produce overlapping changes from partial context, leading to merge conflicts and semantic contradictions.

Hierarchical topologies add multiple coordination levels. AgentOrchestra uses this pattern with a planning agent that maintains a global perspective while assigning subtasks based on expertise. The key property is that hierarchical patterns partition context, so no single agent needs the full system context. Hierarchical multi-agent systems support flexible adaptation to task demands and efficient management of large-scale systems while preserving local autonomy.

Intent's Context Engine processes 400,000+ files, giving every agent in the graph the same codebase view.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

State Management Across Agent Boundaries

State management across agent boundaries determines whether agents stay aligned across long-running work, fresh sessions, and parallel execution. The five patterns below each carry distinct tradeoffs in token cost, coherence mechanism, and latency.

Blackboard / shared memory: a public space where agents post requests and results. Each agent independently evaluates whether it can respond based on capabilities and availability. Blackboard architectures outperform RAG-based alternatives in studies showing 13% to 57% improvement in end-to-end task success.
Graph-based message passing: structures communication along declared dependency edges. Each agent writes structured outputs to a shared execution context; downstream agents pull only what they need. Explicitly declaring the coordination graph can reduce redundant communication and serialization overhead.
Living specifications: durable, machine-readable artifacts that serve as the source of truth across context window boundaries. Unlike in-context state, living specs persist externally and survive complete context replacement. Anthropic's guidance identifies a claude-progress.txt file, together with git history, as a handoff mechanism that helps agents resume with a fresh context window.
Hierarchical summarization: a staged context-management strategy for long-running workflows. Production systems often prioritize cheaper operations first and escalate to full summarization only when context pressure requires it.
Event-driven delta delivery: a context-delivery strategy in which agents receive only the new information since their last invocation. This can reduce cumulative token cost by avoiding repeated processing of previously consumed context.

Pattern	Token Cost	Coherence Mechanism	Latency
Blackboard (autonomous)	High (~2x RAG)	Broadcast + self-selection; stale message removal critical	Variable
Graph-based message passing	Low (pull-only)	Declared dependency graph	Low
Living specifications	Minimal (external artifact)	External file read survives session resets	None (read-only reference)
Hierarchical summarization	Medium (amortized)	Structured handoffs; external memory	Medium
Event-driven delta delivery	Low (delta only)	Governance layer	Low

Failure Recovery Patterns for Multi-Agent Systems

Multi-agent failure recovery maps to five documented failure modes, each requiring a distinct recovery mechanism at the system or coordination layer. Understanding each mode determines whether an orchestration system degrades gracefully or cascades into complete failure.

Error Cascading from Upstream Deviations

Minor errors from upstream agents (incorrect parameters or hallucinated values) are consumed as valid inputs by downstream agents without detection. The deviation amplifies at each level along the collaboration chain.

Recovery: Schema validation gates between agents enforce that output matches the expected structure before passing downstream. Anthropic recommends file-based communication contracts in which one agent writes a file, and another reads and responds to it.

Infinite Loops and Repetitive Behavior

Feedback loops emerge when one agent's output triggers another agent whose output triggers the first, or when a single agent spins indefinitely on a tool call. Premature completion and step repetition are identified as notable error modes in recent multi-agent failure taxonomies.

Recovery: Two mechanisms work together. Intra-task iteration limits cap the number of execution cycles per agent. Inter-phase boolean exit gates block agents from declaring completion unless explicit success criteria are set in a shared state file: the implementation-review-test cycle cannot exit until tests_passed == true is written. LangGraph provides turn management through RemainingSteps, where agents inspect their remaining budget and gracefully terminate rather than hard-crash:

python

from langgraph.managed import RemainingSteps

class State(TypedDict):
    messages: Annotated[list, lambda x, y: x + y]
    remaining_steps: RemainingSteps

def agent_node(state: State) -> dict:
    if state["remaining_steps"] < 2:
        return {"messages": [{"role": "system",
                "content": "Turn limit approaching: summarizing and handing off"}]}

Context Drift

Agents lose accurate information about the codebase state as context windows fill (factual drift) or drift from the original specifications over long sessions (alignment drift). Without a well-organized context, agents generate inconsistent or incorrect code, a problem that worsens in multi-step tasks where key information from earlier steps gets lost.

Recovery: A Verifier agent at handoff points uses a persistent living spec as the correctness standard rather than evaluating only the local diff. Intent uses this pattern: the Verifier pattern checks results against the spec and flags inconsistencies, bugs, or missing pieces before handing work back for review.

Verifier False Passes

Agents write code that satisfies tests and passes static analysis while silently breaking a contract defined elsewhere. Verifiers exhibit agreement bias, tending to agree with prior outputs rather than independently evaluating them.

Open source

augmentcode/auggie★205

Star on GitHub

Recovery: Independent dual-agent verification separates the reviewing agent from the testing agent. A deterministic enforcement layer (lifecycle hooks and quality gates) blocks premature exit. The Boolean exit gate operates at the system level, removing reliance on the generating agent's self-assessment.

Parallel Write Conflicts

As the agent count increases, the number of potential interaction relationships grows exponentially. Without coordination, parallel agents create communication bottlenecks, ambiguity of responsibility, and goal drift.

Recovery: The one-writer-per-module rule eliminates write conflicts by construction. Multiple developer agents write code for different modules in parallel, with work segmented across isolated Git worktrees.

Failure Mode	Recovery Pattern	Enforcement Level
Error cascading	Schema validation gates at handoffs	System (harness)
Infinite loops	Two-level turn caps + boolean exit gates	System (harness)
Context drift	Living spec as correctness anchor	Coordination artifact
Verifier false passes	Independent dual-agent verification	System (harness)
Parallel write conflicts	Isolated Git worktrees	Architecture (git)
Vague handoff conditions	Boolean exit gates with explicit success criteria	System (state file)

How Intent Handles Orchestration in Production

Intent packages multi-agent orchestration as a product workflow built around a persistent spec, isolated workspaces, and blocking verification. Intent implements multi-agent orchestration through a structured three-role model: Coordinator, Implementor(s), and Verifier. The Coordinator agent analyzes the codebase via the Context Engine, drafts a spec as an evolving document, decomposes the spec into a dependency-ordered DAG, and then delegates subtasks to Implementors in parallel batches called waves.

The wave structure derives directly from the DAG: tasks at the same dependency level run simultaneously within a single wave; the subsequent wave begins only after the prior wave completes. Each Implementor runs its own cycle within a scoped context, operating in an isolated Git worktree. The Verifier functions as a blocking pre-merge check against the spec, catching spec-implementation mismatches before code reaches the main branch.

The living spec serves as the central coordination artifact. When an agent completes work, the spec updates to reflect what was built. When requirements change mid-execution, updates propagate to all active agents. Intent's documentation describes spec-driven development in which a living spec serves as the authoritative source for active agents, and a Verifier checks results against that spec before handoff.

Intent's architecture parallels patterns documented across production multi-agent systems. These systems share a consistent design: an orchestrator layer centered on coordination and delegation, with subagents operating within minimal, scoped context windows. For multi-service refactors where requirements evolve during execution, the bidirectional spec prevents alignment drift that static task tickets cannot address.

When to Adopt Multi-Agent Orchestration

The decision between single-agent and multi-agent orchestration reduces to a concrete question: does the task exceed what one context window can hold without degrading output quality? For tasks spanning multiple services and shared contracts, multi-agent orchestration with proper state management and verification gates produces measurably better outcomes. For single-file prototypes, a single agent with general-purpose tools is faster and cheaper.

Teams evaluating production orchestration should test one workflow that already breaks single-agent execution: a cross-service refactor or a multi-file feature with shared contracts. Start with explicit task decomposition, a shared state model, and verification gates at every handoff before expanding agent count or adding more elaborate routing.

Intent's Coordinator-Implementor-Verifier pipeline enforces verified handoffs at every stage.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

Multi-Agent Orchestration: A Practical Architecture Without the Buzzwords

TL;DR

Intent's living spec keeps parallel agents aligned across multi-service refactors.

What Multi-Agent Orchestration Means

The Four Primitives: Task Decomposition, Routing, State, Recovery

Task Decomposition

Routing

State

Recovery

Orchestration Topologies: Hub-and-Spoke vs. Mesh vs. Hierarchical

Intent's Context Engine processes 400,000+ files, giving every agent in the graph the same codebase view.

State Management Across Agent Boundaries

Failure Recovery Patterns for Multi-Agent Systems

Error Cascading from Upstream Deviations

Infinite Loops and Repetitive Behavior

Context Drift

Verifier False Passes

Parallel Write Conflicts

How Intent Handles Orchestration in Production

When to Adopt Multi-Agent Orchestration

Intent's Coordinator-Implementor-Verifier pipeline enforces verified handoffs at every stage.

Frequently Asked Questions About Multi-Agent Orchestration

Written by

Ani Galstian

Give your codebase the agents it deserves

TL;DR

Intent's living spec keeps parallel agents aligned across multi-service refactors.

What Multi-Agent Orchestration Means

The Four Primitives: Task Decomposition, Routing, State, Recovery

Task Decomposition

Routing

State

Recovery

Orchestration Topologies: Hub-and-Spoke vs. Mesh vs. Hierarchical

Intent's Context Engine processes 400,000+ files, giving every agent in the graph the same codebase view.

State Management Across Agent Boundaries

Failure Recovery Patterns for Multi-Agent Systems

Error Cascading from Upstream Deviations

Infinite Loops and Repetitive Behavior

Context Drift

Verifier False Passes

Parallel Write Conflicts

How Intent Handles Orchestration in Production

When to Adopt Multi-Agent Orchestration

Intent's Coordinator-Implementor-Verifier pipeline enforces verified handoffs at every stage.

Frequently Asked Questions About Multi-Agent Orchestration

How many agents should a multi-agent system use?

When should teams use multi-agent orchestration instead of a single agent?

What is the most common failure in multi-agent systems?

How does multi-agent orchestration affect token costs?

What prevents infinite loops in multi-agent systems?

Related Guides

Written by

Ani Galstian

Give your codebase the agents it deserves