What percentage of multi-agent failures are caused by specification problems rather than model limitations?

A significant share of cataloged multi-agent failures traces to system design issues rather than model capability, according to published taxonomy research. These include disobeying task constraints, repeating steps, and failing to recognize task completion.

How many tokens does a multi-agent workflow consume compared to a single-agent?

Multi-agent systems typically consume 2x to 10x as many tokens as single-agent baselines, depending on the benchmark and architecture. Retry-adjusted analyses narrow the effective multiplier to roughly 2-3x, but absolute cost and variance both increase sharply with agent count.

What is the minimum number of independent modules needed for a multi-agent system to outperform a single agent?

Three or more genuinely independent modules are a reasonable floor. Below that, orchestration setup, specification overhead, and review costs typically exceed the gains in parallelism. For tightly coupled problems, a single capable agent is usually sufficient.

Can partial specifications work for multi-agent systems?

Partial specifications increase coordination risk because agents fill in ambiguity differently, producing locally coherent outputs that conflict when integrated. Multi-agent systems work best when the exact scope, interfaces, and acceptance criteria are defined before parallel execution begins.

Do multi-agent systems always produce better code quality than single-agent systems?

No, controlled benchmarks show measurable gains for well-structured tasks with clear specifications, but those gains depend on the quality of the specifications. Without that structure, multi-agent systems introduce failure modes, including conflicting abstractions and context fragmentation, that reduce overall quality.

When Multi-Agent Is Overkill: A Decision Framework for Scaling AI Agent Workflows

Multi-agent AI coding delivers value only when tasks are parallelizable across independent modules, each agent operates against a pre-written specification, and coordination overhead stays below the time saved through parallel execution. For everything else, a single well-prompted agent is faster, cheaper, and easier to control.

TL;DR

Multi-agent workflows pay off only when work decomposes into independent, parallelizable units governed by written specs. For bug fixes, tightly coupled codebases, sequential dependency chains, or ambiguous requirements, a single agent wins. The dominant failure mode is not model capability; it is agents making conflicting implicit decisions because no shared specification constrains their choices.

The Gap Between Multi-Agent Hype and Multi-Agent Reality

Every major AI coding tool now supports multi-agent orchestration. Anthropic ships subagents and background tasks in Claude Code. OpenAI's Codex runs parallel agents across projects. GitHub Copilot's /fleet command spins up coordinated agent teams. The messaging is consistent: delegate work, run tasks in parallel, trust agents to handle substantial projects.

A Microsoft experiment evaluating a five-agent swarm on a codebase found that contract-first planning was the key improvement, not agent count. The assumption that more agents lead to faster delivery conflates parallelism with productivity. Adding agents works when the underlying work can be executed in parallel. When it does not, each additional agent introduces communication overhead, context fragmentation, and merge conflicts, making the system slower than a single agent working sequentially.

The Difference Between Specs That Hold and Specs That Drift

Before deciding whether a workflow needs one agent or five, teams should first determine whether the work has been specified at all. Specification quality is the single largest predictor of multi-agent success or failure, and most discussions about agent count skip it entirely.

In multi-agent systems, agents share no conversation history and operate in isolated context windows. The specification is the only shared artifact that prevents divergence. Without a spec, each agent resolves ambiguity independently: one assumes aggressive error handling while another assumes conservative defaults, one picks a naming convention while another picks a different one. The outputs compile individually but conflict when merged.

Published research on multi-agent failures identifies failure modes such as disobeying task specifications, repeating steps, and failing to recognize task completion. These are specification and system design problems, not model capability problems.

A minimum viable spec for parallel agent work includes scope boundary definitions, interface contracts agents can read but not modify, behavioral constraints in explicit tiers, measurable acceptance criteria, and a dependency graph that identifies which tasks are genuinely safe to parallelize.

Intent keeps parallel agents aligned to living specs as requirements evolve, reducing manual reconciliation across services.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline

···

$ cat build.log | auggie --print --quiet \

"Summarize the failure"

Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42

Fix: npm install lodash @types/lodash

5 Questions That Separate Single-Agent Tasks From Multi-Agent Ones

Run the task through these five evaluation questions before reaching for multi-agent orchestration. If the answers mostly point toward single-agent, adding more agents will introduce overhead without a corresponding gain.

1. Can the task be decomposed into subtasks with no sequential dependencies on each other?

If step B requires the complete output of step A, the work is sequential. Multi-agent adds handoff cost without enabling parallelism. Refactoring a function, updating its callers, and then fixing broken tests form a dependency chain in which each step's correctness depends on the previous step's complete output.

2. Do the subtasks touch disjoint sets of files?

When multiple agents edit overlapping files, they operate on isolated snapshots of the codebase and cannot observe each other's in-flight changes. The overlap risk is highest in shared coordination files such as routing tables, configuration files, and component registries.

3. Can each subtask be fully specified without referencing another agent's intermediate output?

If one agent's work depends on decisions another agent has not yet made, the tasks are coupled. Parallel execution produces outputs built on incompatible assumptions that require human reconciliation.

4. Is the task large enough that the orchestration setup cost is proportionally small?

Multi-agent orchestration has fixed costs: defining agent roles, establishing communication protocols, managing handoffs, and reviewing coordination outputs. For a task completable in a single focused session, these costs exceed the time saved through parallelism.

5. Can the engineer review parallel output faster than agents produce it?

All agent output requires human review before entering production. If an engineer cannot review faster than agents produce, parallel agents create a review backlog that eliminates the productivity gain.

Question	"Yes" points toward	"No" points toward
Subtasks have no sequential dependencies	Multi-agent	Single-agent
Subtasks touch disjoint file sets	Multi-agent	Single-agent
Each subtask is independently specifiable	Multi-agent	Single-agent
Task is large enough to justify setup cost	Multi-agent	Single-agent
Engineer can review parallel output	Multi-agent	Single-agent

A majority of "no" answers means a single agent is the right choice. Multi-agent systems should only be considered when most answers are clearly "yes."

When Single-Agent Workflows Win

Single-agent workflows outperform multi-agent setups across the scenarios most engineers see in daily work.

Single-file edits and targeted bug fixes: When a bug is contained within a single file or a small cluster of related files, there is no unit of work to parallelize. The entire relevant context fits within a single agent's window, and adding agents incurs only coordination overhead.
Tightly coupled files with shared state: When files share a global configuration object, a database schema, a singleton service, or a shared interface, parallel agents produce conflicting edits. Each agent works from its own snapshot and cannot observe real-time changes from other agents. Scaling research finds that when agents operate on diverging world states, they can produce incorrect outputs that neither agent detects on its own. The paper also shows that coordination structure matters: independent agents amplify errors, while centralized coordination can contain them through verification.
Sequential reasoning chains: Refactoring a function, updating callers, and then fixing tests is a dependency chain. Information loss across summarization and handoff steps compounds in multi-agent workflows.
Loosely defined or ambiguous requirements: A single agent applies one interpretation throughout. Multiple agents each apply their own, producing outputs that require manual reconciliation.
Small tasks where the setup cost exceeds the execution time: For tasks completable in minutes, the overhead of defining agent roles and reviewing parallel outputs eliminates any speed advantage.
Tasks requiring consistent style and architectural judgment: Code quality, naming conventions, and API ergonomics require a single coherent perspective applied throughout. Multiple agents produce output that is internally consistent within each contribution but inconsistent across the whole.
When the engineer's review bandwidth is the bottleneck: Parallel agents produce more output faster, but if the engineer cannot review it at the same rate, the result is a review backlog and increased risk from under-reviewed code.

When Multi-Agent Systems Provide Real Value

Multi-agent workflows earn their complexity cost under specific conditions, each of which must hold simultaneously. If any one is missing, a single agent is likely the better choice.

Three or more independent modules need to be implemented simultaneously: the Microsoft swarm experiment found that specialized agents with a narrow, well-defined scope were more thorough within their domain than a generalist solo agent.
Disjoint files across services or directories: when each agent's expected file set has no overlap, the risk of conflicting edits drops sharply, and file separation becomes a practical prerequisite for running tasks in parallel.
Specs defined with measurable acceptance criteria: each subtask can be handed to an agent without ambiguity, and success criteria can be evaluated independently.
Work too complex to script but too high-volume for manual effort: this is the intersection where parallel agents can justify their complexity, but only when task boundaries and review processes are already clear.
Specialization that materially improves quality: a dedicated reviewer agent, working from a fresh context, catches issues that the writing agent misses because the reviewer is not biased toward the code it just produced.

Intent coordinates specialist agents across isolated workspaces so parallel execution stays aligned to a single spec.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

What Working Multi-Agent Workflows Actually Look Like

Multi-agent success follows consistent patterns across documented production implementations. Three structural elements appear in every working example: role-based separation, parallel execution across genuinely independent tasks, and pre-defined specs that eliminate implicit decision-making.

Anthropic's C compiler project ran 16 parallel Claude Code agents in isolated environments, using git branches for task claiming. Tests were critical for success, though agent count enabled scale.

OpenAI's DevDay demos used Codex to perform parallel refactoring across isolated SDKs and samples; success came from stateless tasks with no shared dependencies.

GitHub's Fleet runs independent tasks in parallel while automatically sequencing dependents. Their multi-agent guide stresses explicit schemas, action definitions, and interfaces as failure-prevention basics over raw parallelism.

Scaling Without Full Multi-Agent Orchestration

Several intermediate approaches capture significant portions of the multi-agent benefits without coordination failures or cost overhead.

The simplest is plan-then-execute with a single agent. The agent operates in two explicit phases: a read-only planning phase, then an implementation phase. The engineer reviews the plan at the boundary before any code is written, preventing loops in which agents cycle through unhelpful iterations without human correction.

Asynchronous background delegation extends this pattern further. A single agent runs a well-specified task unattended while the engineer works on other tasks or is offline. Practitioners report this as a primary workflow: assign a well-scoped job, then wait for completion rather than interrupting.

For tasks with measurable correctness, an iterative loop with automated quality gates often replaces a separate reviewer role entirely. A single agent runs in a loop in which tests, linters, and CI pipelines provide feedback, rather than human review, at each iteration.

Long, sequential work is well-suited to a checkpoint-based approach. A single agent handles a large, multi-step task structured around explicit checkpoints: commit points, test-passing milestones, or pause-and-review moments. Deeply coupled pipelines with sequential dependencies are better suited to a single agent working sequentially than to multiple coordinated agents.

For investigation-heavy work, sub-agent delegation can serve as a context firewall rather than a coordination layer. The primary agent spawns focused sub-agents for bounded subtasks, not as role-based specialists but as isolated investigators that handle log analysis, codebase exploration, or targeted research without consuming the main agent's context window.

The cleanest pattern is spec-driven execution with the agent as executor. The engineer authors a detailed specification that encodes context, constraints, workflow steps, and acceptance criteria, and the agent operates as a pure executor. Runtime routing decisions are encoded at authoring time in a readable text file rather than delegated to a coordinator agent.

Approach	Context Windows	Coordination Overhead	Best For
Plan-then-execute	Single	None	Complex features needing architectural validation
Async background agent	Single	None	Well-specified refactoring and migration tasks
Iterative loop with quality gates	Single	None	Tasks with measurable correctness via test suites
Checkpoint-based execution	Multi (sequential)	None	Multi-day migrations and dependency upgrades
Sub-agent context firewall	Multi (bounded)	Minimal	Large codebase investigation and log analysis
Spec-driven execution	Single	None	Recurring task types with stable workflows

Starting with a single-agent workflow and moving to multi-agent coordination only when single-agent approaches show clear limitations produces better outcomes in most engineering workflows. Multi-agent architecture is an upgrade path for specifically identified parallel workloads, not a default starting point.

Failure Modes That Make Multi-Agent Actively Worse

Multi-agent systems introduce failure modes that do not exist in single-agent workflows. These are structural risks.

Conflicting implicit decisions: parallel agents each resolve specification ambiguity independently, producing locally coherent implementations that are globally incompatible, and these conflicts often pass code review even as they produce incorrect runtime behavior.
Context fragmentation across handoffs: each agent in a pipeline operates on a partial snapshot of the system state, and decisions made by upstream agents are not fully visible to downstream agents, so errors compound as each agent makes locally reasonable decisions based on incomplete information.
Context poisoning: as an agent's context grows through exploring bad solutions, its reasoning quality degrades, and that bad exploration can propagate into downstream agents as apparently authoritative prior work.
Orchestration overhead consuming parallelism gains: when equal-status agents coordinate through locking and handoffs, the mechanism itself can consume the benefit of parallelism, since every additional agent introduces more intermediate text and more places for errors to compound.
Merge conflicts on shared files: multiple agents editing the same codebase generate textual conflicts that git catches and semantic conflicts that it does not, and two agents implementing the same interface differently produce outputs that compile individually but break when integrated.
The "bag of agents" anti-pattern: teams add agents without establishing a formal coordination topology, leading to compounding error rates as uncoordinated agents generate overlapping, conflicting, or redundant work, often slowing rather than speeding up a coding task.

When the Cost of Multi-Agent Turns Positive

Multi-agent workflows generally consume substantially more tokens than single-agent workflows, with reported overhead ranging from roughly 2x to over 10x depending on benchmark and architecture. Retry-adjusted cost analysis narrows the effective multiplier closer to 2-3x, but the absolute cost increase remains significant.

Open source

augmentcode/augment.vim★609

Star on GitHub

Two cost components are frequently undercounted. The first is the token cost, which scales with agent count. Each subagent receives its own context window, and in multi-agent systems, every subagent's response is fed back into the orchestrator's context as input on the subsequent call. This creates an input-token spiral that grows with the number of agents, and token usage can vary widely across different runs of the same task, making cost budgeting unpredictable.

The second is coordination cost, which scales with the complexity of orchestration. Time spent writing specs, defining agent roles, managing handoffs, resolving conflicts, and reviewing parallel outputs is real engineering time. For short or simple tasks, this investment eliminates any speed advantage the parallelism provides.

The return on investment turns positive under a specific set of conditions: the task is large enough that orchestration setup is proportionally small, typically three or more independent modules; the spec is already written or can be written once and reused; and parallelized execution saves more time than coordination consumes.

Architecture choices matter more than agent count for cost efficiency. Anthropic's tool-use guidance documents a 37% reduction in token usage through improved architecture alone, without changing agent count. Intent's Context Engine applies semantic analysis across 400,000+ files to reduce redundant context loading across coordinator, implementor, and verifier agents.

How to Decide Between Single-Agent and Multi-Agent

Each list below describes the conditions under which a given approach is the right default. Match the task against the lists, and the agent count follows from the answers rather than from intuition.

Use a single agent when:

The task involves a single file or a small cluster of tightly coupled files
Files share state through global configuration, shared schemas, or singleton services
Each step depends on the complete output of the previous step
Requirements are vague, ambiguous, or still being discovered
The task is completable in a single focused session
Consistent style and architectural judgment must be maintained throughout
The engineer's review bandwidth is already saturated

Consider multi-agent when:

Three or more independent modules need simultaneous implementation
Each agent's expected file set is disjoint or nearly so
Written specs define each agent's scope, interfaces, and acceptance criteria
The work is too complex to script, but too high-volume for manual effort
Specialized roles, such as coder, reviewer, or researcher, add measurable quality
The engineer has the bandwidth to review parallel output streams

Use an intermediate approach when:

The task is large but sequential, requiring checkpoints rather than parallelism
The work is well-specified, but does not need parallel execution
Sub-tasks would benefit from context isolation without full orchestration
The team wants autonomous execution with automated quality gates

Exhausting single-agent capabilities before adding coordination layers yields better outcomes in most engineering workflows.

Write the Spec Before Choosing Agent Count

The single-agent versus multi-agent decision becomes much clearer once the work is properly specified. A clear specification reveals whether tasks are genuinely independent, whether file boundaries are disjoint, and whether acceptance criteria can be evaluated in isolation. Without that specification, adding agents increases the surface area for conflict without increasing productivity.

For teams ready to move beyond ad-hoc agent prompting, the highest-leverage investment is not more agents but better specifications. Start by writing the spec for the next feature, map the dependency graph, and identify which tasks touch overlapping files. The answer to "how many agents?" follows directly from that analysis.

See how Intent coordinates the coordinator, implementer, and verifier agents against a living spec, keeping parallel execution aligned as plans evolve.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

When Multi-Agent Is Overkill: A Decision Framework for Scaling AI Agent Workflows

TL;DR

The Gap Between Multi-Agent Hype and Multi-Agent Reality

The Difference Between Specs That Hold and Specs That Drift

Intent keeps parallel agents aligned to living specs as requirements evolve, reducing manual reconciliation across services.

5 Questions That Separate Single-Agent Tasks From Multi-Agent Ones

When Single-Agent Workflows Win

When Multi-Agent Systems Provide Real Value

Intent coordinates specialist agents across isolated workspaces so parallel execution stays aligned to a single spec.

What Working Multi-Agent Workflows Actually Look Like

Scaling Without Full Multi-Agent Orchestration

Failure Modes That Make Multi-Agent Actively Worse

When the Cost of Multi-Agent Turns Positive

How to Decide Between Single-Agent and Multi-Agent

Write the Spec Before Choosing Agent Count

See how Intent coordinates the coordinator, implementer, and verifier agents against a living spec, keeping parallel execution aligned as plans evolve.

Frequently Asked Questions About Multi-Agent Workflows

Written by

Ani Galstian

Give your codebase the agents it deserves

TL;DR

The Gap Between Multi-Agent Hype and Multi-Agent Reality

The Difference Between Specs That Hold and Specs That Drift

Intent keeps parallel agents aligned to living specs as requirements evolve, reducing manual reconciliation across services.

5 Questions That Separate Single-Agent Tasks From Multi-Agent Ones

When Single-Agent Workflows Win

When Multi-Agent Systems Provide Real Value

Intent coordinates specialist agents across isolated workspaces so parallel execution stays aligned to a single spec.

What Working Multi-Agent Workflows Actually Look Like

Scaling Without Full Multi-Agent Orchestration

Failure Modes That Make Multi-Agent Actively Worse

When the Cost of Multi-Agent Turns Positive

How to Decide Between Single-Agent and Multi-Agent

Write the Spec Before Choosing Agent Count

See how Intent coordinates the coordinator, implementer, and verifier agents against a living spec, keeping parallel execution aligned as plans evolve.

Frequently Asked Questions About Multi-Agent Workflows

What percentage of multi-agent failures are caused by specification problems rather than model limitations?

How many tokens does a multi-agent workflow consume compared to a single-agent?

What is the minimum number of independent modules needed for a multi-agent system to outperform a single agent?

Can partial specifications work for multi-agent systems?

Do multi-agent systems always produce better code quality than single-agent systems?

Related Guides

Written by

Ani Galstian

Give your codebase the agents it deserves