When should teams move from a single agent to a multi-agent architecture?

Move to multi-agent only after a single agent with strong tooling and prompt engineering hits clear limits. The MAST taxonomy shows many remaining failures come from coordination and organizational design, so added agents help only when the bottleneck is coordination rather than model capability.

What is the main trade-off between orchestrator-worker and peer-to-peer designs?

Orchestrator-worker centralizes synthesis and gives one agent the full task picture, but its context grows over time and the orchestrator becomes a single point of failure. Peer-to-peer bounds each agent's context by role, yet emergent errors are harder to trace and local failures can propagate.

How do multi-agent systems prevent context collision?

Separate memory into three zones. A shared store holds validated facts, a private scratchpad isolates each agent's reasoning, and an episodic log preserves an audit trail so unfinished reasoning is never retrieved as validated fact.

Which coordination pattern fits coding agents?

Coding agents need retrieval, tool interfaces, sandboxed execution, and verifier review, with the pattern depending on whether change locations are known before execution. At repository scale, graph-based localization and the Context Engine inside Augment's Cosmos give agents code-structure context before edits.

What controls keep agentic workflows bounded on cost?

Step counts, time budgets, and circuit breakers cap loop length and kill stuck agents after repeated failures. Human-in-the-loop approval gates pause execution at critical decision points so cost and risk stay reviewable.

Agentic AI Architecture Patterns: 9 System Designs

System-level agentic AI architecture begins when architects choose among nine coordination patterns. Delegation, state sharing, and failure boundaries determine whether agents can complete work without uncontrolled cost or context growth. The nine documented patterns are single-agent loop, orchestrator-worker, supervisor, hierarchical, swarm, network/mesh, pipeline, evaluator-optimizer, and router. Each pattern changes control, latency, cost, and observability through a specific coordination mechanism. The right choice depends on whether teams can predict subtasks before execution and how much context agents must share during execution.

TL;DR

MAST catalogs specification issues, inter-agent misalignment, and task verification failures at coordination boundaries. Research from Anthropic, LangChain, AWS, and arXiv shows architects should choose structure before adding agents. In coding-agent workflows, Augment's Cosmos, a unified cloud agents platform, pairs its Context Engine with semantic dependency graph analysis for dependency-aware codebase understanding.

In cross-service code changes, a single agent can lose coherence as its context fills with conversation history, tool outputs, and prior code. The instinct is to add agents. Anthropic treats multi-agent execution overhead as a design constraint, especially because 'most coding tasks involve fewer truly parallelizable tasks than research.' Architects also need to bound execution through step counts, time budgets, checkpoint recovery, and failure isolation. In codebase onboarding workflows, the Context Engine behind Augment's Cosmos cuts developer onboarding from months to days.

Augment's Cosmos runs coding agents in the cloud and applies this same isolation at the platform level. Its Agent Runtime schedules and isolates parallel coding agents so each worker keeps a private memory scope, while a shared virtual filesystem with tenant memory preserves cross-file relationships and lets each agent return condensed summaries to the shared layer.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT

The Nine System-Level Agentic Architecture Patterns

The nine system-level patterns organize agent coordination by control model and subtask timing, including whether teams predefine subtasks or determine them at runtime. Architects use those dimensions to match autonomy, observability, and failure boundaries before adding agents. The patterns form a continuum from deterministic chains through single-agent systems to multi-agent architectures. Azure Databricks describes this continuum in its agent design patterns.

Pattern	Control Model	Subtask Definition	Key Trade-off
Single-Agent Loop	Centralized (one agent)	Dynamic, self-directed	Simplicity vs. capability ceiling
Orchestrator-Worker	Centralized orchestrator	Dynamic, determined at runtime	Flexibility vs. predictability
Supervisor / Agent-as-Tool	Centralized supervisor	Delegated to sub-agents as tools	Generality vs. performance
Hierarchical	Tree-structured, multi-level	Nested decomposition	Comprehensiveness vs. latency/cost
Swarm	Decentralized, peer handoff	Emergent	Flexibility vs. observability
Network / Mesh	Decentralized, peer-to-peer	Collaborative	Collaboration vs. coordination overhead
Pipeline	Sequential, predetermined	Pre-defined stages	Predictability vs. rigidity
Evaluator-Optimizer	Dual-role iterative	Fixed generator/evaluator roles	Quality vs. execution cost
Router	Classifier-dispatcher	Pre-classified by domain	Parallelism vs. synthesis complexity

Single-Agent Loop: The Baseline

The single-agent loop keeps coordination to one model, one tool-routing loop, and one task state. Teams get a bounded control model before inter-agent handoff surfaces appear. LangGraph's create_react_agent is the standard single-agent implementation used in single-agent benchmarking. LangChain's progression is direct: 'Start with a single agent and good prompt engineering. Add tools before adding agents. Graduate to multi-agent patterns only when you hit clear limits.'

Orchestrator-Worker: Centralized Fan-Out

The orchestrator-worker pattern uses a central LLM to decompose tasks, create worker assignments, and synthesize outputs through shared state. Architects use it when runtime flexibility matters because subtasks cannot be known in advance. Anthropic defines its flexibility this way: 'subtasks aren't pre-defined, but determined by the orchestrator based on the specific input,' according to the orchestrator-worker pattern. This applies to complex tasks such as 'coding products making complex changes to multiple files, or search tasks gathering information from multiple sources.' LangGraph's Send API enables dynamic worker node creation, where each worker has its own state.

Supervisor / Agent-as-Tool

The supervisor pattern centralizes agent-as-tool delegation through specialized prompts, LLMs, tools, and isolated scratchpads. It keeps routing centralized when calendar, email, CRM, and database workflows need controlled context exposure. Sub-agents do not share a scratchpad; the workflow appends final responses to a global scratchpad, per multi-agent workflows. LangChain now recommends 'using the supervisor pattern directly via tools rather than this library for most use cases' because tool calling gives more control over context engineering.

Hierarchical Task Decomposition

The hierarchical pattern uses tree-structured delegation. A top-level manager decomposes a goal, delegates to specialized child agents, reviews output, and approves or returns work for revision. This structure supports complex, ambiguous problems and compliance or approval gates. Google Cloud hierarchical pattern guidance identifies increased model calls, latency, operational costs, and 'considerable architectural complexity' as the cost.

Dimension	Hierarchical	Swarm	Network / Mesh	Pipeline	Router
Observability	Review gates support oversight	Central observability is reduced	Distributed collaboration reduces central visibility	Handoffs are predefined	Dispatch path is classified upfront
Failure boundary	Manager review can approve or return work	Local failures can propagate	Coordination overhead can spread across peers	Stage boundaries are explicit	Misclassification affects dispatch
Context sharing	Delegated work returns for review	Agents hand off directly	Specialists share information directly	Context moves stage by stage	Inputs route to specialized agents

Swarm and Network/Mesh: Decentralized Coordination

Swarm and network/mesh patterns use decentralized peer coordination. They trade central observability for direct specialist handoff. The swarm pattern lets each sub-agent hand off to any other agent with one agent active at a time and no central orchestrator. The network/mesh pattern lets specialist agents share information directly in peer-to-peer collaboration. Swarm fits collaborative specialist work, but swarm anti-patterns caution against it for 'deterministic tasks, resource-constrained environments, simple sequential workflows, or regulatory environments requiring explainable decisions.' Swarm also 'may not be feasible when working with third-party agents' because each sub-agent must know all other agents.

Pipeline, Evaluator-Optimizer, and Router

Pipeline, evaluator-optimizer, and router patterns constrain agent behavior with fixed sequencing, iterative review, or classifier dispatch. Teams use them when they can define stages, evaluation roles, or domains before execution. The pipeline pattern arranges agents in a fixed sequence, which limits coordination to handoff between predefined stages. The trade-off is rigidity: 'the rigid, predefined structure makes it difficult to adapt to dynamic conditions,' per pipeline constraints. The evaluator-optimizer pattern pairs a generator LLM with an evaluator LLM in an iterative cycle that 'resembles writer-editor collaboration.' The added review step raises execution cost. The router pattern classifies input and dispatches it to specialized agents, then synthesizes results for single requests, parallel execution, and large-domain inputs.

Orchestrator-Worker vs. Peer-to-Peer: The Central Tradeoff

The orchestrator-worker versus peer-to-peer decision controls whether agents coordinate through centralized synthesis or distributed peer exchange. A practical multi-agent orchestration walkthrough maps these roles to explicit task decomposition, shared state, and verification gates. In a centralized design, a single lead agent decomposes the task, assigns each subtask to a worker, exchanges coordination messages, and synthesizes the final answer, while each worker sees only its own context and the lead's messages. In a decentralized design, workers act as peers without an orchestrator, exchanging findings each round and stopping once they reach a consensus threshold.

Dimension	Orchestrator-Worker	Peer-to-Peer/Decentralized
Execution cost	Coordination cost grows as messages and synthesis pile up	Coordination cost grows as peer updates broadcast widely
Coordination risk	Coordination seams appear across frameworks	Emergent errors are harder to trace
Failure isolation	Supervisor is single point of failure	No single point; local failures propagate
Observability	Centralized; orchestrator has full task picture	Distributed; emergent behaviors arise unprogrammed
Context growth	Orchestrator context grows over time	Distributed; each agent's context is bounded by role

DevOps teams can map these coordination seams to specific orchestration platforms at the implementation layer.

Anthropic's production Research system shows how to manage these costs. Vague instructions like 'research the semiconductor shortage' caused subagents to 'misinterpret the task or perform the exact same searches as other agents.' Each subagent now requires 'an objective, an output format, guidance on the tools and sources to use, and clear task boundaries.' To bound context growth, Anthropic uses an artifact pattern where specialized agents create outputs that persist independently, passing lightweight references back to the coordinator. Specialized sub-agents also keep isolated task state, explore focused tasks extensively, and return condensed summaries, as documented in the multi-agent research system.

Multi-agent memory architecture separates transient state, task history, durable facts, and learned procedures. Agents can then share validated knowledge without mixing private reasoning into shared state. The CoALA framework's single-agent decomposition maps these functions to working memory for transient task state, episodic memory for task histories and decision traces, semantic memory for durable facts and domain models, and procedural memory for learned workflows and tool usage patterns.

A practical memory implementation keeps shared state narrow and validated:

Separate working memory from long-term memory so transient task state does not become durable fact.
Persist within-thread graph state with checkpointers so interrupted workflows can resume from saved state.
Store cross-thread memories in scoped namespaces so agents retrieve only relevant durable facts.
Keep private scratchpads per agent so unfinished reasoning is not exposed as validated knowledge.
Write shared conclusions to a Library and retain an Episodic Log for audit and decision traces.

LangGraph separates short-term and long-term memory. Short-term, within-thread memory uses checkpointers that 'save graph state at each step,' while long-term, cross-thread memory uses 'a persistent document store that lets you put, get, and search for memories,' scoped to custom namespaces. For swarm systems, teams pass both short-term checkpointers and long-term stores at compile time. Checkpointer backends include InMemorySaver, SqliteSaver, and PostgresSaver.

A recurring memory failure in multi-agent systems is context collision. Without separate zones, an agent running a tool call may retrieve another agent's unfinished reasoning as validated fact. A three-zone memory design separates a Library of shared validated facts, a private Scratchpad per agent for isolated reasoning, and an Episodic Log audit trail. This matters because production coordination needs explicit memory infrastructure that an implicit, orchestrator-held context cannot provide.

In code review, the Context Engine reaches a 59% F-score by mapping codebase structure and evaluating each change against the broader codebase before it moves forward. Cosmos ships this capability as a Deep Code Review expert that runs context-aware review across the repository.

Failure Modes and Error-Handling Patterns

Agentic failure handling requires explicit detection, bounded loops, and validated tool interfaces. These failures 'can often be silent, occurring without generating clear error signals while still deviating from the intended behavior,' per agentic failures. The MAST taxonomy catalogs failure modes in three categories.

Failure Category	Representative Modes
Specification Issues	Disobeying task spec, repeating steps, losing history, failing to recognize completion
Inter-Agent Misalignment	Conversation reset, no clarification requests, task derailment, information withholding
Task Verification Failures	Missing details in final output, premature termination

MAST ties many failures to system organization rather than single-agent capability, which changes the architecture decision. 'Many MAS failures arise from the challenges in organizational design and agent coordination rather than the limitations of individual agents,' per MAST findings. This boundary supports adding multi-agent complexity only when single-agent limits are hit and the remaining failure categories are coordination problems instead of missing model capability.

Architectural mitigations use schemas, whitelists, budgets, supervisors, and retry controls to bound probabilistic reasoning across tool access, loop length, and retry behavior:

Whitelist tool names before execution to block hallucinated tool calls.
Validate tool data against schemas before execution.
Catch infinite loops with step counts or time budgets.
Add deadlock detection and supervisor pre-emption for stuck workflows.
Use circuit breakers to monitor repeated failed requests per agent.
Apply retry policies for transient failures.

One practitioner running enterprise systems learned this directly: a legal review system entered an infinite loop of replanning when one agent consistently failed, so circuit breakers now kill stuck agents after repeated failures.

Reliability comes from component boundaries, validated interfaces, control loops, and telemetry. It 'is earned through principled componentisation (goal manager, planner, tool-router, executor, memory, verifiers, safety monitor, telemetry), disciplined interfaces (schema-constrained, validated, least-privilege tool calls), and explicit control loops,' per componentized reliability.

Coding-Agent Architecture: A Concrete Application

Coding-agent architecture depends on retrieval, tool use, planning, sandboxing, and review around the model. SWE-agent introduced the Agent-Computer Interface comprising search/navigation, file viewer, file editor, and context management. It runs a ReAct loop atop the Linux shell with Docker containers for sandboxed execution.

Open source

augmentcode/auggie★242

Star on GitHub

SWE-Bench codebases can exceed the capacity of a single model invocation, per SWE-Bench capacity. Three localization approaches address codebase scale. Embedding-based retrieval selects relevant code by similarity. LLM-based agentic localization uses agents to find change locations. Graph-based navigation builds dependency graphs over code entities in systems like LocAgent and RepoGraph, with OrcaLoca designing specialized sub-agents for localization.

Coding-agent system design has to bind codebase understanding to execution controls:

Retrieve relevant files and symbols before planning edits.
Use tool interfaces for search/navigation, file viewing, and file editing.
Run model actions in a sandboxed environment when executing code changes.
Localize changes through embedding retrieval, agentic localization, or graph-based navigation.
Review outputs through verifier loops before changes move forward.

Dependency-aware indexing gives coding agents code-structure context before edits. Inside Cosmos, the Context Engine reaches a 70.6% SWE-bench Verified score and processes entire codebases across 400,000+ files through semantic dependency graph analysis. It maintains a real-time knowledge graph that updates within seconds of code changes. Dependency-aware understanding is one axis that separates tools built for complex codebases from retrieval-only approaches.

Coding teams can justify multi-agent design when boundaries of privileged information exist between agents, or when distinct agents must represent different stakeholders. Cosmos composes these workflows from three primitives. Environments define where agents run and what they can touch, Experts define how each agent behaves and which tools it uses, and Sessions turn one-off prompts into auditable, replayable runs. Agents share a virtual filesystem with tenant and private memory, so specialist Experts such as PR Author and Deep Code Review can run in parallel over common context while keeping their own reasoning isolated. For adjacent IT automation scenarios, computer-using agents frame implementation and verification around predefined delivery constraints.

Human-in-the-Loop and Cost-Control Architecture

Human-in-the-loop and cost-control architecture uses approval gates, checkpoint recovery, and execution budgets to keep autonomous workflows reviewable and bounded. Human-in-the-loop architecture pauses agent execution for human review at critical decision points. Human-on-the-loop lets agents execute independently while humans monitor at a supervisory level. LangChain's HumanInTheLoopMiddleware implements approval gates through an interrupt_on configuration map where each tool has its own configuration.

Choose a Coordination Structure Before Adding Agents

The instinct to add agents when a single agent loses coherence often creates harder problems. Coordination cost rises, central observability drops, and local failures propagate across peers. Before scaling out, weigh whether teams can predict subtasks in advance and how tightly agents must share state while they run. Those two answers point to a specific pattern, from a single-agent loop through orchestrator-worker to decentralized peer designs. Bound every choice with step counts, time budgets, checkpoint recovery, and failure isolation so autonomy stays reviewable. For coding workflows, Augment's Cosmos composes these patterns from Environments, Experts, and Sessions over a shared filesystem with tenant and private memory, so parallel agents keep isolated reasoning while reusing common codebase context.

Agentic AI Architecture Patterns: 9 System Designs

TL;DR

The New Code Review Workflow for AI-Native Engineering Teams

The Nine System-Level Agentic Architecture Patterns

Single-Agent Loop: The Baseline

Orchestrator-Worker: Centralized Fan-Out

Supervisor / Agent-as-Tool

Hierarchical Task Decomposition

Swarm and Network/Mesh: Decentralized Coordination

Pipeline, Evaluator-Optimizer, and Router

Orchestrator-Worker vs. Peer-to-Peer: The Central Tradeoff

Failure Modes and Error-Handling Patterns

Coding-Agent Architecture: A Concrete Application

Human-in-the-Loop and Cost-Control Architecture

Choose a Coordination Structure Before Adding Agents

Frequently Asked Questions

Written by

Ani Galstian

Give your codebase the agents it deserves

TL;DR

The New Code Review Workflow for AI-Native Engineering Teams

The Nine System-Level Agentic Architecture Patterns

Single-Agent Loop: The Baseline

Orchestrator-Worker: Centralized Fan-Out

Supervisor / Agent-as-Tool

Hierarchical Task Decomposition

Swarm and Network/Mesh: Decentralized Coordination

Pipeline, Evaluator-Optimizer, and Router

Orchestrator-Worker vs. Peer-to-Peer: The Central Tradeoff

Memory and Context-Sharing Architecture

Failure Modes and Error-Handling Patterns

Coding-Agent Architecture: A Concrete Application

Human-in-the-Loop and Cost-Control Architecture

Choose a Coordination Structure Before Adding Agents

Frequently Asked Questions

When should teams move from a single agent to a multi-agent architecture?

What is the main trade-off between orchestrator-worker and peer-to-peer designs?

How do multi-agent systems prevent context collision?

Which coordination pattern fits coding agents?

What controls keep agentic workflows bounded on cost?

Related

Written by

Ani Galstian

Give your codebase the agents it deserves