The right multi-agent architecture depends on task interdependency: swarm patterns fit independent workloads where routing logic is embedded in the task itself and agents hand off sequentially through distributed control, while supervisor patterns fit workflows requiring dynamic routing, ordered execution, and conflict resolution through a central coordinator.
TL;DR
Multi-agent systems fail when the orchestration pattern mismatches the task structure. Swarm architectures distribute routing decisions across autonomous agents and excel at independent workloads. Supervisor architectures centralize coordination through a routing agent and handle complex dependencies. Production teams increasingly deploy hybrids: supervisor planning with parallel execution.
Why Pattern Selection Matters More Than Agent Count
Engineering teams building multi-agent systems face a fundamental architectural decision before writing a single line of orchestration code. Choose a swarm pattern for a task requiring strict ordering, and agents drift into contradictory outputs. Choose a supervisor for embarrassingly parallel work, and the coordinator becomes a throughput bottleneck.
Multi-agent systems consume approximately 15x more tokens than chat interactions in production. Research across coordination architectures shows that performance swings based on whether the architecture matches the task structure. Mismatched patterns produce measurable degradation even when individual agents perform well.
See how Cosmos coordinates parallel agents with shared context and tenant memory that compounds across your team.
Free tier available · VS Code extension · Takes 2 minutes
What Is a Swarm Architecture?
A swarm architecture distributes routing intelligence across agents, with each agent encapsulating its own instructions, tools, and handoff logic. Each agent decides independently when to transfer control to another specialist. No single agent has a global view of the workflow, which makes swarms lightweight to set up but difficult to debug when handoff logic produces unexpected routing.
OpenAI's Swarm framework, deprecated in March 2025 and superseded by the OpenAI Agents SDK, defined this pattern through two primitives: Agent objects and handoffs. An agent hands off execution by returning another Agent object, and the framework switches the active agent while preserving conversation history. The Agents SDK retains the same model while adding guardrails, tracing, and production-grade state management.
A Critical Distinction: Swarm vs. Fan-Out Parallelism
A common misconception conflates swarm agent design with parallel execution. A multi-architecture benchmark clarifies: in a swarm, "only one agent can be active at any given time." Swarm is strictly decentralized sequential control transfer where each agent acts in turn.
| Dimension | Swarm (Decentralized Handoffs) | Fan-Out Parallelism |
|---|---|---|
| Execution model | Sequential; one active agent at a time | Truly parallel; N agents run simultaneously |
| Routing authority | Distributed across agents | Central coordinator assigns work |
| Context model | Active agent holds full context; passes on handoff | Each agent receives a task slice |
| Topology | Mesh (peer-to-peer) | Star (hub-and-spoke) |
| Coordinator required | No | Yes |
| API calls (multi-domain) | 7+ | ~5 |
Fan-out parallelism, where multiple agents execute simultaneously on independent sub-tasks, requires some form of coordination. Swarm patterns eliminate a central coordinator, with agents coordinating through local interactions or handoffs.
What Is a Supervisor Architecture?
A supervisor architecture routes tasks through a central coordinator that dynamically selects which worker agent handles each sub-task based on runtime state. LangGraph's multi-agent documentation describes the supervisor as an orchestrator that routes tasks to individual worker agents.
Three core responsibilities distinguish the supervisor pattern from static sequential chains:
- Dynamic task routing: The coordinator reasons about agent capabilities and task state at each decision point, re-routing based on partial results. The supervisor prompt must contain clear capability descriptions for each worker; vague descriptions produce random routing.
- Output validation and ordering: The coordinator evaluates outcomes before any output advances the workflow. This enforces semantic dependency ordering without hardcoded step sequences. Validation catches errors that would cascade through downstream agents in a swarm.
- Conflict and loop prevention: CrewAI defaults allow_delegation to False on specialist agents to block delegation loops. Without explicit loop guards, supervisor architectures can enter infinite re-dispatch cycles that burn tokens without progress.
When Swarm Patterns Work
Swarm architectures excel when agents operate independently with self-contained context and when routing logic is embedded in the task itself. When any of these prerequisites is missing, default to a supervisor.
Read-heavy exploration, triage, and summarization. Documentation on parallel subagents identifies this as the canonical use case: "Use parallel agents for read-heavy tasks such as exploration, tests, triage, and summarization." Agents that read and analyze without writing to shared state eliminate the conflict resolution overhead that negates parallelism advantages.
Parallel code operations across isolated modules. One implementation demonstrates Git worktree isolation, where multiple branches are worked simultaneously without conflicts. When agents need to read each other's in-progress changes, swarm patterns break down.
High-volume routing where logic is self-evident. Swarm patterns fit cases where each specialist agent's instructions already contain the routing logic for its domain. A central router adds overhead without adding value.
| Signal | Favors Swarm |
|---|---|
| Agent interdependency | Low; no shared in-progress output |
| State mutation | Read-heavy or isolated writes |
| Context per agent | Self-contained and bounded |
| Task queue structure | Agents self-assign without blocking |
| Third-party agents involved | No; swarm requires full mesh awareness |
Benchmark analysis of swarm architectures describes each sub-agent as aware of and able to hand off to any other agent in the group. This mutual awareness requirement makes swarms less feasible when integrating third-party agents that lack visibility into the full mesh.
When Supervisor Patterns Work
Supervisor architectures earn their coordination overhead when tasks require dynamic routing, ordered execution, or centralized conflict resolution. Expect 20-40% more tokens per run compared to a swarm, offset by reduced duplicate work and fewer cascading failures.
Ordered execution with validation gates. When Task B cannot start until Task A completes and passes validation, the supervisor holds global context and enforces logical ordering. A documented research automation workflow demonstrates this pattern with reviewers providing feedback and revisers iterating on drafts.
Heterogeneous capability routing. Different subtasks requiring different capabilities (web search, code execution, database queries) need a coordinator that selects the correct specialist at each step. AutoGen's GroupChatManager routing relies on agent capability descriptions, and clearer descriptions improve selection accuracy.
Tasks exceeding a single context window. Anthropic's agent architecture guide identifies multi-agent architectures as appropriate when tasks exceed a single context window, with specialized sub-agents handling focused technical work.
A coordination study reinforces the supervisor case: applying a supervisor layer to the Smolagent framework reduced average token consumption by 29.68%. Uncoordinated agents produced more waste through redundant work than the coordination overhead consumed.
Cosmos implements a version of this pattern. Its Experts framework decomposes complex workflows into parallel agent sessions while maintaining cross-service dependency awareness through the Context Engine.
Hybrid: Supervisor Planning with Parallel Execution
Production multi-agent systems increasingly combine supervisor planning with parallel execution. One layer handles task decomposition while a second handles autonomous work. The tradeoff is infrastructure complexity. Hybrid architectures require state isolation between tiers, failure detection across execution boundaries, and careful schema design. Teams with fewer than 4 agent roles should default to a flat supervisor before introducing hybrid complexity.
How Hybrid Architectures Work
The hybrid pattern separates concerns into a planning tier (supervisor) that decomposes objectives and validates results, and an execution tier (parallel agents) that carries out assigned work in isolated contexts.
| Dimension | LangGraph Nested | CrewAI Hierarchical | AutoGen Nested Chat | Cosmos |
|---|---|---|---|---|
| Planning mechanism | Supervisor node + Command routing | Manager LLM with capability matching | Custom on_messages in coordinator | Experts with shared context and tenant memory |
| Execution isolation | Subgraph state isolation via independent state schemas | Manager validates each output | Information silo (inner chat invisible to outer) | Environment isolation per agent session |
| Inner-to-outer communication | Subgraphs read/write shared graph state keys | Manager reviews and approves outputs | Summaries only via SocietyOfMindAgent | Deep Code Review validates against codebase context |
| Hierarchy depth | No documented nesting-depth limit, subject to configurable recursion limit | Two levels | Multiple chat patterns; MagenticOne adds lead orchestrator | Three layers: plan, execute, review |
| Setup complexity | Moderate; requires StateGraph composition and state key mapping | Low; declarative process config | High; custom on_messages and summary handlers | Low; Experts compose from primitives (Environments, Experts, Sessions) |
| Debugging difficulty | Graph state inspection via LangSmith | Manager output logs | Inner chat transcripts hidden by default | Structured event log surfaces failures explicitly |
LangGraph implements this through nested subgraphs compiled as nodes in a parent graph. CrewAI's hierarchical process allocates tasks based on capabilities and reviews outputs before completion. AutoGen's SocietyOfMind pattern wraps an inner group chat and surfaces only a consolidated response to the outer coordinator.
Explore how Cosmos isolates agent execution environments while maintaining shared context across your entire codebase.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
Cosmos: Environments, Experts, and Sessions
Cosmos illustrates the hybrid pattern through its three primitives. Environments define where agents run and what they can touch. Experts define how agents behave and what events they subscribe to. Sessions turn prompts into auditable, replayable workflows. Augment Code's Context Engine maintains semantic understanding of code and relationships across repos and services.
The platform ships with reference Experts including Deep Code Review, which validates agent-generated changes against full codebase context before human review. The Experts framework handles structured planning and coordination while agent sessions execute concurrently in isolated Environments.
Failure Modes That Determine Architecture Choice
Understanding how each pattern fails reveals which architecture fits a given workload. The MAST taxonomy analyzed 7 multi-agent frameworks across 1,600+ execution traces and identified 14 unique failure modes. Teams evaluating why systems fail at scale should map these modes against their workload characteristics.
Swarm Failure Modes
Agent drift. Progressive degradation of behavior as interactions accumulate. Research on agent drift shows this takes multiple forms. Agents deviate from task intent (semantic drift), consensus among agents breaks down (coordination drift), and unintended strategies emerge (behavioral drift). Each agent turn introduces probabilistic deviations that compound across handoffs. Swarm pipelines exceeding 8-10 sequential handoffs show measurable quality degradation that prompt tuning alone cannot resolve.
Duplicate work and task collisions. Without a central task registry, multiple agents independently pick up the same task and produce conflicting outputs. Planner-worker decomposition eliminates the race condition at the source by assigning tasks explicitly.
Cascading failures. A failure classification aligned with OWASP ASI08 describes how errors in one agent propagate through connected components. Swarm architectures lack a central agent with global state to detect propagation or circuit-break failing paths. In supervisor architectures, the coordinator can halt dispatch to a failing worker after one bad output; in swarms, corrupted context passes forward until the pipeline terminates.
O(n²) failure surface scaling. In a fully connected swarm, 4 agents produce 6 potential failure points; 10 agents produce 45. Above 8 agents, the combinatorial failure surface exceeds what end-to-end tests can cover, and hierarchical orchestration with explicit failure boundaries becomes a reliability requirement.
Supervisor Failure Modes
Single point of failure. In hub-and-spoke topology, coordinator failure halts the entire system. Azure's agent design patterns discuss checkpoint persistence for workflow pause/resume and structured output enforcement via Pydantic schemas as mitigation.
Context window saturation. The supervisor accumulates full message history from all sub-agent interactions. Routing accuracy drops noticeably after 8-12 sub-agent round trips as historical messages crowd out current task state. CrewAI addresses this with respect_context_window=True for automatic summarization; LangGraph's subgraphs isolate graph state per agent or team.
Bottleneck effects. The supervisor's synchronous routing loop forces serial execution even when sub-tasks are logically independent. LLM-level parallelism flags do not propagate to the agent graph; true parallel dispatch requires graph-level primitives like LangGraph's Send API.
Over-centralization. Azure's design pattern guidance warns that flow-control overhead often exceeds the benefits of breaking work into multiple agents. When the supervisor prompt contains logic for 5+ distinct responsibilities, a well-prompted single agent is often more effective.
Cosmos addresses the validation gap through its Deep Code Review Expert, which checks agent-generated outputs against full codebase context before human review.
Decision Matrix: Task Characteristics to Architecture Pattern
This 8-dimension framework maps task characteristics to the appropriate agent orchestration pattern. Teams running coding workspaces should weigh the dimensions most relevant to their deployment context.
| Dimension | Swarm | Supervisor | Hybrid |
|---|---|---|---|
| Task interdependency | Low; independent steps | High; cross-task dependencies | Complex web; nonlinear dependencies |
| Execution order | Fixed, predefined | Logically ordered, runtime-determined | Dynamic per sub-team, globally coordinated |
| Output validation needs | None or final-only | Intermediate validation required | Multi-level quality gates |
| Error handling | Abort on failure acceptable | Re-delegate to different agent | Hierarchical escalation |
| Latency tolerance | Latency-critical | Moderate tolerance | Latency-tolerant |
| Budget constraints | Tight; predictable per-run cost | Moderate flexibility | Cost secondary to capability |
| Context window requirements | Fits single context per agent | Requires diverse expertise | Multiple parallel workstreams |
| Maintainability | Stable, tightly coupled pipeline | Independent agent lifecycles | Team-level independent deployment |
Reading the matrix: When 5+ dimensions cluster in a single column, that architecture fits. Mixed results suggest starting with a flat supervisor.
Starting point by team scale:
- 1-3 agent roles: Default to a sequential pipeline with a terminal reviewer.
- 3-5 agent roles: Start with a flat supervisor (CrewAI Process.hierarchical, LangGraph supervisor node, or AutoGen GroupChatManager).
- 5+ agent roles with independent sub-teams: Evaluate hybrid architecture when sub-teams can execute in parallel and the coordinator's context would otherwise saturate.
Decision Flowchart
Step 1: Does the task fit within a single LLM context window and require no output validation between steps? If yes, use a sequential architecture.
Step 2: Does the task require coordination across multiple domains or dynamic re-routing based on runtime state? If no, add a terminal reviewer to a sequential pipeline. If yes, proceed.
Step 3: Does the task require multi-level validation, independent sub-team development, or recursive decomposition? If no, use a flat supervisor (CrewAI Process.hierarchical, LangGraph supervisor node, or AutoGen GroupChatManager). If yes, use a hybrid with nested supervisors and parallel execution.
Anti-Patterns to Avoid
Premature complexity. Teams jump to multi-agent hierarchies when a well-instrumented sequential pipeline would suffice. Each supervisor node is an LLM call that must be justified by task complexity. CrewAI now defaults allow_delegation to False on all agents and blocks delegation loops out of the box. Retry storms require global rate limiting at the infrastructure level. When coordination problems emerge, evaluate the architecture before rewriting prompts.
Missing state isolation. Without careful state schema design in hybrid architectures, sub-team state modifications collide. Cosmos addresses this through isolated Environments per agent session. Context Engine provides each agent with architectural understanding across 400,000+ files. Parallel agents work with accurate, current cross-service context from the dependency graph.
Ignoring token cost in pattern selection. Handoff-based swarm patterns generate 7+ API calls and 14,000+ tokens on multi-domain tasks, compared to ~5 calls and ~9,000 tokens for subagent patterns with parallel support. Teams that select architectures based on capability alone without modeling per-run costs frequently discover 3-5x budget overruns. Profile token consumption on representative tasks before committing to an architecture.
What to Do Next
Start with the task. Evaluate the workload against the decision criteria and identify where signals cluster. If the result is mixed, begin with a flat supervisor and add hierarchy only after measured bottlenecks justify the coordination cost.
Explore how Cosmos Experts compose planning, execution, and validation into auditable agent workflows across your codebase.
Free tier available · VS Code extension · Takes 2 minutes
FAQ
Related
Written by

Ani Galstian
Ani writes about enterprise-scale AI coding tool evaluation, agentic development security, and the operational patterns that make AI agents reliable in production. His guides cover topics like AGENTS.md context files, spec-as-source-of-truth workflows, and how engineering teams should assess AI coding tools across dimensions like auditability and security compliance