When should a team add a supervisor layer to an existing swarm?

When agents produce redundant work or contradictory outputs despite functioning correctly in isolation. Research shows coordination overhead consistently costs less than the waste from uncoordinated execution, even when each individual agent performs well.

How does the hybrid pattern prevent supervisor context window overflow?

Hybrid architectures isolate execution-tier message traffic from the planning tier. AutoGen's SocietyOfMindAgent surfaces only a consolidated response to the outer supervisor. LangGraph supports isolation through independent subgraph state schemas and separate checkpointers. CrewAI provides respect_context_window=True for automatic summarization.

What monitoring signals indicate the wrong architecture was chosen?

For swarms: duplicate task execution, superlinear token cost growth, or quality degradation after 8+ sequential handoffs. For supervisors: context window filling before workflow completion, or the supervisor prompt accumulating 5+ routing responsibilities. For hybrids: planning-tier overhead exceeding 30% of total run cost.

Can LangGraph's supervisor pattern run sub-agents in parallel?

Setting parallel_tool_calls=true on the supervisor LLM does not achieve true parallelism. Parallelism in LangGraph requires graph primitives: the Send API or @task futures for concurrent subgraph execution.

How does Cosmos implement the hybrid pattern for multi-agent orchestration?

Cosmos is built around Environments (where agents run), Experts (how agents behave), and Sessions (auditable, replayable workflows). Experts handle planning and validation while agent sessions execute in parallel within isolated Environments, each backed by Augment Code's Context Engine for cross-repo understanding.

What is the minimum task complexity that justifies multi-agent architecture?

Single-agent pipelines handle most tasks that fit within one context window and require no intermediate validation. Multi-agent architecture earns its overhead when the task requires heterogeneous capabilities, validation gates between steps, or parallel execution across isolated contexts. Below 3 agent roles, coordination cost typically exceeds the benefit.

Swarm vs. Supervisor: Multi-Agent Architecture Guide

The right multi-agent architecture depends on task interdependency: swarm patterns fit independent workloads where routing logic is embedded in the task itself and agents hand off sequentially through distributed control, while supervisor patterns fit workflows requiring dynamic routing, ordered execution, and conflict resolution through a central coordinator.

TL;DR

Multi-agent systems fail when the orchestration pattern mismatches the task structure. Swarm architectures distribute routing decisions across autonomous agents and excel at independent workloads. Supervisor architectures centralize coordination through a routing agent and handle complex dependencies. Production teams increasingly deploy hybrids: supervisor planning with parallel execution.

Why Pattern Selection Matters More Than Agent Count

Engineering teams building multi-agent systems face a fundamental architectural decision before writing a single line of orchestration code. Choose a swarm pattern for a task requiring strict ordering, and agents drift into contradictory outputs. Choose a supervisor for embarrassingly parallel work, and the coordinator becomes a throughput bottleneck.

Multi-agent systems consume approximately 15x more tokens than chat interactions in production. Research across coordination architectures shows that performance swings based on whether the architecture matches the task structure. Mismatched patterns produce measurable degradation even when individual agents perform well.

See how Cosmos coordinates parallel agents with shared context and tenant memory that compounds across your team.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

What Is a Swarm Architecture?

A swarm architecture distributes routing intelligence across agents, with each agent encapsulating its own instructions, tools, and handoff logic. Each agent decides independently when to transfer control to another specialist. No single agent has a global view of the workflow, which makes swarms lightweight to set up but difficult to debug when handoff logic produces unexpected routing.

OpenAI's Swarm framework, deprecated in March 2025 and superseded by the OpenAI Agents SDK, defined this pattern through two primitives: Agent objects and handoffs. An agent hands off execution by returning another Agent object, and the framework switches the active agent while preserving conversation history. The Agents SDK retains the same model while adding guardrails, tracing, and production-grade state management.

A Critical Distinction: Swarm vs. Fan-Out Parallelism

A common misconception conflates swarm agent design with parallel execution. A multi-architecture benchmark clarifies: in a swarm, "only one agent can be active at any given time." Swarm is strictly decentralized sequential control transfer where each agent acts in turn.

Dimension	Swarm (Decentralized Handoffs)	Fan-Out Parallelism
Execution model	Sequential; one active agent at a time	Truly parallel; N agents run simultaneously
Routing authority	Distributed across agents	Central coordinator assigns work
Context model	Active agent holds full context; passes on handoff	Each agent receives a task slice
Topology	Mesh (peer-to-peer)	Star (hub-and-spoke)
Coordinator required	No	Yes
API calls (multi-domain)	7+	~5

Fan-out parallelism, where multiple agents execute simultaneously on independent sub-tasks, requires some form of coordination. Swarm patterns eliminate a central coordinator, with agents coordinating through local interactions or handoffs.

What Is a Supervisor Architecture?

A supervisor architecture routes tasks through a central coordinator that dynamically selects which worker agent handles each sub-task based on runtime state. LangGraph's multi-agent documentation describes the supervisor as an orchestrator that routes tasks to individual worker agents.

Three core responsibilities distinguish the supervisor pattern from static sequential chains:

Dynamic task routing: The coordinator reasons about agent capabilities and task state at each decision point, re-routing based on partial results. The supervisor prompt must contain clear capability descriptions for each worker; vague descriptions produce random routing.
Output validation and ordering: The coordinator evaluates outcomes before any output advances the workflow. This enforces semantic dependency ordering without hardcoded step sequences. Validation catches errors that would cascade through downstream agents in a swarm.
Conflict and loop prevention: CrewAI defaults allow_delegation to False on specialist agents to block delegation loops. Without explicit loop guards, supervisor architectures can enter infinite re-dispatch cycles that burn tokens without progress.

When Swarm Patterns Work

Swarm architectures excel when agents operate independently with self-contained context and when routing logic is embedded in the task itself. When any of these prerequisites is missing, default to a supervisor.

Read-heavy exploration, triage, and summarization. Documentation on parallel subagents identifies this as the canonical use case: "Use parallel agents for read-heavy tasks such as exploration, tests, triage, and summarization." Agents that read and analyze without writing to shared state eliminate the conflict resolution overhead that negates parallelism advantages.

Parallel code operations across isolated modules. One implementation demonstrates Git worktree isolation, where multiple branches are worked simultaneously without conflicts. When agents need to read each other's in-progress changes, swarm patterns break down.

High-volume routing where logic is self-evident. Swarm patterns fit cases where each specialist agent's instructions already contain the routing logic for its domain. A central router adds overhead without adding value.

Signal	Favors Swarm
Agent interdependency	Low; no shared in-progress output
State mutation	Read-heavy or isolated writes
Context per agent	Self-contained and bounded
Task queue structure	Agents self-assign without blocking
Third-party agents involved	No; swarm requires full mesh awareness

Benchmark analysis of swarm architectures describes each sub-agent as aware of and able to hand off to any other agent in the group. This mutual awareness requirement makes swarms less feasible when integrating third-party agents that lack visibility into the full mesh.

When Supervisor Patterns Work

Supervisor architectures earn their coordination overhead when tasks require dynamic routing, ordered execution, or centralized conflict resolution. Expect 20-40% more tokens per run compared to a swarm, offset by reduced duplicate work and fewer cascading failures.

Ordered execution with validation gates. When Task B cannot start until Task A completes and passes validation, the supervisor holds global context and enforces logical ordering. A documented research automation workflow demonstrates this pattern with reviewers providing feedback and revisers iterating on drafts.

Heterogeneous capability routing. Different subtasks requiring different capabilities (web search, code execution, database queries) need a coordinator that selects the correct specialist at each step. AutoGen's GroupChatManager routing relies on agent capability descriptions, and clearer descriptions improve selection accuracy.

Tasks exceeding a single context window. Anthropic's agent architecture guide identifies multi-agent architectures as appropriate when tasks exceed a single context window, with specialized sub-agents handling focused technical work.

A coordination study reinforces the supervisor case: applying a supervisor layer to the Smolagent framework reduced average token consumption by 29.68%. Uncoordinated agents produced more waste through redundant work than the coordination overhead consumed.

Cosmos implements a version of this pattern. Its Experts framework decomposes complex workflows into parallel agent sessions while maintaining cross-service dependency awareness through the Context Engine.

Hybrid: Supervisor Planning with Parallel Execution

Production multi-agent systems increasingly combine supervisor planning with parallel execution. One layer handles task decomposition while a second handles autonomous work. The tradeoff is infrastructure complexity. Hybrid architectures require state isolation between tiers, failure detection across execution boundaries, and careful schema design. Teams with fewer than 4 agent roles should default to a flat supervisor before introducing hybrid complexity.

How Hybrid Architectures Work

The hybrid pattern separates concerns into a planning tier (supervisor) that decomposes objectives and validates results, and an execution tier (parallel agents) that carries out assigned work in isolated contexts.

Dimension	LangGraph Nested	CrewAI Hierarchical	AutoGen Nested Chat	Cosmos
Planning mechanism	Supervisor node + Command routing	Manager LLM with capability matching	Custom on_messages in coordinator	Experts with shared context and tenant memory
Execution isolation	Subgraph state isolation via independent state schemas	Manager validates each output	Information silo (inner chat invisible to outer)	Environment isolation per agent session
Inner-to-outer communication	Subgraphs read/write shared graph state keys	Manager reviews and approves outputs	Summaries only via SocietyOfMindAgent	Deep Code Review validates against codebase context
Hierarchy depth	No documented nesting-depth limit, subject to configurable recursion limit	Two levels	Multiple chat patterns; MagenticOne adds lead orchestrator	Three layers: plan, execute, review
Setup complexity	Moderate; requires StateGraph composition and state key mapping	Low; declarative process config	High; custom on_messages and summary handlers	Low; Experts compose from primitives (Environments, Experts, Sessions)
Debugging difficulty	Graph state inspection via LangSmith	Manager output logs	Inner chat transcripts hidden by default	Structured event log surfaces failures explicitly

LangGraph implements this through nested subgraphs compiled as nodes in a parent graph. CrewAI's hierarchical process allocates tasks based on capabilities and reviews outputs before completion. AutoGen's SocietyOfMind pattern wraps an inner group chat and surfaces only a consolidated response to the outer coordinator.

Explore how Cosmos isolates agent execution environments while maintaining shared context across your entire codebase.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline

···

$ cat build.log | auggie --print --quiet \

"Summarize the failure"

Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42

Fix: npm install lodash @types/lodash

Cosmos: Environments, Experts, and Sessions

Cosmos illustrates the hybrid pattern through its three primitives. Environments define where agents run and what they can touch. Experts define how agents behave and what events they subscribe to. Sessions turn prompts into auditable, replayable workflows. Augment Code's Context Engine maintains semantic understanding of code and relationships across repos and services.

The platform ships with reference Experts including Deep Code Review, which validates agent-generated changes against full codebase context before human review. The Experts framework handles structured planning and coordination while agent sessions execute concurrently in isolated Environments.

Failure Modes That Determine Architecture Choice

Understanding how each pattern fails reveals which architecture fits a given workload. The MAST taxonomy analyzed 7 multi-agent frameworks across 1,600+ execution traces and identified 14 unique failure modes. Teams evaluating why systems fail at scale should map these modes against their workload characteristics.

Swarm Failure Modes

Agent drift. Progressive degradation of behavior as interactions accumulate. Research on agent drift shows this takes multiple forms. Agents deviate from task intent (semantic drift), consensus among agents breaks down (coordination drift), and unintended strategies emerge (behavioral drift). Each agent turn introduces probabilistic deviations that compound across handoffs. Swarm pipelines exceeding 8-10 sequential handoffs show measurable quality degradation that prompt tuning alone cannot resolve.

Duplicate work and task collisions. Without a central task registry, multiple agents independently pick up the same task and produce conflicting outputs. Planner-worker decomposition eliminates the race condition at the source by assigning tasks explicitly.

Cascading failures. A failure classification aligned with OWASP ASI08 describes how errors in one agent propagate through connected components. Swarm architectures lack a central agent with global state to detect propagation or circuit-break failing paths. In supervisor architectures, the coordinator can halt dispatch to a failing worker after one bad output; in swarms, corrupted context passes forward until the pipeline terminates.

O(n²) failure surface scaling. In a fully connected swarm, 4 agents produce 6 potential failure points; 10 agents produce 45. Above 8 agents, the combinatorial failure surface exceeds what end-to-end tests can cover, and hierarchical orchestration with explicit failure boundaries becomes a reliability requirement.

Supervisor Failure Modes

Single point of failure. In hub-and-spoke topology, coordinator failure halts the entire system. Azure's agent design patterns discuss checkpoint persistence for workflow pause/resume and structured output enforcement via Pydantic schemas as mitigation.

Context window saturation. The supervisor accumulates full message history from all sub-agent interactions. Routing accuracy drops noticeably after 8-12 sub-agent round trips as historical messages crowd out current task state. CrewAI addresses this with respect_context_window=True for automatic summarization; LangGraph's subgraphs isolate graph state per agent or team.

Bottleneck effects. The supervisor's synchronous routing loop forces serial execution even when sub-tasks are logically independent. LLM-level parallelism flags do not propagate to the agent graph; true parallel dispatch requires graph-level primitives like LangGraph's Send API.

Over-centralization. Azure's design pattern guidance warns that flow-control overhead often exceeds the benefits of breaking work into multiple agents. When the supervisor prompt contains logic for 5+ distinct responsibilities, a well-prompted single agent is often more effective.

Cosmos addresses the validation gap through its Deep Code Review Expert, which checks agent-generated outputs against full codebase context before human review.

Decision Matrix: Task Characteristics to Architecture Pattern

This 8-dimension framework maps task characteristics to the appropriate agent orchestration pattern. Teams running coding workspaces should weigh the dimensions most relevant to their deployment context.

Open source

augmentcode/augment-swebench-agent★873

Star on GitHub

Dimension	Swarm	Supervisor	Hybrid
Task interdependency	Low; independent steps	High; cross-task dependencies	Complex web; nonlinear dependencies
Execution order	Fixed, predefined	Logically ordered, runtime-determined	Dynamic per sub-team, globally coordinated
Output validation needs	None or final-only	Intermediate validation required	Multi-level quality gates
Error handling	Abort on failure acceptable	Re-delegate to different agent	Hierarchical escalation
Latency tolerance	Latency-critical	Moderate tolerance	Latency-tolerant
Budget constraints	Tight; predictable per-run cost	Moderate flexibility	Cost secondary to capability
Context window requirements	Fits single context per agent	Requires diverse expertise	Multiple parallel workstreams
Maintainability	Stable, tightly coupled pipeline	Independent agent lifecycles	Team-level independent deployment

Reading the matrix: When 5+ dimensions cluster in a single column, that architecture fits. Mixed results suggest starting with a flat supervisor.

Starting point by team scale:

1-3 agent roles: Default to a sequential pipeline with a terminal reviewer.
3-5 agent roles: Start with a flat supervisor (CrewAI Process.hierarchical, LangGraph supervisor node, or AutoGen GroupChatManager).
5+ agent roles with independent sub-teams: Evaluate hybrid architecture when sub-teams can execute in parallel and the coordinator's context would otherwise saturate.

Decision Flowchart

Step 1: Does the task fit within a single LLM context window and require no output validation between steps? If yes, use a sequential architecture.

Step 2: Does the task require coordination across multiple domains or dynamic re-routing based on runtime state? If no, add a terminal reviewer to a sequential pipeline. If yes, proceed.

Step 3: Does the task require multi-level validation, independent sub-team development, or recursive decomposition? If no, use a flat supervisor (CrewAI Process.hierarchical, LangGraph supervisor node, or AutoGen GroupChatManager). If yes, use a hybrid with nested supervisors and parallel execution.

Anti-Patterns to Avoid

Premature complexity. Teams jump to multi-agent hierarchies when a well-instrumented sequential pipeline would suffice. Each supervisor node is an LLM call that must be justified by task complexity. CrewAI now defaults allow_delegation to False on all agents and blocks delegation loops out of the box. Retry storms require global rate limiting at the infrastructure level. When coordination problems emerge, evaluate the architecture before rewriting prompts.

Missing state isolation. Without careful state schema design in hybrid architectures, sub-team state modifications collide. Cosmos addresses this through isolated Environments per agent session. Context Engine provides each agent with architectural understanding across 400,000+ files. Parallel agents work with accurate, current cross-service context from the dependency graph.

Ignoring token cost in pattern selection. Handoff-based swarm patterns generate 7+ API calls and 14,000+ tokens on multi-domain tasks, compared to ~5 calls and ~9,000 tokens for subagent patterns with parallel support. Teams that select architectures based on capability alone without modeling per-run costs frequently discover 3-5x budget overruns. Profile token consumption on representative tasks before committing to an architecture.

What to Do Next

Start with the task. Evaluate the workload against the decision criteria and identify where signals cluster. If the result is mixed, begin with a flat supervisor and add hierarchy only after measured bottlenecks justify the coordination cost.

Explore how Cosmos Experts compose planning, execution, and validation into auditable agent workflows across your codebase.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

Swarm vs. Supervisor: Multi-Agent Architecture Guide

TL;DR

Why Pattern Selection Matters More Than Agent Count

See how Cosmos coordinates parallel agents with shared context and tenant memory that compounds across your team.

What Is a Swarm Architecture?

A Critical Distinction: Swarm vs. Fan-Out Parallelism

What Is a Supervisor Architecture?

When Swarm Patterns Work

When Supervisor Patterns Work

Hybrid: Supervisor Planning with Parallel Execution

How Hybrid Architectures Work

Explore how Cosmos isolates agent execution environments while maintaining shared context across your entire codebase.

Cosmos: Environments, Experts, and Sessions

Failure Modes That Determine Architecture Choice

Swarm Failure Modes

Supervisor Failure Modes

Decision Matrix: Task Characteristics to Architecture Pattern

Decision Flowchart

Anti-Patterns to Avoid

What to Do Next

Explore how Cosmos Experts compose planning, execution, and validation into auditable agent workflows across your codebase.

FAQ

Written by

Ani Galstian

Give your codebase the agents it deserves

TL;DR

Why Pattern Selection Matters More Than Agent Count

See how Cosmos coordinates parallel agents with shared context and tenant memory that compounds across your team.

What Is a Swarm Architecture?

A Critical Distinction: Swarm vs. Fan-Out Parallelism

What Is a Supervisor Architecture?

When Swarm Patterns Work

When Supervisor Patterns Work

Hybrid: Supervisor Planning with Parallel Execution

How Hybrid Architectures Work

Explore how Cosmos isolates agent execution environments while maintaining shared context across your entire codebase.

Cosmos: Environments, Experts, and Sessions

Failure Modes That Determine Architecture Choice

Swarm Failure Modes

Supervisor Failure Modes

Decision Matrix: Task Characteristics to Architecture Pattern

Decision Flowchart

Anti-Patterns to Avoid

What to Do Next

Explore how Cosmos Experts compose planning, execution, and validation into auditable agent workflows across your codebase.

FAQ

When should a team add a supervisor layer to an existing swarm?

How does the hybrid pattern prevent supervisor context window overflow?

What monitoring signals indicate the wrong architecture was chosen?

Can LangGraph's supervisor pattern run sub-agents in parallel?

How does Cosmos implement the hybrid pattern for multi-agent orchestration?

What is the minimum task complexity that justifies multi-agent architecture?

Related

Written by

Ani Galstian

Give your codebase the agents it deserves