How should engineering teams choose between a sequential pipeline and parallel fan-out?

Choose by mapping task topology to dependency order. Sequential pipelines suit ordered work in which each step depends on the previous output. Parallel fan-out suits independent subtasks that an aggregator can synthesize. Tasks with prerequisite checks can fail under fan-out due to routing overhead.

What stops a self-correcting loop from running forever?

A defined retry boundary stops a self-correcting loop. Teams can enforce that boundary as a maximum iteration count, a budget ceiling, or a watchdog that detects repeated cycles. Beyond a simple cap, track the delta between consecutive attempts and terminate when the agent only rephrases the same query.

Why does a supervisor-worker pattern reduce error cascades?

A supervisor-worker pattern limits error cascades through centralized validation and dynamic replanning. The orchestrator decomposes tasks, monitors progress, and replans when stuck. This still depends on clear delegation: explicit task boundaries in each subagent description prevent misrouting and duplicate work.

Do structured output schemas guarantee correct handoffs?

No, schema adherence only validates format, not meaning. OpenAI's strict: true guarantees adherence to the structural schema, but the model can still insert semantically incorrect values into valid fields. Production systems need schema validation for format and semantic validation for logical correctness.

Where should human-in-the-loop checkpoints be placed?

Place checkpoints before irreversible side effects such as destructive database operations, infrastructure changes, and financial transactions. Approvals after side effects provide retrospective review rather than control. Persistence is mandatory because resuming without a checkpointer restarts from scratch.

How do multi-agent patterns control coordination cost?

Multi-agent patterns limit coordination cost by constraining unnecessary context sharing, bounding retries, and routing work only to agents that need it. Control coordination cost at the boundary by isolating task-relevant state, capping retries, and avoiding routing every subtask through the same expensive path.

Agentic Workflow Patterns: Building Agents That Coordinate

Agentic workflow patterns give multi-agent systems a reusable coordination architecture. They define how agents pass work, share context, validate outputs, and recover from failures. Pattern selection matters early because it fixes the handoff, routing, validation, and recovery mechanisms at each coordination boundary.

TL;DR

Multi-agent systems fail when coordination outgrows a single prompt. A UC Berkeley MAST analysis of 1,600+ multi-agent execution traces identified 14 failure modes clustering into system design issues, inter-agent misalignment, and task verification. Five coordination patterns map workflow shape to validation, observability, and human checkpoints, preventing errors from cascading across stages.

Multi-agent coordination fails when handoffs carry silent intermediate errors into later stages. Workflow patterns add contracts, routing, and audit checkpoints before downstream agents consume flawed context. In a sequential chain, a subtly incorrect intermediate output can pass through intact, and every downstream agent then treats it as fact, with no stack trace or alert appearing.

The MAST taxonomy groups multi-agent failures into specification and system design issues, inter-agent misalignment, and task verification. That makes coordination design part of the reliability boundary, not a secondary implementation detail.

A workflow pattern differs from a one-off agent prompt because it defines the structure of coordination. It specifies what each agent receives, what it returns, how control transfers, and where execution pauses. This guide covers five patterns staff engineers can apply directly, each mapped to the failure mode it addresses.

For long-running and parallel work, those patterns also depend on persistent context, escalation rules, and review checkpoints remaining available across sessions. Augment Cosmos is a cloud agents platform built for exactly this coordination layer: its Environments, Experts, and Sessions turn individual prompts into auditable, replayable workflows with organizational memory that carries effective configurations team-wide rather than keeping them in one engineer's local setup.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT

Why Do Coordination Patterns Affect Multi-Agent Reliability?

Agentic coordination patterns matter because routing, validation, and recovery mechanisms give multi-agent systems known places to catch flawed handoffs. Research on multi-agent systems shows that errors can propagate and sometimes amplify across agents, especially when communication is dense or poorly controlled, while validation checkpoints and other gating mechanisms can limit that spread.

Production failures appear in three ways. Context loss occurs when long reasoning chains cause an agent to miss or dilute relevant context, producing hallucinations when critical information is missing. Cascading errors emerge when a single step fails, and the agent explores an entirely different trajectory; Anthropic describes this as the compound nature of errors in agentic systems. Unauditable outputs occur when a task completes incorrectly without generating an error signal, leaving no trace for root cause analysis.

Coordination failure	How it appears	Pattern-level control
Context loss	Agent misses or dilutes relevant context	Constrain shared state to task-relevant history and handoff fields
Cascading errors	One step fails; the agent explores a different trajectory	Add validation gates before downstream agents consume output
Unauditable outputs	Task completes incorrectly without an error signal	Emit structured events for root-cause analysis
Topology mismatch	Static pattern selection does not match the workflow topology	Select patterns from the task shape before implementation

Task topology should drive pattern selection. Fan-out benefits parallelizable tasks but hurts sequential ones. Static pattern selection becomes a failure mode when the workflow topology does not match the coordination structure.

What Prerequisites Do Multi-Agent Coordination Patterns Require?

Before choosing a pattern, define shared context, handoff contracts, and observability hooks. Skipping one removes a recovery mechanism.

A shared context strategy governs how agents access information without overwhelming one another. LangChain frames context engineering as write, select, compress, and isolate. Anthropic's three techniques are compaction, structured note-taking, and context isolation via sub-agents. Multi-agent systems add coordination work when agents receive irrelevant history, so context precision constrains shared state to task-relevant history and handoff fields.

Augment Cosmos's Context Engine uses semantic dependency graph analysis across 400,000+ files to select task-relevant code, linked issues, PR feedback, documentation, and ticketing context for each task.

Handoff contracts define what flows between agents. OpenAI's structured outputs with strict: true guarantee schema adherence, but structural conformance does not guarantee semantic correctness. Production systems need both schema validation and semantic validation. Agent observability tools make every action a structured event; without this instrumentation, a task can complete incorrectly without emitting one for root cause analysis.

Step-by-Step Workflow: Five Coordination Patterns

Five agentic coordination patterns map workflow topology to validation, routing, and recovery mechanisms. Use the shape of the work to choose the structure before choosing a framework.

Pattern	Workflow topology	Main control point	Failure mode addressed	Primary limit
Sequential pipeline	Ordered stages	Validation gates between stages	Error compounding	Cross-table reference failure
Parallel fan-out	Independent subtasks	Aggregator synthesis	Race conditions	Prerequisite sequential checks
Supervisor-worker	Dynamic subtasks	Central supervisor	Single-agent context collapse	Vague delegation
Self-correcting loop	Retryable work with clear criteria	Retry boundary	Unbounded loop cost	Easier prompts can deteriorate
Human-in-the-loop	Irreversible or reviewable actions	Policy-defined pause	Unauthorized irreversible actions	A resume requires a persistent state

Step 1: Sequential Pipeline

The sequential pipeline, also called prompt chaining, is a deterministic agent graph. Each agent processes the output of the previous one and passes a response downstream. Teams accept added stage-to-stage latency in exchange for narrower LLM calls, with programmatic checks on intermediate steps to keep the process on track.

LangGraph enforces handoffs using a state variable, such as current_step or active_agent, that persists across turns. Conversation history integrity is critical: when handing off, include both the tool call and its ToolMessage response, or the history becomes malformed.

Sequential pipeline controls: persist the current stage's state with a state variable; preserve the integrity of conversation history by including the tool call and its ToolMessage response; add validation gates between stages before downstream agents consume intermediate output.

This pattern addresses error compounding. The MAST taxonomy identifies step repetition and disobeying task constraints as distinct failure modes that validation gates between stages can prevent. The pattern's weakness is cross-table reference failure, as information moves through a fixed order and may not revisit earlier sections.

Step 2: Parallel Fan-Out with Aggregation

Parallel fan-out with aggregation routes a task from a dispatcher to multiple specialist agents operating concurrently, then flows their outputs to an aggregator for synthesis. The pattern fits work that can be decomposed into independent subtasks. Each sub-agent must operate independently of the others.

The aggregator controls the main risk, as concurrent agents can produce inconsistent outputs that still appear valid in isolation. The Claude Opus 4.5 multi-agent system card names this synthesis problem as a core difficulty for the orchestrator. Synthesis functions include majority voting, weighted synthesis, orchestrator overrides, and termination based on consensus or quality thresholds.

Fan-out aggregation concentrates conflict resolution in one place: decompose independent subtasks before dispatch; assign specialists with isolated task-relevant state; monitor concurrent outputs for disagreement; reconcile inconsistent outputs; terminate based on consensus or quality thresholds.

Workflow orchestration platforms can coordinate specialized agents with isolated context in incident-response-style decomposition. This pattern addresses race conditions. The hard limit is task shape: tasks requiring prerequisite sequential checks can fail when dispatcher and aggregation work occurs before the prerequisite path finishes.

Cosmos's Parallel Tool Calls connect to this pattern by reducing serial execution bottlenecks. Its Auggie agent executes independent tool calls concurrently, while the Tasklist capability breaks complex work into actionable steps with progress tracking.

Step 3: Supervisor-Worker Delegation

Supervisor-worker delegation uses a central orchestrating agent that breaks down tasks, delegates them to worker agents, and synthesizes their results. The distinguishing property is dynamic subtask selection: the orchestrator determines subtasks from the specific input rather than relying on predefined subtasks.

Microsoft's Magnetic-One uses a dual-ledger design. A Task Ledger maintains facts, while a Progress Ledger lets the orchestrator self-reflect on progress and replan when stuck. The LangGraph supervisor pattern routes work so that only the supervisor responds to the user, centralizing logging, monitoring, and escalation rules.

Delegation needs worker boundaries to prevent duplicate work: define the worker's objective, specify the expected output format, name the tools the worker should use, and define task boundaries to prevent overlapping assignments.

Evaluating AI coding tools for this pattern requires checking whether the tool can preserve task boundaries while agents work across complex codebases. This pattern addresses single-agent context collapse. Delegation fails when instructions are vague: Anthropic fixed duplicate work from vague delegation by requiring each subagent description to include an objective, output format, tool guidance, and clear task boundaries.

Step 4: Self-Correcting Loop

The self-correcting loop lets an agent evaluate its own output against criteria and retry within a boundary. Reflexion separates the loop into an Actor, an Evaluator, and a Self-Reflector: the Actor produces a trajectory, the Evaluator produces a scalar score, and the Self-Reflector converts failed trajectories into verbal feedback diagnosing what went wrong. Teams should use this pattern only when clear evaluation criteria exist.

The retry boundary is the main engineering parameter. LangGraph uses conditional edge routing to END when the run exceeds the retry limit. Without bounds, one documented production incident had two agents stuck in an infinite conversation loop with costs escalating before detection.

A bounded self-correcting loop depends on stop conditions: set a finite retry limit; route to END when the run exceeds the retry limit; track the delta between consecutive attempts; terminate if retries only rephrase the same query with minor variations.

This pattern addresses the cost of unbounded loops and repetitive, non-improving retries. Self-reflection works best when initial accuracy is low and external verification is available. It risks performance deterioration on easier prompts.

Step 5: Human-in-the-Loop Checkpoint

The human-in-the-loop checkpoint pauses execution at policy-defined escalation points before the agent continues. LangGraph's middleware checks each tool call against a configurable interrupt_on policy that maps tool names to approval configs. When a model proposes a reviewable action, the middleware issues an interrupt and halts execution. The human can approve, edit, reject, or respond before execution resumes.

Open source

augmentcode/augment.vim★611

Star on GitHub

State persistence makes pauses resumable. LangGraph requires a checkpointer to persist agent state between interrupt and resume, with AsyncPostgresSaver recommended for production. Without a persistent checkpointer, resuming restarts from scratch.

Human-in-the-loop checkpoints control side effects:

Checkpoint concern	Required mechanism	Failure prevented
Reviewable tool calls	interrupt_on policy mapping tool names to approval configs	Unreviewed writes or SQL execution
Resumable pauses	Persistent checkpointer and same thread_id on resume	Restarting from scratch after an interruption
In-flight state changes	Versioned state classes with migration functions	Broken resume after schema changes
Irreversible actions	Approval before side effects	Destructive operations after the fact

OpenAI's Codex deployment shows the production scenario: auto-review replaces user approval at the sandbox boundary with review by a separate agent that considers intent, environment, and likely impact.

This pattern addresses unauthorized irreversible actions such as destructive database operations, infrastructure changes, and financial transactions. The EU AI Act, Art. 14, makes human oversight a regulatory requirement for high-risk systems.

Augment Cosmos connects this pattern to policy-defined pauses by assigning intermediate workflow stages to agent experts, while humans review prioritization, spec intent, and code-evolution context. Cosmos's code review agent achieved a 59% F-score in code review benchmarks. Teams comparing AI code review tools should evaluate whether review checkpoints happen before side effects and whether review output is auditable.

Choosing and Wiring a Pattern Before You Write Code

Choose the coordination pattern from the workflow topology before writing code. Ordered work, independent subtasks, retry criteria, and irreversible actions each point to different controls. No single structure fits every task.

Before implementation: identify whether subtasks are independent or ordered; locate where irreversible actions occur; determine where evaluation criteria are clear enough to bound a retry loop; select the pattern that matches the workflow topology; define the handoff contract and wire observability before deploying.

Production systems also need defined execution environments, behavior rules, and persisted state. Augment Cosmos uses Environments, Experts, and Sessions to turn prompts into workflows that persist across long-running and parallel work, with organizational memory that carries effective agent configurations team-wide rather than keeping them in one engineer's local setup.

Agentic Workflow Patterns: Building Agents That Coordinate

TL;DR

The New Code Review Workflow for AI-Native Engineering Teams

Why Do Coordination Patterns Affect Multi-Agent Reliability?

What Prerequisites Do Multi-Agent Coordination Patterns Require?

Step-by-Step Workflow: Five Coordination Patterns

Step 1: Sequential Pipeline

Step 2: Parallel Fan-Out with Aggregation

Step 3: Supervisor-Worker Delegation

Step 4: Self-Correcting Loop

Step 5: Human-in-the-Loop Checkpoint

Choosing and Wiring a Pattern Before You Write Code

Frequently Asked Questions About Agentic Workflow Patterns

Written by

Molisha Shah

Give your codebase the agents it deserves

TL;DR

The New Code Review Workflow for AI-Native Engineering Teams

Why Do Coordination Patterns Affect Multi-Agent Reliability?

What Prerequisites Do Multi-Agent Coordination Patterns Require?

Step-by-Step Workflow: Five Coordination Patterns

Step 1: Sequential Pipeline

Step 2: Parallel Fan-Out with Aggregation

Step 3: Supervisor-Worker Delegation

Step 4: Self-Correcting Loop

Step 5: Human-in-the-Loop Checkpoint

Choosing and Wiring a Pattern Before You Write Code

Frequently Asked Questions About Agentic Workflow Patterns

How should engineering teams choose between a sequential pipeline and parallel fan-out?

What stops a self-correcting loop from running forever?

Why does a supervisor-worker pattern reduce error cascades?

Do structured output schemas guarantee correct handoffs?

Where should human-in-the-loop checkpoints be placed?

How do multi-agent patterns control coordination cost?

Related Guides

Written by

Molisha Shah

Give your codebase the agents it deserves