What is the coordinator agent pattern in coding?

A multi-agent architecture where a central Coordinator decomposes tasks into a dependency-ordered DAG, delegates scoped subtasks to parallel Implementors, and uses a Verifier to validate outputs against the specification. It prevents file conflicts, context drift, and cascading errors. VeriMAP (EACL 2026) provides the published reference; Cosmos implements it in production.

How does the Verifier differ from running tests?

The Verifier coordinates multiple validation modalities instead of a single test suite, running deterministic checks first (linters, type checkers, tests) then LLM semantic review. SpecLoop research showed structured diagnostic feedback outperforms binary pass/fail. Test suites feed into the Verifier as one input among several.

When should teams avoid CIV?

When a single well-prompted agent can handle the task. Osmani's orchestrator guidance favors starting single-agent and adding orchestration only when complexity and parallelism justify it. CIV fits parallelizable multi-file work with cross-service dependencies; single-file changes and exploratory work usually do not need it.

How does Cosmos's shared context differ from GitHub Spec Kit?

Spec Kit produces spec.md, plan.md, and tasks.md that agents consume but do not update. Cosmos's coordination layer is bidirectional: agents write back corrections and completions as work progresses, and tenant memory persists those learnings across sessions so the next run starts smarter than the last.

What prevents file conflicts between parallel agents?

Git worktrees. Anthropic's Claude Code common workflows, Cosmos's isolated Environments, and the debugging parallel agents guide all use worktree-level isolation, with each Implementor on its own branch.

Coordinator-Implementor-Verifier Pattern for Dev Teams

The Coordinator-Implementor-Verifier (CIV) pattern is a three-role agent architecture where a Coordinator decomposes tasks into a dependency-ordered plan, parallel Implementors execute scoped subtasks in isolated contexts, and a Verifier validates each output against the original specification before proceeding.

TL;DR

Uncoordinated AI coding agents produce conflicts, duplicated code, and cascading errors. CIV addresses this by splitting work into planning (Coordinator), execution (Implementors), and validation (Verifier) with dependency ordering and isolation. Cosmos implements CIV with shared context, memory, and isolated environments. The pattern carries real cost in tokens and Coordinator quality dependence, making it wrong for single-file work.

Why Parallel Agent Work Needs Coordination

Running three AI agents on a feature branch sounds like 3x throughput until two agents edit the same file and one silently erases the other's work. Errors in agentic systems compound, so minor issues that are manageable in traditional software derail agents entirely. Codex documentation and community discussions note that concurrent work on the same files routinely produces merge conflicts a human would have caught at plan time.

The engineering response is better architecture. A plan-execute-validate pattern appears in research and practitioner discussions, including Megagon Labs' VeriMAP framework for verification-aware multi-agent planning. Augment Cosmos, a unified cloud agents platform for running coordinated agents across the software development lifecycle, operationalizes the pattern: agents share context and memory, run in isolated environments, and route work through a Verifier before anything merges.

[ Free report ]

The Agentic SDLC

How teams like Stripe, Ramp, and Uber move from solo coding agents to a coordinated, team-level system.

Download the guide

Uncoordinated Agents Produce Structurally Predictable Failures

Multi-agent coding failures resemble distributed-systems problems, driven by isolated context windows, stateless execution, and the lack of native shared-state primitives. Five failure modes recur often enough to treat as architectural:

Silent file overwrites. Two agents edit the same file concurrently; the codebase still compiles, one agent's contribution is gone, and no warning fires because no locking or conflict detection exists in the agent execution layer by default. Git worktree isolation is the standard remediation, and the debugging parallel agents guide walks through how to apply it.
Context drift across agent boundaries. One agent references a function a peer already refactored; another writes to a shared file a downstream agent depends on, with no notification mechanism. Anthropic's context engineering work explains the mechanism: every token attends to every other token, producing n² pairwise relationships, so stale state from peers crowds out signal.
Duplicated implementations. Two agents each write a date-parsing utility with subtly different timezone handling; one caller gets ISO-8601, the other gets local time, and the bug surfaces in production.
Cascading errors. An upstream agent hallucinates a function signature; downstream agents wire up to it as ground truth. Anthropic's NIST submission on agent safety describes this as a trust chain.

Premature task completion. An agent joins a workflow mid-stream and concludes the task is done. Anthropic's guidance on long-running agents recommends leaving incremental artifacts so the next session picks up correctly.

Failure Pattern	Primary Root Cause	System Layer
Silent file overwrites	No write locking; no shared filesystem awareness	Execution layer
Context drift and stale state	Isolated context windows; no push-update mechanism	Context layer
Duplicated implementations	Stateless generation; no shared implementation registry	Context + data layer
Cascading errors	No inter-agent output validation; trust chain propagation	Orchestration layer
Premature task completion	No shared progress state; absent stop conditions	Task management layer

Every failure maps to a missing architectural primitive. The CIV pattern introduces three roles that together address these failure modes.

The CIV Pattern: Three Roles, Two Nested Control Loops

Cosmos decomposes multi-agent development into three roles with isolated execution contexts and two nested control loops. The inner loop is a ReAct-style reason-act-observe cycle each Implementor runs in its own context. The outer loop spans the Coordinator and Verifier: a plan-execute-verify-replan cycle that crosses agent boundaries using structured data contracts. A flat ReAct agent with tool use has only the inner loop, so when it produces wrong output, no mechanism above exists to detect the failure, revise the plan, or route a corrected subtask to a fresh context.

The peer-reviewed VeriMAP system at EACL 2026 provides the most mechanically detailed published description of verification-aware planning with DAG-structured subtasks and dependencies.

text

Original Specification
         │
         ▼
  ┌─────────────┐
  │ COORDINATOR  │  ← Decomposes spec into DAG
  │              │    Computes topological order
  │              │    Compiles scoped context per node
  └──────┬───────┘
         │ task + scoped context
         ▼
  ┌─────────────────────────────────┐
  │  IMPLEMENTOR(s)                 │  ← Parallel where DAG allows
  │  (isolated context windows)     │    Each receives only its node's context
  └──────────────┬──────────────────┘
                 │ structured output
                 ▼
          ┌─────────────┐
          │  VERIFIER    │  ← Validates against original spec
          └──────┬───────┘
                 │
         ┌───────┴────────┐
         │                │
        PASS             FAIL
         │                │
         ▼                ▼
   Coordinator      Coordinator retries
   proceeds to      (up to N=3 attempts)
   next DAG node    with updated context:
                    prior attempt +
                    verifier feedback
                         │
                    (if limit exceeded)
                         ▼
                   Dynamic replanning
                   or escalation

How the Coordinator Plans and Delegates

The Coordinator transforms a specification into a directed acyclic graph (DAG) where each node is a bounded subtask and each edge is a dependency. The VeriMAP arXiv companion frames the Coordinator as the central orchestrator that follows a DAG-structured task plan to support reliable, adaptive execution across agents.

Decomposition couples task generation with verification design: for each subtask, VeriMAP's planner produces both the subtask instruction and a verification module, so each subtask arrives with its own definition of done. Two decomposition strategies dominate. Repository-level decomposition (SWE-agent, OpenDevin) treats the whole repo as the unit of analysis; it works well for monorepos but scales poorly past a few hundred thousand files. Standardized output coordination (MetaGPT, ChatDev) uses structured output schemas; it handles polyrepo cleanly but a wrong schema forces cascading plan revisions. Cosmos uses repository-level decomposition backed by the Context Engine, which is why it performs best on large, interconnected codebases.

VeriMAP gates execution on verification: the Coordinator cannot proceed to dependent subtasks until upstream verification succeeds, preventing unverified parents from poisoning children. For context assembly, VeriMAP uses a pull model where the Coordinator merges structured outputs into a dictionary and provides it as context to each child Executor. Pull keeps contexts small but requires the Coordinator to anticipate what each child needs; push (forwarding full history) avoids that burden but produces "lost in the middle" effects. Google's write-up on multi-agent framework design discusses context management at a production level.

Cosmos runs the Coordinator on top of the Context Engine, which performs semantic codebase analysis before decomposition. This architectural grounding lets the Coordinator produce scoped context per subtask rather than handing each Implementor an undifferentiated slice of the repository.

The retry budget is the single most important production configuration in CIV. VeriMAP's published defaults cap the per-subtask execution-verification loop at 3 attempts, with a separate 5-iteration cap on replanning. Teams should calibrate against cost per round (if a subtask consumes roughly 40K tokens per attempt, a 10-round budget burns 400K tokens on that subtask alone), failure shape (format errors recover cheaply; semantic failures rarely improve with more retries, but replanning does), and task determinism (pure code generation converges under retry; external-API tasks may not). On exhaustion, escalation should trigger replanning or human routing.

How Implementors Execute in Parallel with Isolated Context

Implementors are execution agents that receive a single scoped subtask and carry it out using available tools, under two hard constraints: a per-subtask retry cap (VeriMAP defaults to 3 attempts), and a structured output contract the Coordinator merges by key name for downstream nodes. The Plan-Execute paper notes that this separation produces cost-efficiency and reasoning-quality advantages. Because Implementor scope is narrower than the Coordinator's, teams can route Implementor work to cheaper models while reserving the strongest model for the Coordinator.

Preventing the failure modes documented earlier requires three isolation layers: input isolation (each Implementor runs in its own context window, populated only with what the Coordinator assembled), output isolation (structured contracts ensure only declared variables pass forward), and filesystem isolation (git worktrees prevent concurrent writes). Without the worktree layer, input and output isolation still leave the silent-overwrite failure mode open. Anthropic's Claude Agent SDK work confirms the rationale: subagents use isolated context windows and only send relevant information back to the orchestrator.

Cosmos runs each Implementor in an isolated Environment so agents execute in parallel waves without filesystem conflicts. Simpler multi-agent systems have Implementors consume the plan and produce code; Cosmos introduces a bidirectional relationship where Implementors read from and write back to the shared context as work completes, keeping plan and codebase synchronized. Teams new to this model can start with the Bring Your Own Agent guide.

How the Verifier Closes the Loop

The Verifier validates Implementor output against the original specification, producing pass/fail plus structured feedback that feeds back into the Coordinator's retry context. An SE survey on verification in agentic systems argues that effective verification requires layered pipelines combining LLM-based reasoning with static analysis, dynamic testing, and formal methods, since no single technique suffices alone. VeriMAP follows suit: each executor node can be equipped with verification functions, and a subtask is complete only when all pass. Anthropic's research on effective agents notes code is particularly amenable to this because solutions are verifiable through tests and output quality can be measured objectively.

Feedback quality determines retry-loop effectiveness. SpecLoop's formal equivalence approach feeds counterexamples back to refine specifications, showing that structured error feedback substantially outperforms binary pass/fail. Verifier accuracy has real failure modes: false positives block correct work and waste retry budget; false negatives ship bugs. Teams should log Verifier decisions, sample them for human review early, and order verification stages so deterministic checks (linters, type checkers, test runners) run before expensive LLM semantic review.

Verifier Component	Mechanism	Source
Spec compliance (formal)	Formal equivalence checking	arXiv 2603.02895
Automated testing	Dedicated test executor agent	arXiv 2312.13010 (AgentCoder)
Verification function assignment	Planner generates verification module per subtask; all must pass	arXiv 2510.17109
Feedback format	Structured diagnostics (compiler errors, counterexamples)	arXiv 2603.02895
Retry limit	Default 3 attempts per subtask; 5-iteration replanning cap	arXiv 2510.17109

Cosmos's Verifier checks results against the spec before developer review, routing failed subtasks back through the Coordinator with structured feedback so reviewers see triaged work instead of a raw failed diff.

Industry Convergence and Where Implementations Diverge

Several industry actors have moved toward three-role decompositions under the same structural pressure. The more interesting question is where implementations disagree, since divergences reveal which CIV properties are fundamental.

Addy Osmani's self-improving agents describes Planners, Workers, and a Judge, and his orchestrator guidance warns against jumping to hierarchical systems when a single agent suffices. Composio's agent orchestrator plans tasks, spawns agents, and "autonomously handles CI fixes, merge conflicts, and code reviews" via a review-and-merge gate. GitHub's Spec Kit formalizes the workflow as Constitution → Specify → (Clarify) → Plan → Tasks → Implement (see also Martin Fowler's analysis). Anthropic's multi-agent research system uses an orchestrator-worker pattern with specialized subagents, and Claude Code best practices add a Plan Mode for safe complex changes.

Role mapping is consistent, but operational layers diverge in three meaningful ways:

Runtime vs. human Verifier: VeriMAP and Cosmos run the Verifier as an automated gate per subtask. Spec Kit has no runtime Verifier; verification is human PR review. Spec Kit assumes a reviewer is always present; CIV does not.
Static vs. shared context: Spec Kit artifacts are written once and consumed. Cosmos's coordination layer is bidirectional: agents write back as work progresses, keeping downstream agents current. Osmani's model treats the codebase itself as the spec source, which collapses if the codebase drifts during long-running tasks.
Isolation primitive: Osmani relies on per-agent context only. Composio uses Docker sandboxes. Anthropic and Cosmos use git worktrees for filesystem isolation on top of context isolation. Context isolation alone leaves file overwrites open; worktrees close them.

This table maps each CIV role across five implementations:

CIV Role	Osmani	Composio	GitHub Spec Kit	Anthropic
Coordinator	Planner	Orchestrator	Specify + Plan	Lead Agent
Implementor	Worker	Spawned subagents	Implement	Subagent/Worker
Verifier	Judge	CI Review + Merge	PR Review (human)	Evaluator
Isolation	Per-agent context	Docker sandbox	Version-controlled artifacts	Git worktrees
Spec mechanism	Codebase reading	Design spec	spec.md + plan.md	Plan Mode output

VeriMAP (EACL 2026) adds a fifth research-side data point with a Subtask Coordinator, Executor, and per-node Verifier.

When CIV Is the Wrong Pattern

CIV carries real cost and hurts more than it helps in several cases:

Single-file or isolated changes: Coordination overhead dominates; a single well-prompted agent finishes faster.
Exploratory tasks: DAG decomposition requires a known target. A single agent with tool access outperforms a Coordinator that cannot plan what it does not yet understand.
Cost-sensitive environments: Token budgets can run roughly 5-15x a single-agent baseline depending on replanning frequency.
Weak Coordinator model: Output quality is capped by Coordinator planning quality. A frontier model for the Coordinator with cheaper Implementors is defensible; cheap models throughout produce poor DAGs.
Verifier false-positive dominance: Uncalibrated verification blocks correct work and burns retry budget, easily mistaken for "the pattern does not work."

Open source

augmentcode/augment-swebench-agent★875

Star on GitHub

The Coordinator is also a single point of failure: a bad decomposition can only be corrected by replanning. Teams should instrument Coordinator output (DAG quality, verification pass rates, token spend per subtask) before scaling.

How Cosmos Maps onto the CIV Pattern

Cosmos's architecture maps cleanly onto CIV: a unified cloud agents platform coordinating agents in isolated Environments through a persistent knowledge layer, with its runtime roles corresponding to the Coordinator, Implementor, and Verifier. It fits cross-service refactors across 3+ repositories, feature work with clear decomposition, and codebases where context management is the binding constraint.

Cosmos's shared filesystem and tenant memory address specification drift: Implementors read from and write to the shared context, so the coordination layer stays synchronized with actual work and updates propagate to all active agents. Every agent shares the same Context Engine, which processes 400,000+ files, so the Coordinator's decomposition, Implementors' execution, and the Verifier's validation all reason about the same architecture. In internal testing, this produces 5-10x faster completion and 40% fewer hallucinations on large-repo edits compared to tools that process files in isolation.

Cosmos enforces human-in-the-loop checkpoints as a platform feature. In a CIV workflow, this means a human reviews the Coordinator's spec, DAG, and parallelism plan before any code is written, then reviews Verifier output and approves commit, PR, and merge, seeing passed subtasks and retry history rather than a raw diff. Teams set the policies for where human judgment is required, and Cosmos enforces them. Amelia Wattenberger's interview with Refactoring describes the workspace primitive as an isolated environment bundling the codebase, agents, and a spec.

CIV inherits from classical distributed-systems patterns (Microsoft's saga pattern, the PEVR loop) with one key divergence: classical sagas assume deterministic, predefined compensating transactions, while CIV generates compensation (retry, replan, or escalate) at runtime. That is why retry budgets are architectural concerns and why Cosmos surfaces retry history at review time instead of silently consuming failures.

Adopt the CIV Pattern for Your Next Multi-Service Feature

A concrete starting configuration for a first adoption:

Pick one feature with 3-6 independent subtasks across at least two services
Cap parallel Implementors at 3-4 for the first run
Set a retry budget of 3 attempts per subtask (VeriMAP's default); escalate to replanning on exhaustion, capped at 5 iterations
Gate each subtask on deterministic checks first (lint, type, unit tests), then semantic review
Instrument Coordinator DAG quality and Verifier decisions from day one

Coordinator-Implementor-Verifier Pattern for Dev Teams

TL;DR

Why Parallel Agent Work Needs Coordination

The Agentic SDLC

Uncoordinated Agents Produce Structurally Predictable Failures

The CIV Pattern: Three Roles, Two Nested Control Loops

How the Coordinator Plans and Delegates

How Implementors Execute in Parallel with Isolated Context

How the Verifier Closes the Loop

Industry Convergence and Where Implementations Diverge

When CIV Is the Wrong Pattern

How Cosmos Maps onto the CIV Pattern

Adopt the CIV Pattern for Your Next Multi-Service Feature

FAQ

Written by

Paula Hingel

Give your codebase the agents it deserves

TL;DR

Why Parallel Agent Work Needs Coordination

The Agentic SDLC

Uncoordinated Agents Produce Structurally Predictable Failures

The CIV Pattern: Three Roles, Two Nested Control Loops

How the Coordinator Plans and Delegates

How Implementors Execute in Parallel with Isolated Context

How the Verifier Closes the Loop

Industry Convergence and Where Implementations Diverge

When CIV Is the Wrong Pattern

How Cosmos Maps onto the CIV Pattern

Adopt the CIV Pattern for Your Next Multi-Service Feature

FAQ

What is the coordinator agent pattern in coding?

How does the Verifier differ from running tests?

When should teams avoid CIV?

How does Cosmos's shared context differ from GitHub Spec Kit?

What prevents file conflicts between parallel agents?

Related

Written by

Paula Hingel

Give your codebase the agents it deserves