The Coordinator-Implementor-Verifier (CIV) pattern is a three-role agent architecture where a Coordinator decomposes tasks into a dependency-ordered plan, parallel Implementors execute scoped subtasks in isolated contexts, and a Verifier validates each output against the original specification before proceeding.
TL;DR
Uncoordinated AI coding agents produce conflicts, duplicated code, and cascading errors. CIV addresses this by splitting work into planning (Coordinator), execution (Implementors), and validation (Verifier) with dependency ordering and isolation. Cosmos implements CIV with shared context, memory, and isolated environments. The pattern carries real cost in tokens and Coordinator quality dependence, making it wrong for single-file work.
Why Parallel Agent Work Needs Coordination
Running three AI agents on a feature branch sounds like 3x throughput until two agents edit the same file and one silently erases the other's work. Errors in agentic systems compound, so minor issues that are manageable in traditional software derail agents entirely. Codex documentation and community discussions note that concurrent work on the same files routinely produces merge conflicts a human would have caught at plan time.
The engineering response is better architecture. A plan-execute-validate pattern appears in research and practitioner discussions, including Megagon Labs' VeriMAP framework for verification-aware multi-agent planning. Augment Cosmos, a unified cloud agents platform for running coordinated agents across the software development lifecycle, operationalizes the pattern: agents share context and memory, run in isolated environments, and route work through a Verifier before anything merges.
See how Cosmos keeps parallel agents aligned through shared context and tenant memory across cross-service refactors.
Free tier available · VS Code extension · Takes 2 minutes
Uncoordinated Agents Produce Structurally Predictable Failures
Multi-agent coding failures resemble distributed-systems problems, driven by isolated context windows, stateless execution, and the lack of native shared-state primitives. Five failure modes recur often enough to treat as architectural:
- Silent file overwrites. Two agents edit the same file concurrently; the codebase still compiles, one agent's contribution is gone, and no warning fires because no locking or conflict detection exists in the agent execution layer by default. Git worktree isolation is the standard remediation, and the debugging parallel agents guide walks through how to apply it.
- Context drift across agent boundaries. One agent references a function a peer already refactored; another writes to a shared file a downstream agent depends on, with no notification mechanism. Anthropic's context engineering work explains the mechanism: every token attends to every other token, producing n² pairwise relationships, so stale state from peers crowds out signal.
- Duplicated implementations. Two agents each write a date-parsing utility with subtly different timezone handling; one caller gets ISO-8601, the other gets local time, and the bug surfaces in production.
- Cascading errors. An upstream agent hallucinates a function signature; downstream agents wire up to it as ground truth. Anthropic's NIST submission on agent safety describes this as a trust chain.
Premature task completion. An agent joins a workflow mid-stream and concludes the task is done. Anthropic's guidance on long-running agents recommends leaving incremental artifacts so the next session picks up correctly.
| Failure Pattern | Primary Root Cause | System Layer |
|---|---|---|
| Silent file overwrites | No write locking; no shared filesystem awareness | Execution layer |
| Context drift and stale state | Isolated context windows; no push-update mechanism | Context layer |
| Duplicated implementations | Stateless generation; no shared implementation registry | Context + data layer |
| Cascading errors | No inter-agent output validation; trust chain propagation | Orchestration layer |
| Premature task completion | No shared progress state; absent stop conditions | Task management layer |
Every failure maps to a missing architectural primitive. The CIV pattern introduces three roles that together address these failure modes.
The CIV Pattern: Three Roles, Two Nested Control Loops
Cosmos decomposes multi-agent development into three roles with isolated execution contexts and two nested control loops. The inner loop is a ReAct-style reason-act-observe cycle each Implementor runs in its own context. The outer loop spans the Coordinator and Verifier: a plan-execute-verify-replan cycle that crosses agent boundaries using structured data contracts. A flat ReAct agent with tool use has only the inner loop, so when it produces wrong output, no mechanism above exists to detect the failure, revise the plan, or route a corrected subtask to a fresh context.
The peer-reviewed VeriMAP system at EACL 2026 provides the most mechanically detailed published description of verification-aware planning with DAG-structured subtasks and dependencies.
How the Coordinator Plans and Delegates
The Coordinator transforms a specification into a directed acyclic graph (DAG) where each node is a bounded subtask and each edge is a dependency. The VeriMAP arXiv companion frames the Coordinator as the central orchestrator that follows a DAG-structured task plan to support reliable, adaptive execution across agents.
Decomposition couples task generation with verification design: for each subtask, VeriMAP's planner produces both the subtask instruction and a verification module, so each subtask arrives with its own definition of done. Two decomposition strategies dominate. Repository-level decomposition (SWE-agent, OpenDevin) treats the whole repo as the unit of analysis; it works well for monorepos but scales poorly past a few hundred thousand files. Standardized output coordination (MetaGPT, ChatDev) uses structured output schemas; it handles polyrepo cleanly but a wrong schema forces cascading plan revisions. Cosmos uses repository-level decomposition backed by the Context Engine, which is why it performs best on large, interconnected codebases.
VeriMAP gates execution on verification: the Coordinator cannot proceed to dependent subtasks until upstream verification succeeds, preventing unverified parents from poisoning children. For context assembly, VeriMAP uses a pull model where the Coordinator merges structured outputs into a dictionary and provides it as context to each child Executor. Pull keeps contexts small but requires the Coordinator to anticipate what each child needs; push (forwarding full history) avoids that burden but produces "lost in the middle" effects. Google's write-up on multi-agent framework design discusses context management at a production level.
Cosmos runs the Coordinator on top of the Context Engine, which performs semantic codebase analysis before decomposition. This architectural grounding lets the Coordinator produce scoped context per subtask rather than handing each Implementor an undifferentiated slice of the repository.
The retry budget is the single most important production configuration in CIV. VeriMAP's published defaults cap the per-subtask execution-verification loop at 3 attempts, with a separate 5-iteration cap on replanning. Teams should calibrate against cost per round (if a subtask consumes roughly 40K tokens per attempt, a 10-round budget burns 400K tokens on that subtask alone), failure shape (format errors recover cheaply; semantic failures rarely improve with more retries, but replanning does), and task determinism (pure code generation converges under retry; external-API tasks may not). On exhaustion, escalation should trigger replanning or human routing.
How Implementors Execute in Parallel with Isolated Context
Implementors are execution agents that receive a single scoped subtask and carry it out using available tools, under two hard constraints: a per-subtask retry cap (VeriMAP defaults to 3 attempts), and a structured output contract the Coordinator merges by key name for downstream nodes. The Plan-Execute paper notes that this separation produces cost-efficiency and reasoning-quality advantages. Because Implementor scope is narrower than the Coordinator's, teams can route Implementor work to cheaper models while reserving the strongest model for the Coordinator.
Preventing the failure modes documented earlier requires three isolation layers: input isolation (each Implementor runs in its own context window, populated only with what the Coordinator assembled), output isolation (structured contracts ensure only declared variables pass forward), and filesystem isolation (git worktrees prevent concurrent writes). Without the worktree layer, input and output isolation still leave the silent-overwrite failure mode open. Anthropic's Claude Agent SDK work confirms the rationale: subagents use isolated context windows and only send relevant information back to the orchestrator.
Cosmos runs each Implementor in an isolated Environment so agents execute in parallel waves without filesystem conflicts. Simpler multi-agent systems have Implementors consume the plan and produce code; Cosmos introduces a bidirectional relationship where Implementors read from and write back to the shared context as work completes, keeping plan and codebase synchronized. Teams new to this model can start with the Bring Your Own Agent guide.
Explore how Cosmos coordinates agents through shared context and tenant memory, preventing stale-context failures across parallel execution.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
How the Verifier Closes the Loop
The Verifier validates Implementor output against the original specification, producing pass/fail plus structured feedback that feeds back into the Coordinator's retry context. An SE survey on verification in agentic systems argues that effective verification requires layered pipelines combining LLM-based reasoning with static analysis, dynamic testing, and formal methods, since no single technique suffices alone. VeriMAP follows suit: each executor node can be equipped with verification functions, and a subtask is complete only when all pass. Anthropic's research on effective agents notes code is particularly amenable to this because solutions are verifiable through tests and output quality can be measured objectively.
Feedback quality determines retry-loop effectiveness. SpecLoop's formal equivalence approach feeds counterexamples back to refine specifications, showing that structured error feedback substantially outperforms binary pass/fail. Verifier accuracy has real failure modes: false positives block correct work and waste retry budget; false negatives ship bugs. Teams should log Verifier decisions, sample them for human review early, and order verification stages so deterministic checks (linters, type checkers, test runners) run before expensive LLM semantic review.
| Verifier Component | Mechanism | Source |
|---|---|---|
| Spec compliance (formal) | Formal equivalence checking | arXiv 2603.02895 |
| Automated testing | Dedicated test executor agent | arXiv 2312.13010 (AgentCoder) |
| Verification function assignment | Planner generates verification module per subtask; all must pass | arXiv 2510.17109 |
| Feedback format | Structured diagnostics (compiler errors, counterexamples) | arXiv 2603.02895 |
| Retry limit | Default 3 attempts per subtask; 5-iteration replanning cap | arXiv 2510.17109 |
Cosmos's Verifier checks results against the spec before developer review, routing failed subtasks back through the Coordinator with structured feedback so reviewers see triaged work instead of a raw failed diff.
Industry Convergence and Where Implementations Diverge
Several industry actors have moved toward three-role decompositions under the same structural pressure. The more interesting question is where implementations disagree, since divergences reveal which CIV properties are fundamental.
Addy Osmani's self-improving agents describes Planners, Workers, and a Judge, and his orchestrator guidance warns against jumping to hierarchical systems when a single agent suffices. Composio's agent orchestrator plans tasks, spawns agents, and "autonomously handles CI fixes, merge conflicts, and code reviews" via a review-and-merge gate. GitHub's Spec Kit formalizes the workflow as Constitution → Specify → (Clarify) → Plan → Tasks → Implement (see also Martin Fowler's analysis). Anthropic's multi-agent research system uses an orchestrator-worker pattern with specialized subagents, and Claude Code best practices add a Plan Mode for safe complex changes.
Role mapping is consistent, but operational layers diverge in three meaningful ways:
- Runtime vs. human Verifier: VeriMAP and Cosmos run the Verifier as an automated gate per subtask. Spec Kit has no runtime Verifier; verification is human PR review. Spec Kit assumes a reviewer is always present; CIV does not.
- Static vs. shared context: Spec Kit artifacts are written once and consumed. Cosmos's coordination layer is bidirectional: agents write back as work progresses, keeping downstream agents current. Osmani's model treats the codebase itself as the spec source, which collapses if the codebase drifts during long-running tasks.
- Isolation primitive: Osmani relies on per-agent context only. Composio uses Docker sandboxes. Anthropic and Cosmos use git worktrees for filesystem isolation on top of context isolation. Context isolation alone leaves file overwrites open; worktrees close them.
This table maps each CIV role across five implementations:
| CIV Role | Osmani | Composio | GitHub Spec Kit | Anthropic |
|---|---|---|---|---|
| Coordinator | Planner | Orchestrator | Specify + Plan | Lead Agent |
| Implementor | Worker | Spawned subagents | Implement | Subagent/Worker |
| Verifier | Judge | CI Review + Merge | PR Review (human) | Evaluator |
| Isolation | Per-agent context | Docker sandbox | Version-controlled artifacts | Git worktrees |
| Spec mechanism | Codebase reading | Design spec | spec.md + plan.md | Plan Mode output |
VeriMAP (EACL 2026) adds a fifth research-side data point with a Subtask Coordinator, Executor, and per-node Verifier.
When CIV Is the Wrong Pattern
CIV carries real cost and hurts more than it helps in several cases:
- Single-file or isolated changes: Coordination overhead dominates; a single well-prompted agent finishes faster.
- Exploratory tasks: DAG decomposition requires a known target. A single agent with tool access outperforms a Coordinator that cannot plan what it does not yet understand.
- Cost-sensitive environments: Token budgets can run roughly 5-15x a single-agent baseline depending on replanning frequency.
- Weak Coordinator model: Output quality is capped by Coordinator planning quality. A frontier model for the Coordinator with cheaper Implementors is defensible; cheap models throughout produce poor DAGs.
- Verifier false-positive dominance: Uncalibrated verification blocks correct work and burns retry budget, easily mistaken for "the pattern does not work."
The Coordinator is also a single point of failure: a bad decomposition can only be corrected by replanning. Teams should instrument Coordinator output (DAG quality, verification pass rates, token spend per subtask) before scaling.
How Cosmos Maps onto the CIV Pattern
Cosmos's architecture maps cleanly onto CIV: a unified cloud agents platform coordinating agents in isolated Environments through a persistent knowledge layer, with its runtime roles corresponding to the Coordinator, Implementor, and Verifier. It fits cross-service refactors across 3+ repositories, feature work with clear decomposition, and codebases where context management is the binding constraint.
Cosmos's shared filesystem and tenant memory address specification drift: Implementors read from and write to the shared context, so the coordination layer stays synchronized with actual work and updates propagate to all active agents. Every agent shares the same Context Engine, which processes 400,000+ files, so the Coordinator's decomposition, Implementors' execution, and the Verifier's validation all reason about the same architecture. In internal testing, this produces 5-10x faster completion and 40% fewer hallucinations on large-repo edits compared to tools that process files in isolation.
Cosmos enforces human-in-the-loop checkpoints as a platform feature. In a CIV workflow, this means a human reviews the Coordinator's spec, DAG, and parallelism plan before any code is written, then reviews Verifier output and approves commit, PR, and merge, seeing passed subtasks and retry history rather than a raw diff. Teams set the policies for where human judgment is required, and Cosmos enforces them. Amelia Wattenberger's interview with Refactoring describes the workspace primitive as an isolated environment bundling the codebase, agents, and a spec.
CIV inherits from classical distributed-systems patterns (Microsoft's saga pattern, the PEVR loop) with one key divergence: classical sagas assume deterministic, predefined compensating transactions, while CIV generates compensation (retry, replan, or escalate) at runtime. That is why retry budgets are architectural concerns and why Cosmos surfaces retry history at review time instead of silently consuming failures.
Adopt the CIV Pattern for Your Next Multi-Service Feature
A concrete starting configuration for a first adoption:
- Pick one feature with 3-6 independent subtasks across at least two services
- Cap parallel Implementors at 3-4 for the first run
- Set a retry budget of 3 attempts per subtask (VeriMAP's default); escalate to replanning on exhaustion, capped at 5 iterations
- Gate each subtask on deterministic checks first (lint, type, unit tests), then semantic review
- Instrument Coordinator DAG quality and Verifier decisions from day one
Cosmos keeps coordinated agents aligned through shared context and tenant memory, and its isolated Environments reduce file-collision risk during parallel execution.
Free tier available · VS Code extension · Takes 2 minutes
FAQ
Related
Written by

Paula Hingel
Paula writes about the patterns that make AI coding agents actually work — spec-driven development, multi-agent orchestration, and the context engineering layer most teams skip. Her guides draw on real build examples and focus on what changes when you move from a single AI assistant to a full agentic codebase.