Skip to content
Book demo
Back to Guides

Coordinator-Implementor-Verifier Pattern for Dev Teams

Apr 20, 2026Last updated: May 22, 2026
Paula Hingel
Paula Hingel
Coordinator-Implementor-Verifier Pattern for Dev Teams

The Coordinator-Implementor-Verifier (CIV) pattern is a three-role agent architecture where a Coordinator decomposes tasks into a dependency-ordered plan, parallel Implementors execute scoped subtasks in isolated contexts, and a Verifier validates each output against the original specification before proceeding.

TL;DR

Uncoordinated AI coding agents produce conflicts, duplicated code, and cascading errors. CIV addresses this by splitting work into planning (Coordinator), execution (Implementors), and validation (Verifier) with dependency ordering and isolation. Cosmos implements CIV with shared context, memory, and isolated environments. The pattern carries real cost in tokens and Coordinator quality dependence, making it wrong for single-file work.

Why Parallel Agent Work Needs Coordination

Running three AI agents on a feature branch sounds like 3x throughput until two agents edit the same file and one silently erases the other's work. Errors in agentic systems compound, so minor issues that are manageable in traditional software derail agents entirely. Codex documentation and community discussions note that concurrent work on the same files routinely produces merge conflicts a human would have caught at plan time.

The engineering response is better architecture. A plan-execute-validate pattern appears in research and practitioner discussions, including Megagon Labs' VeriMAP framework for verification-aware multi-agent planning. Augment Cosmos, a unified cloud agents platform for running coordinated agents across the software development lifecycle, operationalizes the pattern: agents share context and memory, run in isolated environments, and route work through a Verifier before anything merges.

See how Cosmos keeps parallel agents aligned through shared context and tenant memory across cross-service refactors.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

Uncoordinated Agents Produce Structurally Predictable Failures

Multi-agent coding failures resemble distributed-systems problems, driven by isolated context windows, stateless execution, and the lack of native shared-state primitives. Five failure modes recur often enough to treat as architectural:

  • Silent file overwrites. Two agents edit the same file concurrently; the codebase still compiles, one agent's contribution is gone, and no warning fires because no locking or conflict detection exists in the agent execution layer by default. Git worktree isolation is the standard remediation, and the debugging parallel agents guide walks through how to apply it.
  • Context drift across agent boundaries. One agent references a function a peer already refactored; another writes to a shared file a downstream agent depends on, with no notification mechanism. Anthropic's context engineering work explains the mechanism: every token attends to every other token, producing n² pairwise relationships, so stale state from peers crowds out signal.
  • Duplicated implementations. Two agents each write a date-parsing utility with subtly different timezone handling; one caller gets ISO-8601, the other gets local time, and the bug surfaces in production.
  • Cascading errors. An upstream agent hallucinates a function signature; downstream agents wire up to it as ground truth. Anthropic's NIST submission on agent safety describes this as a trust chain.

Premature task completion. An agent joins a workflow mid-stream and concludes the task is done. Anthropic's guidance on long-running agents recommends leaving incremental artifacts so the next session picks up correctly.

Failure PatternPrimary Root CauseSystem Layer
Silent file overwritesNo write locking; no shared filesystem awarenessExecution layer
Context drift and stale stateIsolated context windows; no push-update mechanismContext layer
Duplicated implementationsStateless generation; no shared implementation registryContext + data layer
Cascading errorsNo inter-agent output validation; trust chain propagationOrchestration layer
Premature task completionNo shared progress state; absent stop conditionsTask management layer

Every failure maps to a missing architectural primitive. The CIV pattern introduces three roles that together address these failure modes.

The CIV Pattern: Three Roles, Two Nested Control Loops

Cosmos decomposes multi-agent development into three roles with isolated execution contexts and two nested control loops. The inner loop is a ReAct-style reason-act-observe cycle each Implementor runs in its own context. The outer loop spans the Coordinator and Verifier: a plan-execute-verify-replan cycle that crosses agent boundaries using structured data contracts. A flat ReAct agent with tool use has only the inner loop, so when it produces wrong output, no mechanism above exists to detect the failure, revise the plan, or route a corrected subtask to a fresh context.

The peer-reviewed VeriMAP system at EACL 2026 provides the most mechanically detailed published description of verification-aware planning with DAG-structured subtasks and dependencies.

text
Original Specification
┌─────────────┐
│ COORDINATOR │ ← Decomposes spec into DAG
│ │ Computes topological order
│ │ Compiles scoped context per node
└──────┬───────┘
│ task + scoped context
┌─────────────────────────────────┐
│ IMPLEMENTOR(s) │ ← Parallel where DAG allows
│ (isolated context windows) │ Each receives only its node's context
└──────────────┬──────────────────┘
│ structured output
┌─────────────┐
│ VERIFIER │ ← Validates against original spec
└──────┬───────┘
┌───────┴────────┐
│ │
PASS FAIL
│ │
▼ ▼
Coordinator Coordinator retries
proceeds to (up to N=3 attempts)
next DAG node with updated context:
prior attempt +
verifier feedback
(if limit exceeded)
Dynamic replanning
or escalation

How the Coordinator Plans and Delegates

The Coordinator transforms a specification into a directed acyclic graph (DAG) where each node is a bounded subtask and each edge is a dependency. The VeriMAP arXiv companion frames the Coordinator as the central orchestrator that follows a DAG-structured task plan to support reliable, adaptive execution across agents.

Decomposition couples task generation with verification design: for each subtask, VeriMAP's planner produces both the subtask instruction and a verification module, so each subtask arrives with its own definition of done. Two decomposition strategies dominate. Repository-level decomposition (SWE-agent, OpenDevin) treats the whole repo as the unit of analysis; it works well for monorepos but scales poorly past a few hundred thousand files. Standardized output coordination (MetaGPT, ChatDev) uses structured output schemas; it handles polyrepo cleanly but a wrong schema forces cascading plan revisions. Cosmos uses repository-level decomposition backed by the Context Engine, which is why it performs best on large, interconnected codebases.

VeriMAP gates execution on verification: the Coordinator cannot proceed to dependent subtasks until upstream verification succeeds, preventing unverified parents from poisoning children. For context assembly, VeriMAP uses a pull model where the Coordinator merges structured outputs into a dictionary and provides it as context to each child Executor. Pull keeps contexts small but requires the Coordinator to anticipate what each child needs; push (forwarding full history) avoids that burden but produces "lost in the middle" effects. Google's write-up on multi-agent framework design discusses context management at a production level.

Cosmos runs the Coordinator on top of the Context Engine, which performs semantic codebase analysis before decomposition. This architectural grounding lets the Coordinator produce scoped context per subtask rather than handing each Implementor an undifferentiated slice of the repository.

The retry budget is the single most important production configuration in CIV. VeriMAP's published defaults cap the per-subtask execution-verification loop at 3 attempts, with a separate 5-iteration cap on replanning. Teams should calibrate against cost per round (if a subtask consumes roughly 40K tokens per attempt, a 10-round budget burns 400K tokens on that subtask alone), failure shape (format errors recover cheaply; semantic failures rarely improve with more retries, but replanning does), and task determinism (pure code generation converges under retry; external-API tasks may not). On exhaustion, escalation should trigger replanning or human routing.

How Implementors Execute in Parallel with Isolated Context

Implementors are execution agents that receive a single scoped subtask and carry it out using available tools, under two hard constraints: a per-subtask retry cap (VeriMAP defaults to 3 attempts), and a structured output contract the Coordinator merges by key name for downstream nodes. The Plan-Execute paper notes that this separation produces cost-efficiency and reasoning-quality advantages. Because Implementor scope is narrower than the Coordinator's, teams can route Implementor work to cheaper models while reserving the strongest model for the Coordinator.

Preventing the failure modes documented earlier requires three isolation layers: input isolation (each Implementor runs in its own context window, populated only with what the Coordinator assembled), output isolation (structured contracts ensure only declared variables pass forward), and filesystem isolation (git worktrees prevent concurrent writes). Without the worktree layer, input and output isolation still leave the silent-overwrite failure mode open. Anthropic's Claude Agent SDK work confirms the rationale: subagents use isolated context windows and only send relevant information back to the orchestrator.

Cosmos runs each Implementor in an isolated Environment so agents execute in parallel waves without filesystem conflicts. Simpler multi-agent systems have Implementors consume the plan and produce code; Cosmos introduces a bidirectional relationship where Implementors read from and write back to the shared context as work completes, keeping plan and codebase synchronized. Teams new to this model can start with the Bring Your Own Agent guide.

Explore how Cosmos coordinates agents through shared context and tenant memory, preventing stale-context failures across parallel execution.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline
···
$ cat build.log | auggie --print --quiet \
"Summarize the failure"
Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42
Fix: npm install lodash @types/lodash

How the Verifier Closes the Loop

The Verifier validates Implementor output against the original specification, producing pass/fail plus structured feedback that feeds back into the Coordinator's retry context. An SE survey on verification in agentic systems argues that effective verification requires layered pipelines combining LLM-based reasoning with static analysis, dynamic testing, and formal methods, since no single technique suffices alone. VeriMAP follows suit: each executor node can be equipped with verification functions, and a subtask is complete only when all pass. Anthropic's research on effective agents notes code is particularly amenable to this because solutions are verifiable through tests and output quality can be measured objectively.

Feedback quality determines retry-loop effectiveness. SpecLoop's formal equivalence approach feeds counterexamples back to refine specifications, showing that structured error feedback substantially outperforms binary pass/fail. Verifier accuracy has real failure modes: false positives block correct work and waste retry budget; false negatives ship bugs. Teams should log Verifier decisions, sample them for human review early, and order verification stages so deterministic checks (linters, type checkers, test runners) run before expensive LLM semantic review.

Verifier ComponentMechanismSource
Spec compliance (formal)Formal equivalence checkingarXiv 2603.02895
Automated testingDedicated test executor agentarXiv 2312.13010 (AgentCoder)
Verification function assignmentPlanner generates verification module per subtask; all must passarXiv 2510.17109
Feedback formatStructured diagnostics (compiler errors, counterexamples)arXiv 2603.02895
Retry limitDefault 3 attempts per subtask; 5-iteration replanning caparXiv 2510.17109

Cosmos's Verifier checks results against the spec before developer review, routing failed subtasks back through the Coordinator with structured feedback so reviewers see triaged work instead of a raw failed diff.

Industry Convergence and Where Implementations Diverge

Several industry actors have moved toward three-role decompositions under the same structural pressure. The more interesting question is where implementations disagree, since divergences reveal which CIV properties are fundamental.

Addy Osmani's self-improving agents describes Planners, Workers, and a Judge, and his orchestrator guidance warns against jumping to hierarchical systems when a single agent suffices. Composio's agent orchestrator plans tasks, spawns agents, and "autonomously handles CI fixes, merge conflicts, and code reviews" via a review-and-merge gate. GitHub's Spec Kit formalizes the workflow as Constitution → Specify → (Clarify) → Plan → Tasks → Implement (see also Martin Fowler's analysis). Anthropic's multi-agent research system uses an orchestrator-worker pattern with specialized subagents, and Claude Code best practices add a Plan Mode for safe complex changes.

Role mapping is consistent, but operational layers diverge in three meaningful ways:

  • Runtime vs. human Verifier: VeriMAP and Cosmos run the Verifier as an automated gate per subtask. Spec Kit has no runtime Verifier; verification is human PR review. Spec Kit assumes a reviewer is always present; CIV does not.
  • Static vs. shared context: Spec Kit artifacts are written once and consumed. Cosmos's coordination layer is bidirectional: agents write back as work progresses, keeping downstream agents current. Osmani's model treats the codebase itself as the spec source, which collapses if the codebase drifts during long-running tasks.
  • Isolation primitive: Osmani relies on per-agent context only. Composio uses Docker sandboxes. Anthropic and Cosmos use git worktrees for filesystem isolation on top of context isolation. Context isolation alone leaves file overwrites open; worktrees close them.

This table maps each CIV role across five implementations:

CIV RoleOsmaniComposioGitHub Spec KitAnthropic
CoordinatorPlannerOrchestratorSpecify + PlanLead Agent
ImplementorWorkerSpawned subagentsImplementSubagent/Worker
VerifierJudgeCI Review + MergePR Review (human)Evaluator
IsolationPer-agent contextDocker sandboxVersion-controlled artifactsGit worktrees
Spec mechanismCodebase readingDesign specspec.md + plan.mdPlan Mode output

VeriMAP (EACL 2026) adds a fifth research-side data point with a Subtask Coordinator, Executor, and per-node Verifier.

When CIV Is the Wrong Pattern

CIV carries real cost and hurts more than it helps in several cases:

  • Single-file or isolated changes: Coordination overhead dominates; a single well-prompted agent finishes faster.
  • Exploratory tasks: DAG decomposition requires a known target. A single agent with tool access outperforms a Coordinator that cannot plan what it does not yet understand.
  • Cost-sensitive environments: Token budgets can run roughly 5-15x a single-agent baseline depending on replanning frequency.
  • Weak Coordinator model: Output quality is capped by Coordinator planning quality. A frontier model for the Coordinator with cheaper Implementors is defensible; cheap models throughout produce poor DAGs.
  • Verifier false-positive dominance: Uncalibrated verification blocks correct work and burns retry budget, easily mistaken for "the pattern does not work."
Open source
augmentcode/review-pr36
Star on GitHub

The Coordinator is also a single point of failure: a bad decomposition can only be corrected by replanning. Teams should instrument Coordinator output (DAG quality, verification pass rates, token spend per subtask) before scaling.

How Cosmos Maps onto the CIV Pattern

Cosmos's architecture maps cleanly onto CIV: a unified cloud agents platform coordinating agents in isolated Environments through a persistent knowledge layer, with its runtime roles corresponding to the Coordinator, Implementor, and Verifier. It fits cross-service refactors across 3+ repositories, feature work with clear decomposition, and codebases where context management is the binding constraint.

Cosmos's shared filesystem and tenant memory address specification drift: Implementors read from and write to the shared context, so the coordination layer stays synchronized with actual work and updates propagate to all active agents. Every agent shares the same Context Engine, which processes 400,000+ files, so the Coordinator's decomposition, Implementors' execution, and the Verifier's validation all reason about the same architecture. In internal testing, this produces 5-10x faster completion and 40% fewer hallucinations on large-repo edits compared to tools that process files in isolation.

Cosmos enforces human-in-the-loop checkpoints as a platform feature. In a CIV workflow, this means a human reviews the Coordinator's spec, DAG, and parallelism plan before any code is written, then reviews Verifier output and approves commit, PR, and merge, seeing passed subtasks and retry history rather than a raw diff. Teams set the policies for where human judgment is required, and Cosmos enforces them. Amelia Wattenberger's interview with Refactoring describes the workspace primitive as an isolated environment bundling the codebase, agents, and a spec.

CIV inherits from classical distributed-systems patterns (Microsoft's saga pattern, the PEVR loop) with one key divergence: classical sagas assume deterministic, predefined compensating transactions, while CIV generates compensation (retry, replan, or escalate) at runtime. That is why retry budgets are architectural concerns and why Cosmos surfaces retry history at review time instead of silently consuming failures.

Adopt the CIV Pattern for Your Next Multi-Service Feature

A concrete starting configuration for a first adoption:

  • Pick one feature with 3-6 independent subtasks across at least two services
  • Cap parallel Implementors at 3-4 for the first run
  • Set a retry budget of 3 attempts per subtask (VeriMAP's default); escalate to replanning on exhaustion, capped at 5 iterations
  • Gate each subtask on deterministic checks first (lint, type, unit tests), then semantic review
  • Instrument Coordinator DAG quality and Verifier decisions from day one

Cosmos keeps coordinated agents aligned through shared context and tenant memory, and its isolated Environments reduce file-collision risk during parallel execution.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

FAQ

Written by

Paula Hingel

Paula Hingel

Paula writes about the patterns that make AI coding agents actually work — spec-driven development, multi-agent orchestration, and the context engineering layer most teams skip. Her guides draw on real build examples and focus on what changes when you move from a single AI assistant to a full agentic codebase.

Get Started

Give your codebase the agents it deserves

Install Augment to get started. Works with codebases of any size, from side projects to enterprise monorepos.