The Coordinator-Implementor-Verifier (CIV) pattern is a three-role agent architecture where a Coordinator decomposes tasks into a dependency-ordered plan, parallel Implementors execute scoped subtasks in isolated contexts, and a Verifier validates each output against the original specification before proceeding.
TL;DR
Parallel AI coding agents without coordination produce conflicts, duplicated code, and cascading errors. The CIV pattern addresses this with three roles: a Coordinator that plans and delegates with dependency ordering, Implementors that execute in isolated contexts, and a Verifier that validates against the spec. Intent treats multi-agent development as a single, coordinated system where agents share a living spec and workspace and adapt without restarts. The pattern carries real cost in tokens and Coordinator quality dependence, which makes it the wrong fit for single-file work.
Why Parallel Agent Work Needs Coordination
Running three AI agents on a feature branch sounds like 3x throughput until two agents edit the same file and one silently erases the other's work. Errors in agentic systems compound, so minor issues that are manageable in traditional software derail agents entirely. Codex documentation and community discussions note that concurrent work on the same files routinely produces merge conflicts a human would have caught at plan time.
The engineering response is better architecture. A plan-execute-validate pattern appears in research and practitioner discussions, including Megagon Labs' VeriMAP framework for verification-aware multi-agent planning. Intent operationalizes the pattern inside a developer workspace where agents share a living spec, run in isolated worktrees, and route work through a Verifier before anything merges.
See how Intent's living specs keep parallel agents aligned across cross-service refactors.
Free tier available · VS Code extension · Takes 2 minutes
Uncoordinated Agents Produce Structurally Predictable Failures
Multi-agent coding failures resemble distributed-systems problems, driven by isolated context windows, stateless execution, and the lack of native shared-state primitives. Five failure modes recur often enough to be treated as architectural:
- Silent file overwrites. Two agents edit the same file concurrently; the codebase still compiles, one agent's contribution is gone, and no warning fires because no locking or conflict detection exists in the agent execution layer by default. Git worktree isolation is the standard remediation, and the debugging parallel agents guide walks through how to apply it.
- Context drift across agent boundaries. One agent references a function a peer already refactored; another writes to a shared file a downstream agent depends on, with no notification mechanism. Anthropic's context engineering work explains the mechanism: every token attends to every other token, producing n² pairwise relationships, so stale state from peers crowds out signal.
- Duplicated implementations. Two agents each write a date-parsing utility with subtly different timezone handling; one caller gets ISO-8601, the other gets local time, and the bug surfaces in production.
- Cascading errors. An upstream agent hallucinates a function signature; downstream agents wire up to it as ground truth. Anthropic's NIST submission on agent safety describes this as a trust chain.
- Premature task completion. An agent joins a workflow mid-stream and concludes the task is done. Anthropic's guidance on long-running agents recommends leaving incremental artifacts so the next session picks up correctly.
| Failure Pattern | Primary Root Cause | System Layer |
|---|---|---|
| Silent file overwrites | No write locking; no shared filesystem awareness | Execution layer |
| Context drift and stale state | Isolated context windows; no push-update mechanism | Context layer |
| Duplicated implementations | Stateless generation; no shared implementation registry | Context + data layer |
| Cascading errors | No inter-agent output validation; trust chain propagation | Orchestration layer |
| Premature task completion | No shared progress state; absent stop conditions | Task management layer |
Every failure maps to a missing architectural primitive. The CIV pattern introduces three roles that together address the failure modes above.
The CIV Pattern: Three Roles, Two Nested Control Loops
Intent's architecture decomposes multi-agent development into three roles with isolated execution contexts and two nested control loops. The inner loop is a ReAct-style reason-act-observe cycle each Implementor runs in its own context. The outer loop spans the Coordinator and Verifier: a plan-execute-verify-replan cycle that crosses agent boundaries using structured data contracts. A flat ReAct agent with tool use has only the inner loop, so when it produces wrong output, no mechanism above exists to detect the failure, revise the plan, or route a corrected subtask to a fresh context.
The peer-reviewed VeriMAP system at EACL 2026 provides the most mechanically detailed published description of verification-aware planning with DAG-structured subtasks and dependencies.
How the Coordinator Plans and Delegates
The Coordinator transforms a specification into a directed acyclic graph (DAG) where each node is a bounded subtask and each edge is a dependency. The VeriMAP arXiv companion defines the Coordinator as "the central orchestrator of multi-agent task execution, following the task plan (represented as a DAG) to support reliable and adaptive execution."
Decomposition couples task generation with verification design: for each subtask, VeriMAP's planner produces both the subtask instruction (the work unit) and a verification module (a suite of verification functions), so each subtask arrives with its own definition of done. Two decomposition strategies dominate and suit different codebase shapes. Repository-level decomposition (SWE-agent, OpenDevin) treats the whole repo as the unit of analysis; it works well for monorepos but scales poorly past a few hundred thousand files. Standardized output coordination (MetaGPT, ChatDev) uses structured output schemas; it handles polyrepo cleanly but a wrong schema forces cascading plan revisions. Intent uses repository-level decomposition backed by the Context Engine, which is why it performs best on large, interconnected codebases. The MaintainCoder paper offers related context on architectural awareness in planning.
VeriMAP gates execution on verification: the Coordinator cannot proceed to dependent subtasks until upstream verification succeeds, which prevents unverified parents from poisoning children. For context assembly, VeriMAP uses a pull model: "The Coordinator merges their structured outputs into a single dictionary. It then provides this dictionary as context to the child node's Executor." Pull keeps contexts small but requires the Coordinator to anticipate what each child needs; push (forwarding full history) avoids that burden but accumulates context and produces "lost in the middle" effects. Pull wins for independent DAG nodes; push wins for short exploratory tasks. Google's write-up on multi-agent framework design discusses context management at a production level.
Intent's Coordinator runs on top of the Context Engine, which performs semantic codebase analysis before decomposition and drafts a spec that serves as a living document for the project. This architectural grounding lets specialist agents receive scoped context per subtask rather than an undifferentiated slice of the repository.
The retry budget is the single most important production configuration in CIV. VeriMAP's published defaults cap the per-subtask execution-verification loop at 3 attempts, with a separate 5-iteration cap on replanning. Teams should calibrate against cost per round (at 40K tokens per attempt, a 10-round budget burns 400K tokens per subtask), failure shape (format errors recover cheaply with low budgets; semantic failures rarely improve with more retries, but replanning does), and task determinism (pure code generation converges under retry; external-API tasks may never converge). On exhaustion, escalation should trigger replanning or human routing; indefinite retry loops waste tokens without improving outcomes.
How Implementors Execute in Parallel with Isolated Context
Implementors are execution agents that receive a single scoped subtask and carry it out using available tools, under two hard constraints: a per-subtask retry cap (VeriMAP defaults to 3 attempts), and a structured output contract the Coordinator merges by key name for downstream nodes. The Plan-Execute paper notes that this separation produces cost-efficiency and reasoning-quality advantages. Because Implementor scope is narrower than the Coordinator's, teams can route Implementor work to cheaper models while reserving the strongest model for the Coordinator.
Preventing the failure modes documented earlier requires three isolation layers: input isolation (each Implementor runs in its own context window, populated only with what the Coordinator assembled), output isolation (structured contracts ensure only declared variables pass forward), and filesystem isolation (git worktrees prevent concurrent writes). Without the worktree layer, input and output isolation still leave the silent-overwrite failure mode open. Anthropic's Claude Agent SDK work states the rationale: subagents enable parallelization and help manage context because they use isolated context windows and only send relevant information back to the orchestrator.
Intent runs each Implementor in an isolated git worktree so Specialist Agents execute in parallel waves without filesystem conflicts. Simpler multi-agent systems have Implementors consume the plan and produce code; Intent introduces a bidirectional relationship where Implementors read from and write back to the spec as work completes, keeping plan and codebase synchronized. Teams new to this model can start with the Bring Your Own Agent guide.
How the Verifier Closes the Loop
The Verifier validates Implementor output against the original specification, producing pass/fail plus structured feedback that feeds back into the Coordinator's retry context. An SE survey on LLM verification characterizes effective verification as "layered, hybrid pipelines that integrate LLM-based reasoning with static analysis, dynamic testing, and formal methods... no single technique is sufficient in isolation." VeriMAP follows suit: each executor node can be equipped with verification functions, and a subtask is complete only when all pass. Anthropic's research on effective agents notes code is particularly amenable to this because solutions are verifiable through tests and output quality can be measured objectively.
Feedback quality determines retry-loop effectiveness. SpecLoop's formal equivalence approach feeds counterexamples back to refine specifications, showing that structured error feedback substantially outperforms binary pass/fail. Praetorian's deterministic AI orchestration post describes a related blocked-exit return mechanism. Verifier accuracy has real failure modes: false positives block correct work and waste retry budget; false negatives ship bugs. Teams should log Verifier decisions, sample them for human review early, and order verification stages so deterministic checks (linters, type checkers, test runners) run before expensive LLM semantic review.
| Verifier Component | Mechanism | Source |
|---|---|---|
| Spec compliance (formal) | Formal equivalence checking | arXiv 2603.02895 |
| Automated testing | Dedicated test executor agent | arXiv 2312.13010 (AgentCoder) |
| Verification function assignment | Planner generates verification module per subtask; all must pass | arXiv 2510.17109 |
| Feedback format | Structured diagnostics (compiler errors, counterexamples) | arXiv 2603.02895 |
| Retry limit | Default 3 attempts per subtask; 5-iteration replanning cap | arXiv 2510.17109 |
Intent's Verifier checks results against the spec before developer review, routing failed subtasks back through the Coordinator with structured feedback so reviewers see triaged work instead of a raw failed diff.
Explore how Intent's Coordinator, Implementors, and Verifier share a living spec that prevents stale-context failures across parallel agents.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
Industry Convergence and Where Implementations Diverge
Several industry actors have moved toward three-role decompositions under the same structural pressure. The more interesting question is where implementations disagree, since divergences reveal which CIV properties are fundamental.
Addy Osmani's self-improving agents describes Planners, Workers, and a Judge, and his orchestrator guidance warns against jumping to hierarchical systems when a single agent suffices. Composio's agent orchestrator plans tasks, spawns agents, and "autonomously handles CI fixes, merge conflicts, and code reviews" via a review-and-merge gate. GitHub's Spec Kit formalizes the workflow as Constitution → Specify → (Clarify) → Plan → Tasks → Implement (see also Martin Fowler's analysis). Anthropic's multi-agent research system uses an orchestrator-worker pattern with specialized subagents, and Claude Code best practices add a Plan Mode for safe complex changes.
Role mapping is consistent, but operational layers diverge in three meaningful ways:
- Runtime vs. human Verifier: VeriMAP and Intent run the Verifier as an automated gate per subtask. Spec Kit has no runtime Verifier; verification is human PR review. Spec Kit assumes a reviewer is always present; CIV does not.
- Static vs. living spec: Spec Kit artifacts are written once and consumed. Intent's living spec is bidirectional, with Implementors writing back. Osmani's model treats the codebase itself as the spec source, which collapses if the codebase drifts during long-running tasks.
- Isolation primitive: Osmani relies on per-agent context only. Composio uses Docker sandboxes. Anthropic and Intent use git worktrees for filesystem isolation on top of context isolation. Context isolation alone leaves file overwrites open; worktrees close them.
| CIV Role | Osmani | Composio | GitHub Spec Kit | Anthropic |
|---|---|---|---|---|
| Coordinator | Planner | Orchestrator | Specify + Plan | Lead Agent |
| Implementor | Worker | Spawned subagents | Implement | Subagent/Worker |
| Verifier | Judge | CI Review + Merge | PR Review (human) | Evaluator |
| Isolation | Per-agent context | Docker sandbox | Version-controlled artifacts | Git worktrees |
| Spec mechanism | Codebase reading | Design spec | spec.md + plan.md | Plan Mode output |
VeriMAP (EACL 2026) adds a fifth research-side data point with a Subtask Coordinator, Executor, and per-node Verifier.
When CIV Is the Wrong Pattern
CIV carries real cost and hurts more than it helps in several cases:
- Single-file or isolated changes: Coordination overhead dominates; a single well-prompted agent finishes faster.
- Exploratory tasks: DAG decomposition requires a known target. A single agent with tool access outperforms a Coordinator that cannot plan what it does not yet understand.
- Cost-sensitive environments: Token budgets can run 5-15x a single-agent baseline depending on replanning frequency.
- Weak Coordinator model: System output quality is capped by Coordinator planning quality. A frontier model for the Coordinator with cheaper Implementors is defensible; cheap models throughout produce poor DAGs.
- Verifier false-positive dominance: Uncalibrated verification blocks correct work and burns retry budget, which is easy to mistake for "the pattern does not work."
The Coordinator is also a single point of failure: a bad decomposition can only be corrected by replanning. Teams should instrument Coordinator output (DAG quality, verification pass rates, token spend per subtask) before scaling.
Intent as the Reference CIV Implementation
Intent is the production implementation: a living specification coordinates multiple agents, each in an isolated git worktree, with the Coordinator, Implementor, and Verifier roles mapped directly onto CIV. It fits cross-service refactors touching 3+ repositories, feature work with clear decomposition, migrations where spec drift is a real risk, and codebases large enough that context management is the binding constraint.
Intent's living spec addresses specification drift: Implementors read from and write to the spec, so the coordination artifact stays synchronized with actual work and updates propagate to all active agents. Every agent shares the same Context Engine, which processes 400,000+ files, so the Coordinator's decomposition, Implementors' execution, and the Verifier's validation all reason about the same architecture. Based on Augment Code's internal testing, this produces 5-10x faster completion and 40% fewer hallucinations on large-repo edits compared to tools that process files in isolation.
Intent has two mandatory human checkpoints: Checkpoint 1 (human reviews the Coordinator's spec, DAG, and parallelism plan before any code is written) and Checkpoint 2 (human reviews Verifier output and approves commit, PR, and merge, seeing passed subtasks and retry history rather than a raw diff), plus an optional manual spec-edit step between them. Amelia Wattenberger's interview with Refactoring describes the workspace primitive: an isolated environment bundling a copy of the codebase with its own branch and git worktree, a set of agents, notes, terminals, and a spec.
CIV inherits from classical distributed-systems patterns (Microsoft's saga pattern, the PEVR loop) with one key divergence: classical sagas assume deterministic, predefined compensating transactions, while CIV generates compensation (retry, replan, or escalate) at runtime. That is why retry budgets are architectural concerns and why Intent surfaces retry history at Checkpoint 2 instead of silently consuming failures.
Adopt the CIV Pattern for Your Next Multi-Service Feature
A concrete starting configuration for a first adoption:
- Pick one feature with 3-6 independent subtasks across at least two services
- Cap parallel Implementors at 3-4 for the first run
- Set a retry budget of 3 attempts per subtask (VeriMAP's default); escalate to replanning on exhaustion, capped at 5 iterations
- Gate each subtask on deterministic checks first (lint, type, unit tests), then semantic review
- Instrument Coordinator DAG quality and Verifier decisions from day one
Intent's living specs keep coordinated agents aligned as implementation changes accumulate, and its isolated workspaces reduce file-collision risk during parallel execution.
Free tier available · VS Code extension · Takes 2 minutes
FAQ
Related
Written by

Paula Hingel
Paula writes about the patterns that make AI coding agents actually work — spec-driven development, multi-agent orchestration, and the context engineering layer most teams skip. Her guides draw on real build examples and focus on what changes when you move from a single AI assistant to a full agentic codebase.