The year was 2025, and things were simple.
You worked with an agent like a pair programmer. You prompted, it coded. You reviewed, it fixed. The feedback loop was tight, and it felt familiar—like working with a new colleague who typed really, really fast.
That model held as long as work stayed mostly linear.
Then developers started doing more.
They opened multiple instances of VS Code on the same machine to run agent sessions in parallel. They spawned sub-agents for different tasks. Instead of handing work to an agent one piece at a time, they started assembling small teams of AI collaborators and letting them run.
On paper, this looked like a straightforward productivity win. In practice, it exposed coordination problems that most development workflows weren’t built to handle.
We are running into the limits of Git
Git works well when collaboration is mostly serial. Branches diverge occasionally, merge points are infrequent, and a human can usually read a diff and understand what changed.
That model breaks down when several agents are writing code against the same system at the same time.

Multi-agent merge patterns
Merge conflicts pile up faster than anyone can reason about them. Diffs get too big to review with confidence. Commit history keeps growing, but it’s no longer a useful record of intent. Git stops being a coordination tool and starts being the thing you work around.
Teams have started experimenting with workarounds. Git worktrees isolate agents into separate directories. File-level locking prevents simultaneous edits. These approaches can help in the short term, but they are adaptations layered on infrastructure that assumes a different style of work.¹
What’s becoming clear is that we’re pushing a coordination model beyond the conditions it was designed for.
The Mythical Man-Month, at machine speed
None of this is new.
In The Mythical Man-Month, Fred Brooks explained why software productivity doesn’t scale linearly with additional people. As more contributors are added, communication overhead increases, assumptions diverge, and integration becomes the dominant cost. Adding manpower to a late project makes it later, not faster.
AI agents don’t repeal this dynamic. They accelerate it.
When several agents are assigned to a poorly specified task, each can make progress quickly. The problem is that they’re often making progress against different interpretations of what “done” means. The resulting work may function in isolation and still fail at integration. The coordination cost Brooks described is still there—it just shows up over hours instead of months.
Parallel execution is no longer the hard part. Coordination is.
Tickets don’t describe agent work
This becomes especially visible when you look at how work is typically defined.
Issue trackers and task lists work reasonably well for humans. A ticket like “improve the onboarding flow” assumes an engineer will infer missing context, ask clarifying questions, and adjust as they learn more.
Humans read between the lines. Agents don’t.

A ticket relies on human interpretation. A spec is machine-executable.
If you don’t make the goal, constraints, and success criteria explicit, each agent will interpret the task in isolation. That’s how you end up with five implementations that technically work and still don’t fit together.
You see this most clearly once parallel execution outpaces a team’s ability to reason about what’s actually being built.
Recent research on agent-authored pull requests found that vague task descriptions and missing context are among the top reasons agent PRs get rejected. Agents that don’t understand the full scope of what they’re building produce work that doesn’t integrate.²

Source: Ehsani et al., "Where Do AI Coding Agents Fail?" arXiv:2601.15195 (2026)
This isn’t a limitation of the agents so much as a mismatch between how work is described and how agents actually operate.
The spec becomes the coordination layer
Once you’re running multiple agents in parallel, the spec stops being process and starts being infrastructure. It’s the only place their work reliably comes together.
Agents don’t share intuition or negotiate ambiguity. They act on whatever has been made explicit. Without a shared reference point, each agent makes reasonable local decisions and the system drifts. You only see the problem at the end, when the pieces don’t fit and the time you saved on execution gets lost to integration.
A thin spec doesn’t help. “Build a payment integration” fails for the same reason vague tickets fail: it leaves critical decisions unstated. Specs that work make two things explicit: human judgment about what success means and which constraints matter, and system context about the patterns, dependencies, and failure modes the code needs to respect.

Human intent meets system context
When both are present, review moves earlier and gets cheaper. You validate intent once instead of reconciling multiple implementations after the fact.
At that point, the plan becomes the product. The code is simply the result of executing it.
Why specs are foundational
As agent execution improves, the leverage point for humans shifts.
Code generation is becoming inexpensive. Every model improvement lowers the cost of implementation. What doesn’t get cheaper at the same rate is deciding what to build, defining boundaries, and recognizing edge cases before they cause damage. That work still requires judgment, and it doesn’t parallelize well.
This changes what it means to be effective. In agent-driven systems, the developers who succeed aren’t the ones who prompt most cleverly. They’re the ones who can make intent explicit and constraints clear.
Agent coordination makes this unavoidable. You can’t run multiple agents on vibes. They need a shared reference point—something more structured than chat history and more actionable than a static document. A spec can play that role, but only when it’s grounded in real system context: existing patterns, known failure modes, and the tribal knowledge teams usually carry informally.
This also exposes limits in existing tooling. Git assumes human-paced, mostly serial collaboration. Under multi-agent parallel execution, it tracks changes but not intent. Specs give teams something to version that reflects what they meant to build, not just the files that were generated.³
As output increases, review has to move up the stack. Reviewing large volumes of generated code doesn’t scale. Reviewing intent does. Catching a problem at the plan level is cheaper than reconciling it after execution.
What changes when the spec is primary
Treating the spec as primary changes where coordination happens.
Disagreements surface earlier, at the plan level, rather than in conflicting implementations. Parallel work is easier to reason about because everyone is working from the same playbook.
This changes what gets reviewed and versioned. Instead of discovering misalignment in code, teams catch it while there’s still time to adjust the plan.
What Intent is designed to support

Intent coordinating parallel agents to build a product page
Intent was built around this shift.
Intent treats the spec as the system of record, not an annotation layered on top of generated code. The spec is where intent lives, where work is divided, and where correctness is evaluated. Code is produced downstream of that decision, not treated as the primary artifact.
Faster execution doesn’t help if coordination collapses. The problem to solve is keeping teams aligned as output increases.
A change in where effort lives
This shift is bigger than tooling preferences.
For a long time, developer productivity meant translating intent into code more efficiently: better editors, better languages, better frameworks. All optimized for execution.
That assumption is weakening. Implementation is becoming cheap. The constraint is no longer can we build it, but do we know what we’re building.
When intent isn’t explicit, agent teams execute efficiently against the wrong goal. Speed stops being an advantage and starts amplifying mistakes.
The developers who adapt will be the ones who can make the definition of “done” explicit before any code is written, and spot failure modes at the plan level instead of in generated output.
The tools that win will be built for that reality. Specs won’t be documentation written after the fact. They’ll be the primary artifact.
At that point, the plan becomes the product.
References
¹ Nick Mitchinson, Using Git Worktrees for Multi-Feature Development with AI Agents (October 2025)
https://www.nrmitchi.com/2025/10/using-git-worktrees-for-multi-feature-development-with-ai-agents/
² Ehsani et al., Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub (January 2026)
https://arxiv.org/abs/2601.15195
³ Birgitta Böckeler, Understanding Spec-Driven Development (October 2025)
https://martinfowler.com/articles/exploring-gen-ai/sdd-3-tools.html
Written by

Sylvain Giuliani
Sylvain Giuliani is the head of growth at Augment Code, leveraging more than a decade of experience scaling developer-focused SaaS companies from $1 M to $100 M+ ARR. Before Augment, he built and led the go-to-market and operations engines at Census and served as CRO at Pusher, translating deep data insights into outsized revenue gains.
