Living specs are the most reliable way to guide AI agent development because implementation changes flow back into the specification and prevent spec drift.
TL;DR
AI-generated code drifts quickly when teams keep regenerating it based on stale requirements. Static specs fail because they only move information one way. Living specs reduce that gap by writing implementation decisions back into the spec, keeping requirements, constraints, and code aligned across repeated development cycles.
Engineering teams using AI coding agents quickly run into the same failure mode: the markdown says one thing, the repository says another, and the next-generation cycle amplifies the mismatch. InfoQ research describes this as the "spec gap," a drift problem where gaps in the specification widen with direct code changes and keep resurfacing because AI generation is non-deterministic. That makes static specs fragile in workflows that expect repeated regeneration, review, or multi-step handoffs.
Living specs change the direction of information flow. Instead of treating the spec as a one-time prompt, teams treat it as an evolving artifact that captures requirements, constraints, and implementation decisions. ThoughtWorks, Anthropic, and Addy Osmani all point toward the same operating principle: write enough structure for the agent to act correctly, then update that structure as reality changes.
For teams evaluating orchestration tools, some vendors now describe living specs as workflow infrastructure rather than just documentation. Augment Code's Intent product, for example, uses repository context from its Context Engine to draft and coordinate spec-driven work, and there is some independent validation of its product behavior, though much of it comes from vendor-adjacent rather than fully independent sources. This guide explains what living specs are, how to structure them, where teams overspecify, and how to review agent-updated specs without creating documentation debt.
Why Static Specs Fail AI Agent Workflows
Static specs fail AI agent workflows because they only move information in one direction, which lets implementation drift compound across regeneration cycles. Teams that regenerate code from outdated requirements repeatedly see the same pattern: the specification says one thing, the implementation does another, and the next pass widens the mismatch.
InfoQ research discusses enterprise spec-driven development and references related work on specification authoring and deterministic generation.
The root cause is directional. Static specs flow one way: a developer writes requirements, an agent consumes them, and the spec remains unchanged while the codebase evolves. Living specs add a feedback loop. As the InfoQ presentation explains, writing specs and requirements down helps align both agents and humans in AI-native development workflows.
This feedback loop changes the role of the spec. A Spec Kit discussion captures the idea clearly: teams routinely version the source code generated by agents but neglect version control for the specs that produced it, thereby inverting the dependency relationship that matters most. In living spec-driven workflows, specifications become the source of truth, and implementation becomes the compiled output.
Not every change needs to originate in the spec for every tool, but teams that routinely regenerate code from stale specs should expect drift to recur unless implementation decisions are written back.
Intent's living specifications are designed around this exact problem, keeping specs connected to the repository as the codebase changes rather than treating them as one-time inputs.
See how Intent handles spec synchronisation across large repositories.
Free tier available · VS Code extension · Takes 2 minutes
The Four Phases of Bidirectional Spec Updates
Bidirectional spec updates require clear separation between initial intent and implementation, followed by review and refinement to keep both aligned. ThoughtWorks guidance documents this split between design and implementation with a human always in the loop.
The four phases below illustrate how multi-agent code-generation workflows are structured when specs are treated as coordination infrastructure rather than as prompts.
| Phase | Direction | What Happens |
|---|---|---|
| 1. Initial Intent | Developer → Spec | Developer writes high-level requirements; AI expands them into structured specs |
| 2. Implementation | Spec → Agent → Code | Agents read the finalized spec and generate code, tests, and documentation |
| 3. Bidirectional Update | Implementation → Spec | Agents or developers update the spec to reflect what was actually built |
| 4. Continuous Refinement | Production → Spec | Metrics, incidents, and operational learnings feed back into the spec |
Phase 3 is the defining characteristic. Without bidirectional updates, specifications are just elaborate prompts. With bidirectional updates, specifications become a coordination infrastructure that more reliably reflects the current system state over time.
A practical warning from InfoQ: adopting spec-driven workflows without changing how product, architecture, engineering, and QA stakeholders collaborate risks creating layers of outdated documentation that no one maintains.
Anatomy of a Living Spec: Seven Essential Sections
A useful living spec gives agents enough context to implement correctly without turning the document into a brittle script. Based on the AGENTS.md standard, the Osmani guide, and the O'Reilly guide, several sections recur in effective agent-facing specs.
1. Agent role and project overview
The agent role and project overview reduce ambiguity by defining priorities, domain, and stack before implementation begins. That framing helps the agent make tradeoffs that match the repository rather than generic defaults.
2. Key commands
Key commands improve execution reliability because agents repeatedly need exact build, test, lint, and migration syntax. Full commands reduce guesswork and cut avoidable tool errors.
3. Architecture and critical files
Architecture and critical files improve navigation because file:line references point agents to the real control points in the codebase. That reduces exploration overhead and lowers the chance of editing the wrong layer.
4. Code style via examples
Code-style examples improve consistency because a single working snippet shows patterns, error handling, and logging conventions more clearly than prose. That makes the generated code easier to review and merge.
5. Three-tier boundaries
Three-tier boundaries control agent autonomy by separating default-safe actions from changes that require approval or are prohibited outright. That prevents accidental edits to high-risk areas.
6. Implementation status
Implementation status tracking improves coordination because the spec shows what is complete, in progress, or blocked. That visibility helps reviewers and parallel agents work from the same state.
7. Decision log
Decision logs prevent repeated debate because architectural choices and their reasons stay attached to the spec. Future agents and reviewers can then preserve intent rather than rediscover it.
Writing Requirements at the Right Granularity
Requirement granularity determines whether the living specs guide implementation or overwhelms it. If requirements are too vague, agents fill gaps with assumptions. If requirements are overly detailed, agents may ignore the specification or follow it too literally, introducing unnecessary complexity.
The over-specification problem
Over-specification creates unstable agent behavior because detailed instructions can be either ignored or followed too literally, and both outcomes degrade implementation quality. That is why teams need enough structure to constrain risk without dictating every step.
Birgitta Böckeler's hands-on research, published on Martin Fowler's site, examines spec-driven development workflows and concludes that generating exhaustive acceptance criteria for small tasks creates more overhead than it provides in accuracy. Kent Beck's critique, also on Fowler, identifies the philosophical flaw: heavy upfront specification assumes nothing will be learned during implementation, which rarely holds in practice.
The practical implication is straightforward: write enough spec to orient the agent and establish constraints, then iterate. Treat the spec as a hypothesis about what needs to be built, not a complete blueprint.
Declarative outcomes beat imperative instructions
Declarative requirements produce better agent behavior because they describe the desired outcome and constraints instead of prescribing every implementation step. That gives the agent room to apply existing patterns while preserving reviewable success criteria.
The contrast between declarative and imperative approaches is illustrated by this TDS article:
| Approach | Example |
|---|---|
| Imperative (over-specified) | "Import numpy. Define a function called cosine_distance. Convert inputs to numpy arrays. Calculate the dot product. Calculate norms. Return 1 minus the quotient." |
| Declarative (outcome-focused) | Write a short and fast function in Python to compute the cosine distance between two input vectors." |
Osmani emphasises guiding agents with clear problem definitions and success criteria rather than expecting them to work unattended.
Before and after: a complete requirement rewrite
A good requirement rewrite improves implementation accuracy by separating context, constraints, output, and success criteria into fields that agents can act on and reviewers can verify.
Before (under-specified, mixed concerns):
Create a user dashboard that shows analytics. Use Redux for state management. It should load fast. Use Material-UI. Make it look modern.
This mixes functional requirements, technical mandates, unmeasurable performance goals, UI library choices, and subjective design language.
After (properly structured living spec):
Anthropic context docs articulate the governing principle: strive for the minimal set of information that fully outlines expected behavior.
How Multi-Agent Coordinators Use Living Specs
Multi-agent coordinators depend on living specs because parallel agents need a shared record of current intent, task boundaries, and accepted decisions. Once multiple agents work in parallel, the spec becomes operational coordination rather than planning text. Understanding how autonomous agents transform development workflows helps clarify why this coordination layer matters.
Augment Code Intent is one vendor example of this architecture. According to the Intent page, the product is designed to draft specs, break work into tasks, and coordinate specialist agents against a shared plan. Those descriptions are product-stated behavior rather than independently established benchmarks.
A coordinator in this model typically handles four functions:
- Context analysis: Inspect relevant repository structure and dependencies before task assignment
- Specification drafting: Create or refine the working spec from developer intent
- Task decomposition: Break the spec into executable units with handoff points
- Delegation management: Assign work while accounting for inter-task dependencies
Dependency-aware decomposition matters because shared type or schema changes can create avoidable race conditions and merge conflicts if tasks are split without dependency order. One external review from AwesomeAgents.ai described this behavior in a single observed Intent workflow, but broader independent validation remains limited.
According to Intent docs, the product includes six specialist agent personas, each designed for specific types of work:
| Agent | Responsibility |
|---|---|
| Investigate | Explores codebase, assesses feasibility |
| Implement | Executes implementation plans |
| Verify | Checks implementations match specifications |
| Critique | Reviews specs for feasibility |
| Debug | Analyzes and fixes issues |
| Code Review | Automated reviews with severity classification |
Augment documentation says specialists can run in isolated Git worktrees, but it does not specify a mandatory human-approval checkpoint before code generation begins. Readers should treat those points as vendor-described workflow characteristics, not independent proof of performance.
If spec coordination across parallel agents is the bottleneck, Intent's orchestration model is worth understanding.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
Reviewing Agent-Updated Living Specs
Reviewing agent-updated specs requires checking both implementation accuracy and whether the spec still captures the team's real decisions. GitHub's spec-driven development guidance emphasises phases such as Specify, Plan, Tasks, and Implement, with specifications versioned alongside the repository.
Four triggers for spec review
Spec review should happen at predictable transition points because drift usually appears when code, requirements, or data models change faster than documentation. Regular triggers keep the spec close enough to implementation to remain useful.
- After each agent implementation cycle, review incremental changes rather than thousand-line code dumps
- Before transitioning from spec to coding phase: Validate the spec itself before implementation begins
- When agents surface ambiguities or edge cases: These moments indicate gaps requiring clarification
- When data models or requirements change: Trigger spec updates immediately
What to focus on during spec review
Spec review should focus on high-risk mismatches because correctness problems usually hide in architecture, security boundaries, and undocumented decisions rather than syntax. Reviewing those areas first keeps agent-written changes maintainable over repeated regeneration cycles.
The Anthropic guide on building effective agents recommends that engineers review agent outputs and findings to confirm accuracy and refine results, with human oversight remaining an important part of the process. Three areas deserve particular scrutiny:
- Architectural coherence: Consistency across the codebase and alignment with system design
- Security-critical sections: Bright Security advises teams to be stricter around authentication, authorization, and state changes
- Decision log entries: Verify that the architectural choices recorded in the spec actually reflect team intent
Why version-controlled specs reduce drift
Version-controlled specs reduce drift because the same review, diff, and history tools used for code also expose requirement changes over time. That creates institutional memory for both humans and agents.
In Osmani's workflow, commit the spec file to the repo so the agent can use git diff or git blame to understand changes across sessions.
When specs are stored in version control, agents retain memory across sessions. The relationship between context engine and context windows is worth understanding here: Augment Code says its Context Engine analyzes codebases across 400,000+ files through semantic dependency graph analysis, but that vendor-provided boundary should not be treated as general evidence for all teams or repositories.
Eight Antipatterns That Derail Agent Workflows
Agent workflows break when the specification either omits critical constraints or tries to control every implementation detail. Osmani's research frames the core principle: a spec for an AI agent is something teams iterate on, planning, verifying, and refining rather than treating as a one-and-done artifact.
| Antipattern | What Goes Wrong | Fix |
|---|---|---|
| Under-specification | Agents fill gaps with assumptions; no opportunity to ask for clarification in automated workflows | Use structured acceptance criteria with testable requirements |
| Over-specification | Agents may ignore detailed specs or follow them too literally, creating duplicates or unrequested features | Specify outcomes and constraints, not implementation steps |
| Mixed functional/technical concerns | Agents cannot distinguish must-have constraints from suggestions without explicit prioritisation | Use separate sections: functional requirements, technical stack, performance constraints, boundaries |
| Missing context continuity | Agents repeat previously corrected mistakes when conventions are not preserved | Maintain an AGENTS.md file and a project notes file for recurring errors |
| Vague success criteria | Agents have no clear stopping rule, so iteration becomes arbitrary | Use quantifiable, testable criteria such as response time or test coverage requirements |
| Jumping to solutions | Agents implement the described solution rather than the actual problem | Follow Specify → Plan → Tasks → Implement |
| Environmental context blindness | Code works locally but ignores runtime, deployment, or secrets boundaries | Include deployment context, secrets boundaries, and infrastructure constraints |
| Token-insensitive specs | Long, unfocused context can degrade performance and review quality as task complexity grows | Provide targeted context relevant to the specific task |
The over-specification paradox deserves special attention. Böckeler's research on the Fowler article documented agent failure modes in supervised coding sessions, including misdiagnosis of problems, brute-force fixes, and misunderstood requirements. More detail does not guarantee more control. In practice, intent plus constraints produces more stable outcomes than procedures plus exhaustive detail.
Protecting critical decisions without over-specifying
Protected-decision markers preserve architectural constraints by clearly separating non-negotiable choices from implementation details that agents can adapt. That keeps critical security or compliance decisions from being rewritten accidentally.
Living Specs Across the Tool Landscape
The spec-driven development landscape includes a range of tools and workflow styles. The table below shows how agentic and spec-driven coding approaches differ across tools, helping teams choose based on workflow shape rather than marketing labels.
| Tool | Spec Type | Best Fit |
|---|---|---|
| Augment Intent | Vendor-described living specs with multi-agent orchestration | Enterprise brownfield codebases, parallel agent execution |
| AWS Kiro | Specs using EARS notation with human review gates | Formal, compliance-heavy greenfield AWS projects |
| GitHub Spec Kit | Cross-agent spec-driven development toolkit | Teams using multiple AI tools that need tool-agnostic specs |
| Cursor + .cursorrules | Static rules-based configuration | Individual developer productivity, iterative work |
| Claude Code + CLAUDE.md | Static instruction files | Well-defined tasks with active human review |
According to Augment documentation, Intent combines living specs with explicit multi-agent coordination. That matters most for teams that need orchestration across large, existing codebases rather than single-agent assistance, but the specific behavior claims come primarily from vendor materials.
ThoughtWorks describes spec-driven development as an emerging approach to AI-assisted coding workflows, and multiple sources describe AGENTS.md as an emerging open standard for cross-tool interoperability.
Version the Spec Like You Version the Code
Pick one task this week that is large enough to drift but still reviewable in a single pull request: a JWT auth change, a billing workflow fix, or a cross-service refactor. Write the spec in the repo, define 3-5 measurable success criteria, and require implementation updates to the decision log before merge. That process change usually reveals whether the team has a spec problem, a review problem, or a coordination problem.
If the work spans multiple services or parallel agents, a workflow that keeps specs synchronised with implementation matters more than any individual tool choice. Intent is built around this coordination problem.
See how Intent keeps specs and implementation in sync.
Free tier available · VS Code extension · Takes 2 minutes
Frequently Asked Questions about Living Specs for AI Agents
Related Guides
Written by

Molisha Shah
GTM and Customer Champion