Skip to content
Install
Back to Guides

How to Write Living Specs for AI Agent Development

Mar 19, 2026
Molisha Shah
Molisha Shah
How to Write Living Specs for AI Agent Development

Living specs are the most reliable way to guide AI agent development because implementation changes flow back into the specification and prevent spec drift.

TL;DR

AI-generated code drifts quickly when teams keep regenerating it based on stale requirements. Static specs fail because they only move information one way. Living specs reduce that gap by writing implementation decisions back into the spec, keeping requirements, constraints, and code aligned across repeated development cycles.

Engineering teams using AI coding agents quickly run into the same failure mode: the markdown says one thing, the repository says another, and the next-generation cycle amplifies the mismatch. InfoQ research describes this as the "spec gap," a drift problem where gaps in the specification widen with direct code changes and keep resurfacing because AI generation is non-deterministic. That makes static specs fragile in workflows that expect repeated regeneration, review, or multi-step handoffs.

Living specs change the direction of information flow. Instead of treating the spec as a one-time prompt, teams treat it as an evolving artifact that captures requirements, constraints, and implementation decisions. ThoughtWorks, Anthropic, and Addy Osmani all point toward the same operating principle: write enough structure for the agent to act correctly, then update that structure as reality changes.

For teams evaluating orchestration tools, some vendors now describe living specs as workflow infrastructure rather than just documentation. Augment Code's Intent product, for example, uses repository context from its Context Engine to draft and coordinate spec-driven work, and there is some independent validation of its product behavior, though much of it comes from vendor-adjacent rather than fully independent sources. This guide explains what living specs are, how to structure them, where teams overspecify, and how to review agent-updated specs without creating documentation debt.

Why Static Specs Fail AI Agent Workflows

Static specs fail AI agent workflows because they only move information in one direction, which lets implementation drift compound across regeneration cycles. Teams that regenerate code from outdated requirements repeatedly see the same pattern: the specification says one thing, the implementation does another, and the next pass widens the mismatch.

InfoQ research discusses enterprise spec-driven development and references related work on specification authoring and deterministic generation.

The root cause is directional. Static specs flow one way: a developer writes requirements, an agent consumes them, and the spec remains unchanged while the codebase evolves. Living specs add a feedback loop. As the InfoQ presentation explains, writing specs and requirements down helps align both agents and humans in AI-native development workflows.

This feedback loop changes the role of the spec. A Spec Kit discussion captures the idea clearly: teams routinely version the source code generated by agents but neglect version control for the specs that produced it, thereby inverting the dependency relationship that matters most. In living spec-driven workflows, specifications become the source of truth, and implementation becomes the compiled output.

Not every change needs to originate in the spec for every tool, but teams that routinely regenerate code from stale specs should expect drift to recur unless implementation decisions are written back.

Intent's living specifications are designed around this exact problem, keeping specs connected to the repository as the codebase changes rather than treating them as one-time inputs.

See how Intent handles spec synchronisation across large repositories.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

The Four Phases of Bidirectional Spec Updates

Bidirectional spec updates require clear separation between initial intent and implementation, followed by review and refinement to keep both aligned. ThoughtWorks guidance documents this split between design and implementation with a human always in the loop.

The four phases below illustrate how multi-agent code-generation workflows are structured when specs are treated as coordination infrastructure rather than as prompts.

PhaseDirectionWhat Happens
1. Initial IntentDeveloper → SpecDeveloper writes high-level requirements; AI expands them into structured specs
2. ImplementationSpec → Agent → CodeAgents read the finalized spec and generate code, tests, and documentation
3. Bidirectional UpdateImplementation → SpecAgents or developers update the spec to reflect what was actually built
4. Continuous RefinementProduction → SpecMetrics, incidents, and operational learnings feed back into the spec

Phase 3 is the defining characteristic. Without bidirectional updates, specifications are just elaborate prompts. With bidirectional updates, specifications become a coordination infrastructure that more reliably reflects the current system state over time.

A practical warning from InfoQ: adopting spec-driven workflows without changing how product, architecture, engineering, and QA stakeholders collaborate risks creating layers of outdated documentation that no one maintains.

Anatomy of a Living Spec: Seven Essential Sections

A useful living spec gives agents enough context to implement correctly without turning the document into a brittle script. Based on the AGENTS.md standard, the Osmani guide, and the O'Reilly guide, several sections recur in effective agent-facing specs.

1. Agent role and project overview

The agent role and project overview reduce ambiguity by defining priorities, domain, and stack before implementation begins. That framing helps the agent make tradeoffs that match the repository rather than generic defaults.

text
## Agent Role
You are an implementor for a Node.js REST API serving financial transaction data.
Priority order: correctness > security > performance > code elegance.
## Project Overview
Mission: Real-time transaction processing API serving 10k+ merchants.
Stack: Node 20, TypeScript 5.3, PostgreSQL 15, Redis 7, Docker.

2. Key commands

Key commands improve execution reliability because agents repeatedly need exact build, test, lint, and migration syntax. Full commands reduce guesswork and cut avoidable tool errors.

text
## Commands
- Run tests: `npm test`
- Run single test: `npm test -- --grep "auth"`
- Build: `npm run build`
- Lint: `npm run lint`
- Database migrations: `npm run migrate:latest`
- Type check: `npx tsc --noEmit`

3. Architecture and critical files

Architecture and critical files improve navigation because file:line references point agents to the real control points in the codebase. That reduces exploration overhead and lowers the chance of editing the wrong layer.

text
## Critical Files
| What | Where |
|-------------------|------------------------------|
| App entry point | `src/index.ts` |
| Route definitions | `src/routes/index.ts:15` |
| Auth middleware | `src/middleware/auth.ts:42` |
| DB connection | `src/database/connection.ts` |

4. Code style via examples

Code-style examples improve consistency because a single working snippet shows patterns, error handling, and logging conventions more clearly than prose. That makes the generated code easier to review and merge.

text
// TypeScript 5.x
// Behavior: returns { error: "User not found" } and logs structured context on failure.
const result = await fetchUser(id)
if (result.error) {
logger.error("Failed to fetch user", { id, error: result.error })
return { error: "User not found" }
}
return { data: result.data }

5. Three-tier boundaries

Three-tier boundaries control agent autonomy by separating default-safe actions from changes that require approval or are prohibited outright. That prevents accidental edits to high-risk areas.

text
## Boundaries
### ✅ Always
- Write tests before implementation
- Use TypeScript strict mode
- Log errors with structured fields (never plain strings)
### ⚠️ Ask First
- Adding new dependencies
- Changing database schema
- Modifying authentication logic
### 🚫 Never
- Commit credentials or API keys
- Modify `.env.production`
- Push directly to `main`

6. Implementation status

Implementation status tracking improves coordination because the spec shows what is complete, in progress, or blocked. That visibility helps reviewers and parallel agents work from the same state.

text
├─ ✓ Hero Section (completed)
├─ ✓ Feature Sections (completed)
├─ ◐ Redesign Hero (in progress)
├─ ◐ Mobile View (in progress)
└─ ○ Animations (not started)

7. Decision log

Decision logs prevent repeated debate because architectural choices and their reasons stay attached to the spec. Future agents and reviewers can then preserve intent rather than rediscover it.

text
## Decision Log
- 2026-03-15: Using RS256 for token signing (security audit requirement)
- 2026-03-16: Repository pattern for data access (consistency with existing services)

Writing Requirements at the Right Granularity

Requirement granularity determines whether the living specs guide implementation or overwhelms it. If requirements are too vague, agents fill gaps with assumptions. If requirements are overly detailed, agents may ignore the specification or follow it too literally, introducing unnecessary complexity.

The over-specification problem

Over-specification creates unstable agent behavior because detailed instructions can be either ignored or followed too literally, and both outcomes degrade implementation quality. That is why teams need enough structure to constrain risk without dictating every step.

Birgitta Böckeler's hands-on research, published on Martin Fowler's site, examines spec-driven development workflows and concludes that generating exhaustive acceptance criteria for small tasks creates more overhead than it provides in accuracy. Kent Beck's critique, also on Fowler, identifies the philosophical flaw: heavy upfront specification assumes nothing will be learned during implementation, which rarely holds in practice.

The practical implication is straightforward: write enough spec to orient the agent and establish constraints, then iterate. Treat the spec as a hypothesis about what needs to be built, not a complete blueprint.

Declarative outcomes beat imperative instructions

Declarative requirements produce better agent behavior because they describe the desired outcome and constraints instead of prescribing every implementation step. That gives the agent room to apply existing patterns while preserving reviewable success criteria.

The contrast between declarative and imperative approaches is illustrated by this TDS article:

ApproachExample
Imperative (over-specified)"Import numpy. Define a function called cosine_distance. Convert inputs to numpy arrays. Calculate the dot product. Calculate norms. Return 1 minus the quotient."
Declarative (outcome-focused)Write a short and fast function in Python to compute the cosine distance between two input vectors."

Osmani emphasises guiding agents with clear problem definitions and success criteria rather than expecting them to work unattended.

Before and after: a complete requirement rewrite

A good requirement rewrite improves implementation accuracy by separating context, constraints, output, and success criteria into fields that agents can act on and reviewers can verify.

Before (under-specified, mixed concerns):

Create a user dashboard that shows analytics. Use Redux for state management. It should load fast. Use Material-UI. Make it look modern.

This mixes functional requirements, technical mandates, unmeasurable performance goals, UI library choices, and subjective design language.

After (properly structured living spec):

text
## User Authentication
Context: New users cannot access protected endpoints. We need JWT-based auth.
Constraints:
- Use only standard libraries; no external auth services
- Tokens expire in 15 minutes
- Refresh tokens valid for 7 days
- Must work with existing user DB schema
Output Specification:
- POST /auth/login endpoint returning JWT
- Middleware function for route protection
- Unit tests covering happy path + 3 error cases
- Return format: JSON with {token, refreshToken, expiresIn}
Success Criteria:
- All tests pass
- No breaking changes to existing user endpoints
- Response time < 100ms for token validation

Anthropic context docs articulate the governing principle: strive for the minimal set of information that fully outlines expected behavior.

How Multi-Agent Coordinators Use Living Specs

Multi-agent coordinators depend on living specs because parallel agents need a shared record of current intent, task boundaries, and accepted decisions. Once multiple agents work in parallel, the spec becomes operational coordination rather than planning text. Understanding how autonomous agents transform development workflows helps clarify why this coordination layer matters.

Augment Code Intent is one vendor example of this architecture. According to the Intent page, the product is designed to draft specs, break work into tasks, and coordinate specialist agents against a shared plan. Those descriptions are product-stated behavior rather than independently established benchmarks.

A coordinator in this model typically handles four functions:

  • Context analysis: Inspect relevant repository structure and dependencies before task assignment
  • Specification drafting: Create or refine the working spec from developer intent
  • Task decomposition: Break the spec into executable units with handoff points
  • Delegation management: Assign work while accounting for inter-task dependencies

Dependency-aware decomposition matters because shared type or schema changes can create avoidable race conditions and merge conflicts if tasks are split without dependency order. One external review from AwesomeAgents.ai described this behavior in a single observed Intent workflow, but broader independent validation remains limited.

According to Intent docs, the product includes six specialist agent personas, each designed for specific types of work:

AgentResponsibility
InvestigateExplores codebase, assesses feasibility
ImplementExecutes implementation plans
VerifyChecks implementations match specifications
CritiqueReviews specs for feasibility
DebugAnalyzes and fixes issues
Code ReviewAutomated reviews with severity classification

Augment documentation says specialists can run in isolated Git worktrees, but it does not specify a mandatory human-approval checkpoint before code generation begins. Readers should treat those points as vendor-described workflow characteristics, not independent proof of performance.

If spec coordination across parallel agents is the bottleneck, Intent's orchestration model is worth understanding.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline
···
$ cat build.log | auggie --print --quiet \
"Summarize the failure"
Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42
Fix: npm install lodash @types/lodash

Reviewing Agent-Updated Living Specs

Reviewing agent-updated specs requires checking both implementation accuracy and whether the spec still captures the team's real decisions. GitHub's spec-driven development guidance emphasises phases such as Specify, Plan, Tasks, and Implement, with specifications versioned alongside the repository.

Open source
augmentcode/auggie163
Star on GitHub

Four triggers for spec review

Spec review should happen at predictable transition points because drift usually appears when code, requirements, or data models change faster than documentation. Regular triggers keep the spec close enough to implementation to remain useful.

  • After each agent implementation cycle, review incremental changes rather than thousand-line code dumps
  • Before transitioning from spec to coding phase: Validate the spec itself before implementation begins
  • When agents surface ambiguities or edge cases: These moments indicate gaps requiring clarification
  • When data models or requirements change: Trigger spec updates immediately

What to focus on during spec review

Spec review should focus on high-risk mismatches because correctness problems usually hide in architecture, security boundaries, and undocumented decisions rather than syntax. Reviewing those areas first keeps agent-written changes maintainable over repeated regeneration cycles.

The Anthropic guide on building effective agents recommends that engineers review agent outputs and findings to confirm accuracy and refine results, with human oversight remaining an important part of the process. Three areas deserve particular scrutiny:

  • Architectural coherence: Consistency across the codebase and alignment with system design
  • Security-critical sections: Bright Security advises teams to be stricter around authentication, authorization, and state changes
  • Decision log entries: Verify that the architectural choices recorded in the spec actually reflect team intent

Why version-controlled specs reduce drift

Version-controlled specs reduce drift because the same review, diff, and history tools used for code also expose requirement changes over time. That creates institutional memory for both humans and agents.

In Osmani's workflow, commit the spec file to the repo so the agent can use git diff or git blame to understand changes across sessions.

When specs are stored in version control, agents retain memory across sessions. The relationship between context engine and context windows is worth understanding here: Augment Code says its Context Engine analyzes codebases across 400,000+ files through semantic dependency graph analysis, but that vendor-provided boundary should not be treated as general evidence for all teams or repositories.

Eight Antipatterns That Derail Agent Workflows

Agent workflows break when the specification either omits critical constraints or tries to control every implementation detail. Osmani's research frames the core principle: a spec for an AI agent is something teams iterate on, planning, verifying, and refining rather than treating as a one-and-done artifact.

AntipatternWhat Goes WrongFix
Under-specificationAgents fill gaps with assumptions; no opportunity to ask for clarification in automated workflowsUse structured acceptance criteria with testable requirements
Over-specificationAgents may ignore detailed specs or follow them too literally, creating duplicates or unrequested featuresSpecify outcomes and constraints, not implementation steps
Mixed functional/technical concernsAgents cannot distinguish must-have constraints from suggestions without explicit prioritisationUse separate sections: functional requirements, technical stack, performance constraints, boundaries
Missing context continuityAgents repeat previously corrected mistakes when conventions are not preservedMaintain an AGENTS.md file and a project notes file for recurring errors
Vague success criteriaAgents have no clear stopping rule, so iteration becomes arbitraryUse quantifiable, testable criteria such as response time or test coverage requirements
Jumping to solutionsAgents implement the described solution rather than the actual problemFollow Specify → Plan → Tasks → Implement
Environmental context blindnessCode works locally but ignores runtime, deployment, or secrets boundariesInclude deployment context, secrets boundaries, and infrastructure constraints
Token-insensitive specsLong, unfocused context can degrade performance and review quality as task complexity growsProvide targeted context relevant to the specific task

The over-specification paradox deserves special attention. Böckeler's research on the Fowler article documented agent failure modes in supervised coding sessions, including misdiagnosis of problems, brute-force fixes, and misunderstood requirements. More detail does not guarantee more control. In practice, intent plus constraints produces more stable outcomes than procedures plus exhaustive detail.

Protecting critical decisions without over-specifying

Protected-decision markers preserve architectural constraints by clearly separating non-negotiable choices from implementation details that agents can adapt. That keeps critical security or compliance decisions from being rewritten accidentally.

text
<!-- BEGIN USER-SPECIFIED -->
Authentication Design Decision:
We use JWT tokens with 15-minute expiration and refresh token rotation.
DO NOT change this to session-based auth or increase token duration.
Rationale: Security audit requirement from 2026-01-15.
<!-- END USER-SPECIFIED -->

Living Specs Across the Tool Landscape

The spec-driven development landscape includes a range of tools and workflow styles. The table below shows how agentic and spec-driven coding approaches differ across tools, helping teams choose based on workflow shape rather than marketing labels.

ToolSpec TypeBest Fit
Augment IntentVendor-described living specs with multi-agent orchestrationEnterprise brownfield codebases, parallel agent execution
AWS KiroSpecs using EARS notation with human review gatesFormal, compliance-heavy greenfield AWS projects
GitHub Spec KitCross-agent spec-driven development toolkitTeams using multiple AI tools that need tool-agnostic specs
Cursor + .cursorrulesStatic rules-based configurationIndividual developer productivity, iterative work
Claude Code + CLAUDE.mdStatic instruction filesWell-defined tasks with active human review

According to Augment documentation, Intent combines living specs with explicit multi-agent coordination. That matters most for teams that need orchestration across large, existing codebases rather than single-agent assistance, but the specific behavior claims come primarily from vendor materials.

ThoughtWorks describes spec-driven development as an emerging approach to AI-assisted coding workflows, and multiple sources describe AGENTS.md as an emerging open standard for cross-tool interoperability.

Version the Spec Like You Version the Code

Pick one task this week that is large enough to drift but still reviewable in a single pull request: a JWT auth change, a billing workflow fix, or a cross-service refactor. Write the spec in the repo, define 3-5 measurable success criteria, and require implementation updates to the decision log before merge. That process change usually reveals whether the team has a spec problem, a review problem, or a coordination problem.

If the work spans multiple services or parallel agents, a workflow that keeps specs synchronised with implementation matters more than any individual tool choice. Intent is built around this coordination problem.

See how Intent keeps specs and implementation in sync.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

Frequently Asked Questions about Living Specs for AI Agents

Written by

Molisha Shah

Molisha Shah

GTM and Customer Champion


Get Started

Give your codebase the agents it deserves

Install Augment to get started. Works with codebases of any size, from side projects to enterprise monorepos.