What makes a spec "living" versus "static"?

A living spec stays aligned with implementation because changes discovered during coding are written back into the spec. Static specs are one-time inputs that do not change unless a person updates them manually.

How detailed should a living spec be for AI agents?

Living specs should be detailed enough to define outcomes, constraints, and boundaries. Böckeler's research showed that generating 16 acceptance criteria for a small bug fix was excessive. The right level is sufficient to orient the agent and establish constraints, while leaving room to iterate.

Can living specs work with multiple AI coding tools on the same team?

Yes, if the team uses a portable format and a shared review process. GitHub Spec Kit supports cross-tool workflows, while AGENTS.md is described as an open format for guiding coding agents across tools.

How do teams prevent spec drift when agents update specifications?

Teams prevent drift by versioning specs with code and reviewing spec changes after each implementation cycle. Build checks, test failures, and explicit approval gates catch mismatches early.

Should the entire spec be written before any implementation begins?

No, not in most real workflows. Kent Beck's critique argues that heavy upfront specification assumes that implementation teaches nothing new, which does not align with how software projects evolve.

How to Write Living Specs for AI Agent Development

Living specs are the most reliable way to guide AI agent development because implementation changes flow back into the specification and prevent spec drift.

TL;DR

AI-generated code drifts quickly when teams keep regenerating it based on stale requirements. Static specs fail because they only move information one way. Living specs reduce that gap by writing implementation decisions back into the spec, keeping requirements, constraints, and code aligned across repeated development cycles.

From Static Specs to Living Specs

Engineering teams using AI coding agents quickly run into the same failure mode: the markdown says one thing, the repository says another, and the next regeneration cycle amplifies the mismatch. InfoQ research frames the underlying problem: a code issue reflects a gap in the specification, and because AI generation is non-deterministic, that gap keeps resurfacing as code is regenerated. That makes static specs fragile in workflows that expect repeated regeneration, review, or multi-step handoffs.

Living specs change the direction of information flow. Instead of treating the spec as a one-time prompt, teams treat it as an evolving artifact that captures requirements, constraints, and implementation decisions. ThoughtWorks, Anthropic, and Addy Osmani each emphasize a similar operating principle: write enough structure for the agent to act correctly, then update it as reality changes.

For teams evaluating orchestration tools, some vendors now describe living specs as workflow infrastructure rather than just documentation. Augment Cosmos, Augment Code's unified cloud agents platform with shared context and memory that compounds across the team and the software development lifecycle, is one example. Cosmos uses repository context from its Context Engine to coordinate spec-driven work across agents, with a human checkpoint to review the spec before agents execute. This guide explains what living specs are, how to structure them, where teams overspecify, and how to review agent-updated specs without creating documentation debt.

Why Static Specs Fail AI Agent Workflows

Static specs fail AI agent workflows because they only move information in one direction, which lets implementation drift compound across regeneration cycles.

The root cause is directional. Static specs flow one way: a developer writes requirements, an agent consumes them, and the spec remains unchanged while the codebase evolves. Living specs add a feedback loop. As the InfoQ presentation explains, writing specs and requirements down aligns both agents and humans in AI-native development workflows.

This feedback loop changes the role of the spec. A Spec Kit discussion captures the idea: teams routinely version the source code generated by agents but neglect version control for the specs that produced it, thereby inverting the dependency relationship that matters most. In living spec-driven workflows, specifications become the source of truth, and implementation becomes the compiled output.

Not every change needs to originate in the spec for every tool, but teams that routinely regenerate code from stale specs should expect drift to recur unless implementation decisions are written back.

Cosmos is built around this exact problem. It keeps the spec and shared context connected to the repository as the codebase changes, rather than treating them as one-time inputs.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT

The Four Phases of Bidirectional Spec Updates

Bidirectional spec updates require clear separation between initial intent and implementation, followed by review and refinement to keep both aligned. ThoughtWorks guidance documents this split between design and implementation with a human always in the loop.

The four phases below illustrate how multi-agent code-generation workflows are structured when specs are treated as coordination infrastructure rather than as prompts.

Phase	Direction	What Happens
1. Initial Intent	Developer → Spec	Developer writes high-level requirements; AI expands them into structured specs
2. Implementation	Spec → Agent → Code	Agents read the finalized spec and generate code, tests, and documentation
3. Bidirectional Update	Implementation → Spec	Agents or developers update the spec to reflect what was actually built
4. Continuous Refinement	Production → Spec	Metrics, incidents, and operational learnings feed back into the spec

Phase 3 is the defining characteristic. Without it, specifications are just elaborate prompts; with it, they become coordination infrastructure that more reliably reflects the current system state over time.

A practical warning from InfoQ: adopting spec-driven workflows without changing how product, architecture, engineering, and QA stakeholders collaborate risks creating layers of outdated documentation that no one maintains.

Anatomy of a Living Spec: Seven Essential Sections

A useful living spec gives agents enough context to implement correctly without turning the document into a brittle script. Based on the AGENTS.md standard, the Osmani guide, and the O'Reilly guide, several sections recur in effective agent-facing specs.

1. Agent role and project overview

The agent role and project overview reduce ambiguity by defining priorities, domain, and stack before implementation begins. That framing lets the agent make tradeoffs that match the repository rather than generic defaults.

text

## Agent Role
You are an implementor for a Node.js REST API serving financial transaction data.
Priority order: correctness > security > performance > code elegance.

## Project Overview
Mission: Real-time transaction processing API serving 10k+ merchants.
Stack: Node 20, TypeScript 5.3, PostgreSQL 15, Redis 7, Docker.

2. Key commands

Key commands improve execution reliability because agents repeatedly need exact build, test, lint, and migration syntax. Full commands reduce guesswork and cut avoidable tool errors.

text

## Commands
- Run tests: `npm test`
- Run single test: `npm test -- --grep "auth"`
- Build: `npm run build`
- Lint: `npm run lint`
- Database migrations: `npm run migrate:latest`
- Type check: `npx tsc --noEmit`

3. Architecture and critical files

Architecture and critical files improve navigation because file:line references point agents to the real control points in the codebase. That reduces exploration overhead and lowers the chance of editing the wrong layer.

text

## Critical Files
| What              | Where                        |
|-------------------|------------------------------|
| App entry point   | `src/index.ts`               |
| Route definitions | `src/routes/index.ts:15`     |
| Auth middleware   | `src/middleware/auth.ts:42`  |
| DB connection     | `src/database/connection.ts` |

4. Code style via examples

Code-style examples improve consistency because a single working snippet shows patterns, error handling, and logging conventions more clearly than prose. That makes the generated code easier to review and merge.

text

// TypeScript 5.x
// Behavior: returns { error: "User not found" } and logs structured context on failure.
const result = await fetchUser(id)
if (result.error) {
  logger.error("Failed to fetch user", { id, error: result.error })
  return { error: "User not found" }
}

return { data: result.data }

5. Three-tier boundaries

Three-tier boundaries control agent autonomy by separating default-safe actions from changes that require approval or are prohibited outright. That prevents accidental edits to high-risk areas.

text

## Boundaries
### ✅ Always
- Write tests before implementation
- Use TypeScript strict mode
- Log errors with structured fields (never plain strings)

### ⚠️ Ask First
- Adding new dependencies
- Changing database schema
- Modifying authentication logic

### 🚫 Never
- Commit credentials or API keys
- Modify `.env.production`
- Push directly to `main`

6. Implementation status

Implementation status tracking improves coordination because the spec shows what is complete, in progress, or blocked. That visibility keeps reviewers and parallel agents working from the same state.

text

├─ ✓ Hero Section (completed)
├─ ✓ Feature Sections (completed)
├─ ◐ Redesign Hero (in progress)
├─ ◐ Mobile View (in progress)
└─ ○ Animations (not started)

7. Decision log

Decision logs prevent repeated debate because architectural choices and their reasons stay attached to the spec. Future agents and reviewers can then preserve intent rather than rediscover it.

text

## Decision Log
- 2026-03-15: Using RS256 for token signing (security audit requirement)
- 2026-03-16: Repository pattern for data access (consistency with existing services)

Writing Requirements at the Right Granularity

Requirement granularity determines whether a living spec guides implementation or overwhelms it.

The over-specification problem

Over-specification creates unstable agent behavior because detailed instructions can be either ignored or followed too literally, and both outcomes degrade implementation quality. That is why teams need enough structure to constrain risk without dictating every step.

Birgitta Böckeler's hands-on research, published on Martin Fowler's site, examines spec-driven development workflows and concludes that generating exhaustive acceptance criteria for small tasks adds more overhead than accuracy. Kent Beck's critique, also on Fowler, identifies the philosophical flaw: heavy upfront specification assumes nothing will be learned during implementation, which rarely holds in practice.

The practical implication: write enough spec to orient the agent and establish constraints, then iterate. Treat the spec as a hypothesis about what needs to be built rather than a finished blueprint.

Declarative outcomes beat imperative instructions

Declarative requirements produce better agent behavior because they describe the desired outcome and constraints instead of prescribing every implementation step. That gives the agent room to apply existing patterns while preserving reviewable success criteria.

This TDS article illustrates the contrast between declarative and imperative approaches:

Approach	Example
Imperative (over-specified)	"Import numpy. Define a function called cosine_distance. Convert inputs to numpy arrays. Calculate the dot product. Calculate norms. Return 1 minus the quotient."
Declarative (outcome-focused)	"Write a short and fast function in Python to compute the cosine distance between two input vectors."

Osmani emphasizes guiding agents with clear problem definitions and success criteria rather than expecting them to work unattended.

Before and after: a complete requirement rewrite

A good requirement rewrite improves implementation accuracy by separating context, constraints, output, and success criteria into fields that agents can act on and reviewers can verify.

Before (under-specified, mixed concerns):

text

Create a user dashboard that shows analytics. Use Redux for state management. It should load fast. Use Material-UI. Make it look modern.

This mixes functional requirements, technical mandates, unmeasurable performance goals, UI library choices, and subjective design language.

After (properly structured living spec):

text

## User Authentication
Context: New users cannot access protected endpoints. We need JWT-based auth.

Constraints:
- Use only standard libraries; no external auth services
- Tokens expire in 15 minutes
- Refresh tokens valid for 7 days
- Must work with existing user DB schema

Output Specification:
- POST /auth/login endpoint returning JWT
- Middleware function for route protection
- Unit tests covering happy path + 3 error cases
- Return format: JSON with {token, refreshToken, expiresIn}

Success Criteria:
- All tests pass
- No breaking changes to existing user endpoints
- Response time < 100ms for token validation

Anthropic context docs articulate the governing principle: strive for the minimal set of information that fully outlines expected behavior.

How Multi-Agent Coordinators Use Living Specs

Multi-agent coordinators depend on living specs because parallel agents need a shared record of current intent, task boundaries, and accepted decisions. Once multiple agents work in parallel, the spec becomes operational coordination rather than planning text. Understanding how autonomous agents transform development workflows clarifies why this coordination layer matters.

Augment Cosmos is built on this architecture. Cosmos coordinates agents against shared context and memory, with humans steering at a spec-and-intent-review checkpoint before agents execute.

A coordinator in this model typically handles four functions:

Context analysis: Inspect relevant repository structure and dependencies before task assignment
Specification drafting: Create or refine the working spec from developer intent
Task decomposition: Break the spec into executable units with handoff points
Delegation management: Assign work while accounting for inter-task dependencies

Dependency-aware decomposition matters because shared type or schema changes can create avoidable race conditions and merge conflicts if tasks are split without dependency order. Coordinating that ordering across parallel agents is part of what Cosmos is designed to handle.

Cosmos runs a default coordination model: a coordinator drafts the spec and delegates, implementor agents execute in parallel, and a verifier checks results against the spec. It also ships with reusable reference experts for common work:

Reference Expert	Responsibility
Deep Code Review	Context-aware review tuned for recall, to catch every bug possible
PR Author	Implements changes to a merge-ready state
E2E Testing	Validates against real infrastructure
Incident Response	Triages and resolves incidents

Cosmos keeps a human in the loop by design: teams set the policies for where human judgment is required, and specs return for review before agents independently write, test, and review code.

If keeping parallel agents aligned to one spec is the bottleneck, Cosmos's model is built for exactly that.

Reviewing Agent-Updated Living Specs

Reviewing agent-updated specs requires checking both implementation accuracy and whether the spec still captures the team's real decisions. GitHub's spec-driven development guidance emphasizes phases such as Specify, Plan, Tasks, and Implement, with specifications versioned alongside the repository.

Four triggers for spec review

Spec review should happen at predictable transition points because drift usually appears when code, requirements, or data models change faster than documentation. Regular triggers keep the spec close enough to implementation to remain useful.

Open source

augmentcode/review-pr★38

Star on GitHub

After each agent implementation cycle, review incremental changes rather than thousand-line code dumps
Before transitioning from spec to coding phase: Validate the spec itself before implementation begins
When agents surface ambiguities or edge cases: These moments indicate gaps requiring clarification
When data models or requirements change: Trigger spec updates immediately

What to focus on during spec review

Spec review should focus on high-risk mismatches because correctness problems usually hide in architecture, security boundaries, and undocumented decisions rather than syntax. Reviewing those areas first keeps agent-written changes maintainable over repeated regeneration cycles.

The Anthropic guide on building effective agents recommends that engineers review agent outputs and findings to confirm accuracy and refine results, with human oversight remaining an important part of the process. Three areas deserve particular scrutiny:

Architectural coherence: Consistency across the codebase and alignment with system design
Security-critical sections: Bright Security advises teams to be stricter around authentication, authorization, and state changes
Decision log entries: Verify that the architectural choices recorded in the spec reflect team intent

Why version-controlled specs reduce drift

Version-controlled specs reduce drift because the same review, diff, and history tools used for code also expose requirement changes over time. That creates institutional memory for both humans and agents.

In Osmani's workflow, commit the spec file to the repo so the agent can use git diff or git blame to understand changes across sessions.

When specs are stored in version control, agents retain memory across sessions. This is where Augment Code's Context Engine matters: it analyzes codebases across 400,000+ files through semantic dependency graph analysis. That keeps specs anchored to the real structure of the codebase rather than to a single model's context window.

Eight Antipatterns That Derail Agent Workflows

Agent workflows break when the specification either omits critical constraints or tries to control every implementation detail. Osmani's research frames the core principle: teams plan, verify, and refine a spec for an AI agent rather than treat it as a one-and-done artifact.

Antipattern	What Goes Wrong	Fix
Under-specification	Agents fill gaps with assumptions; no opportunity to ask for clarification in automated workflows	Use structured acceptance criteria with testable requirements
Over-specification	Agents may ignore detailed specs or follow them too literally, creating duplicates or unrequested features	Specify outcomes and constraints, not implementation steps
Mixed functional/technical concerns	Agents cannot distinguish must-have constraints from suggestions without explicit prioritisation	Use separate sections: functional requirements, technical stack, performance constraints, boundaries
Missing context continuity	Agents repeat previously corrected mistakes when conventions are not preserved	Maintain an AGENTS.md file and a project notes file for recurring errors
Vague success criteria	Agents have no clear stopping rule, so iteration becomes arbitrary	Use quantifiable, testable criteria such as response time or test coverage requirements
Jumping to solutions	Agents implement the described solution rather than the actual problem	Follow Specify → Plan → Tasks → Implement
Environmental context blindness	Code works locally but ignores runtime, deployment, or secrets boundaries	Include deployment context, secrets boundaries, and infrastructure constraints
Token-insensitive specs	Long, unfocused context can degrade performance and review quality as task complexity grows	Provide targeted context relevant to the specific task

Böckeler's research on the Fowler article documented agent failure modes in supervised coding sessions, including misdiagnosis of problems, brute-force fixes, and misunderstood requirements. In practice, intent plus constraints produces more stable outcomes than procedures plus exhaustive detail.

Protecting critical decisions without over-specifying

Protected-decision markers preserve architectural constraints by separating non-negotiable choices from implementation details that agents can adapt. That keeps critical security or compliance decisions from being rewritten accidentally.

text

<!-- BEGIN USER-SPECIFIED -->
Authentication Design Decision:
We use JWT tokens with 15-minute expiration and refresh token rotation.
DO NOT change this to session-based auth or increase token duration.
Rationale: Security audit requirement from 2026-01-15.
<!-- END USER-SPECIFIED -->

Living Specs Across the Tool Landscape

The spec-driven development landscape includes a range of tools and workflow styles. The table below shows how agentic and spec-driven coding approaches differ across tools, so teams can choose based on workflow shape rather than marketing labels.

Tool	Spec Type	Best Fit
Augment Cosmos	Unified cloud agents platform; coordinated agents with shared context and a spec/intent-review checkpoint	Enterprise codebases, parallel agent execution at scale
AWS Kiro	Specs using EARS notation with human review gates	Formal, compliance-heavy greenfield AWS projects
GitHub Spec Kit	Cross-agent spec-driven development toolkit	Teams using multiple AI tools that need tool-agnostic specs
Cursor + .cursorrules	Static rules-based configuration	Individual developer productivity, iterative work
Claude Code + CLAUDE.md	Static instruction files	Well-defined tasks with active human review

Cosmos combines shared context and memory with explicit multi-agent coordination. That matters most for teams that need orchestration across large, existing codebases rather than single-agent assistance.

ThoughtWorks describes spec-driven development as an emerging approach to AI-assisted coding workflows, and multiple sources describe AGENTS.md as an emerging open standard for cross-tool interoperability.

Version The Spec Like You Version The Code

Pick one task this week that is large enough to drift but still reviewable in a single pull request: a JWT auth change, a billing workflow fix, or a cross-service refactor. Write the spec in the repo, define 3-5 measurable success criteria, and require implementation updates to the decision log before merge. That process change usually reveals whether the team has a spec problem, a review problem, or a coordination problem.

If the work spans multiple services or parallel agents, a workflow that keeps specs aligned with implementation matters more than any individual tool choice. Cosmos is built around this coordination problem.

How to Write Living Specs for AI Agent Development

TL;DR

From Static Specs to Living Specs

Why Static Specs Fail AI Agent Workflows

The New Code Review Workflow for AI-Native Engineering Teams

The Four Phases of Bidirectional Spec Updates

Anatomy of a Living Spec: Seven Essential Sections

1. Agent role and project overview

2. Key commands

3. Architecture and critical files

4. Code style via examples

5. Three-tier boundaries

6. Implementation status

7. Decision log

Writing Requirements at the Right Granularity

The over-specification problem

Declarative outcomes beat imperative instructions

Before and after: a complete requirement rewrite

How Multi-Agent Coordinators Use Living Specs

Reviewing Agent-Updated Living Specs

Four triggers for spec review

What to focus on during spec review

Why version-controlled specs reduce drift

Eight Antipatterns That Derail Agent Workflows

Protecting critical decisions without over-specifying

Living Specs Across the Tool Landscape

Version The Spec Like You Version The Code

FAQs about Living Specs for AI Agents

Written by

Molisha Shah

Give your codebase the agents it deserves

TL;DR

From Static Specs to Living Specs

Why Static Specs Fail AI Agent Workflows

The New Code Review Workflow for AI-Native Engineering Teams

The Four Phases of Bidirectional Spec Updates

Anatomy of a Living Spec: Seven Essential Sections

1. Agent role and project overview

2. Key commands

3. Architecture and critical files

4. Code style via examples

5. Three-tier boundaries

6. Implementation status

7. Decision log

Writing Requirements at the Right Granularity

The over-specification problem

Declarative outcomes beat imperative instructions

Before and after: a complete requirement rewrite

How Multi-Agent Coordinators Use Living Specs

Reviewing Agent-Updated Living Specs

Four triggers for spec review

What to focus on during spec review

Why version-controlled specs reduce drift

Eight Antipatterns That Derail Agent Workflows

Protecting critical decisions without over-specifying

Living Specs Across the Tool Landscape

Version The Spec Like You Version The Code

FAQs about Living Specs for AI Agents

What makes a spec "living" versus "static"?

How detailed should a living spec be for AI agents?

Can living specs work with multiple AI coding tools on the same team?

How do teams prevent spec drift when agents update specifications?

Should the entire spec be written before any implementation begins?

Related

Written by

Molisha Shah

Give your codebase the agents it deserves