Living specs generally produce more reliable agent output than static specs on multi-step tasks with changing requirements or dependencies because bidirectional synchronization keeps the specification aligned with the code as implementation evolves.
TL;DR
Static specs give agents a strong start, but they drift as code and dependencies change. A study of failed AI-generated PRs examined rejection patterns across 600 rejected pull requests and found that alignment loss during execution caused more failures than incorrect task descriptions. Living specs are most useful on longer, higher-volatility tasks because they reduce execution-time misalignment by keeping spec and code synchronized.
Why Spec Lifecycle Matters for Agent Output
An IEEE review on detecting and managing documentation drift confirms that outdated or inaccurate documentation hinders effective development and that robust synchronization between code and documentation remains an unsolved challenge. That challenge becomes sharper when AI coding agents, not humans, are making many implementation decisions in sequence: each step can compound a small misalignment into a large one.
The industry has responded with two lifecycle models. Static specs, exemplified by Amazon's Kiro, generate requirements, design, and task documents upfront, with documented checkpoints or review steps between phases before implementation proceeds. Living specs keep the specification current as work progresses, so later agent steps inherit updated intent rather than an obsolete plan. Intent operationalizes that persistent context for multi-session agent work by treating the spec as a living artifact that updates as agents complete tasks.
This guide examines where static specs work, where they fail, and why living specs become more valuable as task duration, ambiguity, and dependency volatility increase; it also covers where living specs introduce overhead that may not be worth paying. Thoughtworks' Technology Radar places spec-driven development in the Assess ring as of Volume 33 (November 2025), noting that the workflows remain elaborate and opinionated, a caution this guide returns to after examining the evidence.
Intent keeps specs aligned with code as agents work, using living specs and multi-agent orchestration to reduce drift across multi-file tasks.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
What Static Specs Are: Kiro's Upfront Blueprint Model
Static specifications are formal documents created before implementation and not automatically updated when code diverges from the plan. Amazon's Kiro is a current example: it turns a natural language prompt into structured requirements (using EARS syntax), a design document, and a task list. User review gates sit between each phase. AWS describes the value in a Connect blog post as clear traceability from requirements through implementation, but the traceability is primarily one-way: spec to code, not code back to spec.
Birgitta Böckeler's tool analysis classifies Kiro as Level 1: Spec-First, where a spec is written upfront for the task at hand but may not persist as the long-term source of truth.
The model works well when its assumptions hold: requirements are understood upfront, dependencies are pinned, and the task finishes before anything changes. The first thing that breaks is usually dependency alignment. The spec says "use library X," but by the time the agent reaches step 4, a new version has shipped, a downstream service has changed its contract, or a constraint surfaced during implementation invalidates part of the design. At that point, the spec is wrong but still authoritative, so the agent keeps following it. Kiro allows developers to manually request spec updates after code changes, but does not automatically reconcile specs during execution: the developer must recognize when drift has occurred and trigger the refresh.
What Living Specs Are: Bidirectional Sync During Execution
Living specifications are structured specs that update as implementation changes, so later agent steps operate from the current state rather than the original plan. In Böckeler's framework, this aligns with Level 2: Spec-Anchored and, in some systems, may approach a spec-as-source model.
The core mechanism is bidirectional synchronization: the spec informs code, and code changes flow back to update the spec. When an agent discovers an API has changed, a database field is missing, or a constraint conflicts with an upstream service, the system reconciles the spec against what was actually built before the next step begins. Without that reverse flow, a static spec stays frozen unless a human manually updates it.
Research on trustworthy AI-augmented engineering discusses end-to-end frameworks and design principles for maintaining alignment between agent output and developer intent. Martin Fowler's team identifies the gap in work on context anchoring: code shows what was built, but not the rejected options, accepted tradeoffs, or unresolved constraints. Once that reasoning disappears, later agent steps have to infer intent from artifacts instead of reading it directly. Spec by Example proved that executable checks catch forward drift; what it could not do was update the spec when implementation forced a design change. Updating the spec automatically is what bidirectional sync adds.
In a Spec Kit discussion, developers noted that agents forced to rediscover intent from production code burn context on reverse engineering. A structured, current spec carries the same intent at higher signal density.
Intent is a production example of this approach. Its workflow maintains a living spec as agents execute tasks, routing context through a coordinator agent that plans and reconciles work, specialist agents that execute in parallel, and a verifier that checks results against spec constraints. From the developer's perspective, reconciliation surfaces as spec updates visible in the agent workspace: the developer can see what changed, review the updated intent, and intervene before the next agent step proceeds. Review happens at reconciliation points rather than only at the end. Augment Code's Context Engine processes codebases across 400,000+ files, which matters when agents must reconcile intent against multi-file implementation reality rather than a single prompt. That architectural context allows agents to reuse current intent rather than re-read stale artifacts. It also means reconciliation considers cross-service dependencies rather than only the files a single agent happened to touch.
The Spec Drift Problem: Why Static Specs Degrade Over Time
Specification drift is the growing gap between what the spec says and what the code actually does. Research and practitioner analysis point to three recurring causes.
Structural causes of drift
Training data lag creates version mismatch. A Testkube analysis illustrates the broader pattern: agents often make incorrect assumptions about libraries, infrastructure, or APIs because their training data predates the version actually running in production.
Non-determinism creates divergence even when the spec stays constant. Research on AI-augmented engineering confirms that identical prompts can yield different code. This undermines requirement-to-code consistency even when the spec is technically correct.
Path of least resistance makes manual spec maintenance unreliable. InfoQ's treatment of spec-driven development emphasizes that specification authoring is part of implementation and should be treated with the same rigor as source code. Without an enforcement mechanism, drift is the default outcome.
Why AI agents amplify drift
AI systems compound the problem because change no longer comes from one developer working linearly. In spec-driven workflows, changes flow through specifications from features, bug fixes, and refactoring simultaneously. That multiplicity turns divergence from an exception into a normal operating condition.
The strongest quantitative evidence comes from an empirical study of 33,000 agent-authored PRs across GitHub. The researchers qualitatively analyzed 600 rejected PRs (562 accessible after excluding deleted or archived cases) and built a four-level taxonomy of rejection patterns:
| Rejection category | PR count | Share | What it means |
|---|---|---|---|
| Reviewer-level abandonment | 228 | 38% | PRs closed without meaningful human engagement |
| Pull request-level issues | 188 | 31% | Duplicates, unwanted features, wrong branch |
| Code-level failures | 133 | 22% | CI/test failures (99), incorrect implementation (19), incomplete implementation (15) |
| Agentic failures | 13 | 2% | Licensing violations, misalignment with reviewer instructions |
Code-level failures are the most relevant category for spec design. The largest sub-category is CI/test failure (99 PRs), where automated builds or tests broke because of the submitted changes. Incorrect implementations (19 PRs) and incomplete implementations (15 PRs) round out the category: cases where the agent understood the task but produced wrong or partial code. Agentic failures (13 PRs) represent a different problem, where the agent's behavior diverged from reviewer expectations or project norms. Both code-level and agentic failures represent alignment loss during execution, not bad task descriptions.
This study does not prove that every living-spec workflow outperforms every static-spec workflow. It does establish a boundary condition: when an agent receives a correct task description and still produces code that fails CI, breaks tests, or implements the wrong thing, alignment loss during execution is the primary failure mode. At 22% of all rejections, code-level failures are not a marginal problem. Living specs target exactly this failure mode.
Thoughtworks' Technology Radar adds an important caution here: even within spec-driven development, teams risk reverting to traditional antipatterns like heavy upfront specification and big-bang releases. If spec reconciliation adds a 30-second wait between agent steps on a task that should take 10 minutes, the overhead is undermining the workflow it is supposed to support. The same applies if every minor code change triggers a spec review that interrupts the developer's flow. The drift problem is real, but the solution carries its own overhead.
Two Drift Scenarios That Break Agent Output
The following scenarios illustrate how spec drift manifests in practice and how each spec model responds differently.
Scenario 1: API change mid-implementation
A Testkube blog post describes AI-generated Terraform targeting AWS provider v4.x while the platform runs provider v5.x with breaking changes. Here is how that plays out step by step:
- Spec created: The requirements specify provisioning an S3 bucket with server-side encryption using the project's Terraform configuration.
- Agent starts implementation: The agent generates a
resource "aws_s3_bucket"block withserver_side_encryption_configurationinline, following the v4.x pattern its training data reflects. - Divergence appears: In AWS provider v5.x,
server_side_encryption_configurationis a separate resource (aws_s3_bucket_server_side_encryption_configuration), not an inline block. The generated code is syntactically valid Terraform but architecturally wrong for the provider version the project uses. - Static spec response: The spec still says "provision S3 with encryption." Nothing in the spec reflects the provider version constraint. Syntax validation passes. The error surfaces at
terraform planor deployment, after the agent has already built dependent resources on top of the wrong structure. - Living spec response: When the agent reads the project's
.terraform.lock.hcland provider configuration, it detects the v5.x provider. The system updates the spec to note the provider version constraint before the next step. Subsequent resources reference the correct separate-resource pattern instead of compounding the v4.x assumption across the configuration.
The benefit is bounded: it is strongest when dependency checks happen during implementation rather than only at CI or deploy time. On a multi-resource Terraform configuration, catching the mismatch at step 3 instead of step 10 avoids cascading rework.
Scenario 2: Wrong implementation despite a correct task
The empirical study of failed agentic PRs documents cases where the task description is accurate, yet the implementation still solves the wrong problem. This pattern looks like:
- Task assigned: "Add rate limiting to the /api/orders endpoint."
- Agent implements: The agent adds rate limiting at the API gateway level, applying it globally to all routes.
- Divergence: The project's architecture applies rate limiting per-service, not at the gateway. The existing rate-limiting pattern lives in middleware, and the gateway is intentionally thin. The agent's implementation is functionally correct (it rate-limits orders) but architecturally wrong (it changes the gateway contract for every service).
- Static spec response: The spec says "add rate limiting to /api/orders." That is exactly what the agent did. The architectural mismatch only surfaces during code review or when another service's traffic patterns change unexpectedly.
- Living spec response: If the system reconciles the spec against existing middleware patterns and gateway configuration, it updates the spec to specify per-service middleware rate limiting before the agent writes code. The resolved architectural decision carries forward to subsequent tasks involving other endpoints.
Intent's workflow addresses this directly: the spec persists as shared state across tasks, so resolved decisions carry forward instead of being lost between sessions. For teams comparing this to vibe coding, the key difference is whether intent survives the next handoff.
| Drift scenario | Root cause | Static spec: when is it caught? | Living spec: when is it caught? |
|---|---|---|---|
| API version mismatch | Dependency reality changed | At terraform plan, deploy, or code review (after dependent resources are built) | During implementation, before dependent steps compound the error |
| Correct task, wrong architecture | Alignment lost during execution | At code review or production incident | During spec reconciliation, before the agent writes code |
| Non-deterministic agent behavior | Same prompt, different code | Not caught until manual comparison across runs | Persistent spec constrains later runs, reducing variance |
See how Intent's persistent spec context and coordinated agents handle multi-file tasks.
Free tier available · VS Code extension · Takes 2 minutes
Where Living Specs Add Cost or Risk
Living specs are not free. Continuous reconciliation introduces overhead and failure modes that teams should evaluate before adopting them.
Reconciliation can introduce errors. When an agent updates the spec based on implementation changes, it can silently drop a requirement, weaken a constraint, or misinterpret a design tradeoff. The spec mutation itself becomes a source of bugs. These errors are most likely when reconciliation crosses domain boundaries (the agent understands the code change but misreads its business implications) and least likely on mechanical changes like dependency version bumps or schema additions where the constraint is explicit. Teams using living-spec workflows need review checkpoints on spec changes, not just code changes. Intent's verifier agent checks results against spec constraints, but no automated system catches every semantic drift in the spec itself.
Overhead scales with task frequency. Each reconciliation cycle costs compute, latency, and token usage. For a three-step task on a small repo, that overhead may exceed the cost of occasional manual re-prompting. The benefit only exceeds the cost when tasks are long enough or complex enough for drift to cause rework.
Review burden shifts rather than disappearing. Static specs front-load review at gates between phases. Living specs distribute review across reconciliation points throughout execution. The total review burden may not decrease; it redistributes from a few concentrated checkpoints to ongoing monitoring of spec changes.
Small tasks get over-engineered. A feature that touches one file, uses stable dependencies, and finishes in a single session does not benefit from continuous reconciliation. Adding spec-sync overhead to throwaway prototypes or single-function changes slows iteration without reducing risk.
Vendor architecture coupling. Intent's living-spec model is tightly integrated with Augment Code's Context Engine and multi-agent orchestration. That coupling is the cost of deep codebase-aware reconciliation: the Context Engine's cross-service dependency analysis is what makes reconciliation work at scale across 400,000+ files. Teams that prioritize vendor portability over deep integration should evaluate open frameworks like GitHub Spec Kit and cc-sdd that retrofit structure onto existing agents, accepting narrower reconciliation scope in exchange for modularity.
The Tradeoff Spectrum: Three Levels of Spec Commitment
Böckeler's framework offers the most useful lens for evaluating spec-driven systems: the main tradeoff is how long the spec remains authoritative after initial creation.
| Level | Name | Tool example | Ongoing cost | Alignment strength | Best for |
|---|---|---|---|---|---|
| 1 | Spec-First | Kiro | Low: no sync overhead | Weakens as task grows | Single-session, stable-dependency tasks |
| 2 | Spec-Anchored | Intent | Medium: reconciliation cycles | Maintains through execution | Multi-session, multi-file, volatile tasks |
| 3 | Spec-as-Source | Tessl | High: spec is the primary edit surface | Strongest, but non-determinism risk grows | Long-lived features with strict compliance needs |
Most current tools still operate at Level 1 or have no native spec model at all. External frameworks such as GitHub Spec Kit and cc-sdd exist to retrofit structure onto agents that otherwise begin from prompts alone. Intent is built around the persistent spec model at Level 2 rather than treating it as a separate framework bolted on afterward. That model also extends to multi-agent workflows, where different agents must share the same updated intent instead of forking their own interpretations.
When to Use Each Model
The most practical approach for spec-driven development is not choosing one model permanently but segmenting by task type. A team might use Kiro for stable one-session features and Intent for multi-service migrations within the same quarter. The question is where the boundary falls for a given task. Four variables determine that threshold:
| Variable | Static spec likely sufficient | Living spec likely worth the overhead |
|---|---|---|
| Task duration | Single session, under 2 hours | Multi-session, spanning days or handoffs |
| Files and services touched | 1-3 files in one service | 4+ files across 2+ services |
| Dependency stability | Stable APIs, pinned versions, no active migrations | Active API changes, library upgrades, or infra shifts |
| Agent handoff count | One agent, one run | Multiple agents or resumed sessions |
A concrete example: a team is upgrading a payment service from Stripe API v2 to v3 across 8 files in 2 services. The Stripe API contract is actively changing, the work will span at least 3 sessions, and two agents will divide the frontend and backend migration. All four indicators point to the right column: multi-session, multi-service, unstable dependencies, multiple handoffs. A static spec written on day one will be wrong by day three. A living spec that updates as each migration step resolves Stripe's new field names and webhook contracts keeps the second agent from repeating the first agent's discoveries.
If a task hits two or more indicators in the right column, the risk of drift-induced rework likely exceeds the overhead of continuous reconciliation. Teams should test this by running one volatile task through a living-spec workflow and comparing the rework rate against their last equivalent task using static specs or prompt-only workflows.
For organizations operating in that higher-volatility range, Intent provides persistent specifications, reconciliation against real code via the Context Engine, and coordinated context across specialist agents.
Get started with Intent and bring living specs and coordinated agents into your development workflow.
Free tier available · VS Code extension · Takes 2 minutes
FAQ
Related
Written by

Molisha Shah
GTM and Customer Champion