Skip to content
Install
Back to Guides

The Spec as Source of Truth: Why Codebases Should Be Rebuildable from Documentation

Apr 9, 2026
Ani Galstian
Ani Galstian
The Spec as Source of Truth: Why Codebases Should Be Rebuildable from Documentation

Treating the spec as the single source of truth for AI coding means the specification, not the codebase, is the primary artifact engineers maintain, and code becomes a derived, regenerable output that AI agents produce from that spec on demand.

TL;DR

Most teams cannot rebuild their codebase from documentation alone, and the rebuild test reveals exactly why: delete src/, open a fresh agent session, point it at the spec, and regenerate. The divergences that surface are almost never due to missing endpoints or incorrect data types. They are about implicit decisions: why a particular error code was chosen, why a specific caching strategy was used, and why one library was selected over another. Those decisions live in developer memory, not in documentation.

Every engineering team has experienced the moment: a critical service breaks, the developer who built it left six months ago, and the documentation describes a system that no longer exists. The codebase is the only record of what the software actually does.

This problem intensifies with AI development. AI coding agents are stateless between sessions. They cannot remember decisions made in prior work or infer why a particular authentication pattern was chosen. Without a persistent, complete specification, each new session starts from zero, and the resulting code drifts.

A spec-as-source research frames this directly: "Spec-driven development inverts the traditional workflow by treating specifications as the source of truth and code as a generated or verified secondary artifact." The question for engineering teams is not whether specifications matter; it is how complete, how machine-readable, and how rigorously enforced those specifications need to be.

This guide covers the rebuild test as a completeness benchmark, what "source of truth" means when AI agents are the primary code producers, and a practical spectrum of three rigor levels teams can adopt incrementally.

The Rebuild Test: A Spec Completeness Benchmark

The rebuild test is a concrete, binary assessment of spec quality: delete the entire src/ directory, open a clean AI agent session with no prior context, point it at the specification files, and attempt to regenerate the codebase. If the regenerated output passes the existing test suite and matches production behavior, the spec may be adequate for the behaviors covered. If it diverges, the spec has gaps.

This test is not theoretical. A bootstrapping study demonstrated the mechanism directly: "Starting from a 926-word specification and a first implementation produced by an existing agent (Claude Code), a newly generated agent re-implements the same specification correctly from scratch." The paper emphasizes the importance of specifications in guiding agent behavior and development.

What the Rebuild Test Reveals

The rebuild test surfaces a specific category of missing information: implicit decisions. Production codebases accumulate decisions that never get written down. Why does the cancellation endpoint return 200 on duplicate requests instead of 409? Why does the authentication middleware check for a specific header format? Why does the rate limiter use a sliding window instead of a fixed window?

When an AI agent encounters these ambiguities during regeneration, it makes its own decisions. Those decisions will differ from the original ones. The test suite catches some divergences, but many implicit decisions affect behavior in ways that standard tests do not cover.

Rebuild Test OutcomeWhat It MeansRequired Action
All contract tests passSpec covers behavioral contractsExpand to edge case coverage
Unit tests fail on business logicSpec omits decision rationaleAdd business rules with enforcement levels
Integration tests failSpec omits cross-service contractsAdd dependency specs and API contracts
Tests pass, but behavior differsTests are insufficient; spec may also be incompleteStrengthen both spec and test coverage
The agent cannot start the generationSpec lacks structural information (stack, dependencies, folder structure)Add architectural context to spec

When the rebuild test produces multiple failure rows simultaneously, fix them in this order: structural gaps first (the agent cannot start generation without stack and folder information), then integration test failures (cross-service contract gaps cause cascading failures that obscure unit-level diagnosis), and finally business logic gaps. The row "tests pass, but behavior differs" requires stronger test coverage before spec changes.

The Community Signal

Some practitioners have been testing this pattern in production for extended periods, treating code as explicitly disposable: "If the implementation is bad, delete it. Write tests first if you need to, then delete and regenerate." The spec becomes a checkpoint before execution begins.

Intent's living specs keep agents aligned with evolving requirements across sessions.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline
···
$ cat build.log | auggie --print --quiet \
"Summarize the failure"
Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42
Fix: npm install lodash @types/lodash

What "Source of Truth" Means for AI Agents

Spec documentation for AI coding agents encodes decisions, not just requirements. The distinction matters because AI agents are stateless. They do not retain memory between completions. Every agent session starts fresh. The specification is the only mechanism for persisting context across sessions.

How Agents Actually Consume Specs

Every major AI coding tool uses the same fundamental mechanic: context files are injected at the start of the model's context window on each session. Cursor's rules documentation states this explicitly: "Large language models don't retain memory between completions. Rules provide persistent, reusable context at the prompt level."

Anthropic's context engineering guide uses a hybrid strategy: "CLAUDE.md files are naively dropped into context up front, while primitives like glob and grep allow it to navigate its environment and retrieve files just-in-time." The spec file is an important source of project context for the agent.

Project-root context files are now a common mechanism used by major AI coding tools to provide repository-level instructions.

ToolPrimary Context FileAdditional Files
Cursor.cursor/rules/*.mdc.cursorrules (legacy fallback)
Claude CodeCLAUDE.md@imported files
GitHub Copilot.github/copilot-instructions.md.github/instructions/*.instructions.md, AGENTS.md
AiderCONVENTIONS.mdMultiple via .aider.conf.yml

GitHub Copilot's agent documentation explicitly states AGENTS.md, CLAUDE.md, and GEMINI.md. A single AGENTS.md file can serve as a cross-tool spec artifact.

Decisions vs. Requirements

A requirement says "all API endpoints require authentication." A decision says "authorization failures return 403, not 404, to avoid enumeration attacks on order IDs (see CWE-200), per security review Q3 2024." The requirement specifies what the agent should build. The decision tells the agent why a specific implementation was chosen over alternatives.

Specs that encode only requirements produce code that works, but makes different implementation choices on every regeneration. Specs that encode decisions produce code that converges toward the same implementation regardless of which agent or model generates it.

The spec-as-source research captures this: the planning phase "encodes constraints that the implementation must respect, for example, 'use PostgreSQL for persistence' or 'all API endpoints require authentication.' When using AI coding assistants, the plan provides crucial context: the AI learns not just what to build but how the system is structured and what conventions it should follow."

The Session Fidelity Problem

Static specs create a specific failure mode when agents execute across multiple sessions. An analysis of 600 rejected pull requests identifies alignment loss during execution, not poor task descriptions, as the primary driver of agentic workflow failures. The agent receives an accurate spec but still produces code that fails CI because the spec does not update to reflect intermediate decisions made during implementation.

Intent addresses this with living specs that update as agents complete work. When an agent finishes a task, the spec reflects what was actually built. When requirements change, updates propagate to all active agents. Intent's coordinator documentation describes a Coordinator Agent that analyzes the codebase, drafts a spec as a living document, and then generates tasks for specialist agents to execute. Users can stop the Coordinator at any time to manually edit the spec.

PatternFlowSession Continuity
Static specRequirements → Agent → Code (spec unchanged)Lost between sessions
Living specRequirements → Agent → Code → Spec updates → Next agent inherits current statePreserved across sessions

This architectural distinction highlights a structural problem: teams routinely version the source code generated by agents but neglect to version the specs that produced it, inverting the dependency relationship that matters most.

The Practical Spectrum: Three Rigor Levels for Spec-Driven Development

Engineering teams do not adopt spec-driven development in a single step. The three levels below represent a progression from lightweight spec discipline to full spec-as-source, each with distinct enforcement mechanisms, organizational tradeoffs, and named production examples. Teams can enter at any level, but each builds on the practices of the one before it.

Level 1: Spec-First

Spec-first development means the specification is written before implementation begins. The spec is an input artifact that enables parallel development, but drift between spec and implementation accumulates over time without structural prevention.

How it works: Teams write an OpenAPI YAML, Protocol Buffer definition, or structured Markdown spec before writing code. Consumer teams develop against mock servers generated from the spec while the provider team builds the real implementation.

Named example: Stripe maintains a Stripe OpenAPI with versioned spec files in JSON and YAML. Per Stripe docs: "developers of third-party SDKs and custom Stripe clients powered by the OpenAPI specifications in stripe/openapi can use the unified files to drive their code generation."

Core tradeoff: Spec-first provides high parallelism benefits, consumers develop before providers ship, but carries high drift risk because there is no automated enforcement of spec-code parity. Enforcement is social and manual.

Level 2: Spec-Anchored

Spec-anchored development places the spec in a shared, centrally governed repository. Automated contract testing at build and CI time enforces that implementations conform to the spec.

How it works: A central contract repository, owned by neither provider nor consumer teams exclusively, holds the specifications. CI gates fail builds when implementation diverges from the shared spec. Contract testing tools like Pact automatically validate conformance.

Named example: Netflix GraphQL uses federated GraphQL schemas as the shared contract. Each Domain Graph Service is a standalone spec-compliant GraphQL service.

Named example: Shopify Sorbet enforces component contracts during monolith decomposition. Sorbet expresses input and output contracts on component boundaries, making them machine-checkable at the type level.

Core tradeoff: Spec-anchored development provides low drift risk through automated CI gates, but introduces medium organizational friction because provider teams must relinquish sole ownership of the spec.

Level 3: Spec-as-Source

Spec-as-source development treats the spec as the primary artifact to be maintained. Engineers edit specs; machines produce code. Any change to behavior means changing the spec and regenerating, not editing code directly.

How it works: The specification becomes the thing engineers maintain. Code generation pipelines produce implementation artifacts from the spec. Changes to behavior require changing the spec and regenerating. martinfowler.com documents this pattern with Tessl, where generated code is marked with // GENERATED FROM SPEC - DO NOT EDIT.

Named example: Uber design system uses spec-as-source across seven implementation stacks: UIKit, SwiftUI, Android XML, Android Compose, Web React, Go, and SDUI. A Figma link serves as the spec input; the resulting design is translated into detailed technical specifications tailored to seven platform stacks.

Named example: Netflix UDA treats the domain model as the spec; GraphQL schemas, Iceberg tables, and Data Mesh sources are generated projections.

Core tradeoff: Spec-as-source provides near-zero drift risk by construction, but introduces high organizational friction and high debugging complexity because generated code is harder to trace.

Consolidated Comparison

DimensionSpec-FirstSpec-AnchoredSpec-as-Source
Primary functionAlignment and parallelismEnforced shared contractPrimary maintained artifact
Spec ownershipProvider teamShared/central repoSpec team or tooling
EnforcementSocial/manualAutomated CI gatesGeneration or continuous validation
Drift riskHighLowNear-zero
Organizational frictionLowMediumHigh
Debugging complexityLowMediumHigh
Tooling maturity (2025–2026)HighHighLow to medium
Named examplesStripeNetflix (GraphQL Federation)Netflix (UDA), Uber (design system)

The rigor levels are not alternatives; they are a progression. Teams that attempt Level 3 without first establishing Level 2's CI gates and contract tests find that their specs become aspirational documents rather than enforced contracts.

Open source
augmentcode/auggie180
Star on GitHub

The Adoption Sequence

A practical adoption path:

  1. Pick one critical service (authentication, payments, core API)
  2. Write a machine-readable spec covering its behavioral contracts
  3. Add a CI lint gate using Spectral or oasdiff in report-only mode
  4. Promote to the blocking gate after observing violations for one to two sprints. The signal that the team is ready to promote: the violations the lint gate catches are consistently real gaps, not false positives from teams legitimately iterating on the spec.
  5. Add contract tests to validate consumer-provider compatibility
  6. Test the rebuild: delete generated code, regenerate from spec, confirm tests pass
  7. Scale to other services once the workflow proves reliable

Intent's living specs update automatically as agents work, keeping decision context current across regeneration cycles.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

What Generation-Grade Specs Actually Contain

A spec that passes the rebuild test contains more than endpoints and request schemas. Generation-grade specs encode business rules with enforcement levels, architectural decisions with rationale, security constraints with CWE mappings, and generation provenance metadata.

The spec-as-source research captures the distinction: the planning phase "encodes constraints that the implementation must respect... Without this context, even a perfect functional spec may yield code that contradicts organizational standards or architectural decisions."

Documentation-Grade vs. Generation-Grade

DimensionDocumentation-GradeGeneration-Grade
Business rulesIn prose comments or an external wikiInline, machine-readable, with enforcement levels
RationaleAbsent or in a separate ADR documentCo-located with the rule, linked by ID
ConstraintsDescribed in the description fieldsEncoded as executable expressions (CEL, regex, enum)
Generation metadataNoneModel version, timestamp, rebuild trigger
Error semanticsHTTP status codes onlyReason codes, retry semantics, business-rule references
Security constraintsImplicit in implementationExplicit, with CWE mappings and enforcement levels

Example: OpenAPI with Generation-Grade Extensions

The OpenAPI spec supports specification extensions via the x- prefix. Generation-grade specs use these extensions to carry business logic, decisions, and provenance:

yaml
paths:
/orders/{id}/cancel:
post:
summary: Cancel an order
x-business-rules:
- id: BR-042
rule: "Orders may only be cancelled within 30 minutes of placement"
rationale: "Fulfillment SLA requires warehouse pick within 45 minutes"
enforcement: MUST
- id: BR-043
rule: "Orders in SHIPPED status cannot be cancelled"
rationale: "Carrier handoff is irreversible; use /returns"
enforcement: MUST
x-architectural-decisions:
- adr-ref: "ADR-017"
decision: "Cancellation is idempotent; repeated calls return 200"
rationale: "Retry safety for mobile clients on unreliable networks"
parameters:
- name: id
in: path
required: true
schema:
type: string
format: uuid
x-validation-note: "Must be caller's own order; returns 403 not 404"
responses:
'409':
description: Order cannot be cancelled
x-business-rule-ref: "BR-042, BR-043"

The x-business-rules entries carry both the rule and the rationale inline. The x-architectural-decisions entry links to a specific ADR with its reasoning. The x-validation-note on the path parameter encodes a security constraint, 403 vs. 404, to prevent enumeration, that documentation-grade specs leave entirely implicit.

The Constitution Pattern

Constitutional spec-driven development research introduces a separate constraint layer: a versioned document that encodes non-negotiable security requirements and includes CWE vulnerability mappings. Feature specs are governed by the constitution's constraints and feature-specific logic.

yaml
# constitution.yaml
version: "2.0.1"
principles:
- id: SEC-001
enforcement: MUST
cwe-mapping: CWE-285
rule: "All API endpoints require authentication and appropriate authorization checks"
rationale: "No anonymous access to business data"
- id: DATA-001
enforcement: MUST
rule: "All state mutations emit an audit event"
rationale: "Supports logging, monitoring, and auditability objectives relevant to SOC 2 Type II"

This separation means security and compliance constraints are enforced structurally, rather than relying on each feature spec to remember to include them.

Run the Rebuild Test on Your Most Critical Service This Week

The gap between "we have documentation" and "we have a rebuildable spec" is the gap between intent and execution. Every implicit decision not captured in the spec is one that the next AI agent session will make differently.

The right starting point is the one service where an implicit decision has already caused a production incident or confusing agent output. The first spec element to write is not the happy-path API contract, but the decision rationale for the behavior that is hardest to explain to a fresh agent. That single entry, "cancellation returns 200 on duplicate requests because retry safety matters more than strict idempotency semantics", is the difference between a spec that describes what the code does and one that explains why. Run the rebuild test. The failures are the roadmap.

Intent's living specs keep every agent session working from the current project context.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

Frequently Asked Questions: Spec-Driven Development and Rebuildable Codebases

Written by

Ani Galstian

Ani Galstian

Get Started

Give your codebase the agents it deserves

Install Augment to get started. Works with codebases of any size, from side projects to enterprise monorepos.