Can SDD be adopted incrementally in a brownfield codebase, or does it require a comprehensive specification effort?

Brownfield SDD adoption works incrementally. InfoQ prescribes that specification coverage grows organically with each modification. Attempting to retroactively spec entire legacy systems is impractical.

How does spec-driven development handle the tribal knowledge problem in legacy code?

Change-level specs capture discovered behavior at the point of modification, creating a growing specification asset over time. GitHub's SAML hardening shows that teams can bootstrap specifications from production traffic rather than documentation.

What happens when specifications drift from actual system behavior?

Specification drift is the default state in brownfield systems and must be continuously governed. The primary mitigation is discipline: every AI-assisted code change must update the corresponding spec, or the gap resurfaces during later generation and maintenance.

Are there brownfield codebases where SDD cannot be adopted in place?

Yes, GitHub migration demonstrates that some legacy systems require architectural modernization before specification infrastructure can be applied. Planning should assess whether a given legacy system can adopt SDD in place or needs modernization first.

How do git worktrees support parallel brownfield SDD execution?

Git worktrees create complete filesystem isolation between concurrent workstreams while sharing the underlying repository data. Production examples support multiple concurrent agents, though disk space multiplication and shared-state management require engineering solutions at enterprise scale.

Spec-Driven Development for Brownfield Enterprise Codebases

Spec-driven development in brownfield enterprise codebases is most effective when teams write change-level specifications rather than full-system specs, because undocumented dependencies and scale make comprehensive upfront specifications impractical.

TL;DR

Brownfield codebases break greenfield SDD assumptions because legacy behavior, dependencies, and contracts are rarely documented. The practical approach is to write specs only for the change being made, incrementally grow coverage, and verify against existing tests and production-observed behavior.

Martin Fowler's evaluation of SDD tools covers Kiro, spec-kit, and Tessl but never quantifies the effort to introduce them into existing codebases, the exact problem facing teams that maintain repositories with hundreds of thousands of files, 10-15 years of technical debt, and little surviving architectural documentation.

The problem is structural. SDD demos usually assume blank-slate requirements analysis, but brownfield systems already exist, their contracts are often implicit, and their dependencies are rarely fully documented. Teams need a workflow that begins with understanding the current system and writes narrow specs only for the intended change. The payoff is no longer hypothetical: Salesforce's engineering team cut a legacy migration it had estimated at two years down to four months by leading with dependency analysis rather than file-by-file rewriting.

Augment Cosmos is a unified cloud platform for running coordinated AI agents across the software development lifecycle, with shared context and memory that compound across a team. Built on the Context Engine, it gives teams working in large codebases the architectural understanding to map dependencies across 400,000+ files through semantic dependency graph analysis before they draft a change-level spec.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT

What Teams Need Before Starting Brownfield SDD

Brownfield SDD has a lower barrier to entry than greenfield approaches because it starts with an existing codebase rather than a blank page. Three conditions make adoption practical.

The first condition is a semantic analysis tool capable of building dependency maps across the existing repository. Manual code reading does not scale past a few hundred files, and AI discovery without a dedicated context layer produces incomplete maps and downstream specification gaps.

The second condition is at least one engineer who understands the existing architecture well enough to review and correct the dependency maps the tool produces, since tribal knowledge can be structured and captured but not fully automated away. The third condition is a baseline for two DORA metrics: current lead time for changes and change failure rate. Without that baseline, the team cannot tell whether SDD is improving merge speed and reducing regressions.

Why Spec-Driven Development Breaks in Brownfield Codebases

Spec-driven development breaks in brownfield environments because five interconnected failure modes invalidate the assumptions built into greenfield SDD workflows.

Comprehensive Specification Is Impractical at Scale

Enterprise codebases with 100,000+ files cannot be comprehensively specified without exceeding human review capacity and retrieval limits. InfoQ's analysis puts it directly: for large existing applications, generating full specifications either blows past context limits or produces specs too large to review. The fix is scope reduction, keeping granular specs closest to the area of change.

Tribal Knowledge Silos Block Specification Authoring

Thoughtworks documents the knowledge loss problem in their TW whitepaper: with no architecture decision records and little to no test coverage, incremental feature development has no safety net. The engineers who understood architectural intent have departed, and modern engineers are not trained on legacy technologies. Writing specifications requires understanding what the system does, and that understanding has evaporated. Salesforce ran into the same wall internally: engineers traced a two-year migration estimate largely to undocumented legacy patterns no current team member could fully explain.

Even dedicated boundary-enforcement tooling struggles to keep dependencies legible. GitHub Engineering describes the technical debt that builds up in large legacy codebases, and a Packwerk review found the debt from privacy checks still far from paid off, despite explicit tooling designed to enforce dependency boundaries.

Cosmos's Context Engine maps these cross-module relationships across the repository and narrows the blind spots that leave boundary specs incomplete during legacy modernization.

Implicit Behavioral Contracts Resist Formalization

Brownfield systems contain behavioral expectations between components that were never documented: shared timing assumptions, ordering dependencies, and undocumented error-handling behaviors. These implicit contracts must be discovered before they can be encoded in specs. InfoQ warns that gaps between specification and actual system behavior compound over time, resurfacing in different forms whenever code is regenerated based on an incomplete spec.

AI Performance Degrades in Unhealthy Code

Fowler cites Adam Tornhill's research showing LLMs produce a 30% higher defect risk in less-healthy code, and that the study's less-healthy code was nowhere near as bad as much legacy code is. Kent Beck sharpens the critique via Fowler's blog: writing whole specifications upfront assumes teams learn nothing during implementation that would change the spec. In a brownfield, every implementation reveals hidden coupling that can invalidate upfront specs.

Failure Mode	Greenfield Impact	Brownfield Impact
Scale of specification	Manageable: new system, defined scope	Impractical: 100K+ files exceed review capacity
Knowledge availability	The developer defines intent directly	Tribal knowledge lost; original architects departed
Dependency visibility	Defined at design time	Undocumented; accumulated over 10-15 years
Behavioral contracts	Specified before implementation	Implicit; must be reverse-engineered from production
AI code quality	Clean code, lower defect risk	Higher defect risk in unhealthy codebases

Five Steps Teams Can Use to Apply a Spec-Driven Workflow to Brownfield Codebases

The brownfield SDD workflow differs from greenfield in a fundamental way: teams build an architectural understanding of the existing code first, then write narrow specifications scoped to the intended change. The greenfield habit of writing a comprehensive spec and generating code from it does not survive contact with an undocumented legacy system.

Step 1: Build Semantic Understanding Across the Existing Codebase

Brownfield SDD begins with understanding the codebase before any specification authoring. For repositories spanning hundreds of thousands of files, this requires semantic dependency analysis that maps relationships between components, identifies architectural patterns, and surfaces implicit contracts.

Cosmos's Context Engine builds this architectural understanding across codebases spanning 400,000+ files, the foundation change-level specification depends on. Red Hat sequences legacy modernization the same way with its agent mesh, where reasoning agents handle dependency mapping and migration planning before any coding agent touches a file. Without this foundation, specifications are written against an incomplete model of the system, leading to the integration failures Fowler warned about: just because the windows are larger does not mean AI will properly capture everything inside them.

The RPI Loop formalizes this: an agent scans the codebase and produces a compact summary of only the relevant state without writing code, keeping research and implementation in separate phases to prevent context contamination.

Step 2: Scope Each Spec to the Change Itself

The second step is the paradigm shift that makes brownfield SDD viable: each specification covers only the delta of the intended change. Rather than retroactively documenting an entire legacy codebase, teams write narrow specs covering what the current change touches. Coverage grows organically with each modification, concentrated where it provides the most value: the modules under active development.

A change-level spec defines four elements:

Current behavior: what the system does today, if discoverable from tests or production traffic
Target behavior: the precise delta from the current state
Invariants: what must not change in adjacent systems
Scope boundary: what is explicitly excluded from this change

Cosmos anchors these narrow specs to dependency evidence from the live codebase through its shared context layer, rather than relying on tribal memory or documentation that may be years out of date.

Step 3: Decompose Against Existing Architecture

Teams must decompose the change-level spec into implementation tasks that respect the existing architectural structure. Decomposition in brownfield differs from greenfield because the architecture already exists, constraints are real, and deviations from established patterns create a maintenance burden.

Stripe Minions validates this at enterprise scale: guidance is applied at a scoped or subdirectory level to avoid a global rules file that would exceed the model's context.

Step 4: Execute in Isolated Worktrees

Implementation tasks execute in parallel using Git worktrees, which give each agent its own directory, branch, and filesystem state while sharing the underlying repository data.

Boris Cherny, creator and head of Claude Code at Anthropic, describes running many parallel Claude Code sessions and using separate git checkouts for each local session when working on large batch changes, such as codebase-wide migrations. Google patterns further formalize this by assigning specific roles to individual agents, creating systems that are more modular, testable, and reliable.

Red Hat's agent mesh applies the same principle to legacy estates: specialized agents each own a slice of the migration and share state as the work proceeds, which lets a small team oversee a modernization that would take years by hand. Cosmos provisions isolated cloud Environments for this kind of parallel execution. They maintain architectural context across concurrent work streams and stay durable across long-running runs. Resource considerations are real: multiple concurrent worktrees multiply disk usage and can require per-worktree database instances and worktree-indexed Docker volume names at enterprise scale.

Step 5: Verify Against Spec and Existing Tests

Verification in brownfield serves two purposes: confirming the change matches the spec, and confirming it does not break existing behavior. That second purpose, protecting behavior that already exists, is what separates brownfield from greenfield. Compare the implementation against both the change-level spec and the existing test suite. The dual check catches the technical debt injection Fowler's team identified: AI-generated code that adds unrequested features and must integrate with systems the AI does not fully understand.

A machine-checkable contract validation step further strengthens verification. The pattern below uses an OpenAPI contract as an example of turning a prose spec into a CI-enforceable artifact:

text

# openapi.yaml
openapi: 3.1.0
info:
  title: Payment Validation API
  version: 1.0.0
paths:
  /payments/validate:
    post:
      summary: Queue payment for async fraud validation
      responses:
        '202':
          description: Accepted for async validation

Runnable validation: Redocly CLI, the maintained replacement for the deprecated swagger-cli, validates this contract and fails fast if required OpenAPI fields are missing.

text

npx @redocly/cli lint openapi.yaml
# Exits 0 when the description is valid, non-zero when it is not.

Common failure mode: if responses are omitted, validation fails with a schema validation error identifying the offending path. In brownfield teams, this turns a spec from informal prose into a machine-checkable contract that can run in CI before implementation merges.

Cosmos connects verification back to the same architectural map used during discovery, drawing on shared context and tenant memory to confirm that the implemented delta stayed within the intended boundary.

Three Specification Patterns for Brownfield Codebases

Brownfield specification patterns differ from greenfield because they acknowledge the reality IEEE documented decades ago: legacy code itself is often the only reliable documentation.

Pattern 1: Change Specs (Delta Specifications)

Change specs capture only the behavioral delta of the intended modification. Every bug fix, feature addition, and refactoring becomes an opportunity to add specifications for the code being touched. The discipline requirement from InfoQ: every AI-assisted change must update the spec alongside the code. Skipping that update widens the specification gap, which resurfaces as non-deterministic AI generation failures later.

Pattern 2: Dependency Boundary Specs (Service Contract Specifications)

Dependency boundary specs formalize implicit contracts at integration points between legacy and modern systems, the same contracts that come under pressure during a monolith-to-microservices migration. Required components include machine-readable artifacts such as OpenAPI for REST and Avro or Protobuf for events, plus non-functional concerns such as failure modes, SLOs, and versioning, all tied to a shared vocabulary across teams.

Anti-corruption layers implement these boundary specs. Fowler describes a Backend for Frontend as an Anti-Corruption Layer that holds the frontend's domain model while translating to legacy interfaces. GitHub's SAML hardening shows the pattern in practice: bootstrap schemas from production traffic, A/B test with the Scientist framework, and converge on a minimal schema validated against millions of requests.

Pattern 3: Migration specs (Incremental Modernization Specifications)

Migration specs define the target state and the incremental steps to reach it from the current state, and are designed to be executed without stopping feature delivery. Three components are required per CircleCI analysis:

Target state vision: specific enough to validate intermediate steps
Incremental steps: each is individually deployable and independently valuable
Integration layer design: the facade that mediates between old and new during transition

Open source

augmentcode/auggie★243

Star on GitHub

Shopify's implementation of the Strangler Fig pattern validates this approach: build a facade, identify independent, extractable modules, migrate incrementally, and continuously monitor. Salesforce's engineering team executed this pattern when it modernized a third-party managed package into native multi-tenant Java, ordering the work by dependency graph and shipping in stages rather than attempting a single rewrite. Peer-reviewed research confirms that direct rewrites are rarely feasible in enterprise environments due to risks of functional regression and loss of institutional domain knowledge.

Pattern	Scope	When to Use	Key Discipline
Change spec	Single modification delta	Bug fixes, feature additions, refactoring	Update spec with every AI-assisted change
Dependency boundary spec	Integration point contract	Service extraction, monolith decomposition	Validate against production traffic, not docs
Migration spec	Multi-phase architectural change	System modernization, database migration	Each step must be independently deployable

What Does Not Work: Full-Pipeline SDD for Brownfield Changes

AWS Kiro's mandatory three-phase pipeline creates structural friction for brownfield codebases. Kiro's own product team acknowledged this: not everyone starts from requirements, especially when working on existing brownfield apps where the technical architecture is already mapped out.

Three limitations make full-pipeline SDD impractical for routine brownfield changes. First, spec generation and full agent hook execution add per-task overhead that a single-line bug fix cannot justify. Second, the agent starts from scratch each session, relearning the codebase every time. Third, AWS's own case study demonstrated the approach only on a small codebase.

Cosmos addresses these limitations directly. Shared context and tenant memory persist architectural understanding across sessions, so agents do not relearn the codebase on every task. For teams working on incremental legacy changes, that continuity beats forcing every task through a fresh requirements-first pipeline.

Full-pipeline SDD still earns its place for large greenfield features inside brownfield codebases: a new service, a new API surface, a new subsystem. The distinction is between specifying new work, where upfront specification adds value, and modifying existing work, where change-level specs fit better.

How Teams Should Measure Brownfield SDD Effectiveness

DORA has acknowledged the topic but published no SDD-specific findings, and peer-reviewed literature offers no standardized metrics yet. Teams adopting brownfield SDD in 2026 are establishing baselines rather than following mature standards. Four metrics adapted from adjacent research provide a starting framework.

Drift Rate

InfoQ identifies drift as the natural state that must be continuously governed: divergence between specs and actual system behavior over time. Teams can instrument it immediately through schema validation failures per sprint, contract-test failures that signal implementation divergence, and spec revision frequency.

Regression Rate

Defect density in specced versus unspecced areas of the codebase provides the clearest signal. A long-cited baseline from software-quality literature is roughly 1 defect per 1,000 lines, though real rates vary widely by language, codebase, and measurement method. AI-assisted changes with formal specs should trend below it, though rigorous brownfield before-and-after data has not been published.

Time-to-Merge vs. Baseline

DORA Lead Time for Changes is the closest proxy. Establish a baseline before SDD adoption, then track quarterly.

Performance Cluster	Lead Time for Changes	Change Failure Rate
Elite	Less than one day	~5%
High	One day to one week	~20%
Medium	One week to one month	~10%
Low	More than one month	Highest of the four clusters

Source: 2024 DORA report. DORA derives these clusters from each year's survey through cluster analysis, so the thresholds shift annually. In 2024, the Medium cluster reported a lower change failure rate than the High cluster.

Specification Coverage Growth

Measure specifications added per sprint rather than total coverage percentage. Brownfield coverage starts near zero and grows slowly. Useful denominators include critical-path components with formal specs, API endpoints with machine-readable contracts, and active-development modules with change-level specs.

Adopt Change-Level Specs Before Your Next Legacy Refactor

The core tension in brownfield SDD is scope: comprehensive specifications are impractical at enterprise scale, yet unspecified AI-assisted changes introduce compounding drift. Change-level specs resolve it by scoping each spec to the delta of a single modification and growing coverage where it matters most.

The next concrete step is simple: on the next brownfield change, write a change-level spec defining current behavior, target behavior, invariants, and scope boundaries before generating code. Measure whether the resulting change merges faster and introduces fewer regressions than the team's baseline.

Spec-Driven Development for Brownfield Enterprise Codebases

TL;DR

The New Code Review Workflow for AI-Native Engineering Teams

What Teams Need Before Starting Brownfield SDD

Why Spec-Driven Development Breaks in Brownfield Codebases

Comprehensive Specification Is Impractical at Scale

Tribal Knowledge Silos Block Specification Authoring

Undocumented Dependencies Create Specification Blind Spots

Implicit Behavioral Contracts Resist Formalization

AI Performance Degrades in Unhealthy Code

Five Steps Teams Can Use to Apply a Spec-Driven Workflow to Brownfield Codebases

Step 1: Build Semantic Understanding Across the Existing Codebase

Step 2: Scope Each Spec to the Change Itself

Step 3: Decompose Against Existing Architecture

Step 4: Execute in Isolated Worktrees

Step 5: Verify Against Spec and Existing Tests

Three Specification Patterns for Brownfield Codebases

Pattern 1: Change Specs (Delta Specifications)

Pattern 2: Dependency Boundary Specs (Service Contract Specifications)

Pattern 3: Migration specs (Incremental Modernization Specifications)

What Does Not Work: Full-Pipeline SDD for Brownfield Changes

How Teams Should Measure Brownfield SDD Effectiveness

Drift Rate

Regression Rate

Time-to-Merge vs. Baseline

Specification Coverage Growth

Adopt Change-Level Specs Before Your Next Legacy Refactor

FAQs

Written by

Molisha Shah

Give your codebase the agents it deserves

TL;DR

The New Code Review Workflow for AI-Native Engineering Teams

What Teams Need Before Starting Brownfield SDD

Why Spec-Driven Development Breaks in Brownfield Codebases

Comprehensive Specification Is Impractical at Scale

Tribal Knowledge Silos Block Specification Authoring

Undocumented Dependencies Create Specification Blind Spots

Implicit Behavioral Contracts Resist Formalization

AI Performance Degrades in Unhealthy Code

Five Steps Teams Can Use to Apply a Spec-Driven Workflow to Brownfield Codebases

Step 1: Build Semantic Understanding Across the Existing Codebase

Step 2: Scope Each Spec to the Change Itself

Step 3: Decompose Against Existing Architecture

Step 4: Execute in Isolated Worktrees

Step 5: Verify Against Spec and Existing Tests

Three Specification Patterns for Brownfield Codebases

Pattern 1: Change Specs (Delta Specifications)

Pattern 2: Dependency Boundary Specs (Service Contract Specifications)

Pattern 3: Migration specs (Incremental Modernization Specifications)

What Does Not Work: Full-Pipeline SDD for Brownfield Changes

How Teams Should Measure Brownfield SDD Effectiveness

Drift Rate

Regression Rate

Time-to-Merge vs. Baseline

Specification Coverage Growth

Adopt Change-Level Specs Before Your Next Legacy Refactor

FAQs

Can SDD be adopted incrementally in a brownfield codebase, or does it require a comprehensive specification effort?

How does spec-driven development handle the tribal knowledge problem in legacy code?

What happens when specifications drift from actual system behavior?

Are there brownfield codebases where SDD cannot be adopted in place?

How do git worktrees support parallel brownfield SDD execution?

Related

Written by

Molisha Shah

Give your codebase the agents it deserves