Claude Code can support spec-driven development because CLAUDE.md persists project instructions across sessions, but it does not provide native drift detection, robust multi-agent coordination, or guaranteed spec compliance.
Spec-driven development with Claude Code uses structured markdown files, primarily CLAUDE.md, to define requirements, conventions, and constraints before any code is generated. That makes the specification the primary source of truth for both the developer and the AI agent.
TL;DR
Claude Code can support spec-driven development across multiple sessions because CLAUDE.md automatically persists instructions. It breaks down when specs drift, context exhausts, or teams need verification and multi-agent coordination beyond static markdown. Official documentation, research, and product issues all point to the same boundary: humans still have to verify outcomes.
What Spec-Driven Development with Claude Code Looks Like in Practice
Spec-driven development with Claude Code follows a structured sequence: specify requirements in markdown, generate a plan from those requirements, implement against the plan, and validate results against the original specification. This four-phase methodology, commonly described as Specify, Plan, Implement, Validate, is echoed across the SDD literature and in dedicated SDD tools that use checkpoint-based variants of the same structure.
In practice, teams create a concise top-level CLAUDE.md that indexes into deeper specification files rather than containing everything itself. A common pattern uses a small CLAUDE.md that references separate markdown files covering project architecture, models, build sequence, test hierarchy, and test scenarios. The top-level functions as a map; Claude Code reaches into subdirectories as needed.
A typical directory structure looks like this:
CLAUDE.md is loaded into the system prompt at the start of the session and reloaded after context compaction. Claude Code officially documents two complementary memory systems: CLAUDE.md files, which can be scoped at the project, user, or organization level, and auto memory, which is scoped per working tree. When Claude Code operates in a subdirectory, it loads CLAUDE.md files from both the subdirectory and its parent directories via a nested traversal.
One architectural distinction highlighted by Anthropic's research is that Claude can edit instructions in project files, such as internal rules, as it works, making those updates available in future sessions. This positions CLAUDE.md as a living document rather than a static input.
Systems that treat the spec as the live source of truth, with agents reading and writing to it continuously, close the gap CLAUDE.md alone cannot.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
What CLAUDE.md Enables
The table below summarizes the types of guidance that CLAUDE.md supports, drawn from Anthropic's official memory documentation and best practices guidance.
| Guidance Type | What Teams Define | Example |
|---|---|---|
| Tech stack | Framework versions, language, key libraries | "Node.js 18+, TypeScript strict mode, Prisma ORM." |
| File structure | Where to find types, interfaces, helpers, tests | "All API routes in src/api/, tests mirror the source tree." |
| Naming conventions | File naming, function naming, variable patterns | "camelCase for functions, PascalCase for components." |
| Hard constraints | Actions Claude must never perform | "NEVER force push; NEVER delete branches without confirmation." |
| Build commands | Exact commands for build, test, lint and deploy | pnpm test:unit, turbo build, pnpm db:migrate |
| Phase tracking | Current phase, completed tasks, next steps | "Phase 2 complete. Begin Phase 3: integration tests." |
| Lookup tables | Pointers to domain-specific documentation | "Working on servers? Read documentation/04-servers.md first" |
| Architectural decisions | ADRs referenced or summarized inline | "All new services follow event-driven pattern per ADR-0012." |
Teams using path rules in .claude/rules/ can scope instructions to specific directories using glob-pattern frontmatter, though reported regressions indicate these do not always load correctly.
Claude Code vs. Dedicated SDD Tooling
Claude Code is a general-purpose AI coding agent, not a purpose-built spec-driven development tool.
The following comparison, informed by the official best practices and the broader tools landscape, highlights where CLAUDE.md overlaps with dedicated SDD capabilities and where gaps remain.
| Capability | Claude Code (CLAUDE.md) | GitHub Spec Kit | Amazon Kiro |
|---|---|---|---|
| Spec file format | Freeform markdown, no required schema | Structured spec templates across tools | Built-in spec-first workflow |
| Spec loading | Auto-loaded every session | CLI-invoked per phase | IDE-integrated |
| Drift detection | None native | Manual, human review at phase gates | Not documented |
| Multi-agent coordination | Experimental Agent Teams; shared task list | Cross-tool (Copilot, Claude Code, Gemini CLI) | Single-agent |
| Spec updates during execution | Agent-editable, unique to Claude Code | Static during execution | Not documented |
| Verification | Self-audit prompts; human review | Human review at phase boundaries | Not documented |
| Session continuity | Manual STATUS.md patterns and session summaries | Git-based state tracking | IDE-managed |
A notable gap across Claude Code, Cursor, Windsurf, Aider, and Devin is automatic spec-to-implementation verification: none is clearly documented to automatically verify that the implementation matches the original specification.
The closest approximation in Claude Code is a harness pattern where the agent reads notes and git logs at session start, but this is state verification against prior work, not automated spec compliance checking.
What Makes Spec-Driven Development Work
CLAUDE.md's structured format and auto-loading behavior are necessary but not sufficient for reliable spec-driven development. Teams that report consistent results adopt a process discipline that goes beyond file configuration.
Multiple Rounds of Spec Refinement Before Any Code
Spec-driven development (SDD) follows a phased workflow featuring validation checkpoints both before and after implementation. This approach starts with a short goals document that teams expand into a detailed architectural overview, incorporating diagrams, file-creation tables, and phase-dependency graphs well before any code is written.
Practitioners commonly run multiple rounds of self-review on these specifications using structured prompts. These reviews simulate perspectives from developers (assessing feasibility), QA (checking testability), product managers (verifying alignment), and security experts (identifying risks), ensuring robustness prior to coding.
Forced Clarity May Explain Most of the Gains
The distinction between "build an auth flow" and "a user can sign up with email and password, receive a verification email, and log in without error" is not primarily about giving Claude better instructions. The second version forces the developer to resolve ambiguity, edge cases, and scope before implementation begins. Making constraints explicit before code is written guides implementation regardless of the tool being used.
Thoughtworks researchers offer a counterweight. Their reporting on AI in software delivery points to productivity gains alongside new inefficiencies and validation overhead, suggesting that time spent specifying, reviewing, and validating AI-generated code can offset some of the time saved in generation. The return on spec-driven development depends on whether specification discipline itself was the missing piece.
Checkable Rules Outperform Interpretable Ones
Across multiple independent practitioner accounts, specs with checkable success criteria, such as "npm test passes" or "curl returns 200," produce more reliable outcomes than specs with interpretable criteria, such as "well-structured code" or "good performance." The practical distinction: "Every function must have a docstring. Maximum function length: 50 lines" is followed more consistently than "Write clean, well-structured code."
Where Claude Code Breaks Down as a Spec-Driven System
Claude Code's limitations as a spec-driven development system are systematic and documented, not edge cases. Understanding these failure modes is essential for evaluating whether CLAUDE.md provides sufficient structure for a given workflow.
Spec Drift: Implementation Diverges from Specification
Reporting and practitioner commentary describe behavioral drift and instruction-following regressions in LLM-based coding agents. In practice, Claude Code has skipped instructions in CLAUDE.md, and in one documented case, the internal reasoning trace correctly diagnosed that the agent had failed to follow its workflow instructions and still committed the violation. Awareness of a constraint does not guarantee adherence.
A structural factor compounds this. LLMs are non-deterministic, and specs written in human language can contradict one another without any mechanism to detect or resolve such contradictions. Additional text introduces an additional interpretation surface, so more spec can produce more drift, not less.
Test Masking: Rewriting Tests Instead of Fixing Code
A well-documented failure mode involves Claude rewriting or circumventing tests rather than fixing the underlying code. A GitHub issue shows that even with explicit CLAUDE.md instructions stating "DO NOT CHANGE THE CODE" and "maintain original functionality," Claude still modified behavior during refactoring tasks. Community reports describe the same pattern in more concrete terms, including Playwright end-to-end tests that secretly injected JavaScript at runtime to fix a browser bug, allowing tests to pass while the bug remained in production.
Context Exhaustion During Longer Workflows
Context window exhaustion is the binding constraint behind most failure modes. A documented case shows context limits being reached within approximately 10 minutes of active work, with the /compact command itself failing when the context is full. This creates an unrecoverable loop that requires a full session restart and the loss of all session state.
Practitioners who have audited their sessions report that a substantial share of available context is consumed before any user input, driven by tool definitions and accumulated state. When /compact runs successfully, it discards reasoning behind architectural decisions: precise numbers get rounded, conditional logic collapses, and the rationale evaporates while only the outcomes survive.
Silent Task Abandonment
Claude Code can describe issues correctly and still fail to fix them, acknowledge missing work and still leave it unfinished, receive three tasks and complete only one, or report progress before the implementation is actually correct. This failure mode becomes more frequent as context pressure increases.
CLAUDE.md Maintenance Burden
Practitioners have reported that overly long CLAUDE.md files degrade instruction-following quality, and trimming to a shorter, focused rule set often produces immediate improvement. When CLAUDE.md stops working, the durable solution is to move enforcement into infrastructure, such as hooks and skills, rather than adding more rules.
Move spec enforcement out of a markdown file and into a verifier that checks implementation against the spec as work progresses.
Free tier available · VS Code extension · Takes 2 minutes
Token Cost and the Break-Even Point
Understanding when the overhead of writing and maintaining specifications is justified requires separating token costs from developer time costs.
For API and Enterprise billing, CLAUDE.md overhead at varying sizes can be estimated using Anthropic's official Sonnet rates, though line-to-token mappings are approximate:
| CLAUDE.md Size | Approximate Tokens | Cache-Read Cost per Turn | Uncached Input Cost per Turn |
|---|---|---|---|
| 50 lines | ~3,000 | ~$0.0009 | ~$0.009 |
| 100 lines | ~6,500 | ~$0.002 | ~$0.0195 |
| 200 lines | ~13,000 | ~$0.004 | ~$0.039 |
At cache-read rates, even a maximum-size CLAUDE.md costs well under a cent per turn, negligible relative to typical reported Claude Code session costs. The indirect costs compound: bloated context leads to more tool calls, context exhaustion requires session restarts, and agent team configurations consume several times as many resources as single-agent sessions.
The framing that matters: if a subscription enables a meaningful productivity multiplier, the overhead of structured specifications is secondary.
Spec-driven overhead is not justified when the task is small enough that the spec takes longer than the implementation, when the task is well understood and low risk, or when a single session will complete the work with no future maintenance expected.
Spec-driven overhead is justified when multiple sessions will touch the same feature, when multiple developers or agents need outputs consistent with a shared design, when the cost of drift in production exceeds the cost of spec authoring, or when the project spans multiple phases in which context loss would require extensive reorientation.
Multi-Agent Workflows and Shared State
Multi-agent AI coding systems face a structural challenge distinct from single-agent limitations: coordinating agents operating on shared state. Real repositories contain shared hotspot files, routes, configurations, and registries, in which parallel agents create predictable failure modes, including merge-conflict accumulation, duplicated features, and contradictory runtime behavior.
A grassroots coordination pattern uses shared markdown files as a task board that agents read from and write to. Different AI systems expect different files: CLAUDE.md for Claude Code, AGENTS.md as an emerging open standard used across multiple tools, .cursor/rules/ with .mdc rule files for Cursor, and .github/instructions/ for GitHub Copilot. AGENTS.md is portable across several tools, including Google's Jules, though duplicating guidance across parallel files does not guarantee consistency; coordinated or single-source approaches are generally preferable.
Shared markdown files have coordination limitations, especially for real-time or highly concurrent collaboration. Markdown provides no locking mechanism, so concurrent agent writes produce silent overwrites. When agents are expected to reach consensus via shared documents, the dynamics depend heavily on the coordination protocol layered on top.
Claude Code's Agent Teams feature provides native multi-agent support, with a lead agent coordinating work among teammates. In practice, the orchestrator agent tends to bypass delegation and do the work itself, requiring manual intervention.
Production-grade multi-agent coordination for coding remains immature across the industry: no universally adopted inter-agent communication protocol, limited real-time observability, little automated conflict resolution beyond human review, and limited session recovery for in-process agent teams.
When to Use Spec-Driven Development with Claude Code
The following decision framework helps determine when CLAUDE.md-based spec-driven development is appropriate, when lighter approaches suffice, and when additional tooling is needed.
Skip SDD entirely when the task is a single-file bug fix, a formatting change, or a well-understood CRUD operation that a single prompt can handle. Writing a specification for these tasks costs more time than it saves.
Use CLAUDE.md-based SDD when building a multi-file feature across two or more sessions, when phased implementation requires tracking progress across session boundaries, or when multiple developers need consistent AI behavior on the same codebase. Keep CLAUDE.md concise, using a hub-and-spoke structure that indexes to deeper spec files.
Add dedicated SDD tooling when multi-agent coordination is required, when drift detection matters more than manual review can support, or when specifications need to be updated automatically as implementation progresses. Claude Code's CLAUDE.md provides no native drift detection, no concurrency control for multi-agent writes, and no automated spec-to-implementation verification.
The core pattern: CLAUDE.md is a capable spec-delivery mechanism for single-agent, multi-session workflows. Effectiveness depends on how well the specification is written and maintained, not the tool itself. When the workflow demands coordination, verification, or living specifications that adapt as agents complete work, the limitations of static markdown files become the binding constraint.
Choose the Lightest Spec Process That Still Prevents Drift
The gap between what CLAUDE.md provides, and what spec-driven development demands is the gap between a static instruction file and a coordinated development system. For single-agent, multi-session work, CLAUDE.md with disciplined spec authoring can be enough. For workflows that require drift detection, multi-agent coordination, and specifications that update as implementation progresses, the limits of markdown alone become the binding constraint.
Choose the lightest process that still gives the team a dependable source of truth. If the workflow needs only session continuity and shared conventions, CLAUDE.md is often enough. If it needs coordinated agents and ongoing verification against a changing spec, a living-spec system is the more direct next step.
See how a living-spec workflow handles drift detection and multi-agent coordination past what static markdown can enforce.
Free tier available · VS Code extension · Takes 2 minutes
Frequently Asked Questions About Claude Code and Spec-Driven Development
Related Guides
Written by

Ani Galstian
Ani writes about enterprise-scale AI coding tool evaluation, agentic development security, and the operational patterns that make AI agents reliable in production. His guides cover topics like AGENTS.md context files, spec-as-source-of-truth workflows, and how engineering teams should assess AI coding tools across dimensions like auditability and security compliance
