Cursor is a strong execution environment for spec-driven development but a poor SDD system. It can consume specs and generate code from them, but it provides no native spec lifecycle management, no structured spec-to-task traceability, and no enforcement that agents stay aligned to a specification as the codebase evolves.
TL;DR
Cursor's rules system injects instructions into the system prompt without versioning specs, tracing requirements to code, or validating that generated output conforms to a specification. Single-session tasks work fine in Cursor alone. Multi-session features, team workflows, and agent-driven development require either external tooling (Spec Kit, OpenSpec) or a purpose-built SDD environment like Intent, where living specs and coordinated agents replace the structural gap.
What SDD Actually Requires From a Toolchain
Spec-driven development shifts the unit of delivery from the codebase to the specification itself. The reorientation InfoQ describes is direct: the spec becomes the authoritative source, and every downstream artifact traces back to it. Thoughtworks placed SDD on their Technology Radar as an emerging approach to AI-assisted coding workflows.
If you're new to the methodology, the foundational SDD guide covers core concepts. If you're already building AI-powered spec workflows, the underlying patterns are covered in depth there.
Five capabilities determine whether a toolchain can support real SDD. Persistent project memory and contract validation are the highest-impact gaps because their absence accumulates cost across every session. Formal spec format and gated phase transitions matter more for teams than for solo developers.
| SDD Requirement | What It Means in Practice | Cursor Support |
|---|---|---|
| Formal spec format | Machine-readable structure (EARS, OpenAPI, GIVEN/WHEN/THEN) that agents parse deterministically | ❌ None; rules are freeform markdown |
| Version-controlled spec artifacts | Specs in git as first-class artifacts with lifecycle states (Draft → Approved → Implemented) | ⚠️ Partial; rules version in git but have no lifecycle states |
| Gated phase transitions | Agent cannot implement until humans approve the spec and plan | ❌ None; agent acts immediately |
| Contract testing and conformance validation | Automated checks that implementation matches the specification | ❌ None |
| Persistent project memory | Agent loads full spec context at session start without re-explanation | ❌ None; manual @file reference each session |
This is the gap Intent was built to close.
Specs in Intent live as the source of truth for parallel agents, with a coordinator decomposing work and a verifier checking results against the spec.
Free tier available · VS Code extension · Takes 2 minutes
What Cursor Actually Gives You for SDD
Cursor offers three primary mechanisms for managing instructions and context, each with a specific SDD limitation.
Rules System fails as constraint enforcement. Cursor's Rules documentation describes Rules as persistent project-level context, stored as .mdc files in .cursor/rules/ with four behaviors: always-applied, glob/file-scoped, intelligently applied, and manual. The SDD problem: rules are injected into the system prompt as advisory text, so the model treats them as suggestions rather than gating constraints. No mechanism exists to block agent action when a rule is violated.
Plan Mode acknowledges drift but does not prevent it. Cursor's Plan Mode documentation tells you that agents sometimes build something that doesn't match what you wanted, and recommends returning to the plan rather than chaining follow-up prompts. Plans save as Markdown. The SDD problem: drift recovery is manual (revert, refine, re-run), and plans have no link to a versioned spec that the agent must satisfy.
@Context References surface specs unreliably. @Files includes specific files with their full contents, subject to context limits. @codebase retrieves chunks by semantic similarity, so a spec document only surfaces if the prompt happens to be semantically close to its content. @Docs indexes public URLs only; private docs are not supported. Specs need deterministic, complete loading rather than similarity-based retrieval.
When I tested these three mechanisms on a multi-file authentication refactor spanning three sessions, the rules stayed active but the agent's interpretation drifted. By session two, it had stopped following the team's "use Result types, not exceptions" rule on new code. The plan from session one was not loaded in session two without manual @-reference, so the agent re-derived an approach that conflicted with completed work. The @codebase reference pulled different fragments of the same auth spec depending on whether I prompted with "validate token" or "check session expiry."
The Pseudo-Spec Problem: Why Cursor Rules Are Not Specifications
The most common mistake in Cursor-based SDD workflows is treating .cursor/rules/ files as specifications. The structural gap is covered above. The more practical question is what happens when a team conflates the two.
Three concrete consequences:
- No audit trail. When a generated feature contradicts a stated requirement, you have no way to determine whether the rule was loaded, ignored, or never written.
.cursorrulescompliance is inconsistent: sometimes rules fire, sometimes they don't, with no observable signal either way. - No review workflow. Rules edits ship like any other code change with no approval state, no "this rule governs feature X" link, and no diff against an authoritative requirement.
- No onboarding artifact. A new engineer reading
.cursor/rules/cannot tell which rules encode hard requirements, which capture team preferences, and which are historical accidents.
I watched someone build a markdown DSL for UI components that worked around the unreliability by mapping structural layout tokens deterministically to flex-column logic, so the LLM's text-parsing strength carried the load rather than its instruction-following. That pattern works for UI scaffolding because the output space is bounded and visually verifiable. It fails for business logic because acceptance criteria, state transitions, and invariants cannot be encoded as parseable layout tokens. The agent has to reason about behavior, and rules-as-suggestions reasoning produces inconsistent output.
Where Cursor SDD Workflows Break in Practice
Failure 1: Spec Memory Does Not Persist Across Sessions
The single largest failure mode in Cursor-based SDD is the absence of persistent spec context. Three symptoms surface from one root cause:
- Context window exhaustion crowds out spec content. When the context fills, Cursor's dynamic context discovery uses a summarization step that compresses the context, but the agent's knowledge degrades after summarization since it's a lossy compression. The practical workaround: pass specs to a new session and continue there, treating session restarts as a routine workflow step.
- Sessions do not carry forward. When custom modes and memory features were removed in v2.1, workflows that depended on persistent, spec-driven modes broke. I consistently hit 15-20 minutes of re-explanation overhead per session just to get the agent back to where the previous session ended.
- Agent drift accumulates silently. Agents take the shortest path to "done," skipping the steps experienced engineers rely on: checking for similar code, following conventions, writing tests. Without a spec the agent is structurally bound to load on every session, plausible-looking code that violates earlier requirements accumulates.
Intent's living spec eliminates the root cause. The coordinator loads the spec at session start, agents read and write to it as they work, and updates propagate to all active agents when requirements change.
Failure 2: Rules Fail Silently
Cursor's rule mechanism has documented fragility. A 3.0.16 regression reported on the Cursor forum silently downgraded all alwaysApply: true rules to "requestable," breaking automatic rule injection and forcing you to rely on explicit @ references as a workaround. A version update removed the only mechanism for reliable context injection, with no warning and no change to the rule file itself. With rules treated as advisory context, no observable signal indicates when a rule fires versus when it is silently dropped.
Failure 3: No Spec-to-Code Validation
Validating implementation against the spec remains a manual slog. A SANER registered report on specification-driven code generation argued that AI coding tools lack a structured approach to requirements specification and testing. I hit this wall myself on a medium-sized feature: specs get written, the agent writes the code, but no automated check confirms the implementation satisfies the spec. Cursor includes IDE error checking, test iteration, and agent verification loops, but none of them validate against an authoritative spec. They check type signatures, test assertions, and the agent's own memory of the prompt. Intent's verifier agent closes this loop by running spec acceptance criteria against generated output before handoff.
Failure 4: Team-Scale Coordination Breaks
Team coordination is where the rules-as-spec model breaks most expensively.
- Worktrees prevent file conflicts but not semantic conflicts. Two agents working in isolated git worktrees can produce individually coherent changes that contradict at integration. Agent A updates the user service to emit a new event payload while agent B updates the notification service to consume the old payload, and both PRs pass tests in isolation.
- Rules files do not scale. Rules can grow to over 1,500 lines, mixing coding standards, business invariants, and project history. Newer rules get less weight as context fills, and no team-visible state shows which rules are active for a given task.
- Spec governance moves outside the IDE. I watched an enterprise team run full spec-driven TDD with Cursor where the entire spec layer sat outside Cursor: Speckit connected to Notion for documentation and Jira for task management, with Cursor only as the execution layer. The team was hiring analysts for UML-style diagrams, and developers were close to quitting because the workflow felt like "not programming" anymore.
Cursor has no shared spec state for parallel agents to reference. Each agent loads its own context, applies rules independently, and produces output without knowing what sibling agents are committing in adjacent worktrees. Teams already running multiple agents will find deeper coverage in the multi-agent system guide and the multi-agent coding workspace guide.
Intent removes this organizational tax by consolidating the spec layer, the agent execution layer, and the verification layer into a single workspace. The coordinator routes work to specialist implementor agents, the verifier validates results against the living spec, and every agent reads from and writes to the same source of truth.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
How to Fill the Gaps: Three Patterns
Every serious SDD practitioner working with Cursor has built a second layer on top of it or moved to an environment built around the spec. The patterns vary by team size and governance requirements.
Pattern 1: Minimal (Solo Developer, OpenSpec)
OpenSpec adds three commands to Cursor: /openspec-proposal drafts a change proposal with proposal.md, tasks.md, and spec deltas; /openspec-apply implements against the agreed spec; /openspec-archive archives completed specs. Setup requires npm install -g @fission-ai/openspec@latest and openspec init. No API keys required.
When this works: Solo developer, brownfield codebase, features that span 2-3 sessions. Overhead is low enough that spec creation does not bottleneck a single-developer workflow.
When this breaks: No project-wide governance equivalent to /speckit.constitution, no automated drift detection between sessions, no team-shared spec state.
Pattern 2: Team (Spec Kit)
GitHub's open-source Spec Kit includes /speckit.specify for documenting requirements, /speckit.plan for generating an implementation plan, /speckit.tasks for creating a dependency-aware task breakdown, and /speckit.implement for executing against the agreed plan. The /speckit.constitution command encodes non-negotiable project principles. A side-by-side breakdown of governance models is available in the Intent vs. GitHub Spec Kit comparison.
Spec Kit does not ship with built-in drift-reconciliation tooling. A community-proposed /speckit.reconcile command and a third-party spec-kit-reconcile extension update feature specs based on gap reports, but both are unofficial. Birgitta Böckeler observes on martinfowler.com that because all artifacts live directly in the workspace, Spec Kit is the most customizable of the SDD tools examined, which is also why teams end up extending it.
When this works: Teams needing shared governance and medium-to-large features. Some teams use Spec Kit only to generate markdown templates, filling them manually to reduce token consumption while preserving the structured artifact format for review.
When this breaks: A known cursor-agent bug causes slash commands to be ignored or have arguments dropped when using cursor-agent as the CLI agent. Practitioners report 1-3+ hours of overhead per feature once review cycles are included. Static specs that do not update during implementation create drift on longer tasks unless teams pull in community extensions.
Pattern 3: SDD-First (Intent)
Intent takes a different architectural approach: living specs that update bidirectionally as agents complete work, a coordinator agent that decomposes specs into parallel tasks, and a verifier agent that checks results against the spec before handoff.
Intent supports BYOA (bring your own agent), so teams already paying for Claude Code, Codex, or OpenCode can run those agents inside Intent's coordinated workspace.
When this works: Teams running parallel agents across multiple services, features where requirements evolve during implementation, and brownfield codebases where spec-implementation alignment must be enforced. The brownfield SDD guide covers adaptation patterns for existing codebases.
When this breaks: Intent is currently a macOS desktop app, so Windows and Linux teams will need to wait. Solo developers on small features will find the coordinator-implementor-verifier model heavier than they need; OpenSpec is a better fit at that scale. Teams that want a pure IDE rather than a workspace should review the agentic IDE vs. agentic development environment comparison before switching.
Decision Framework: When Cursor Is Enough vs. When It Is Not
Workflow scope determines the right tool. The sharpest distinction I've found: Cursor is optimized for human-driven "vibing," with great autocomplete and human steering, which explains its popularity in pair-programming workflows. Spec-first environments like Intent are optimized around the spec itself, which turns out to be more effective for driving agents. SDD overhead is front-loaded; vibe-coding overhead is distributed across every session and often invisible until integration.
Three diagnostics decide which side of the line your workflow falls on:
- Re-explanation tax. If you re-explain decisions at the start of each session, Cursor's native memory model is already costing you time.
- Cross-session contradiction. If generated code contradicts requirements approved two sessions ago, Cursor's native spec handling is too weak for the job.
- Multi-agent or multi-teammate alignment. If multiple agents or teammates need the same source of truth, rules files no longer suffice.
| Your Situation | Recommendation | Reason |
|---|---|---|
| Single-session, single-file interactive tasks | Cursor alone (Agent mode) | Spec overhead adds friction without payoff; Cursor's autocomplete and inline agent are optimized for this |
| Multi-file features spanning 2-3 sessions | Cursor + OpenSpec | Lightweight spec layer prevents drift between sessions; three commands, no API keys |
| Multi-session features needing team alignment | Cursor + Spec Kit | Constitution file encodes team standards; community drift-reconciliation extensions help close gaps; governance at the cost of 1-3 hour overhead per feature |
| Team workflow with shared product/design/eng artifacts | Intent | Cursor Rules are not traceable requirements; Intent provides team-visible spec state, automated verification, and living specs |
| Parallel agent execution across services | Intent for orchestration; Cursor optionally for editing | Cursor's parallel agents share no spec context; Intent's coordinator maintains alignment across implementors through a shared living spec |
Choose a Spec Layer Before Your Next Multi-Session Feature
The real bottleneck in AI-assisted development is problem decomposition, well ahead of typing speed or agent parallelism. The hard part is embedding enough context in each task that the agent produces correct code rather than plausible-looking output. Cursor solves the execution problem well. The spec problem, the one that determines whether execution produces the right output, requires tooling that Cursor does not yet provide.
For single-session work, Cursor alone is the right choice. For multi-session features, OpenSpec or Spec Kit creates a structured spec layer on top of Cursor. For team-scale workflows where specs need to stay alive as parallel agents work across services, the gap calls for an environment built around the spec.
Intent treats the living spec as the coordination layer across parallel agents. Specs update as agents build, the verifier checks results against the spec, and requirements propagate to all active agents when they change.
Free tier available · VS Code extension · Takes 2 minutes
FAQ
Related
- 6 Best Spec-Driven Development Tools for AI Coding in 2026
- 8 Best AI Tools for Spec-Driven Development
- 7 Best AI Agent Observability Tools for Coding Teams in 2026
- DIY Multi-Agent Setups vs. Intent: Build or Buy for Agent Orchestration
- Claude Code vs Intent: Single-Session Agent or Multi-Agent Orchestration?
Written by

Paula Hingel
Paula writes about the patterns that make AI coding agents actually work — spec-driven development, multi-agent orchestration, and the context engineering layer most teams skip. Her guides draw on real build examples and focus on what changes when you move from a single AI assistant to a full agentic codebase.