What is the difference between a micro-spec and a user story?

A micro-spec maps 1:1 to a single test case and implementation function. A user story typically contains multiple testable behaviors; one user story decomposes into several micro-specs.

How many micro-specs should a typical feature produce?

The worked example decomposes user authentication into 7. Split further if a micro-spec can't run in parallel, takes more than 2 hours, lacks a clear I/O interface, or requires shared state.

Do micro-specs work with all AI coding agents?

The pattern is agent-agnostic, though agents that re-read full codebase context on each iteration can drift. Pass only the current micro-spec and its API contract to prevent that; GitHub Spec Kit applies the approach across Copilot, Claude Code, and Gemini CLI.

Can AI draft micro-specs from user stories?

Yes, have AI draft from a user story, then refine manually. The human approval gate is critical: AI drafts, human validates before any code is written.

How do micro-specs handle cross-cutting concerns?

They receive their own micro-specs with dependency relationships. Auth middleware (MS07) depends on JWT generation (MS02) but is parallel-safe with the login endpoint (MS05).

What happens when two micro-specs overlap?

It's a decomposition error, signaled when two agents generate conflicting logic. Halt both, split at the When clause ("when the token is valid, return the profile; when expired, return 401" is two micro-specs in one), then assign the new spec a DAG position and restart.

Micro-Specs: The Pattern That Significantly Improves AI Agent Test Coverage in High-Risk Modules

The micro-spec pattern improves AI agent test coverage because it decomposes broad features into atomic contracts that each require one implementation path and one test.

TL;DR

AI agents given broad feature specs skip edge cases and conflate requirements, so test coverage comes out incomplete. For high-risk modules like authentication, payments, and compliance logic, micro-specs compound coverage benefits quickly: each spec mandates one test, failures trace to one spec. For simple CRUD paths with few branching conditions, the overhead rarely justifies the structure.

Why AI Agents Fail on Broad Feature Specs

Engineering teams adopting AI coding agents hit the same wall: the agent generates code that passes a handful of tests, then ships logic that breaks in production. The Cortex 2026 State of AI Benchmark found that as AI pushed pull request volume up 20%, incidents per pull request rose 23.5% and change failure rates climbed roughly 30%, a quality gap the report ties to teams scaling AI faster than their testing and review foundations.

The root cause is structural. AI agents excel at narrow, well-defined tasks but fail at interpreting broad requirements. A 200-line feature spec gives an agent too many degrees of freedom; it generates happy-path code and declares completion precisely where the real work begins.

Micro-specs exploit this by decomposing every feature into atomic contracts: one behavior, one set of acceptance criteria, one test. Augment Cosmos, the unified cloud agents platform, supports this workflow by running the parallel agents that execute each micro-spec in the cloud, with shared context and memory that keep their work aligned across the dependency graph.

What Micro-Specs Are (and What They Replace)

Micro-specs are atomic, single-behavior specifications that AI agents implement and test in isolation, replacing broad feature specs that leave too much room for interpretation. Break work into tasks implementable and testable independently: for example, validate email format on a registration endpoint rather than build authentication.

The formal underpinning is spec-driven development (SDD): specs act as contracts guiding tools and AI agents that generate, test, and validate code. Micro-specs are the atomic units within SDD.

Dimension	Traditional Specs	Micro-Specs
Granularity	Module or feature-level; hundreds of lines	Atomic rule-level; 1-3 sentences, single behavior
Setup time	Low; write once, hand off to developers	Higher upfront; decomposition adds planning time per feature, though it may reduce rework and regeneration cycles in some narrow domains such as security compliance
AI comprehension	Agents miss edge cases or conflate requirements	Agents parse cleanly; each spec produces one test and one implementation unit
Test coverage	Often incomplete; edge cases omitted	Structurally higher; each micro-spec mandates a corresponding test, so coverage scales with spec completeness
Debuggability	Failures require tracing through large spec sections	Failure maps to one micro-spec; instant root cause
Regeneration stability	High drift; AI reinterprets broad specs inconsistently	Low drift; atomic specs produce more consistent output across regenerations
Human review overhead	High; reviewers parse large docs	Low; reviewers validate one micro-spec at a time

The distinction matters because AI agents fail differently from human developers. A human brings contextual judgment to fill gaps; an AI agent interprets loosely, so boundary conditions get skipped. Spelling out each behavior as its own micro-spec is worth the overhead when a missed edge case would surface as a silent production bug.

The Four-Phase Micro-Spec Workflow

The micro-spec workflow turns a broad feature request into a sequence of atomic, verifiable tasks, following the spec-first pattern: Spec First, Decompose, Agent Executes, Tests + Implementation Generated Together.

Phase 1: Write the Main Spec

The main spec defines the feature's purpose, constraints, and success criteria before implementation begins. OpenAI's guide to building AI-native engineering teams emphasizes that "defining high quality tests is often the first step" in enabling agents to implement features reliably.

Phase 2: Decompose into Micro-Specs

Micro-spec decomposition turns the main spec into atomic units that can be tested independently and scheduled safely. Each micro-spec must pass four atomicity criteria:

Independent: can execute in parallel with other micro-specs
Time-bounded: completable in under 2 hours for rapid feedback
Clear I/O: has a defined input and output that can be tested
No shared state: executes without conflicting with other tasks

Each micro-spec includes acceptance criteria (in Given/When/Then format), test requirements, and a position in the dependency graph.

Phase 3: Agents Execute One Micro-Spec at a Time

The difference is observable in test output. A broad spec produces one file with four to six cases. The same feature in seven micro-specs produces seven test files with explicitly scoped assertions, because the spec names each edge case.

Phase 4: Each Produces Tests and Implementation

The final phase generates both implementation and tests, then runs them to close the loop. Failures feed back as the next prompt input. The cycle repeats until every test passes.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT

Worked Example: User Authentication Decomposed into Micro-Specs

A worked authentication example shows how micro-spec decomposition turns one broad requirement into a dependency graph of isolated behaviors and tests. The approach applies the dependency-aware agent orchestration model formalized in multi-agent research.

Top-Level Spec

yaml

name: "User Authentication System"
version: "1.0"
objective: "Secure user login with OAuth integration"
success_criteria:
  - "Authentication completes within 2 seconds"
  - "Token expires after 24 hours"
constraints:
  - "Support 10,000 concurrent users"
  - "Zero-downtime deployment required"

Dependency Graph (Wave Structure)

Micro-specs organize into parallel waves based on their dependencies:

text

Wave 1:
  MS01: Password validation service
  MS02: JWT token generation service
  MS03: Rate limiting middleware
  MS04: DB schema migration (users table)

Wave 2 (depends on Wave 1):
  MS05: Login endpoint integration
    Deps: MS01, MS02, MS03, MS04

Wave 3 (depends on Wave 2):
  MS06: OAuth adapter
    Deps: MS02
  MS07: Auth middleware for protected routes
    Deps: MS02

Micro-Spec MS01: Password Validation Service

markdown

## Micro-Spec: MS01: Password Validation Service

### Intent
Pure validation function: given a plain text password and validation rules config,
return a validation result. No side effects, no external calls.

### Dependencies (DAG)
- Blocked by: None
- Blocks: MS05
- Parallel-safe with: MS02, MS03, MS04

### Constraints
- Always: Enforce minimum 8 characters, require one symbol
- Never: Log or persist the plain text password
- Out of scope: Database access, network calls

### Acceptance Criteria
Given "Str0ng!Pass", When validated, Then return {valid: true, errors: []}
Given "short", When validated, Then return {valid: false, errors: ["MIN_LENGTH"]}
Given "NoSymbolHere123", When validated, Then return {valid: false, errors: ["MISSING_SYMBOL"]}
Given an empty string, When validated, Then return {valid: false, errors: ["REQUIRED"]}

### Test Requirements
- Unit: validation logic with known valid/invalid inputs
- Edge cases: empty string, null, unicode characters, maximum-length strings

Micro-Spec MS02: JWT Token Generation Service

markdown

## Micro-Spec: MS02: JWT Token Generation Service

### Intent
Given a userId, role, and expirationConfig, return a signed JWT string.
Pure function with crypto operations only.

### Dependencies (DAG)
- Blocked by: None
- Blocks: MS05, MS06, MS07
- Parallel-safe with: MS01, MS03, MS04

### Acceptance Criteria
Given userId "user_123" and role "admin", When generated with 24h expiration,
  Then return a valid JWT with correct claims
Given an expired token, When validated,
  Then return {valid: false, error: "TOKEN_EXPIRED"}
Given a token with an invalid signature, When validated,
  Then return an error indicating an invalid signature
  (for example, {valid: false, error: "invalid signature"})

markdown

## Micro-Spec: MS05: Login Endpoint Integration

### Intent
HTTP endpoint accepting credentials, orchestrating validation, rate limiting,
and token generation.

### Dependencies (DAG)
- Blocked by: MS01, MS02, MS03, MS04
- Blocks: None (leaf node)

### Acceptance Criteria
Given valid credentials, When POST /login, Then return 200 + {token, expiresAt}
Given invalid password, When POST /login, Then return 401 + {error: "INVALID_CREDENTIALS"}
Given rate-limited IP, When POST /login, Then return 429 + {retryAfter: seconds}
Given missing Authorization header, When POST /login, Then return 400 + {error: "AUTH_HEADER_MISSING"}

### Approval Gate
REQUIRED: touches auth logic

Each micro-spec maps to exactly one test file. MS01 produces MS01_password_validation.test.ts. MS05 produces MS05_login_endpoint.test.ts. When a test fails, the micro-spec ID in the test name provides instant root-cause identification.

Coordinating Parallel Work

Parallel execution requires that concurrently executing agents be independent with respect to inputs, outputs, and dependencies. On Cosmos, the agents run in the cloud over a shared filesystem with tenant memory, so task status and downstream handoffs stay aligned as each micro-spec moves through its wave.

How AI Agents Behave Differently with Micro-Specs

Micro-specs change agent behavior by reducing interpretation space, increasing test specificity, and localizing failures. The differences below are observable in four recurring dimensions.

Test generation completeness: A traditional-spec agent generates a handful of tests for the happy path. A micro-spec agent generates one test per spec, with each assertion explicitly scoped.
Code coherence: Traditional specs cause agents to merge behaviors into one path. Micro-specs constrain the agent to one behavior at a time, so code stays modular.
Regeneration stability: Research on constitutional spec-driven development found that encoding explicit security constraints in the specification layer cut security defects by 73% in a banking microservices case study, compared with unconstrained AI generation. That finding supports the micro-spec approach of atomic, testable specifications. Living specs support this mechanism by preserving a single source of truth across regeneration cycles.
Error localization: When a test named test_MS03_rate_limiting_429 fails, the root cause is MS03's rate limiting logic. Traditional spec failures require tracing through large spec sections to isolate which requirement broke.

Practical Techniques for Implementing Micro-Specs

Three implementation techniques make micro-specs operational: a spec template, a prompt template, and a directory convention with CI enforcement.

The Micro-Spec Template

markdown

## Micro-Spec: [MS-ID] [Behavior Name]

### Intent
As a [role], I need [capability] so that [outcome].

### Dependencies (DAG)
- Blocked by: [MS-IDs]
- Blocks: [MS-IDs]
- Parallel-safe with: [MS-IDs]

### Constraints (Always / Never)
- Always: [non-negotiable rules]
- Never: [explicit exclusions]

### Acceptance Criteria
Given [precondition], When [trigger], Then [outcome]
Given [invalid input], When [action], Then [error + status]

### Test Requirements
- Unit: [what to test in isolation]
- Edge cases: [boundary values, nulls, overflow]

### Definition of Done
☐ All acceptance criteria pass
☐ Tests pass (unit + integration where applicable)
☐ No secrets or PII in logs

Agent Prompt Template

text

For the micro-spec below, generate:
1. One function implementing the behavior.
2. One test case validating it using Arrange-Act-Assert.
3. One sentence explaining the edge case covered.

Phase 1: Read the spec. List all files that will change. Assess complexity. Output a todo list.
Phase 2: Grep for every method, model, and constant you intend to use. Confirm they exist.
Phase 3: Write failing tests. Run them. Confirm they fail.
Phase 4: Write the implementation. Run tests. All tests must pass before marking complete.

Directory Structure and CI Enforcement

text

project/
├── specs/
│   ├── MS01_password_validation.md
│   ├── MS02_jwt_generation.md
│   └── MS05_login_endpoint.md
├── tests/
│   ├── MS01_password_validation.test.ts
│   ├── MS02_jwt_generation.test.ts
│   └── MS05_login_endpoint.test.ts
└── .github/workflows/
    └── spec-traceability.yml

The CI gate script checks that every spec file has a corresponding test file:

bash

#!/bin/bash
EXIT_CODE=0
for spec in specs/*.md specs/*.yaml; do
  id=$(basename "$spec" | cut -d_ -f1)
  if ! ls tests/${id}_*.test.* 1>/dev/null 2>&1; then
    echo "ERROR: No test for spec: $spec"
    EXIT_CODE=1
  fi
done
exit $EXIT_CODE

When this gate runs with continue-on-error set to false, no PR merges without a passing test for every micro-spec. Coverage becomes a structural property of the pipeline.

Anti-Patterns That Break the Micro-Spec Pattern

Micro-specs fail when teams violate the pattern's constraints. The following issues have been discussed in relation to micro-spec or spec-driven development.

Open source

augmentcode/augment.vim★608

Star on GitHub

Anti-Pattern	Why It Fails	Fix
Writing micro-specs after code	Specs become documentation, not drivers; tests remain incomplete	Write specs before any implementation begins; spec approval gates block premature coding
Grouping multiple behaviors in one micro-spec	Agent merges requirements; coverage drops	Split until each micro-spec has exactly one When clause in its acceptance criteria
Letting the agent write both spec and tests	Circular validation: agent generates tests from the same mental model that produced the bug	Keep spec authoring and test generation in separate agent contexts; share only the API contract. Related spec automation materials describe living specs, isolated workspaces, and spec-based verification as ways to support this separation rather than relying only on prompt discipline.
No CI enforcement	Developers skip micro-spec tests under deadline pressure; coverage erodes	Remove continue-on-error: true from test workflow steps; require checks on branch protection
Overly abstract micro-specs	"Handle errors properly" gives the agent too much interpretation space; tests become flaky	Define exact error codes and HTTP status codes: "Return 401 with body {error: 'TOKEN_EXPIRED'}"
Storing micro-specs in code comments	Comments lack structured format; agents may not parse them as actionable specifications during generation	Store in dedicated .md files in the /specs/ directory; feed explicitly to the agent as input

Circular validation is the most dangerous of these: the structural fix is to separate code-writing and test-writing contexts so agents share only the spec and API contract.

Tradeoffs and Limitations

Micro-specs improve coverage and regeneration stability, but the pattern introduces costs that teams should evaluate before adopting it broadly.

Tradeoff	Impact	Mitigation
Spec fragmentation	As features grow, the number of micro-specs can become difficult to track manually	Use dependency graphs (DAG model) and a platform that schedules parallel agent work; Cosmos runs the agents in the cloud with durable Sessions that keep each run auditable and isolated
Over-specifying trivial logic	Writing micro-specs for simple getters or pass-through functions creates busywork without improving coverage on logic that actually fails	Apply micro-specs selectively to high-risk modules (validation, auth, payments, compliance); skip trivial CRUD with no branching logic
Human authoring overhead	Decomposing a feature into atomic micro-specs adds upfront planning time proportional to feature complexity	Have AI draft initial micro-specs from user stories, then refine manually; in regulated domains, encoding security constraints directly in the spec follows the security-by-construction approach from constitutional spec-driven development research
Agent misinterpretation of atomicity	Agents sometimes treat a micro-spec as broader than intended, generating code that overlaps with adjacent micro-specs	Enforce the "one When clause" rule from the anti-patterns section; include explicit "Out of scope" constraints in each micro-spec template

A fifth tradeoff is test brittleness at scale. A refactored function signature propagates failures across every micro-spec test that calls it. The mitigation: scope acceptance criteria to observable behavior. Tests asserting "given X input, return Y output" survive refactoring far better than tests asserting internal implementation details.

When Micro-Specs Outperform Broad Specifications

Micro-specs outperform broad specifications when the failure cost of missed edge cases is higher than the cost of decomposition. Five scenarios consistently favor micro-specs over broader approaches.

AI-generated CRUD APIs: Micro-specs per endpoint ensure request validation, response format, and error cases get dedicated tests. The decision threshold: if an endpoint has more than three distinct error states, micro-specs outperform a broad spec.
Complex validation logic: Each validation rule becomes one micro-spec with one test. A payment module decomposes into micro-specs for Luhn checks, expiry date validation, CVV length, and amount bounds. The practical signal: if a bug in a single rule would cause a silent production failure, it warrants a micro-spec.
Multi-agent workflows: When a multi-agent workflow separates planning, coding, and testing, coordination overhead drops. Cosmos runs these agents in the cloud, each scoped as an Expert to one stage and sharing context through the platform's filesystem and tenant memory, so work stays aligned as agents complete it.
Compliance auditing: Each regulatory requirement maps to one micro-spec, one test, and one traceability matrix row. A HIPAA audit logging module decomposes into micro-specs for PHI event capture, timestamp formatting, retention policy, and tamper-detection hashing.
Flaky CI pipelines: Atomic specs force deterministic edge-case tests. Broad specs let agents generate non-deterministic test structures across regenerations; micro-specs eliminate that interpretation space.

How a Cloud Agents Platform Scales Micro-Spec Decomposition

Micro-spec decomposition scales when dependency state stays current as agents complete, fail, or unblock work. Static spec documents cannot track that moving state, and neither can a folder of agents running side by side with partial context.

This is the gap Augment Cosmos closes. Cosmos runs the parallel agents in the cloud over a shared filesystem with tenant and private memory, so the patterns and corrections from one micro-spec carry into the next wave instead of getting lost between runs. Durable Sessions keep each agent's work auditable and replayable, and the platform enforces the human review checkpoints a team defines.

Underneath, Cosmos runs on Augment's Context Engine, which gives each agent architectural awareness across the full codebase through semantic search, so downstream agents like MS05 can discover the interfaces built by MS01 and MS02 without manual search.

Start with One High-Risk Module

A high-risk module is the right place to start because missed edge cases are most expensive there. A module qualifies if a bug would be silent, irreversible, or compliance-exposing. Authentication, payments, and compliance logging consistently qualify. Measure the coverage delta before expanding to the next module.

Micro-Specs: The Pattern That Significantly Improves AI Agent Test Coverage in High-Risk Modules

TL;DR

Why AI Agents Fail on Broad Feature Specs

What Micro-Specs Are (and What They Replace)

The Four-Phase Micro-Spec Workflow

Phase 1: Write the Main Spec

Phase 2: Decompose into Micro-Specs

Phase 3: Agents Execute One Micro-Spec at a Time

Phase 4: Each Produces Tests and Implementation

The New Code Review Workflow for AI-Native Engineering Teams

Worked Example: User Authentication Decomposed into Micro-Specs

Top-Level Spec

Dependency Graph (Wave Structure)

Micro-Spec MS01: Password Validation Service

Micro-Spec MS02: JWT Token Generation Service

Coordinating Parallel Work

How AI Agents Behave Differently with Micro-Specs

Practical Techniques for Implementing Micro-Specs

The Micro-Spec Template

Agent Prompt Template

Directory Structure and CI Enforcement

Anti-Patterns That Break the Micro-Spec Pattern

Tradeoffs and Limitations

When Micro-Specs Outperform Broad Specifications

How a Cloud Agents Platform Scales Micro-Spec Decomposition

Start with One High-Risk Module

Frequently Asked Questions

Written by

Ani Galstian

Give your codebase the agents it deserves

TL;DR

Why AI Agents Fail on Broad Feature Specs

What Micro-Specs Are (and What They Replace)

The Four-Phase Micro-Spec Workflow

Phase 1: Write the Main Spec

Phase 2: Decompose into Micro-Specs

Phase 3: Agents Execute One Micro-Spec at a Time

Phase 4: Each Produces Tests and Implementation

The New Code Review Workflow for AI-Native Engineering Teams

Worked Example: User Authentication Decomposed into Micro-Specs

Top-Level Spec

Dependency Graph (Wave Structure)

Micro-Spec MS01: Password Validation Service

Micro-Spec MS02: JWT Token Generation Service

Micro-Spec MS05: Login Endpoint Integration

Coordinating Parallel Work

How AI Agents Behave Differently with Micro-Specs

Practical Techniques for Implementing Micro-Specs

The Micro-Spec Template

Agent Prompt Template

Directory Structure and CI Enforcement

Anti-Patterns That Break the Micro-Spec Pattern

Tradeoffs and Limitations

When Micro-Specs Outperform Broad Specifications

How a Cloud Agents Platform Scales Micro-Spec Decomposition

Start with One High-Risk Module

Frequently Asked Questions

What is the difference between a micro-spec and a user story?

How many micro-specs should a typical feature produce?

Do micro-specs work with all AI coding agents?

Can AI draft micro-specs from user stories?

How do micro-specs handle cross-cutting concerns?

What happens when two micro-specs overlap?

Related

Written by

Ani Galstian

Give your codebase the agents it deserves