Skip to content
Install
Back to Guides

Steve Yegge's 8 Levels of AI Development: Where's Your Team?

Apr 20, 2026
Paula Hingel
Paula Hingel
Steve Yegge's 8 Levels of AI Development: Where's Your Team?

The 8 levels of AI-assisted development, as defined by Steve Yegge, map a spectrum from zero AI usage through autocomplete and chat assistants to full agent orchestration, with each level representing a distinct shift in developer trust, tooling, and daily workflow.

TL;DR

Most engineering teams operate at Levels 1-3, where AI shows up as autocomplete, chat, or inline edits. Agentic IDEs push some teams to Levels 4-5. Levels 6-8 require a structural shift from single-agent coding to parallel agent orchestration with spec-driven delegation, along with stronger verification, intent articulation, and coordination skills.

Mapping the Spectrum from Autocomplete to Agent Fleets

Steve Yegge's 8-level framework ranges from no AI use to orchestrating multiple agents at once. In a recent conversation with Gergely Orosz, he put AI coding approaches on a spectrum, and in his essay on coding agents he tracks a progression where trust in the agent gradually increases from zero to the point where it takes over the IDE, spills into the CLI, and then multiplies from there.

The framework works best as a diagnostic. A linear 1-to-8 progression implies every team should keep climbing, but climbing has real costs: verification burden, token spend, and the orchestration skills required to manage parallel agents safely. Teams in small, well-factored codebases often get most of the available ROI at Level 3 and pay a tax if they push further. The framework earns its keep for teams that have plateaued and want to understand what structural change is required to move.

Intent, the workspace built for spec-driven agent orchestration, is designed for teams ready to move beyond the single-agent ceiling that caps Levels 1-5.

Explore how Intent coordinates parallel agents through living specs so teams can progress past the single-agent ceiling.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

Yegge's 8 Levels: The Complete Spectrum

The definitions below draw on Yegge's framework as discussed by The Pragmatic Engineer and covered by O'Reilly.

LevelNameTrust StateWhere the Developer Works
1No AINoneIDE, writing all code
2IDE Agent, Permissions OnLowIDE, carefully reviewing
3IDE Agent, YOLO ModeGrowingIDE, less friction
4Watching Agent, Not DiffsModerateIDE, conversation-focused
5CLI-First, IDE AbandonedHighTerminal/CLI
6Several Agents in ParallelHighMultiplexing agents
710+ Agents by HandHigh (frustrated)Juggling contexts
8Custom OrchestratorFullDirecting agent infrastructure

The critical architectural break happens at Level 5. The distinction maps directly to Addy Osmani's conductor-to-orchestrator transition: the conductor model gives you one agent working synchronously against your context window, while the orchestrator model gives you multiple agents with their own context windows, working asynchronously while you plan and check in. Crossing that line requires restructuring how work is decomposed, delegated, and verified.

Levels 1-3: Autocomplete, Chat, and Inline Edits

Levels 1 through 3 differ on three axes: who initiates the interaction, where AI output lands, and whether the developer has to manually bridge the gap between suggestion and code. Each level has its own stuck point.

Level 1: Autocomplete

The developer types; the tool watches editing context and surfaces "ghost text" suggestions accepted with Tab or dismissed with Esc. GitHub Copilot offers ghost text completions and next-edit predictions; Tabnine and Amazon Q offer similar inline experiences.

Stuck point: Accept-fatigue. Suggestions arrive on every keystroke, and developers start tab-completing reflexively. Quality regressions go unnoticed until code review, because the feedback loop sits inside the typing rhythm.

Level 2: Chat Assistants

The developer writes a prompt in a side panel, receives a response, and manually copies or applies the output. GitHub Copilot's Ask Mode operates at this level within the IDE; ChatGPT and Claude web chat function at Level 2 without IDE integration or persistent project context.

Stuck point: Context loss at the copy-paste boundary. Every translation from chat window back to editor drops information the model had and introduces transcription errors.

Level 3: Inline Edits

The AI writes directly to the file. The developer selects code, issues a natural language instruction, and the AI modifies the code in place. GitHub Copilot's Edit Mode and Tabnine's inline actions both operate here.

Stuck point: Scope. Inline edits work well for single-function changes and poorly for cross-file refactors where the model cannot see the dependent call sites. Level 3 is often the right ceiling for small codebases; larger codebases hit this limit fast and either move to Level 4 or regress to Level 2.

CapabilityLevel 1Level 2Level 3
Developer initiates?No (passive)Yes (prompt)Yes (select + instruct)
AI writes to file?No (ghost text)No (side panel)Yes (in-file diff)
Manual bridging?Accept/reject onlyCopy/paste requiredReview diff, accept/reject
Typical toolsCopilot, Tabnine, Amazon QChatGPT, Claude chat, Copilot Ask ModeCopilot Edit Mode, Tabnine Inline Actions

Surveys from The Pragmatic Engineer place the majority of engineers at Levels 1-2.

Levels 4-5: Agent Mode and Multi-File Changes

The developer stops authoring code character by character and starts directing an agent that can read, write, and run code across multiple files. Trust increases, diff review decreases, and the conversation itself becomes the primary interface.

Level 4: Watching the Agent, Not the Diffs

Developers stop inspecting every diff and start watching what the agent is doing, letting more code through while focusing on the conversation. Attention shifts from asking whether the code is correct toward asking whether the agent is headed in the right direction.

Cursor 3 moved Cursor toward a unified workspace built around agents that can autonomously explore codebases, edit multiple files, run commands, and fix errors. The tradeoff: agent mode degrades on large monorepos where the index cannot fit relevant context, leading to confident edits based on incomplete understanding.

Windsurf Cascade uses Flow Awareness to track developer actions, including edits, commands, and clipboard contents, to infer intent without requiring the developer to restate context. The tradeoff is surveillance surface: teams in regulated industries often disable Flow Awareness features because the same signals that help the agent also expose sensitive data.

GitHub Copilot Agent Mode operates in VS Code as an autonomous peer programmer that responds to compile and lint errors, monitors test output, and auto-corrects in a loop. The tradeoff: the auto-correct loop can burn substantial tokens on wrong-path tasks before a human intervenes, and the cost stays invisible until the bill arrives.

Level 5: CLI-First, IDE Abandoned

The developer has moved out of the IDE as the primary workspace. Yegge's characterization is direct: developers just want the agent and will look at the code in the IDE later.

GitHub Copilot's coding agent exemplifies this level. A developer assigns an issue, Copilot opens a draft pull request, works asynchronously in a GitHub Actions environment, and requests review when complete. CLI tools like Aider operate with git-native atomicity, where every AI edit is automatically committed. The tradeoff with atomic commits is history hygiene: a day of agent work produces dozens of micro-commits that must be squashed before merge.

Some Level 5 workflows extend the loop further into CI/CD. Reports on agents in CI pipelines describe AI agents operating in sandbox environments for pull requests, navigating codebases, running CLI commands, and analyzing syntax trees before supporting human review.

Levels 6-8: Orchestration, Parallel Agents, and Spec-Driven Delegation

Levels 6-8 represent a categorically different mode of development. Andrej Karpathy's Verifiability essay argues that in the new programming paradigm, the tasks most amenable to automation are those where outputs can be verified, which pushes the developer's job toward specifying objectives and checking results rather than writing every line directly.

Level 6: Several Agents in Parallel

Yegge frames the trend around running multiple AI agents in parallel and orchestrating them. Reports from teams inside OpenAI Codex describe engineers running several agents simultaneously, typically in the single-digit range per developer. The workflow shifts from sequential task completion to what Osmani calls the factory model: spin up many agents in parallel, where one handles a backend refactor, another implements a feature, and another writes integration tests.

Intent approaches this shift through a structured agent model. A Coordinator Agent analyzes the codebase and delegates to Implementor Agents executing tasks in parallel waves, while a Verifier Agent checks results against the living spec. That role separation addresses the coordination failures documented across multi-agent failure taxonomies, where parallel agents break down without clear role boundaries.

See how Intent's agent model keeps parallel execution coordinated through structured roles.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline
···
$ cat build.log | auggie --print --quiet \
"Summarize the failure"
Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42
Fix: npm install lodash @types/lodash

Level 7: 10+ Agents, Managed by Hand

Coordination quickly becomes confusing and error-prone at this scale. Manual management produces a consistent set of failure modes:

  • Spec drift across agents. Without a shared living spec, each agent works from the prompt it was given, and the specs diverge silently. Two agents end up implementing incompatible versions of the same interface.
  • Duplicated work. Agents assigned adjacent tasks often reimplement utilities their neighbors already wrote, because no shared index tracks what has been completed.
  • Merge conflict storms. Ten agents writing to overlapping files in the same branch produce conflicts that take longer to resolve than the original work would have taken to write by hand.
  • Review collapse. Human reviewers cannot keep up with ten parallel PR streams. Review becomes rubber-stamping, and defects that would have been caught at Level 5 ship at Level 7.

Microsoft's internal Project Societas produced 110,000+ lines of code that was 98% AI-generated, with human work shifting from authoring to directing. That scale is unreachable without coordination infrastructure.

Level 8: Build Your Own Orchestrator

Yegge describes this as the point where developers build their own orchestrator. His Gas Town project, a Go-based orchestrator for Claude Code that can manage 20-30 agents in parallel, adds three capabilities on top of raw agent calls: a shared task queue that prevents duplicated work, a coordinator process that assigns tasks based on agent availability, and checkpointing so that a crashed agent can be resumed without losing state. Those are the minimum primitives any Level 8 system needs.

Intent provides those primitives as a product and adds resumable workspace sessions so that a team can pause a multi-agent project and pick it back up the next day without re-seeding context. Living specs sit at the center as the single source of truth, auto-updating as agents complete work, which removes the spec drift that derails Level 7.

CapabilityLevel 6Level 7Level 8
Agent count2-5 parallel10+Fleet-scale
CoordinationAd-hoc multiplexingManual (error-prone)Systematic orchestration
Spec managementInformalFragmented across agentsLiving specs, single source of truth
VerificationPer-agent reviewOverwhelmedAutomated verification pipeline
ToolsClaude Code swarms, Codex parallelManual terminal managementIntent, Factory.ai, custom orchestrators

Osmani's six-step production line defines the orchestration-level workflow: Plan, Spawn, Monitor, Verify, Integrate, and Retro. Verification has become the bottleneck, taking over from generation, and that is the gap Intent's Verifier Agent is designed to close.

Self-Assessment: Where Is Your Team Today?

Score each statement from 0, never, to 2, consistently. The score works as a directional indicator: a team with a 12 is somewhere around Level 5, rather than precisely at it.

#StatementScore (0-2)
1Team members use AI autocomplete or chat daily
2Developers accept AI suggestions without reviewing every line
3AI edits files directly; developers review diffs rather than writing code
4Developers describe goals to agents rather than specifying implementation steps
5At least some developers work primarily in terminal/CLI with agents rather than IDEs
6Developers run 2+ agents simultaneously on different tasks
7The team has built specs, AGENTS.md files, or orchestration tooling to coordinate agents
8Verification infrastructure, including automated tests and trust constraints, governs what agents can commit
9Parallel agent work merges cleanly without frequent conflicts or rework
10The team measures success by decision velocity and system reliability rather than lines of code

Score interpretation:

  • 0-4: Levels 1-2. Focus on increasing trust through inline edits and edit mode workflows.
  • 5-8: Levels 3-4. Experiment with CLI-first agentic tools.
  • 9-13: Level 5. Ready for parallel agent workflows.
  • 14-17: Levels 6-7. Invest in spec-driven orchestration and verification infrastructure.
  • 18-20: Level 8. Focus on governance, observability, and scaling.

The Skill Shift at Each Level: From Typing to Reviewing to Orchestrating

The verification burden grows at each level, and the skills needed to handle it change with it:

DimensionLevels 1-3Levels 4-5Levels 6-8
Primary activityWriting code, reviewing suggestionsReviewing agent output, approving commandsDecomposing tasks, designing verification systems
Core skillSyntax mastery, prompt engineeringVerification judgment, task framingIntent articulation, orchestration design
Code review taskSymmetric peer reviewAsymmetric AI output evaluationTrust constraint system design
Performance metricLines of code, PR volumePR quality, rework rateDecision velocity, system reliability
Time allocationMajority in construction50%+ in evaluationPrimarily async oversight and exception handling

BCG's analysis of AI workforce impact reinforces the direction: the writing and maintenance of code will be deprioritized, while higher-order systems thinking and proficiency with AI tools grow in importance. Intent's BYOA model supports that shift in practice, letting teams route a planning task to Claude Opus, an implementation task to Codex, and a verification task to a cheaper Haiku-class model inside a single workspace.

How to Progress from Level 5 to Level 6+

ThoughtWorks places "team of coding agents" at Assess stage on its Technology Radar, worth exploring but not yet broadly recommended for production. Gartner predicts 40% of agentic AI projects will be canceled by the end of 2027. The transition requires deliberate preparation.

Phase 1 (Months 1-3): Spec-First Foundation

ThoughtWorks describes spec-driven development as an emerging workflow for AI-assisted and multi-step agentic coding. Spec-writing is the gating skill: decomposing projects into precisely specified, independently verifiable subtasks.

Open source
augmentcode/augment.vim612
Star on GitHub

A working spec for agent execution typically includes:

  1. A goal statement in one sentence describing the user-visible outcome
  2. A scope boundary listing which files, services, or modules are in and out of scope
  3. Interface contracts for any function signatures or API shapes the agent must match
  4. Acceptance tests the agent should be able to run locally before declaring the task complete
  5. A rollback plan describing how to revert the change if verification fails

A bad spec, by contrast, is a paragraph of prose with no scope boundary and no tests. Agents will accept it, produce plausible code, and fail silently.

From there:

Intent's living specs provide this foundation as a product capability, auto-updating as agents complete work so spec rot never sets in. The Coordinator Agent drafts the spec and generates tasks; developers can stop it at any point to manually edit the spec before agents proceed.

Phase 2 (Months 3-6): Controlled Parallelism

Begin running 2-3 agents in parallel on isolated, well-scoped tasks. A good candidate task meets four tests: it touches a bounded set of files, it has existing test coverage, it does not require coordination with other in-flight work, and it can be reverted with a single git command. Backend refactors, test generation, and documentation updates usually pass all four. Cross-cutting changes like auth refactors or database migrations almost never do.

Establish cost monitoring and token budget governance before authorizing parallel agents. O'Reilly's coverage of agentic coding frames the conductor-to-orchestrator shift as a progression: less experienced developers build confidence driving a single agent before taking on parallel coordination, while senior engineers lead the early parallel operations. The pull-back signal is consistent: when more than 20% of parallel agent output requires manual rework, the tasks are poorly scoped and the team should return to sequential execution.

Intent's isolated git worktrees give each agent its own workspace, preventing the merge conflicts and cross-contamination that plague ad hoc parallel agent setups.

Phase 3 (Months 6+): Orchestration Architecture

Anthropic's engineering team has documented the pattern at scale: a lead agent coordinates the process while delegating to specialized subagents that operate in parallel. Teams at this phase need at least three observability metrics in place before scaling agent count further:

  • Spec adherence rate: percentage of agent-produced PRs that match their originating spec without manual correction
  • Verification latency: time from agent task completion to human or automated sign-off
  • Cost per merged PR: total token spend divided by PRs that actually reach main

Teams should also establish pre-merge verification review processes to address the architectural drift that surfaces as AI adoption scales.

Pitfalls Worth Naming Separately

Two failure modes cut across all three phases. The first is context standardization fragmentation: teams using multiple agent tools maintain parallel context files per tool (CLAUDE.md, .cursorrules, AGENTS.md) without a current interoperability standard, and those files drift apart over time. The second is cost explosion: token usage scales dramatically with parallel agents, and budget governance must precede parallel scaling, never follow it.

Map Your Team's Next Level, Then Build the Prerequisites

The gap between where most teams operate, Levels 1-3, and where the industry is heading, Levels 6-8, is primarily a coordination problem. Tooling is the easier half. Stack Overflow's AI sentiment data shows positive sentiment toward AI tools has declined to 60%, down from above 70% in prior years, which fits a pattern where verification burden grows faster than the skills needed to manage it.

Teams scoring 9-13 on the self-assessment face the most consequential decision: continue optimizing single-agent workflows or begin the structural transition to orchestrated, parallel development. Progression depends on the ability to decompose work into precisely specified, independently verifiable tasks that agents can execute in parallel, which is what a spec-driven workspace is built to support.

With Intent, teams can reach Level 6+ without building orchestration infrastructure from scratch.

Build with Intent

Free tier available · VS Code extension · Takes 2 minutes

FAQ

Written by

Paula Hingel

Paula Hingel

Paula writes about the patterns that make AI coding agents actually work — spec-driven development, multi-agent orchestration, and the context engineering layer most teams skip. Her guides draw on real build examples and focus on what changes when you move from a single AI assistant to a full agentic codebase.

Get Started

Give your codebase the agents it deserves

Install Augment to get started. Works with codebases of any size, from side projects to enterprise monorepos.