The best AI model for coding agents in 2026 depends on the agent's role: Opus 4.6 for coordination, Sonnet 4.6 for implementation, Haiku 4.5 for file navigation, and GPT-5.2 for code review, because each role has distinct reasoning, speed, and cost requirements that no single model satisfies.
TL;DR
The best AI model for coding agents in 2026 varies by role: Opus 4.6 for coordination, Sonnet 4.6 for implementation, Haiku 4.5 for file navigation, and GPT-5.2 for code review. Running one model across all roles overspends on simple tasks and underperforms on complex ones. Anthropic's multi-agent research system outperformed single-agent Opus by 90.2% on research retrieval tasks, validating role-based routing in principle across domains.
Why One Model Does Not Fit All Agent Roles
A single model assigned to every agent role in a multi-agent coding system creates two simultaneous failure modes: over-provisioning on simple tasks wastes compute budget and adds latency, while under-provisioning on complex tasks causes quality degradation that compounds at every step of the agent pipeline. The cost is measurable. Routing Opus 4.6 to file navigation tasks can increase input token costs by about 5x versus Haiku 4.5, based on Anthropic's published pricing ($5/M vs. about $1/M input tokens). Routing Haiku to architectural planning produces malformed subtask specs that no downstream agent can correct.
The MasRouter paper formalizes why fixed single-model assignment fails: model routing depends on "the task's domain and difficulty, as well as their corresponding role." The same research refutes the assumption that the largest model always wins: "larger LLMs have been shown not to always outperform their smaller counterparts."
Four concrete mismatch problems drive this failure.
Planning receives the wrong model, and errors cascade. A coordinator agent decomposes ambiguous requirements into executable subtasks, reasons about inter-agent dependencies, and manages parallel workstreams. When a weaker model handles planning, every downstream agent operates on malformed inputs with no upstream correction mechanism. Anthropic positions Opus 4.6 for tasks that demand the deepest reasoning, such as codebase refactoring and coordinating agents in a workflow.
File navigation runs on frontier models at 5x cost. Directory listing, symbol resolution, and import path tracing are structurally simple retrieval operations requiring pattern matching, not deep reasoning. Anthropic's own model guide positions Haiku for high-volume intelligent processing, cost-sensitive deployments, and sub-agent tasks. The pricing gap between Opus 4.6 ($5.00/MTok input) and Haiku 4.5 ($1.00/MTok input) means routing hundreds of file reads through the frontier model exhausts the compute budget before reaching tasks that need it.
Code generation gets a fixed context window regardless of task size. CrewAI docs map task-to-context-window requirements explicitly: small tasks (up to 4K tokens) fit standard models; medium tasks (4K-32K) need enhanced models; large tasks (over 32K) need large context models. A fixed model applies a fixed context window uniformly, truncating instructions on large files or wasting resources on 500-token edits.
Code review forces a choice between bottleneck and missed bugs. A frontier model applied exclusively to code review creates a serialized bottleneck. A low-tier model misses vulnerabilities. Sonnet 4.6 improved coding performance and consistency, but the right model for review depends on whether it is synchronous or async.
| Agent Role | Over-Provisioning Cost | Under-Provisioning Cost |
|---|---|---|
| Planning/Orchestration | Unnecessary spend on largest model | Poor decomposition cascades errors downstream |
| File Navigation/Search | 5x cost inflation on high-frequency retrieval | N/A: capable models offer no quality benefit |
| Code Generation | Wasted context on small files | Truncation discards instructions on large files |
| Code Review | Serialized bottleneck limits parallel reviews | Missed bugs, security vulnerabilities |
Augment Code reports a 40% reduction in hallucinations through the Context Engine's intelligent model routing, which becomes critical when multiple agents operate on the same codebase simultaneously. Intent's model picker makes differentiated assignment practical by letting teams assign the right model to each specialist agent within a coordinated workspace.
See how Intent's model picker assigns the right model to each agent role.
Free tier available · VS Code extension · Takes 2 minutes
Routing the Best Model to Each Coding Agent Role
Model routing directs easy, high-frequency requests to smaller models and harder, reasoning-intensive requests to capable models, matching cost to task complexity at each tier. Three implementation approaches exist in production systems, each suited to different team contexts.
Static routing uses predefined rules to distribute tasks, often without examining the content of each request. The Claude sub-agents API provides the most direct production example: the model field in subagent definitions accepts a model alias (sonnet, opus, haiku), a full model ID (claude-opus-4-6), or inherit to mirror the parent session. Static routing works best for teams with predictable workloads where task types map cleanly to agent roles, such as a fixed Coordinator + Implementor + Reviewer pipeline. It breaks down when task complexity varies significantly within a single role, forcing the team to either over-provision or accept quality drops on harder tasks.
Dynamic routing selects models at runtime based on task complexity. RouteLLM is an open-source complexity-based router that configures a strong_model and weak_model, then evaluates each prompt's complexity against a threshold to route automatically. Dynamic routing adds a classification step before every request, introducing latency (typically 50-200ms per routing decision) that compounds across hundreds of agent calls per session. For teams running fewer than 500 agent calls per day, the engineering overhead of maintaining a routing classifier often exceeds the cost savings it produces.
Hybrid routing combines a static planner with dynamic execution model selection. OpenAI's reasoning guide describes the pattern: a reasoning model serves as the planner, producing a detailed multistep solution and then selecting the right GPT model for each step based on whether high intelligence or low latency matters most. Hybrid routing requires a frontier model for the planning step, meaning it only saves cost on execution, not orchestration.
Choosing a Routing Approach
| Factor | Static | Dynamic | Hybrid |
|---|---|---|---|
| Best for | Fixed pipelines with clear role boundaries | High-volume systems with variable task complexity | Complex workflows where planning requires frontier models |
| Setup complexity | Low: assign model per agent | Medium: train/tune routing classifier | Medium: configure planner + execution pool |
| Latency overhead | None | 50-200ms per routing decision | Planning step only |
| Breaks when | Task complexity varies within a role | Call volume is too low to justify classifier maintenance | Planning model cost dominates the budget |
| Team size fit | Any | 5+ engineers with ML ops capacity | 3+ engineers comfortable with multi-model orchestration |
| Framework | Routing Type | Model Assignment Mechanism |
|---|---|---|
| Claude Code | Static per-subagent | model field in subagent definition (alias or full ID) |
| CrewAI | Static per-agent | LLM instance passed to each Agent object |
| LangGraph | Dynamic via routing node | Structured LLM output classifies; downstream nodes use different models |
| OpenAI Agents SDK | Hybrid (static planner, dynamic doer) | Reasoning model selects execution model per step |
| RouteLLM | Preference-based with cost-threshold routing | Trained router predicts strong-model win probability and compares it to a cost threshold α for routing |
Intent's multi-agent architecture implements static routing with manual model selection per workspace, consistent with Anthropic's orchestrator-worker and evaluator-style multi-agent patterns. This makes Intent best suited for teams with well-defined agent pipelines (Coordinator → Implementor → Reviewer) where each specialist handles a consistent task type. Teams select models for each specialist agent through Intent's model picker, then the Coordinator delegates tasks to those agents with the appropriate model already assigned.
Coordinator: Opus 4.6 for Planning and Architecture Decisions
Claude Opus 4.6 is Anthropic's current flagship model, positioned for coordination: the strongest option for tasks that demand the deepest reasoning, such as codebase refactoring and coordinating agents in a workflow.
Three capabilities make Opus 4.6 suited for the coordinator role.
Adaptive thinking lets the model decide when to apply extended thinking, with developer-configurable effort controls for the intelligence, speed, and cost tradeoff. On SWE-bench, third-party sources report strong performance for Opus 4.5 relative to Sonnet 4.5, though official benchmark documentation did not verify detailed effort-level token-efficiency and cost-scaling claims.
1M token context window provides the retrieval depth coordinators need for large codebase work and is now generally available for Claude Opus 4.6 at standard pricing. Opus 4.6 is the strongest verified 1M-context retrieval option in this analysis for coordinator-scale work.
Native agent teams in Claude Code (research preview) enable spinning up multiple agents working in parallel with autonomous coordination, a capability built directly into the model's operational framework.
Benchmark Evidence for Coordination Tasks
The MCP Atlas benchmark measures tool use and multi-tool orchestration across MCP servers, making it a relevant benchmark for MCP-based coordination tasks. Scores vary significantly by effort level and evaluation configuration:
| Model | MCP Atlas (max effort) |
|---|---|
| Claude Opus 4.5 | 62.3% |
| Claude Opus 4.6 | 59.5% |
| Claude Sonnet 4.5 | 43.8% |
Source: Anthropic's Opus 4.6 system card (Opus 4.6 max-effort score), Anthropic's Opus 4.5 system card (Opus 4.5 and Sonnet 4.5 scores). Anthropic's Opus 4.6 announcement page reported 62.7% at a "high effort" configuration; the system card, published one day later, reports 59.5% at max effort, which is slightly below Opus 4.5. The Scale Labs leaderboard shows higher scores (71.8-75.8%) at yet another configuration. The figures above use the system card as the more rigorous source.
The key takeaway for routing decisions is the Opus-to-Sonnet gap: both Opus versions score 15-19 points above Sonnet 4.5 on MCP Atlas, reflecting the reasoning depth that coordination tasks demand. Downgrading the orchestrator from Opus to Sonnet to save cost produces measurably worse coordination outcomes regardless of which Opus evaluation is used.
On SWE-bench Verified (500 validated real GitHub issues), Opus 4.6 achieves 80.84% averaged across 25 trials, the strongest result for any coordinator-class model in this analysis. That SWE-bench lead, combined with the 15-19 point MCP Atlas advantage over Sonnet, makes the case for Opus as coordinator on both code quality and tool orchestration grounds.
Named enterprise partners validate the coordinator role specifically:
- Anthropic: "Claude Opus 4.6 is a huge leap for agentic planning. It breaks complex tasks into independent subtasks, runs tools and subagents in parallel, and identifies blockers with real precision."
- Sola: "Claude Opus 4.6 is the best orchestration model we've used for complex multi-agent work. It tracks how sub-agents are doing, proactively steers them, and terminates when needed. Active management is new."
Pricing: $5.00/$25.00 per MTok input/output per Anthropic pricing.
When Sonnet Is Good Enough for Coordination
Opus 4.6 is the correct coordinator model for complex pipelines, but Sonnet 4.6 can serve as coordinator in three specific scenarios. Small agent pipelines (2-3 specialists) with well-scoped tasks rarely need Opus-level decomposition. Tight iteration loops where Opus's extended thinking adds unacceptable latency benefit from Sonnet's faster response times. Budget-constrained prototyping where the team is testing pipeline structure before committing to production-grade assignments can defer the Opus cost until the architecture stabilizes. The Opus-to-Sonnet MCP Atlas gap matters most for multi-step tool orchestration across unfamiliar codebases. For pipelines with clear, pre-decomposed tasks where the coordinator primarily dispatches rather than reasons, that gap narrows.
In Intent's pipeline, the Coordinator agent is where Opus 4.6 earns its cost: it analyzes the codebase, drafts the living spec, generates tasks, and delegates to specialist agents. A weaker model at this stage produces malformed task decompositions that no downstream specialist can recover from.
Implementors: Sonnet 4.6 for Fast, Reliable Code Generation
Claude Sonnet 4.6, released February 17, 2026, is Anthropic's current best model for coding and complex agents. It scores 79.6% on SWE-bench Verified (up from 77.2% on Sonnet 4.5) at the same $3/$15 per MTok pricing, and developers in Claude Code testing preferred it over Sonnet 4.5 roughly 70% of the time. For the implementor role in multi-agent pipelines, Sonnet 4.6 is now the correct default.
Tool Call Efficiency: What Sonnet 4.5 Telemetry Showed
The most detailed tool-call telemetry available is from Sonnet 4.5, since Sonnet 4.6 has not yet accumulated comparable production data. Internal evaluation of one-shot instruction-to-PR task completion found that switching from Sonnet 4.0 to Sonnet 4.5 produced 34% fewer tool calls on average and approximately 26% faster overall task completion time while maintaining the same accuracy.
This measurement reflects two simultaneous changes: the model switch from Sonnet 4.0 to Sonnet 4.5, and the introduction of parallel tool-call architecture. The isolated model-only effect, measured from separate production telemetry across several billion tokens, shows a 21% per-message reduction:
| Model | Avg Tool Calls per User Message |
|---|---|
| Sonnet 4.5 | 12.33 |
| Sonnet 4.0 | 15.65 |
| GPT-5 | 11.58 |
Sonnet 4.5 performs more internal reasoning before acting, while Sonnet 4.0 issues more frequent tool calls with quicker execution. The 21% per-message figure is the cleaner measurement of the model's own efficiency gain; the 34% full-task figure includes both model and infrastructure improvements. Sonnet 4.6 is expected to maintain or improve on these efficiency gains, given Anthropic reports it is 70% more token-efficient than Sonnet 4.5 on filesystem benchmarks, but comparable production telemetry for Sonnet 4.6 has not yet been published.
Benchmark Position
Sonnet 4.6 holds a strong coding benchmark position, though the competitive landscape has tightened since Gemini 3.1 Pro's release two days later:
| Model | SWE-bench Verified |
|---|---|
| Claude Opus 4.6 | 80.8% |
| Claude Opus 4.5 | 80.9% |
| Gemini 3.1 Pro | 80.6% |
| Claude Sonnet 4.6 | 79.6% |
| Claude Sonnet 4.5 | 77.2% |
| GPT-5.1 | 76.3% |
Sources: Anthropic's Opus 4.5 system card, Anthropic's Sonnet 4.6 announcement, Google's Gemini 3.1 Pro model card
Gemini 3.1 Pro at 80.6% now sits between the Opus tier and Sonnet 4.6 on SWE-bench Verified, making the implementor model choice more nuanced for teams not committed to the Claude ecosystem. Within Claude-only routing (the focus of this guide's cost model), Sonnet 4.6 at 79.6% closes to within 1.2 points of Opus 4.6, the smallest Sonnet-to-Opus gap in any Claude generation. A customer's internal code editing benchmark reported a 0% error rate on Sonnet 4.5 versus 9% on Sonnet 4; Sonnet 4.6 further improved on consistency and instruction following.
Why Sonnet 4.6 Fits the Implementor Role
Opus output tokens cost 67% more than Sonnet ($25/MTok vs. $15/MTok). For sustained implementation work across multiple parallel agents, this differential compounds quickly. In a session with three implementation tasks at 12K input / 8K output tokens each, Opus costs $0.78 where Sonnet costs $0.468: a $0.31 difference that multiplies across every session. When using Augment Code's Context Engine, teams running Sonnet 4.6 as their implementor model see faster task completion on complex multi-file tasks because Sonnet 4.6 makes smarter use of the codebase context provided to each implementor agent.
Quick Tasks: Haiku 4.5 for File Navigation and Simple Edits
Claude Haiku 4.5 is Anthropic's fastest model, positioned for high-volume, straightforward tasks such as file navigation, simple edits, linting, and sub-agent execution. Its $1.00/MTok input cost is 3x cheaper than Sonnet and 5x cheaper than Opus, with cache reads at $0.10/MTok. Those savings compound across hundreds of repeated tool invocations per session.
What Quality Do You Lose by Routing to Haiku?
Haiku 4.5's SWE-bench Verified score of 73.3% sits within 3.9 percentage points of Sonnet 4.5 (77.2%) and 6.3 points below Sonnet 4.6 (79.6%), but SWE-bench measures full issue resolution, not file navigation quality. No public benchmark specifically measures the tasks being recommended for Haiku: grep, directory listing, symbol resolution, and boilerplate generation. These operations are structurally simpler than SWE-bench issues, relying on pattern matching and retrieval rather than multi-step reasoning. The SWE-bench gap likely overstates the quality difference for these use cases.
Anthropic's Opus 4.5 system card reports that when given Claude Sonnet 4.5 subagents, Claude Opus 4.5 as orchestrator achieved 85.4%, suggesting subagent choice can materially affect end-to-end pipeline performance even when individual task quality appears comparable.
The practical test: if a Haiku agent's file navigation output requires Sonnet-level correction more than 20% of the time, the re-prompting cost negates the 3x pricing advantage. Monitor error rates on Haiku-assigned tasks during the first week of deployment.
Routing Decision Tree
Anthropic describes Haiku 4.5 as suited for real-time, low-latency applications, cost-efficient deployments, and sub-agent or multi-agent tasks. The following decision framework applies that positioning to specific agent operations:
| Route to Haiku | Route to Sonnet/Opus |
|---|---|
| File search, grep, directory listing | Complex debugging |
| Linting, formatting checks | Architectural decisions |
| Simple codebase questions (single-file scope) | Multi-file refactoring |
| Boilerplate generation from templates | Security analysis |
| Formatting files | Production code with edge cases |
For production code or architectural reasoning, more capable models produce a better total cost of task completion when accounting for review, correction, and re-prompting overhead.
Code Review: GPT-5.2 for Deep Analysis
GPT-5.2 is OpenAI's December 2025 frontier model built for professional knowledge work. Its behavioral profile of deeper investigation through exhaustive tool use makes it well suited for asynchronous code review, where thoroughness matters more than speed.
Why GPT-5.2 and not GPT-5.4? GPT-5.4 launched in March 2026 and improved on GPT-5.2 across general benchmarks (33% fewer factual errors, 75% OSWorld). Augment Code itself briefly used GPT-5.4 as the default model before switching to Opus 4.6. The GPT-5.2 recommendation for code review is based on evaluated performance: the published blog post and benchmark results confirm GPT-5.2 was extensively tested and tuned for async code review. The DryRun Security evidence below also predates GPT-5.4's release. GPT-5.4's code review quality relative to GPT-5.2 has not been independently evaluated, so this recommendation should be revisited when GPT-5.4-specific code review benchmarks become available.
Why GPT-5.2 for Review Instead of Sonnet
GPT-5.2 takes more time and makes more tool calls but produces more thoroughly reasoned analysis. Waiting an additional 30 seconds is preferable for async code review if it results in catching subtle bugs versus a faster but shallower review.
This directly inverts the "fewer tool calls" optimization used for implementor agents. For interactive coding, fewer tool calls means faster iteration. For async code review, more exhaustive tool use means better bug detection. The optimal model depends on the task type.
See how Intent's Code Review agent uses GPT-5.2 for thorough analysis while routing Claude models to Coordinator and Implementor roles.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
Security Evidence
The DryRun Security coding report (March 2026) provides the most relevant independent security evaluation comparing agent-generated code. Three agents each built two applications; security scanning ran against every PR produced:
| Agent | Baseline Issues | Final Issues | Net Change |
|---|---|---|---|
| Codex (GPT-5.2) | 9 | 8 | -1 |
| Gemini | 9 | 11 | +2 |
| Claude | 9 | 13 | +4 |
Source: DryRun Security report
Codex finished with the fewest vulnerabilities of the three agents evaluated. Claude carried an IDOR vulnerability and an unauthenticated destructive endpoint unresolved from early PRs through to the final version.
Benchmark Context
GPT-5.2 Thinking achieves 80.0% on SWE-bench. On ARC-AGI-2 abstract reasoning, GPT-5.2 Thinking scores 52.9% versus GPT-5.1's 17.6%: a 35.3 percentage point jump in abstract pattern recognition relevant to identifying non-obvious code issues.
The review benchmark confirms the model-specificity finding: the GPT model series has consistently performed best for code review specifically. The same evaluation found that prompts, toolsets, and guardrails must often be tuned for each model, and treating the model as a drop-in component rarely produces high-quality results.
Intent's built-in Code Review specialist agent applies this principle directly: it uses GPT-5.2 for thorough analysis while the Coordinator and Implementor agents run on Claude models matched to their respective roles.
Cost Modeling: How Routing Saves 40-60% Versus Opus-for-Everything
Model routing achieves 40-60% cost reduction versus uniform frontier model deployment by matching per-token pricing to task complexity across a typical multi-agent coding session. The savings below are calculated from published API pricing from Anthropic and OpenAI.
Current API Pricing (April 2026)
| Model | Input ($/MTok) | Output ($/MTok) | Cache Read ($/MTok) |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | $0.50 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 |
| GPT-5.2 | $1.75 | $14.00 | $0.175 |
| GPT-5.4 | $2.50 | $15.00 | $0.25 |
Note: Claude Opus 4 (legacy, without version suffix) is priced at $15.00/$75.00 per MTok, a separate model from Opus 4.6 at $5.00/$25.00. Sonnet 4.6 pricing matches Sonnet 4.5 exactly ($3/$15). Both OpenAI cached input prices follow the standard 90% discount on base input pricing.
Worked Cost Model: Typical Multi-Agent Session
Token assumptions per session, consistent with a coding agent making approximately 200 API calls per session:
| Task Type | Input Tokens | Output Tokens | Frequency |
|---|---|---|---|
| Architecture planning | 8,000 | 4,000 | 1x |
| Complex implementation | 12,000 | 8,000 | 3x |
| Quick edits/small fixes | 3,000 | 1,500 | 8x |
| Code review/linting | 5,000 | 2,000 | 4x |
| Test generation | 4,000 | 3,000 | 4x |
Session totals: approximately 104,000 input tokens / 60,000 output tokens.
Three-Tier Routing vs. Uniform Opus 4.6
The three-tier Claude routing assigns Opus 4.6 to architecture planning only, Sonnet 4.6 to complex implementation and test generation, and Haiku 4.5 to quick edits and code review. This models a Claude-only scenario; teams following the article's GPT-5.2 recommendation for code review would substitute GPT-5.2 pricing ($1.75/$14 per MTok) for the Haiku code review row, increasing code review cost from $0.060 to $0.147 per session while gaining the deeper analysis described above. The following table shows the Claude-only per-task-type cost breakdown:
| Task Type | Model Assigned | Input Cost | Output Cost | Task Cost | Uniform Opus Cost |
|---|---|---|---|---|---|
| Architecture (1x) | Opus 4.6 | $0.040 | $0.100 | $0.140 | $0.140 |
| Implementation (3x) | Sonnet 4.6 | $0.108 | $0.360 | $0.468 | $0.780 |
| Quick edits (8x) | Haiku 4.5 | $0.024 | $0.060 | $0.084 | $0.420 |
| Code review (4x) | Haiku 4.5 | $0.020 | $0.040 | $0.060 | $0.300 |
| Test generation (4x) | Sonnet 4.6 | $0.048 | $0.180 | $0.228 | $0.380 |
| Session total | $0.98 | $2.02 |
Three-tier routing costs $0.98 per session versus $2.02 for uniform Opus 4.6: a 51% reduction that falls squarely within the 40-60% range. The largest savings come from routing quick edits and code review to Haiku, which together account for $0.66 in Opus costs reduced to $0.14.
AWS reports up to 30% savings through Intelligent Prompt Routing in Amazon Bedrock, a more conservative figure reflecting dynamic routing overhead. Both Anthropic and OpenAI offer 50% batch processing discounts for async tasks, further reducing costs for code review and test generation workflows that tolerate latency.
How Intent's Model Picker Enables Per-Agent Routing
Intent supports per-workspace model selection across all major frontier and mid-tier models, applying the routing strategies described in this guide. The Coordinator, Specialist, and Verifier pipeline maps directly to the tiered model assignments above: assign a frontier model to the Coordinator, a balanced model to Implementors, and match the remaining specialists to their optimal model tier.
Bring-Your-Own-Agent Support
The most important routing decision in Intent is whether to use the native Auggie agent or an external provider. Auggie receives full Context Engine integration, providing architectural-level understanding across 400,000+ files through semantic dependency graph analysis. External agents (Claude Code, OpenAI Codex, OpenCode) work within Intent's workspace and coordination layer but operate without the Context Engine's codebase-wide context. For routing scenarios where cross-file dependency awareness determines code quality, such as multi-service refactoring or architectural planning, the native Auggie agent produces higher-quality output because it accesses the full codebase context. For isolated, single-file tasks, external agents perform comparably.
Teams with existing subscriptions to external providers can use those agents directly within Intent, with model usage billed through those providers.
Supported Models and Recommended Roles
| Recommended Model | Agent Role / Task Type |
|---|---|
| Claude Opus 4.5/4.6 | Complex tasks and agentic planning |
| Claude Sonnet 4.6 | Rapid iteration and implementation |
| GPT-5.2 | Deep code analysis and code review |
| Claude Haiku 4.5 | Fast, lightweight tasks |
As the Intent product page states: "Not every task needs the same model. Intent supports all major models; mix and match based on what each task needs."
Model selection works across all Augment Code surfaces, including the JetBrains panel dropdown, the --model flag in Auggie CLI, and cost tier indicators ($, $$, $$$) in the model picker. Full details are available in the model docs.
Intent's Multi-Agent Pipeline
Intent structures work through built-in specialist agents, each designed for a distinct role in the development workflow:
| Agent | Responsibility | Recommended Model Tier |
|---|---|---|
| Investigate | Explore codebase and assess feasibility | Opus (deep reasoning) |
| Implementor agents | Execute implementation plans | Sonnet 4.6 (code generation) |
| Verify | Check implementations against the living spec | Sonnet or Opus (correctness) |
| Critique | Review specs for feasibility before implementation | Opus (architectural judgment) |
| Debug | Analyze and fix issues | Sonnet or GPT-5.2 (thorough analysis) |
| Code Review | Automated reviews with severity ratings | GPT-5.2 (exhaustive investigation) |
The Coordinator breaks down a submitted spec, specialist agents execute in parallel in their own contexts, and the Coordinator manages handoffs. Each specialist agent receives architectural-level understanding across 400,000+ files through the Context Engine's semantic dependency graph analysis. This context-aware execution applies regardless of which model tier handles the task. For CLI-based configuration, pass auggie --model "claude-sonnet-4-6" to select a model per session; see the CLI docs for the full reference.
Route Models by Agent Role Before Your Next Spec
The core tension in multi-agent coding systems is that quality demands frontier models while volume demands cost efficiency. Model routing resolves this by applying frontier capability only where it changes outcomes: Opus 4.6 for coordination decisions that cascade through every downstream agent, Sonnet 4.6 for the high-volume implementation work that benefits from fewer tool calls and a 79.6% SWE-bench score at $3/MTok, Haiku 4.5 for the hundreds of file operations that need speed over reasoning depth, and GPT-5.2 for the async code review that benefits from exhaustive investigation. The worked cost model shows three-tier Claude routing saves 51% versus uniform Opus 4.6 deployment. Start by assigning models to your Coordinator and Implementor agents in Intent, then expand routing as production telemetry reveals which tasks tolerate cheaper models.
See how Intent's model picker lets teams route the right model to each specialist while keeping living specs and parallel agents aligned.
Free tier available · VS Code extension · Takes 2 minutes
FAQ
Related
Written by
