How does model routing interact with prompt caching?

Prompt caching reduces costs for repeated system prompts and context, but the savings vary by model tier. Haiku's cache read cost ($0.10/MTok) is 5x cheaper than Opus ($0.50/MTok), meaning cache-heavy workflows like repeated file navigation benefit most from routing to cheaper models. For agent roles that reuse the same system prompt across hundreds of calls per session, routing to Haiku with caching enabled can reduce per-call costs by 90% compared to uncached Opus calls.

Does routing require maintaining separate API integrations for each model provider?

Cross-provider routing (mixing Claude and GPT models) requires API integrations with both Anthropic and OpenAI. Production routing proxies like LiteLLM abstract this into a unified interface supporting 100+ providers with built-in fallback, retry, and load balancing. Same-family routing (Opus + Sonnet + Haiku) requires only one provider integration. Intent handles cross-provider routing natively through its BYOA support, removing the need to manage separate integrations.

Can Haiku 4.5 replace Sonnet for implementation tasks to save more money?

Haiku 4.5's SWE-bench Verified score (73.3%) sits 6.3 points below Sonnet 4.6 (79.6%), a wider gap than the 3.9 points between Haiku and the previous Sonnet 4.5. For production code requiring edge case handling, multi-file refactoring, or security-sensitive logic, Sonnet's higher accuracy reduces re-prompting overhead that can negate Haiku's cost savings.

How often should model assignments be updated as new models release?

The default model in Augment Code changed twice in a recent observation window: GPT-5.4 became the default, then Claude Opus 4.6 replaced it. Engineering teams should treat model assignments as versioned configuration. Monitor benchmark scores, pricing changes, and production telemetry, including tool call counts and error rates, to trigger reassessment when a new model release shifts the cost-quality tradeoff.

Does the 34% fewer tool calls figure apply to all tasks or only implementation?

The 34% measurement reflects average tool-call reduction on the internal one-shot instruction-to-PR evaluation when switching from Sonnet 4.0 to Sonnet 4.5. Separate production telemetry from 16 billion tokens shows a 21% per-message reduction from Sonnet 4.0 to Sonnet 4.5 alone, consistent with the direction of the combined figure. Comparable production telemetry for Sonnet 4.6 has not yet been published, though Anthropic reports improved token efficiency.

Why not route code review to Opus instead of GPT-5.2?

Opus 4.6 excels at coordination and planning, but GPT-5.2's behavioral profile of making more tool calls for deeper investigation matches async code review requirements. The DryRun Security report found Codex (GPT-5.2) finished with the fewest vulnerabilities among three tested agents. The review benchmark also confirms that the GPT model series has consistently performed best for code review specifically.

Best AI Model for Coding Agents in 2026: A Routing Guide

The best AI model for coding agents in 2026 depends on the agent's role: Opus 4.6 for coordination, Sonnet 4.6 for implementation, Haiku 4.5 for file navigation, and GPT-5.2 for code review, because each role has distinct reasoning, speed, and cost requirements that no single model satisfies.

TL;DR

The best AI model for coding agents in 2026 varies by role: Opus 4.6 for coordination, Sonnet 4.6 for implementation, Haiku 4.5 for file navigation, and GPT-5.2 for code review. Running one model across all roles overspends on simple tasks and underperforms on complex ones. Anthropic's multi-agent research system outperformed single-agent Opus by 90.2% on research retrieval tasks, validating role-based routing in principle across domains.

Why One Model Does Not Fit All Agent Roles

A single model assigned to every agent role in a multi-agent coding system creates two simultaneous failure modes: over-provisioning on simple tasks wastes compute budget and adds latency, while under-provisioning on complex tasks causes quality degradation that compounds at every step of the agent pipeline. The cost is measurable. Routing Opus 4.6 to file navigation tasks can increase input token costs by about 5x versus Haiku 4.5, based on Anthropic's published pricing ($5/M vs. about $1/M input tokens). Routing Haiku to architectural planning produces malformed subtask specs that no downstream agent can correct.

The MasRouter paper formalizes why fixed single-model assignment fails: model routing depends on "the task's domain and difficulty, as well as their corresponding role." The same research refutes the assumption that the largest model always wins: "larger LLMs have been shown not to always outperform their smaller counterparts."

Four concrete mismatch problems drive this failure.

Planning receives the wrong model, and errors cascade. A coordinator agent decomposes ambiguous requirements into executable subtasks, reasons about inter-agent dependencies, and manages parallel workstreams. When a weaker model handles planning, every downstream agent operates on malformed inputs with no upstream correction mechanism. Anthropic positions Opus 4.6 for tasks that demand the deepest reasoning, such as codebase refactoring and coordinating agents in a workflow.

File navigation runs on frontier models at 5x cost. Directory listing, symbol resolution, and import path tracing are structurally simple retrieval operations requiring pattern matching, not deep reasoning. Anthropic's own model guide positions Haiku for high-volume intelligent processing, cost-sensitive deployments, and sub-agent tasks. The pricing gap between Opus 4.6 ($5.00/MTok input) and Haiku 4.5 ($1.00/MTok input) means routing hundreds of file reads through the frontier model exhausts the compute budget before reaching tasks that need it.

Code generation gets a fixed context window regardless of task size. CrewAI docs map task-to-context-window requirements explicitly: small tasks (up to 4K tokens) fit standard models; medium tasks (4K-32K) need enhanced models; large tasks (over 32K) need large context models. A fixed model applies a fixed context window uniformly, truncating instructions on large files or wasting resources on 500-token edits.

Code review forces a choice between bottleneck and missed bugs. A frontier model applied exclusively to code review creates a serialized bottleneck. A low-tier model misses vulnerabilities. Sonnet 4.6 improved coding performance and consistency, but the right model for review depends on whether it is synchronous or async.

Agent Role	Over-Provisioning Cost	Under-Provisioning Cost
Planning/Orchestration	Unnecessary spend on largest model	Poor decomposition cascades errors downstream
File Navigation/Search	5x cost inflation on high-frequency retrieval	N/A: capable models offer no quality benefit
Code Generation	Wasted context on small files	Truncation discards instructions on large files
Code Review	Serialized bottleneck limits parallel reviews	Missed bugs, security vulnerabilities

Augment Code reports a 40% reduction in hallucinations through the Context Engine's intelligent model routing, which becomes critical when multiple agents operate on the same codebase simultaneously. Intent's model picker makes differentiated assignment practical by letting teams assign the right model to each specialist agent within a coordinated workspace.

[ Free report ]

The Agentic SDLC

How teams like Stripe, Ramp, and Uber move from solo coding agents to a coordinated, team-level system.

Download the guide

Routing the Best Model to Each Coding Agent Role

Model routing directs easy, high-frequency requests to smaller models and harder, reasoning-intensive requests to capable models, matching cost to task complexity at each tier. Three implementation approaches exist in production systems, each suited to different team contexts.

Static routing uses predefined rules to distribute tasks, often without examining the content of each request. The Claude sub-agents API provides the most direct production example: the model field in subagent definitions accepts a model alias (sonnet, opus, haiku), a full model ID (claude-opus-4-6), or inherit to mirror the parent session. Static routing works best for teams with predictable workloads where task types map cleanly to agent roles, such as a fixed Coordinator + Implementor + Reviewer pipeline. It breaks down when task complexity varies significantly within a single role, forcing the team to either over-provision or accept quality drops on harder tasks.

Dynamic routing selects models at runtime based on task complexity. RouteLLM is an open-source complexity-based router that configures a strong_model and weak_model, then evaluates each prompt's complexity against a threshold to route automatically. Dynamic routing adds a classification step before every request, introducing latency (typically 50-200ms per routing decision) that compounds across hundreds of agent calls per session. For teams running fewer than 500 agent calls per day, the engineering overhead of maintaining a routing classifier often exceeds the cost savings it produces.

Hybrid routing combines a static planner with dynamic execution model selection. OpenAI's reasoning guide describes the pattern: a reasoning model serves as the planner, producing a detailed multistep solution and then selecting the right GPT model for each step based on whether high intelligence or low latency matters most. Hybrid routing requires a frontier model for the planning step, meaning it only saves cost on execution, not orchestration.

Choosing a Routing Approach

Factor	Static	Dynamic	Hybrid
Best for	Fixed pipelines with clear role boundaries	High-volume systems with variable task complexity	Complex workflows where planning requires frontier models
Setup complexity	Low: assign model per agent	Medium: train/tune routing classifier	Medium: configure planner + execution pool
Latency overhead	None	50-200ms per routing decision	Planning step only
Breaks when	Task complexity varies within a role	Call volume is too low to justify classifier maintenance	Planning model cost dominates the budget
Team size fit	Any	5+ engineers with ML ops capacity	3+ engineers comfortable with multi-model orchestration

Framework	Routing Type	Model Assignment Mechanism
Claude Code	Static per-subagent	model field in subagent definition (alias or full ID)
CrewAI	Static per-agent	LLM instance passed to each Agent object
LangGraph	Dynamic via routing node	Structured LLM output classifies; downstream nodes use different models
OpenAI Agents SDK	Hybrid (static planner, dynamic doer)	Reasoning model selects execution model per step
RouteLLM	Preference-based with cost-threshold routing	Trained router predicts strong-model win probability and compares it to a cost threshold α for routing

Intent's multi-agent architecture implements static routing with manual model selection per workspace, consistent with Anthropic's orchestrator-worker and evaluator-style multi-agent patterns. This makes Intent best suited for teams with well-defined agent pipelines (Coordinator → Implementor → Reviewer) where each specialist handles a consistent task type. Teams select models for each specialist agent through Intent's model picker, then the Coordinator delegates tasks to those agents with the appropriate model already assigned.

Coordinator: Opus 4.6 for Planning and Architecture Decisions

Claude Opus 4.6 is Anthropic's current flagship model, positioned for coordination: the strongest option for tasks that demand the deepest reasoning, such as codebase refactoring and coordinating agents in a workflow.

Three capabilities make Opus 4.6 suited for the coordinator role.

Adaptive thinking lets the model decide when to apply extended thinking, with developer-configurable effort controls for the intelligence, speed, and cost tradeoff. On SWE-bench, third-party sources report strong performance for Opus 4.5 relative to Sonnet 4.5, though official benchmark documentation did not verify detailed effort-level token-efficiency and cost-scaling claims.

1M token context window provides the retrieval depth coordinators need for large codebase work and is now generally available for Claude Opus 4.6 at standard pricing. Opus 4.6 is the strongest verified 1M-context retrieval option in this analysis for coordinator-scale work.

Native agent teams in Claude Code (research preview) enable spinning up multiple agents working in parallel with autonomous coordination, a capability built directly into the model's operational framework.

Benchmark Evidence for Coordination Tasks

The MCP Atlas benchmark measures tool use and multi-tool orchestration across MCP servers, making it a relevant benchmark for MCP-based coordination tasks. Scores vary significantly by effort level and evaluation configuration:

Model	MCP Atlas (max effort)
Claude Opus 4.5	62.3%
Claude Opus 4.6	59.5%
Claude Sonnet 4.5	43.8%

Source: Anthropic's Opus 4.6 system card (Opus 4.6 max-effort score), Anthropic's Opus 4.5 system card (Opus 4.5 and Sonnet 4.5 scores). Anthropic's Opus 4.6 announcement page reported 62.7% at a "high effort" configuration; the system card, published one day later, reports 59.5% at max effort, which is slightly below Opus 4.5. The Scale Labs leaderboard shows higher scores (71.8-75.8%) at yet another configuration. The figures above use the system card as the more rigorous source.

The key takeaway for routing decisions is the Opus-to-Sonnet gap: both Opus versions score 15-19 points above Sonnet 4.5 on MCP Atlas, reflecting the reasoning depth that coordination tasks demand. Downgrading the orchestrator from Opus to Sonnet to save cost produces measurably worse coordination outcomes regardless of which Opus evaluation is used.

On SWE-bench Verified (500 validated real GitHub issues), Opus 4.6 achieves 80.84% averaged across 25 trials, the strongest result for any coordinator-class model in this analysis. That SWE-bench lead, combined with the 15-19 point MCP Atlas advantage over Sonnet, makes the case for Opus as coordinator on both code quality and tool orchestration grounds.

Named enterprise partners validate the coordinator role specifically:

Anthropic: "Claude Opus 4.6 is a huge leap for agentic planning. It breaks complex tasks into independent subtasks, runs tools and subagents in parallel, and identifies blockers with real precision."
Sola: "Claude Opus 4.6 is the best orchestration model we've used for complex multi-agent work. It tracks how sub-agents are doing, proactively steers them, and terminates when needed. Active management is new."

Pricing: $5.00/$25.00 per MTok input/output per Anthropic pricing.

When Sonnet Is Good Enough for Coordination

Opus 4.6 is the correct coordinator model for complex pipelines, but Sonnet 4.6 can serve as coordinator in three specific scenarios. Small agent pipelines (2-3 specialists) with well-scoped tasks rarely need Opus-level decomposition. Tight iteration loops where Opus's extended thinking adds unacceptable latency benefit from Sonnet's faster response times. Budget-constrained prototyping where the team is testing pipeline structure before committing to production-grade assignments can defer the Opus cost until the architecture stabilizes. The Opus-to-Sonnet MCP Atlas gap matters most for multi-step tool orchestration across unfamiliar codebases. For pipelines with clear, pre-decomposed tasks where the coordinator primarily dispatches rather than reasons, that gap narrows.

In Intent's pipeline, the Coordinator agent is where Opus 4.6 earns its cost: it analyzes the codebase, drafts the living spec, generates tasks, and delegates to specialist agents. A weaker model at this stage produces malformed task decompositions that no downstream specialist can recover from.

Implementors: Sonnet 4.6 for Fast, Reliable Code Generation

Claude Sonnet 4.6, released February 17, 2026, is Anthropic's current best model for coding and complex agents. It scores 79.6% on SWE-bench Verified (up from 77.2% on Sonnet 4.5) at the same $3/$15 per MTok pricing, and developers in Claude Code testing preferred it over Sonnet 4.5 roughly 70% of the time. For the implementor role in multi-agent pipelines, Sonnet 4.6 is now the correct default.

Tool Call Efficiency: What Sonnet 4.5 Telemetry Showed

The most detailed tool-call telemetry available is from Sonnet 4.5, since Sonnet 4.6 has not yet accumulated comparable production data. Internal evaluation of one-shot instruction-to-PR task completion found that switching from Sonnet 4.0 to Sonnet 4.5 produced 34% fewer tool calls on average and approximately 26% faster overall task completion time while maintaining the same accuracy.

This measurement reflects two simultaneous changes: the model switch from Sonnet 4.0 to Sonnet 4.5, and the introduction of parallel tool-call architecture. The isolated model-only effect, measured from separate production telemetry across several billion tokens, shows a 21% per-message reduction:

Model	Avg Tool Calls per User Message
Sonnet 4.5	12.33
Sonnet 4.0	15.65
GPT-5	11.58

Sonnet 4.5 performs more internal reasoning before acting, while Sonnet 4.0 issues more frequent tool calls with quicker execution. The 21% per-message figure is the cleaner measurement of the model's own efficiency gain; the 34% full-task figure includes both model and infrastructure improvements. Sonnet 4.6 is expected to maintain or improve on these efficiency gains, given Anthropic reports it is 70% more token-efficient than Sonnet 4.5 on filesystem benchmarks, but comparable production telemetry for Sonnet 4.6 has not yet been published.

Benchmark Position

Sonnet 4.6 holds a strong coding benchmark position, though the competitive landscape has tightened since Gemini 3.1 Pro's release two days later:

Model	SWE-bench Verified
Claude Opus 4.6	80.8%
Claude Opus 4.5	80.9%
Gemini 3.1 Pro	80.6%
Claude Sonnet 4.6	79.6%
Claude Sonnet 4.5	77.2%
GPT-5.1	76.3%

Sources: Anthropic's Opus 4.5 system card, Anthropic's Sonnet 4.6 announcement, Google's Gemini 3.1 Pro model card

Gemini 3.1 Pro at 80.6% now sits between the Opus tier and Sonnet 4.6 on SWE-bench Verified, making the implementor model choice more nuanced for teams not committed to the Claude ecosystem. Within Claude-only routing (the focus of this guide's cost model), Sonnet 4.6 at 79.6% closes to within 1.2 points of Opus 4.6, the smallest Sonnet-to-Opus gap in any Claude generation. A customer's internal code editing benchmark reported a 0% error rate on Sonnet 4.5 versus 9% on Sonnet 4; Sonnet 4.6 further improved on consistency and instruction following.

Why Sonnet 4.6 Fits the Implementor Role

Opus output tokens cost 67% more than Sonnet ($25/MTok vs. $15/MTok). For sustained implementation work across multiple parallel agents, this differential compounds quickly. In a session with three implementation tasks at 12K input / 8K output tokens each, Opus costs $0.78 where Sonnet costs $0.468: a $0.31 difference that multiplies across every session. When using Augment Code's Context Engine, teams running Sonnet 4.6 as their implementor model see faster task completion on complex multi-file tasks because Sonnet 4.6 makes smarter use of the codebase context provided to each implementor agent.

Claude Haiku 4.5 is Anthropic's fastest model, positioned for high-volume, straightforward tasks such as file navigation, simple edits, linting, and sub-agent execution. Its $1.00/MTok input cost is 3x cheaper than Sonnet and 5x cheaper than Opus, with cache reads at $0.10/MTok. Those savings compound across hundreds of repeated tool invocations per session.

What Quality Do You Lose by Routing to Haiku?

Haiku 4.5's SWE-bench Verified score of 73.3% sits within 3.9 percentage points of Sonnet 4.5 (77.2%) and 6.3 points below Sonnet 4.6 (79.6%), but SWE-bench measures full issue resolution, not file navigation quality. No public benchmark specifically measures the tasks being recommended for Haiku: grep, directory listing, symbol resolution, and boilerplate generation. These operations are structurally simpler than SWE-bench issues, relying on pattern matching and retrieval rather than multi-step reasoning. The SWE-bench gap likely overstates the quality difference for these use cases.

Anthropic's Opus 4.5 system card reports that when given Claude Sonnet 4.5 subagents, Claude Opus 4.5 as orchestrator achieved 85.4%, suggesting subagent choice can materially affect end-to-end pipeline performance even when individual task quality appears comparable.

The practical test: if a Haiku agent's file navigation output requires Sonnet-level correction more than 20% of the time, the re-prompting cost negates the 3x pricing advantage. Monitor error rates on Haiku-assigned tasks during the first week of deployment.

Routing Decision Tree

Anthropic describes Haiku 4.5 as suited for real-time, low-latency applications, cost-efficient deployments, and sub-agent or multi-agent tasks. The following decision framework applies that positioning to specific agent operations:

Route to Haiku	Route to Sonnet/Opus
File search, grep, directory listing	Complex debugging
Linting, formatting checks	Architectural decisions
Simple codebase questions (single-file scope)	Multi-file refactoring
Boilerplate generation from templates	Security analysis
Formatting files	Production code with edge cases

For production code or architectural reasoning, more capable models produce a better total cost of task completion when accounting for review, correction, and re-prompting overhead.

Code Review: GPT-5.2 for Deep Analysis

GPT-5.2 is OpenAI's December 2025 frontier model built for professional knowledge work. Its behavioral profile of deeper investigation through exhaustive tool use makes it well suited for asynchronous code review, where thoroughness matters more than speed.

Why GPT-5.2 and not GPT-5.4? GPT-5.4 launched in March 2026 and improved on GPT-5.2 across general benchmarks (33% fewer factual errors, 75% OSWorld). Augment Code itself briefly used GPT-5.4 as the default model before switching to Opus 4.6. The GPT-5.2 recommendation for code review is based on evaluated performance: the published blog post and benchmark results confirm GPT-5.2 was extensively tested and tuned for async code review. The DryRun Security evidence below also predates GPT-5.4's release. GPT-5.4's code review quality relative to GPT-5.2 has not been independently evaluated, so this recommendation should be revisited when GPT-5.4-specific code review benchmarks become available.

Why GPT-5.2 for Review Instead of Sonnet

GPT-5.2 takes more time and makes more tool calls but produces more thoroughly reasoned analysis. Waiting an additional 30 seconds is preferable for async code review if it results in catching subtle bugs versus a faster but shallower review.

This directly inverts the "fewer tool calls" optimization used for implementor agents. For interactive coding, fewer tool calls means faster iteration. For async code review, more exhaustive tool use means better bug detection. The optimal model depends on the task type.

Security Evidence

The DryRun Security coding report (March 2026) provides the most relevant independent security evaluation comparing agent-generated code. Three agents each built two applications; security scanning ran against every PR produced:

Agent	Baseline Issues	Final Issues	Net Change
Codex (GPT-5.2)	9	8	-1
Gemini	9	11	+2
Claude	9	13	+4

Source: DryRun Security report

Codex finished with the fewest vulnerabilities of the three agents evaluated. Claude carried an IDOR vulnerability and an unauthenticated destructive endpoint unresolved from early PRs through to the final version.

Benchmark Context

GPT-5.2 Thinking achieves 80.0% on SWE-bench. On ARC-AGI-2 abstract reasoning, GPT-5.2 Thinking scores 52.9% versus GPT-5.1's 17.6%: a 35.3 percentage point jump in abstract pattern recognition relevant to identifying non-obvious code issues.

Open source

augmentcode/auggie★255

Star on GitHub

The review benchmark confirms the model-specificity finding: the GPT model series has consistently performed best for code review specifically. The same evaluation found that prompts, toolsets, and guardrails must often be tuned for each model, and treating the model as a drop-in component rarely produces high-quality results.

Intent's built-in Code Review specialist agent applies this principle directly: it uses GPT-5.2 for thorough analysis while the Coordinator and Implementor agents run on Claude models matched to their respective roles.

Cost Modeling: How Routing Saves 40-60% Versus Opus-for-Everything

Model routing achieves 40-60% cost reduction versus uniform frontier model deployment by matching per-token pricing to task complexity across a typical multi-agent coding session. The savings below are calculated from published API pricing from Anthropic and OpenAI.

Current API Pricing (April 2026)

Model	Input ($/MTok)	Output ($/MTok)	Cache Read ($/MTok)
Claude Opus 4.6	$5.00	$25.00	$0.50
Claude Sonnet 4.6	$3.00	$15.00	$0.30
Claude Haiku 4.5	$1.00	$5.00	$0.10
GPT-5.2	$1.75	$14.00	$0.175
GPT-5.4	$2.50	$15.00	$0.25

Note: Claude Opus 4 (legacy, without version suffix) is priced at $15.00/$75.00 per MTok, a separate model from Opus 4.6 at $5.00/$25.00. Sonnet 4.6 pricing matches Sonnet 4.5 exactly ($3/$15). Both OpenAI cached input prices follow the standard 90% discount on base input pricing.

Worked Cost Model: Typical Multi-Agent Session

Token assumptions per session, consistent with a coding agent making approximately 200 API calls per session:

Task Type	Input Tokens	Output Tokens	Frequency
Architecture planning	8,000	4,000	1x
Complex implementation	12,000	8,000	3x
Quick edits/small fixes	3,000	1,500	8x
Code review/linting	5,000	2,000	4x
Test generation	4,000	3,000	4x

Session totals: approximately 104,000 input tokens / 60,000 output tokens.

Three-Tier Routing vs. Uniform Opus 4.6

The three-tier Claude routing assigns Opus 4.6 to architecture planning only, Sonnet 4.6 to complex implementation and test generation, and Haiku 4.5 to quick edits and code review. This models a Claude-only scenario; teams following the article's GPT-5.2 recommendation for code review would substitute GPT-5.2 pricing ($1.75/$14 per MTok) for the Haiku code review row, increasing code review cost from $0.060 to $0.147 per session while gaining the deeper analysis described above. The following table shows the Claude-only per-task-type cost breakdown:

Task Type	Model Assigned	Input Cost	Output Cost	Task Cost	Uniform Opus Cost
Architecture (1x)	Opus 4.6	$0.040	$0.100	$0.140	$0.140
Implementation (3x)	Sonnet 4.6	$0.108	$0.360	$0.468	$0.780
Quick edits (8x)	Haiku 4.5	$0.024	$0.060	$0.084	$0.420
Code review (4x)	Haiku 4.5	$0.020	$0.040	$0.060	$0.300
Test generation (4x)	Sonnet 4.6	$0.048	$0.180	$0.228	$0.380
Session total				$0.98	$2.02

Three-tier routing costs $0.98 per session versus $2.02 for uniform Opus 4.6: a 51% reduction that falls squarely within the 40-60% range. The largest savings come from routing quick edits and code review to Haiku, which together account for $0.66 in Opus costs reduced to $0.14.

AWS reports up to 30% savings through Intelligent Prompt Routing in Amazon Bedrock, a more conservative figure reflecting dynamic routing overhead. Both Anthropic and OpenAI offer 50% batch processing discounts for async tasks, further reducing costs for code review and test generation workflows that tolerate latency.

How Intent's Model Picker Enables Per-Agent Routing

Intent supports per-workspace model selection across all major frontier and mid-tier models, applying the routing strategies described in this guide. The Coordinator, Specialist, and Verifier pipeline maps directly to the tiered model assignments above: assign a frontier model to the Coordinator, a balanced model to Implementors, and match the remaining specialists to their optimal model tier.

Bring-Your-Own-Agent Support

The most important routing decision in Intent is whether to use the native Auggie agent or an external provider. Auggie receives full Context Engine integration, providing architectural-level understanding across 400,000+ files through semantic dependency graph analysis. External agents (Claude Code, OpenAI Codex, OpenCode) work within Intent's workspace and coordination layer but operate without the Context Engine's codebase-wide context. For routing scenarios where cross-file dependency awareness determines code quality, such as multi-service refactoring or architectural planning, the native Auggie agent produces higher-quality output because it accesses the full codebase context. For isolated, single-file tasks, external agents perform comparably.

Teams with existing subscriptions to external providers can use those agents directly within Intent, with model usage billed through those providers.

Supported Models and Recommended Roles

Recommended Model	Agent Role / Task Type
Claude Opus 4.5/4.6	Complex tasks and agentic planning
Claude Sonnet 4.6	Rapid iteration and implementation
GPT-5.2	Deep code analysis and code review
Claude Haiku 4.5	Fast, lightweight tasks

As the Intent product page states: "Not every task needs the same model. Intent supports all major models; mix and match based on what each task needs."

Model selection works across all Augment Code surfaces, including the JetBrains panel dropdown, the --model flag in Auggie CLI, and cost tier indicators ($, $$, $$$) in the model picker. Full details are available in the model docs.

Intent's Multi-Agent Pipeline

Intent structures work through built-in specialist agents, each designed for a distinct role in the development workflow:

Agent	Responsibility	Recommended Model Tier
Investigate	Explore codebase and assess feasibility	Opus (deep reasoning)
Implementor agents	Execute implementation plans	Sonnet 4.6 (code generation)
Verify	Check implementations against the living spec	Sonnet or Opus (correctness)
Critique	Review specs for feasibility before implementation	Opus (architectural judgment)
Debug	Analyze and fix issues	Sonnet or GPT-5.2 (thorough analysis)
Code Review	Automated reviews with severity ratings	GPT-5.2 (exhaustive investigation)

The Coordinator breaks down a submitted spec, specialist agents execute in parallel in their own contexts, and the Coordinator manages handoffs. Each specialist agent receives architectural-level understanding across 400,000+ files through the Context Engine's semantic dependency graph analysis. This context-aware execution applies regardless of which model tier handles the task. For CLI-based configuration, pass auggie --model "claude-sonnet-4-6" to select a model per session; see the CLI docs for the full reference.

Route Models by Agent Role Before Your Next Spec

The core tension in multi-agent coding systems is that quality demands frontier models while volume demands cost efficiency. Model routing resolves this by applying frontier capability only where it changes outcomes: Opus 4.6 for coordination decisions that cascade through every downstream agent, Sonnet 4.6 for the high-volume implementation work that benefits from fewer tool calls and a 79.6% SWE-bench score at $3/MTok, Haiku 4.5 for the hundreds of file operations that need speed over reasoning depth, and GPT-5.2 for the async code review that benefits from exhaustive investigation. The worked cost model shows three-tier Claude routing saves 51% versus uniform Opus 4.6 deployment. Start by assigning models to your Coordinator and Implementor agents in Intent, then expand routing as production telemetry reveals which tasks tolerate cheaper models.

Best AI Model for Coding Agents in 2026: A Routing Guide

TL;DR

Why One Model Does Not Fit All Agent Roles

The Agentic SDLC

Routing the Best Model to Each Coding Agent Role

Choosing a Routing Approach

Coordinator: Opus 4.6 for Planning and Architecture Decisions

Benchmark Evidence for Coordination Tasks

When Sonnet Is Good Enough for Coordination

Implementors: Sonnet 4.6 for Fast, Reliable Code Generation

Tool Call Efficiency: What Sonnet 4.5 Telemetry Showed

Benchmark Position

Why Sonnet 4.6 Fits the Implementor Role

Quick Tasks: Haiku 4.5 for File Navigation and Simple Edits

What Quality Do You Lose by Routing to Haiku?

Routing Decision Tree

Code Review: GPT-5.2 for Deep Analysis

Why GPT-5.2 for Review Instead of Sonnet

Security Evidence

Benchmark Context

Cost Modeling: How Routing Saves 40-60% Versus Opus-for-Everything

Current API Pricing (April 2026)

Worked Cost Model: Typical Multi-Agent Session

Three-Tier Routing vs. Uniform Opus 4.6

How Intent's Model Picker Enables Per-Agent Routing

Bring-Your-Own-Agent Support

Supported Models and Recommended Roles

Intent's Multi-Agent Pipeline

Route Models by Agent Role Before Your Next Spec

FAQ

Written by

Paula Hingel

Give your codebase the agents it deserves

TL;DR

Why One Model Does Not Fit All Agent Roles

The Agentic SDLC

Routing the Best Model to Each Coding Agent Role

Choosing a Routing Approach

Coordinator: Opus 4.6 for Planning and Architecture Decisions

Benchmark Evidence for Coordination Tasks

When Sonnet Is Good Enough for Coordination

Implementors: Sonnet 4.6 for Fast, Reliable Code Generation

Tool Call Efficiency: What Sonnet 4.5 Telemetry Showed

Benchmark Position

Why Sonnet 4.6 Fits the Implementor Role

Quick Tasks: Haiku 4.5 for File Navigation and Simple Edits

What Quality Do You Lose by Routing to Haiku?

Routing Decision Tree

Code Review: GPT-5.2 for Deep Analysis

Why GPT-5.2 for Review Instead of Sonnet

Security Evidence

Benchmark Context

Cost Modeling: How Routing Saves 40-60% Versus Opus-for-Everything

Current API Pricing (April 2026)

Worked Cost Model: Typical Multi-Agent Session

Three-Tier Routing vs. Uniform Opus 4.6

How Intent's Model Picker Enables Per-Agent Routing

Bring-Your-Own-Agent Support

Supported Models and Recommended Roles

Intent's Multi-Agent Pipeline

Route Models by Agent Role Before Your Next Spec

FAQ

How does model routing interact with prompt caching?

Does routing require maintaining separate API integrations for each model provider?

Can Haiku 4.5 replace Sonnet for implementation tasks to save more money?

How often should model assignments be updated as new models release?

Does the 34% fewer tool calls figure apply to all tasks or only implementation?

Why not route code review to Opus instead of GPT-5.2?

Related

Written by

Paula Hingel

Give your codebase the agents it deserves