Why GPT-5.2 is our model of choice for Augment Code Review

Augment Code Review achieved the highest accuracy on the only public benchmark for AI-assisted code review, outperforming systems from Cursor Bugbot, CodeRabbit, and others by ~10 points on overall quality. A key reason? Our choice of GPT-5.2 as the foundation model for code review—and our model-agnostic approach that lets us pick the best tool for each stage of the software development lifecycle. Augment Code Review was originally built on GPT-5, but upgraded to 5.2 as we observed increased quality on the latest reasoning model from OpenAI.

See full benchmark results and methodology →

What makes a great code review model?

Code review has fundamentally different requirements than interactive coding. There are three main factors that make a model great for code review:

First, raw reasoning capability. Given a piece of code, can the model reason deeply about correctness, identify subtle bugs, and understand architectural implications?

Second, effective tool use in an agent harness. The best code review requires comprehensive context: dependency chains, call sites, type definitions, test files, and historical changes. A great review model needs to make the right tool calls to gather all the relevant information, not just examine the diff in isolation.

Third, strong instruction-following for precision and recall balance. We've carefully tuned every sentence in our prompts to catch the right set of issues and make the right trade-offs between precision and recall. The model needs to follow these nuanced instructions faithfully, catching real bugs while avoiding low-signal noise.

Why GPT-5.2 excels for code review

GPT-5.2 delivers on all three requirements. It excels at making thorough tool calls and pulling in comprehensive context. It takes longer and does more reasoning than models optimized for interactive speed, but that's exactly what we want for asynchronous code review where correctness matters more than latency.

This deliberate, reasoned approach works seamlessly with Augment's Context Engine. GPT-5.2 consistently retrieves the right dependencies, call sites, and cross-file relationships needed to evaluate correctness in large, long-lived codebases. It catches the kinds of cross-system issues and architectural problems that require deep reasoning: the bugs that actually cause incidents.

Why GPT-5.2 over Sonnet for code review

We extensively evaluated both GPT and Anthropic's Sonnet and Opus model families for code review. While Sonnet models excel at interactive use cases, delivering fast, high-quality responses with lower latency, GPT-5.2 proved superior for our asynchronous code review workload.

The key difference comes down to optimization targets. Sonnet models are optimized for the interactive case: quick iterations, conversational flow, and rapid feedback loops. They're excellent when a developer is waiting for a response. But code review is fundamentally different. It's asynchronous, doesn't require immediate responses, and benefits from deeper, more exhaustive reasoning.

GPT-5.2 takes more time and makes more tool calls, but produces more thoroughly reasoned analysis. For code review, this trade-off is exactly right. We'd rather wait an extra 30 seconds and catch a subtle concurrency bug than get a fast-but-shallow review that misses cross-system issues.

This isn't a judgment on model quality. Both families are exceptional at what they're optimized for. It's about matching the right model to the right use case. For interactive coding in our IDE, Sonnet's speed advantage matters. For code review, GPT-5.2's deliberate reasoning wins.

The Augment advantage: model agnostic by design

One of Augment's core strengths is that we're model-agnostic.

As models continue to evolve, we can adopt the best tool for each specific job. This pragmatic, use-case-driven approach is how we maintain our benchmark-leading performance while giving developers the best experience at every stage of development.

Given GPT-5.2's strong performance in code review, our team is currently evaluating whether to surface it in the model picker for general code generation use cases. The model's thorough reasoning approach requires some additional tuning for interactive workflows, but we're exploring how to bring its capabilities to more parts of the development lifecycle.

Ready to see GPT-5.2-powered code review in action? Augment Code Review is available now for all Augment Code users. Learn more → or read the full benchmark analysis →

Why GPT-5.2 is our model of choice for Augment Code Review

What makes a great code review model?

Why GPT-5.2 excels for code review

Why GPT-5.2 over Sonnet for code review

The Augment advantage: model agnostic by design

Written by

Akshay Utture

Give your codebase the agents it deserves