August 7, 2025

GPT-5 is here. And we now have a Model Picker

GPT-5 is here. And we now have a Model Picker

With the GPT-5 announcement today, we’re excited to bring it into Augment. Starting today, and rolling out gradually to everyone, you can now choose between two models in Augment:

  • Claude Sonnet 4 – still the default
  • OpenAI GPT-5 – now available via the new model picker

This is the first time we’ve run two models side by side in production. It follows weeks of controlled internal testing — the closest head-to-head we’ve ever seen.

Post image

Model Comparison: Sonnet 4 vs. GPT-5

We spent the past few weeks testing both models on the same set of coding tasks: single-file edits, multi-file refactors, test generation, and bug fixes across large repositories.

⚠️ Note: These tests used Claude Sonnet 4 with reasoning mode disabled. We’re planning a follow-up comparison with reasoning mode enabled, which may impact relative performance on multi-step or large-context tasks.

Metric / DimensionClaude Sonnet 4GPT-5
Preference rate~44% ~47%
Tie rate4%4%
Single-file editsMore direct; fewer tangential suggestionsOccasionally verbose; more context framing
Multi-file changesHandles well but sometimes misses cross-file dependenciesStronger cross-file reasoning; better dependency resolution
Refactor complexityFaster on small/mid-size changesHandles larger changes with more caution and explicit validation
Code quality commentsConcise, focused on the main changeMore thorough; includes edge-case coverage
Failure modesOccasional under-specification on complex changesOccasional over-explanation and slower iteration

*Preference rate indicates how often users preferred one model’s output over the other when shown both responses to the same prompt and code state.

Observations From Testing

Both models perform as excellent coding agents within Augment’s product, with different tradeoffs.

  • Speed vs. Thoroughness – Sonnet 4 returns faster, more direct responses and is likely to make more assumptions in order to complete the task faster. GPT-5 takes longer and makes more tool calls, but surfaces more detailed reasoning — and is more likely to ask clarifying questions when something’s ambiguous.
  • Use Sonnet for: quicker answers, more targeted edits, or when speed and decisiveness matter.
  • Use GPT-5 for: complex debugging, cross-file refactors, or when you want caution, completeness, or thoroughness.
  • Consistency Across Tasks – No single model won outright. Tester preferences clustered clearly: some preferred Sonnet for speed, while others chose GPT-5 for completeness.
    • Use Sonnet for: fast iteration and review cycles.
    • Use GPT-5 for: writing robust code with edge-case handling.
  • Scaling With Context – GPT-5 performed better in large-context scenarios, especially when changes spanned multiple files and required understanding project-wide constraints.

Why We’re Shipping a Picker

We’ve long said we wouldn’t ship a model picker — and for good reason. Our goal has always been to abstract away the complexity of LLMs and let users focus on getting work done, not choosing engines.

However, for the first time, we have two models — Claude Sonnet 4 and GPT-5 — that both deliver high-quality outputs while occupying different points on the latency-vs-quality spectrum. Rather than declaring a “winner,” we’re giving you optionality where it matters:

  • Thoroughness vs. speed – Some users prefer precision and edge case coverage; others want to iterate quickly. Now you can choose based on the task.
  • Fallback resilience – If one provider experiences latency or quality drift, you can switch models with zero workflow changes.
  • Signal for tuning – Seeing how and when users switch gives us valuable insight for future routing, prompt optimization, and agent behavior.

What’s Next

We’ll be monitoring:

  • Usage distribution between models
  • Task types where GPT-5 adoption spikes
  • Latency trends and failure modes over time

Sonnet 4 remains Augment’s default. GPT-5 is there when you want a different approach. Both will keep evolving — and your feedback will be valuable in shaping the next round of tuning.

Molisha Shah

GTM and Customer Champion