Confidence is the new bottleneck

Everyone in our industry is talking about AI and coding. How many lines can agents write? How many PRs can they generate? How fast can they ship? But if you pop out one level, the question that matters is different: what does coding quickly actually give us?

The rate of change in AI right now is so high that I find myself constantly comparing notes with other engineering leaders just to keep up. Lately, I've been talking with CTOs and VPs of Engineering about their AI-native transformations. Their teams are deep into adoption. Agents are generating a significant and growing chunk of their code. But the word that keeps coming up in these conversations isn't velocity or productivity. It's confidence.

Every one of these leaders is wrestling with the same question: how do you build confidence that the code agents produce actually does what it's supposed to do? And across the board, this problem is largely unsolved.

At Augment, we committed earlier this year to journaling and sharing our own AI-native journey as we navigate it. Not a polished narrative after the fact, but the raw learnings alongside the ups and downs. This post is part of that series. And the confidence question is one we've been living with ourselves.

The real complexity was never about coding

I've been rereading a paper by Fred Brooks called "No Silver Bullet" that was written in 1986 and has become freshly relevant. Brooks separated software complexity into two categories. "Accidental complexity" is the mechanics of translating a known solution into working code. Syntax, boilerplate, remembering APIs, wiring things together. "Essential complexity" is everything else. Figuring out what to build. How to design it. How it fails. How it evolves. How humans maintain it over time.

His argument was that most of the real difficulty in software lives in the essential complexity, and no single tool would ever eliminate it. That was true in 1986. It's even more true now.

AI agents are rapidly eating the accidental complexity. They're getting extraordinarily good at taking a well-defined problem and producing working code. But the essential complexity? That's still fully ours.

This connects to something I keep coming back to: Goldratt's Theory of Constraints. The throughput of any system is determined by its tightest constraint. For decades, that constraint was writing code. AI agents are removing it. But that means the constraint has shifted. And if you don't identify the new constraint, you end up optimizing the wrong thing.

Goldratt's Theory of Constraints: The throughput of any system is determined by its tightest constraint. That's everything agents can't do.

The new constraint is everything agents can't do: verifying that the product behavior is correct, writing specs that are clear enough to constrain the solution space, making architectural judgments about whether a technically correct change is the right one for your system. The work that was always underneath the code is now the bottleneck.

Software engineering was always much more than coding. Now we can finally see it.

What's actually at risk

About three months ago at Augment, we pushed hard on doing everything with agents. We quickly found that the more code our agents generated, the less confident we were in what was shipping. We had to fundamentally course-correct.

Here's what we learned, and what I'm hearing echoed across the industry.

Agents start generating more code. PR volume goes up. Reviews become the new bottleneck, and teams feel pressured to rubber-stamp them. Celebrate the velocity. But underneath that velocity, something is eroding.

Coding used to be how engineers built system intuition. You'd discover edge cases while writing. You'd feel the friction of a bad abstraction in your hands. When agents do the writing, that understanding doesn't form the same way.

This raises a set of questions every engineering leader should be sitting with right now:

What happens when the team's collective understanding of the codebase has quietly eroded, and nobody realizes it until something breaks?
Who debugs issues when nobody on the team wrote the code? How do you even know what "good" looks like when you can't trace how you got here?

The real challenge is that these effects don't show up immediately. By the time you notice, the gap between what's in production and what your team understands is already wide.

This is what makes confidence the central challenge. Not just "does the code work?" but "does our team understand it sufficiently to build on it and make the right tradeoffs on an ongoing basis?"

Where human judgment matters most

Coming out of our course-correction, we took time to pause and deeply reflect with our team at large. We converged on three clear leverage points.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT

Specs and spec quality. If the constraint has moved upstream to the "what" and "why," then the quality of specifications becomes the highest-leverage input we have. What does a great spec look like in an agent-first world? How do you write one that actually constrains the solution space rather than leaving agents to guess? Our team arrived at a striking conclusion: spec reviews may become more important than code reviews. One group reframed the whole aspiration. Ticket to PR is the wrong goal. It should be intent to production. This is a big part of why we built Intent, a workspace designed for spec-driven, agent-first development.

Code review, reimagined. Code reviews, when done well, always served two purposes. First, they ensured the code being shipped is high quality and safe. Second, they helped the team understand how the codebase is evolving.

Code review solutions that are in the market today can help with the first purpose. We've invested in a comprehensive automated code review that leverages our context engine's deep understanding of the full codebase to catch quality issues, security concerns, and adherence to patterns at a level of thoroughness that's hard to match manually at scale. Each PR gets flagged as low, medium, or high risk.

But what about the second purpose? That's the harder problem. We’ve been experimenting with a new workflow. For medium and high risk PRs, we invite the engineer to a different kind of experience in our Agentic SDLC platform. The agent guides them through understanding how the PR is evolving the codebase. Not by reading the diff line by line, but by interacting with the agent to understand the key ways the system is changing. That's a fundamentally different experience than traditional code review, and it's how we think teams will maintain system understanding in an agent-first world.

Validation. This is the one that comes up the most in my conversations with engineering leaders. Can we build confidence that the system does what we intended, end to end? Agents can generate code that compiles and passes unit tests but misses the bigger picture entirely. You need a system that allows your agents to run end-to-end tests against your actual infrastructure for every change. Not "does it compile." Not "do the unit tests pass." Does it actually do the right thing? That's the bar, and most teams don't have the infrastructure to clear it today.

Today's tooling isn't built for this

The tooling we have today was designed for a world where humans wrote the code and needed help along the way.

We're entering a world where agents write the code and humans need to steer, verify, and build confidence. That requires fundamentally different infrastructure and different tools.

We've been feeling that pain firsthand. And we've been building a platform that makes this new way of working natural, where human judgment is applied at the moments it matters most and the system keeps learning and improving over time.

We've been kicking the tires on this platform inside Augment with a lot of positive early feedback, and we're starting to share it with a few alpha customers. If this resonates with the challenges your team is facing, we'd love to hear from you.

The learning journey continues

What I believe more and more is this: organizations don't actually want to generate more code. They want to go from intent to production with confidence. That's the real job. And figuring out how to do it well is a software engineering challenge, not a coding challenge.

The engineers who will thrive in this world are the ones who were always strong at the engineering underneath the code. The judgment, the taste, the rigor. We wrote recently about the dimensions that matter most now and how we're rethinking what great engineering looks like. The craft is moving up the abstraction stack, and that's an elevation, not a demotion.

If you've had a moment recently where the engineering underneath the code mattered more than the code itself, I'd love to hear about it. What's the new constraint in your team? Where is human judgment creating the most leverage?

We're all on this learning journey. Let's navigate it together.

This is part of an ongoing series about Augment's AI-native transformation. Previous posts: We're in an exponential, The hardest part about going AI-native isn't just technical, and How we hire AI-native engineers now.