Skip to content
Book demo
All episodes

Podcast · April 23, 2026

From FOMO to flow: The AI metrics that actually matter

Engineering leaders are drowning in AI hype. Everyone's hearing promises of 5x, 10x productivity gains, and very few are actually seeing them. Justin Reock, CTO at DX (recently acquired by Atlassian), has the data to cut through the noise. Host Emma Webb and Justin get into what top-performing teams are doing differently, why AI coding tools are only attacking 16% of the problem, and what "AI readiness" really means in practice. They also cover rubric-based self-validation for agentic code, token cost optimization, and the danger of Goodhart's Law in engineering metrics. Plus a hot take: if developer experience is the leading indicator of developer productivity, agent experience is going to be the leading indicator of agent productivity. If you lead an engineering team and you're trying to figure out where AI fits and whether it's working, this one's for you.

Transcript

Emma Webb: I have been so excited to talk to you because I think there's so much anxiety, so much uncertainty right now about what engineering leaders should be doing with AI.

Justin Reock: Absolutely. Everyone's feeling FOMO. Everybody's feeling left behind. No one knows what to measure. No one really knows what kind of impact these tools are having. People keep hearing things like, "We're expecting 2x, 5x productivity gains," and none of that is actually happening. The nice thing about working at DX is that we can point to the data. We have a massive dataset across hundreds of thousands of engineers and hundreds of companies, so we can actually look at where AI is moving the needle. We're about to release a longitudinal study looking at PR velocity across 500 companies between November 2024 and February of this year. I think some people will be disappointed. The median uplift was about 7.5%.

Emma Webb: What's the P90?

Justin Reock: Around 40–60%.

Emma Webb: Still less than I would've expected.

Justin Reock: Less than most people expect. But I think that's because we're still not addressing all the bottlenecks. Engineers only spend about 16% of their time actually writing code. So even if you had a perfect AI that generated flawless code instantly, you're still only improving 16% of the system. That's why I think agent orchestration, workflows, and broader applications of AI are where we'll see larger gains in the future. But the data is still encouraging. A 10–15% productivity improvement is a great outcome. People just need to recalibrate their expectations.

Emma Webb: So you're basically telling engineering leaders to relax.

Justin Reock: Pretty much. A lot of people are panicking because they think other organizations are seeing 200% gains and they're getting left behind. That's not what we're seeing. The strongest performer in our study saw about a 70% improvement. That's exceptional. But it's still not 2x or 5x.

Emma Webb: I want to know what they're doing.

Justin Reock: A lot of it comes down to education, context engineering, AI readiness, platform improvements, code modularity, documentation quality, and accessibility of information. Education has been one of the strongest indicators of success. Not just providing training materials, but giving engineers time to learn, experiment, and fail safely. Across every company and demographic, we saw a learning curve. When organizations first adopted AI tools, productivity and quality often went down. Then things normalized. Eventually, heavy users began outperforming non-users.

Emma Webb: Any examples?

Justin Reock: One of my favorite examples is Zapier. I interviewed the person leading their agent ecosystem. They've built an environment where you can onboard a new agent and have it running in production in just a couple of days.

Their philosophy is simple: If you do something more than once, build an agent for it. Everything from email signature generation to operational tasks is automated. They're seeing about a 15% productivity improvement per engineer. What I love is that they're hiring more engineers than ever because they know each engineer can now generate more value. Imagine a company with no backlog. Imagine responding to customer feedback almost immediately because you have that much engineering capacity. That's the opportunity.

Emma Webb: Two years ago, if someone had told me we'd live in a world with no backlog and bugs getting fixed almost immediately, I'd have thought they were disconnected from reality.

Justin Reock: That's induced demand. It's the same phenomenon we see with roads. You add more lanes and traffic increases. You increase engineering capacity and suddenly you find more valuable things to build.

Emma Webb: One thing we're talking about a lot at Augment is confidence. If I didn't write the code and maybe didn't even review all of it, how can I be confident in what I'm shipping?

Justin Reock: The first thing is measurement. But I think one of the most important emerging patterns is self-validating agent loops. One approach I like is rubric-based validation. You sit down with the model and define what quality means: * Security * Reliability * Input validation * Maintainability Then you build a scoring rubric. Once the rubric exists, you have the model generate the code and continuously score itself against the rubric. Then it provides both the output and its self-assessment. Humans still need to review, but now they're reviewing with much more context.

Emma Webb: Do you prefer one agent doing everything or separate agents?

Justin Reock: If you can afford it, the team-of-experts model is fantastic. Maybe a more creative model helps generate the rubric. Then a lower-temperature reasoning model performs the validation. Different models can specialize in different parts of the workflow.

Emma Webb: I've been thinking about something. Maybe our standards can actually go up now. Historically we'd say, "That test is flaky, but it's probably fine." Do we live in a world where we never lower the bar because we can just have agents keep trying until they get it right?

Justin Reock: I actually think we should raise the bar. Running something a thousand times is expensive. We burn a lot of tokens doing that.

And let's be honest: most of these companies still aren't making meaningful profits on AI. We don't know what pricing structures will look like long-term. So anything we can do today around token optimization, cost optimization, and reducing unnecessary loops is valuable.

Emma Webb: You're about to launch something around measuring agents, right?

Justin Reock: Yes. Part of DX has always been collecting signals through surveys. We get incredibly high participation rates from engineers. Now we're doing the same thing with agents. After an agent completes a task, we send it a survey.

Emma Webb: Wait. What?

Justin Reock: I'm completely serious. We're calling it the Agent Experience Index. If developer experience predicts developer productivity, then agent experience will likely predict agent productivity.

And here's the interesting part: What's good for humans is usually good for agents. All the things we've been telling organizations to improve for developer experience over the last decade suddenly matter even more because poor experiences now translate directly into token costs.

Emma Webb: I was struck by how pragmatic you are about costs. Most people aren't talking about that yet.

Justin Reock: Maybe because we're still trying to figure out cloud costs 15 years later. But we have to think about it. There are different philosophies. Some people take a "just ship more" approach. Steve Yegge's Gas Town project is a good example.

His philosophy is basically: Fill as many barrels with fish as possible and don't worry about the fish that fall out. The problem is that it's incredibly expensive.

Emma Webb: I heard there was a token leaderboard at Meta.

Justin Reock: There was. And engineers did exactly what you'd expect. They started gaming it. People would do wildly wasteful things just to climb the leaderboard. Which is why we need to be careful. We shouldn't celebrate token usage for its own sake.

Emma Webb: That sounds like Goodhart's Law.

Justin Reock: Exactly. When a measure becomes a target, it stops being a good measure. There's a great example called the Cobra Effect. A king offered money for dead cobras. People started breeding cobras. When the reward program ended, they released all the cobras. The problem got worse. That's what happens when you optimize a single metric.

Emma Webb: One thing I'm hearing is that traditional developer experience work suddenly matters a lot again.

Justin Reock: Absolutely. And I think the thing people are still missing is the broader system. We spend too much time thinking about code generation and not enough time thinking about the entire software delivery lifecycle. We need to map how value flows through organizations. Not just from ticket to pull request. But from idea to customer value. From ideation all the way to revenue. Once you understand that flow, it becomes obvious where AI can have the biggest impact.

Emma Webb: That sounds like a return to first principles.

Justin Reock: Exactly. This is the moment to revisit first principles. How does value actually move through the organization? How do we apply these tools in ways that align with our culture instead of blindly copying what everyone else is doing? This requires intentionality. Culture doesn't happen automatically. You either intentionally shape it or you inherit whatever grows by default.

Emma Webb: That's interesting because agents are taking over more execution work. Maybe our job becomes defining vision, culture, and intent.

Justin Reock: I think that's right. We're stressing the system. Pressure doesn't create flow. Removing friction creates flow. That's why we need to revisit fundamentals.

Emma Webb: You have a pretty optimistic perspective.

Justin Reock: I do. I think there's never been a better time to be a software engineer.

Emma Webb: That's a hot take.

Justin Reock: Maybe. But I've lived through multiple hype cycles where people claimed we wouldn't need engineers anymore. Every single time, demand for engineers increased. Right now we're in an awkward transition period. There are macroeconomic factors, geopolitical factors, and AI all hitting at once. But I'm talking to engineering managers who are shipping four times as much code as they were a few years ago. They're having fun. They're builders again.

Emma Webb: The infamous METR study found productivity declines, but one thing that stood out was that engineers still felt more productive.

Justin Reock: Exactly. Even when the measured outcomes weren't great, people felt productive. They enjoyed using the tools. And I think that's important. It's fun. It's fun to build with these tools. It's fun to remove friction. It's fun to stay in flow.

So my advice is simple: Have fun. This is the best time to be writing software.

Emma Webb: Justin, thank you so much. This was a delight.

Justin Reock: This was wonderful. Thanks for having me.

More from Augment

See how teams are using Cosmos to scale agentic engineering.