If you're leading an engineering team at scale, you've probably felt a familiar tension around AI automation. The energy is real. Everyone knows the potential is there. But when you're responsible for production systems that millions of people depend on, the path from experimentation to confident deployment isn't always clear.
I've spent the past few months in conversations with engineering leaders working through exactly this: how do you move from interest and early experiments with AI to automating real-world production use cases across the SDLC?

AI automation in the SDLC.
In this post, I’ll share the pattern that’s emerging. Teams finding success aren't trying to boil the ocean overnight. They're taking a deliberate approach: identify leverage points, iterate using evaluation datasets, and measure the direct business impact. And it’s an approach you can start to implement in your own development lifecycle today.
How to choose your starting point
The most tractable automation wins we're seeing cluster around a specific part of the development lifecycle: the pull request. Agentic coding tools are now the standard for helping developers craft their PRs. This makes sense when you think about it, because the PR is where developer work becomes visible, measurable, and repeatable. It's where you have clear inputs and outputs, historical data to learn from, and well-defined success criteria. But the rest of the per-merge lifecycle around a PR - the reviews, tests, builds, documentation, and other miscellaneous tasks - are often still manual. Here are a few real-world examples of what we're seeing work.
Code review: the fastest path to impact
Code review is the most obvious example here. It runs on every single pull request, which means high volume and immediate impact. If your team is doing hundreds of PRs a week, even small improvements in review efficiency compound quickly.
When the team at automotive technology company Tekion adopted AI code review, they increased delivery throughput, reduced feedback latency from days to minutes, and dramatically cut human review effort.
- Average time to merge dropped from 3 days 4 hours to 1 day 7 hours resulting in 60% faster merges
- Time to first human review dropped from 3 days to 1 day
- 21% more merge requests merged with the same number of engineers
Build analyzer: high-value automation when things break
Not every automation needs to run constantly to be valuable. Take build and test failures in CI/CD pipelines. They can be relatively infrequent but when they happen, they consume significant engineering time, involving multiple changes, long-running builds, and deep institutional knowledge to diagnose. AI-driven build analysis can triage failures by analyzing logs, correlating recent changes, and providing a first-pass root cause analysis. This reduces time to resolution, lowers support burden, and turns institutional knowledge into reusable automation. It’s a low-volume use case with outsized ROI.
Unit test and doc generation: maintaining quality at scale
Test coverage and quality documentation are two tasks that don’t always fit cleanly into fast-moving developer workflows. Asynchronous automation changes that. By generating tests and documentation in parallel with PR creation, AI can evaluate changes in broader codebase context and apply existing standards consistently. The result is a way to scale quality (and keep your documentation AI-ready) without scaling headcount.
Ticket-to-PR automation: closing the loop from issue to implementation
The gap between issue tracking and code implementation is where context gets lost and velocity slows down. Ticket-to-PR automation closes this loop by automatically generating pull requests directly from issue tickets by analyzing requirements, implementing changes, running tests, and creating properly linked PRs.
This automation shines for well-scoped work like simple bug fixes, routine maintenance, dependency updates, or features following established patterns. Start with low-risk backlog tasks off the critical path (documentation updates, test additions, simple refactors) and expand as you build confidence. Autonomous code generation has more failure modes than review, so the crawl-walk-run approach pays dividends.
The implementation challenge: quality takes iteration
Once you’ve chosen a starting point, standing up an automation is relatively easy. Getting it to deliver consistently high-quality results in production takes discipline.
The teams that move beyond experimentation share three traits: they use historical data to define “good,” they iterate deliberately, and they invest in context.
Start with evaluation datasets
You need examples of what “good” looks like in your specific context. For code review, that might be a corpus of past reviews, including the PRs themselves and the comments that reviewers left. For test generation, it's your existing test suites. For documentation, it's the docs that your team has actually written and maintained.
This historical data serves two purposes:
- It gives you a baseline to evaluate against. You can see whether your automation is catching the same issues that human reviewers caught, or generating tests that look like the ones your team would write.
- It gives you something to iterate on. You're not just guessing whether your prompts are working; you can measure improvement over time.
Iterate toward quality
From there, take an iterative approach. Let me give you a concrete example from our code review automation. We started with a corpus of historical pull requests and the review comments that engineers had left on them. That became our gold standard of the kinds of issues we want the AI to catch.
When we ran our first version of the prompt against that dataset, it caught some things and missed others. So we iterated on the prompt, ran it again, and measured improvement. It’s this hill-climbing process, where you make a change, evaluate against your dataset, then measure the delta, that helps you move AI-assisted engineering from "fine" to "actually valuable."
Tilt, a fintech company with a large, monolithic codebase, took this approach even further. They documented engineering knowledge, codified review guidelines, and integrated Augment with their Notion MCP for core concepts and Linear MCP for QA notes. Every rule change was validated with before-and-after PR comparisons. The results were transformative:
- 30% increase in PR velocity
- 40% reduction in merge times
- 23.5% drop in human review comments
Context is the quality multiplier
Context engineering is a piece that teams often underestimate, and it's where having the right infrastructure makes a huge difference. One of the things you have to figure out is what context to bring into your automation: what parts of the codebase are relevant, what metadata matters, what historical information helps the model make better decisions.
For most automation use cases, semantic codebase context is the single most important thing. You need the specific code changes, yes, but you also need the surrounding context, like related files, similar patterns elsewhere in the codebase, architectural decisions that inform how things should be structured. And because the agent is running by itself in these automation scenarios, without a developer in the loop to course-correct, that context becomes even more critical than it is in IDE use cases.
This is where Augment's context engine becomes a real advantage, like it did for Tilt. Instead of having to architect your own solution for semantic code search, or experiment with different embedding approaches, or try to approximate what context matters, you get that out of the box. It's one less thing to build and maintain, and it's built by people who've spent years thinking about nothing but code context.
Moving from energy to execution
The gap between excitement about AI automation and actually shipping something valuable doesn't have to be as wide as it feels right now. The teams that are making progress aren't doing anything magical, they're just being deliberate about where they start and they’re taking the time to fine tune.
Pick one leverage point in your PR lifecycle. Build an evaluation dataset from your historical data. Start iterating. Measure your results. Don't try to automate everything at once.
If you're looking for the highest-probability first win, code review is hard to beat. It's high volume, the success criteria are clear, and you can start seeing value immediately while you learn the patterns that will help you expand to other use cases.
Ready to get started? Explore Augment Code Review and see the difference our context engine makes.
Written by

John Edstrom
Director of Engineering
John is a seasoned engineering leader currently redefining how engineering teams work with complex codebases in his role as Director of Engineering at Augment Code. With deep expertise in scaling developer tools and infrastructure, John previously held leadership roles at Patreon, Instagram, and Facebook, where he focused on developer productivity and platform engineering. He holds advanced degrees in computer science and brings a blend of technical leadership and product vision to his writing and work in the engineering community.
