How does AI change the SDLC at the planning stage specifically?

AI agents automate ticket analysis, task decomposition, and requirements extraction. Human labor shifts to specification authorship, which directs all downstream agents.

What is the biggest risk of AI in the testing stage?

Circular validation is a critical risk because when AI generates both test cases and implementation code, tests can confirm the implementation's assumptions instead of independently verifying behavior. Specification-driven testing addresses this failure mode.

Does AI adoption improve DORA metrics?

AI adoption may improve some throughput measures while increasing delivery instability. DORA's 2025 report links the two, so teams with loosely coupled architectures benefit more than tightly coupled ones.

What new roles does an agent-native SDLC require?

An agent-native SDLC increases demand for orchestration, oversight, and infrastructure roles because agents take on more execution work. Active hiring signals include Agentic DevOps Engineer (Accenture), Engineering Manager for AgentOps (Scale AI), and Software Engineer for Agent Infrastructure (OpenAI).

Should organizations start AI adoption at the coding stage?

Not necessarily. Thoughtworks podcast recommends moving rigor upstream to specs, tests, and review of what produced the code, and DORA findings suggest teams without strong DevOps foundations see more instability as AI adoption increases.

How fast should organizations expect productivity gains?

Organizations should set conservative expectations because productivity outcomes depend on specification quality, review capacity, rollback controls, and the condition of the codebase.

How AI Changes the SDLC: A Six-Stage Guide

The agent-native SDLC moves delivery into agent-orchestrated workflows. Specifications guide autonomous agents, while developers spend more time validating, orchestrating, and overseeing work across the lifecycle.

TL;DR

Most teams adopt AI unevenly across the SDLC, with coding assistance advancing faster than upstream planning and downstream governance. Agents increasingly replace, augment, or restructure work at each stage, raising throughput while exposing stability risk where review boundaries stay undefined. Human oversight grows more important as autonomy expands across the lifecycle.

The Fork Every Engineering Leader Faces

Engineering leaders face uneven AI adoption across all six SDLC stages. Coding assistance is advancing faster than upstream planning and downstream governance. The result is higher throughput in some stages and higher stability risk in others. Forrester analysis describes uneven adoption across SDLC stages, and DORA report reports that AI adoption can raise throughput while also increasing change failure rates.

The common constraint across stages is control over multi-file work: codebase-wide context lets agents map dependencies, and explicit review boundaries define where humans must intervene. Architects approve framework and infrastructure choices, QA verifies tests against specs, and release owners hold rollback gates before automation expands into production.

SDLC Stage	Primary Shift
Planning	Specifications become the control plane
Design	More architectural decisions require explicit review
Implementation	Developers move toward orchestration and verification
Testing	Specification governance becomes central
Deployment	Throughput gains require stronger rollback controls
Maintenance	Operations move toward autonomous detection and remediation

Augment Cosmos is the unified cloud agents platform with shared context and memory that compounds across the team and the software development lifecycle. Built on the Context Engine, it coordinates agents across every stage of delivery instead of bolting one agent onto a single step.

[ Free report ]

The Agentic SDLC

How teams like Stripe, Ramp, and Uber move from solo coding agents to a coordinated, team-level system.

Download the guide

Stage 1: Requirements Gathering and Planning

Requirements gathering changes first because autonomous agents need precise inputs. Where developers once consumed specifications, approved specs now direct and constrain downstream execution, so human work centers on requirement quality, ambiguity resolution, and specification ownership. For a detailed framework on specification-driven agent workflows, see spec-driven development tools.

GitHub's open-source Spec Kit positions the specification artifact as the central mechanism connecting human intent to agent execution. Microsoft guide describes three layered artifacts that organize implementation: requirements capture intent, plans translate it into technical decisions, and task lists break the plan into units agents can implement.

Autonomous planning work shifts left because agents can parse unstructured inputs into execution-ready artifacts, but humans still own business intent and ambiguity resolution.

Requirements quality becomes the delivery bottleneck because faster agent implementation exposes planning constraints that human teams previously absorbed later in the lifecycle.

Activity	Agent Role	Human Role
Ticket analysis	Generates structured plans, identifies open questions	Reviews decomposition, validates scope
Requirements extraction	Collects from meetings, emails, documents via NLP	Validates business intent, resolves ambiguities
Effort estimation	Produces estimates with justifications	Evaluates estimates, provides calibrating feedback
Specification authoring	Drafts layered specs from intent	Approves specs as the control plane for downstream agents

A dedicated intent-engineering role emerges because developers increasingly translate ambiguous business goals into testable specifications for agent execution.

Stage 2: Software Design and Architecture

Software design changes when AI agents make framework, infrastructure, and integration choices faster than normal review processes can govern them, which increases the number of architectural decisions that need explicit human review. An arXiv paper frames this as the "vibe architecting" problem: choices made in seconds become architectural decisions even when no one reviews them that way. In three years, agents have moved from line-level autocomplete to system-level scaffolding of entire projects from a single prompt.

Architectural Area	Why Review Matters
Framework selection	Changes long-term implementation constraints
Infrastructure scaffolding	Sets platform and deployment assumptions
Integration wiring	Creates cross-system dependencies that outlast the prompt

Architecture agents add value when they analyze repository patterns and draft decision records, because repository-wide context reduces contradictory design choices in large multi-file codebases. The Context Engine builds semantic understanding of dependencies and architecture across an entire codebase.

Human architects remain necessary because code context rarely contains boundary conditions, quality attributes, and business trade-offs. An arXiv paper on rethinking software engineering states that engineers must articulate boundary conditions, quality attributes, and design trade-offs that generative models cannot infer from context alone. Organizational standards, compliance implications, and business logic remain human-dependent.

Risk-aware architectural gating reduces deployment risk when reviewers treat high-volume code changes as architecture-level decisions. Meta DRS operates as an AI-driven risk-aware gatekeeper. During a major partner event in 2024, DRS let teams land more than 10,000 code changes that previously could not have landed during a code freeze, with minimal production impact.

Architect roles shift toward decision engineering because AI can generate design artifacts faster than organizations can evaluate trade-offs.

Stage 3: Implementation and Development

Implementation becomes agentic when AI systems plan, generate, modify, test, and explain software artifacts across multiple SDLC stages. Developers then spend more time reviewing plans, validating outputs, approving changes, and setting boundaries for autonomous work. Forrester defines Agentic Software Development (ASD) as agents doing this work alongside human developers with a degree of autonomy. ASD gives agents more agency than earlier AI coding tools, spans design through delivery, and targets professional developers in complex codebases.

Multi-agent execution changes implementation because specialized agents can coordinate multi-step repository work while humans retain approval over plans and strategic choices. Augment's Auggie CLI scored 51.80% on SWE-bench Pro in February 2026, the top published result among coding agents at the time, powered by a Context Engine that processes entire codebases across 400,000+ files through semantic dependency graph analysis.

Implementation results improve when teams define plan approval, code review, architecture review, and release gates before they increase agent output. Task-level adoption raises output volume without resolving those bottlenecks.

Developer value bifurcates because AI absorbs code translation work faster than it absorbs business judgment and SDLC redesign work. Forrester analysis emphasizes that developers spend much of their time on activities beyond coding, including design, testing, bug fixing, and meeting with stakeholders.

New implementation roles emerge because higher agent autonomy increases the need for orchestration, verification, and accountable judgment. An arXiv review identifies the shift from authorship toward orchestration, verification, and accountable judgment in software engineering work, but it does not explicitly name the roles "AI workflow/orchestration engineer" or "AI governance/assurance lead."

Stage 4: Testing and Quality Assurance

Testing becomes agentic when systems observe an application, decide what to test, generate tests, execute them, and report findings with minimal human intervention. QA work then centers on spec quality, risk review, and coverage judgment. The structural difference from AI-assisted testing: the system owns more of the test loop, while engineers still define the requirements and review the results.

Circular validation becomes the core testing risk because AI-generated tests can inherit the same assumptions as AI-generated implementation unless tests derive from stable specifications. Recent authoritative sources on generative AI identify hallucinations and inadequate evaluation frameworks among the main risks in AI-assisted workflows. In testing, this appears when AI generates both test cases and implementation code, so tests may confirm the implementation's assumptions instead of verifying behavior against requirements. Specification-driven testing keeps tests aligned with expected behavior and reduces some sources of non-determinism. For tool-specific comparisons, see code review tools.

Self-healing testing reduces locator brittleness because ML models track multiple UI properties instead of a single selector in changing interfaces.

Autonomous testing still requires human oversight because agents can misread requirement gaps, invent features, and report success while failures remain unresolved. Fowler article documents a Thoughtworks experiment in which the agentic workflow generated features not requested, made shifting assumptions around requirement gaps, and declared success even when tests were failing.

Testing Capability	Agent Maturity	Human Role
Self-healing test locators	Production-ready in some modern testing platforms, with growing adoption	Monitor false-positive rates
Test generation from specs	Functional, requires spec quality	Author and maintain specifications
Autonomous test strategy	Still emerging, requires human oversight	Define risk-based strategy, review coverage
Circular validation prevention	Requires architectural controls	Ensure tests derive from specs, not code

Repository-wide analysis gives QA teams visibility into coverage, implementation patterns, and spec artifacts before release. The Context Engine processes codebases of 400,000+ files, with semantic understanding across code, dependencies, architecture, and commit history. Teams comparing test automation options can review agent evaluation tools.

QA roles shift toward specification governance because humans must keep AI-generated tests grounded in requirements. When AI generates both code and tests, QA engineers must verify that test generation reflects requirements instead of the implementation.

Stage 5: Deployment, CI/CD, and Release Management

Deployment, CI/CD, and release management change when AI-assisted pipelines accelerate throughput and raise stability risk in tightly coupled systems. Teams need stronger governance and rollback controls as automation expands. In 2025, DORA introduced its inaugural "State of AI-assisted Software Development" report, signaling AI's prominence in its research agenda.

DORA report reports that AI adoption shows a positive relationship with software delivery throughput and product performance, but a negative relationship with software delivery stability. Teams working in loosely coupled architectures with fast feedback loops see gains. Teams operating tightly coupled systems with slow processes see little or no benefit.

DORA Metric	Directional Finding
Deployment Frequency	Organizational delivery metrics such as deployment frequency often remain flat or worsen with AI adoption without proper value stream management
Lead Time for Changes	May decrease as AI improves throughput, though AI can also expose downstream weaknesses and instability
Change Failure Rate	Increases with AI adoption
Failed Deployment Recovery Time	Fast rollback can reduce recovery time after failed deployments

Deployment agents operate inside bounded release systems because prediction, rollback, and misconfiguration detection still depend on platform permissions and governance controls. Augment Cosmos makes those boundaries explicit: teams set the policies for where human judgment is required, and Cosmos enforces them across agent runs.

Rising change failure rates make repository-wide awareness most relevant in deployment work that crosses dependency graphs and cross-file relationships. Teams evaluating supporting infrastructure can compare CI tools.

Stage 6: Maintenance, Monitoring, and Operations

Maintenance, monitoring, and operations change when AI agents detect incidents, suggest remediation, and learn from repeated operational patterns. Human operations work concentrates on exceptions, hardening, and earlier risk reduction.

Operations Area	Agent Contribution	Human Focus
Incident response	Detection and remediation	Exception handling
Repeated failures	Learned remediation patterns	Infrastructure hardening
Postmortems	Support for analysis	Risk management earlier in the lifecycle

Maintenance delivers the highest value when AI reduces comprehension bottlenecks, because understanding code consumes more engineering time than writing it. An arXiv paper from CodeScene and Lund University notes that understanding existing code is a major bottleneck in software development. Program comprehension consumes approximately 70% of developer time. AI-native maintenance shifts debt management from periodic cleanup cycles to continuous hygiene. Code quality, test coverage, documentation, and dependency upgrades become always-on capabilities.

Operational learning compounds over time because agents can convert repeated incident patterns into reusable remediation skills. AWS docs describe AI agents in general as systems that can learn from past interactions.

SRE work moves earlier in the lifecycle because AI improves detection, mitigation, and postmortem support while humans focus on risk management and hardening. Google SRE describes AI as a way to improve incident detection and investigation. It enriches alerts with context, shortens initial investigation, and supports root-cause analysis. Google's incident-management guidance also emphasizes immediate postmortems that examine improvements to detection, mitigation, coordination, and communication. Teams evaluating operational support stacks can compare observability tools.

The New Roles Across the Agent-Native SDLC

As agents take on more execution work, engineering teams need people who can orchestrate agent workflows, validate outputs, and define accountability across the lifecycle.

Open source

augmentcode/auggie★258

Star on GitHub

Oversight capacity can become a limiting factor as organizations deploy AI systems at scale, because they must redesign governance, accountability, and performance processes to support human oversight.

Roles with active hiring signals:

Role	Company	Key Requirement
Agentic DevOps Engineer	Accenture	Minimum 1 year with LLMs, agentic frameworks (LangGraph, Crew AI, Autogen), and prompt engineering/RAG
Engineering Manager, AgentOps	Scale AI	Managing the engineering team and driving technical delivery for the AgentOps team
Software Engineer, Agent Infrastructure	OpenAI	Container orchestration, FastAPI/gRPC APIs, agent training and deployment

These signals come from current job postings at Accenture, Scale AI, and OpenAI.

Skills increasing in value because agent execution raises the premium on oversight, architectural reasoning, and specification quality:

Translating ambiguous business requirements into precise, testable specifications for AI agents
System-level oversight and validation across multiple agent outputs
Architectural skills and business domain knowledge
Agentic framework experience (LangGraph, Crew AI, Autogen)

Entry-level pipeline risk grows when organizations automate foundational tasks faster than they redesign junior roles around review and validation. Reducing entry-level headcount makes teams top-heavy when experienced engineers absorb the AI-supervision burden, and the pipeline for future senior engineers narrows as the tasks that once built foundational skills are automated.

The Amplifier Effect: Why Your Foundation Determines Your Outcome

DORA's 2025 report characterizes AI's primary role as an amplifier of existing organizational strengths and weaknesses. Disciplined engineering culture lets teams move faster without losing control; weak delivery practices create technical debt at speed.

Architecture reviews, release controls, workflow ownership, and governance gates determine AI returns because tool adoption alone does not remove delivery constraints. Two research streams support the view that AI outcomes depend on the underlying organizational system:

DORA 2025: underlying organizational systems shape AI returns more than tool selection
CMU study of 807 GitHub repositories: AI briefly accelerates code generation but then returns to baseline rates, while static analysis issues rise approximately 30% and code complexity rises more than 40%

Before scaling agent deployment, CTOs should stress-test their current review and release controls.

DORA report quantifies two interventions. Transparently addressing job-displacement fears correlates with 125% more team AI adoption, while dedicated learning time during work hours is linked to a 131% increase.

Foundation Question	Why It Matters
Are architecture reviews rigorous enough?	Agents can make implicit architectural decisions faster
Is test coverage strong enough?	Higher code generation volume raises verification demands
Are governance controls explicit?	Throughput gains can coincide with higher change failure rates

Audit One SDLC Stage Before You Scale Agents

Audit one SDLC stage before scaling agents to clarify where agents can act faster and which approvals, rollback checks, and failure signals must stay under human control. Gains remain uneven when organizations add AI to isolated tasks without redesigning review boundaries and ownership.

Choose one stage where adoption is already moving quickly, and work through five decisions:

Identify where agents can act autonomously
Define the human review gates that remain non-negotiable
Decide which artifacts control agent behavior
Decide which approvals stay human
Decide which failure signal to watch first

How AI Changes the SDLC: A Six-Stage Guide

TL;DR

The Fork Every Engineering Leader Faces

The Agentic SDLC

Stage 1: Requirements Gathering and Planning

Stage 2: Software Design and Architecture

Stage 3: Implementation and Development

Stage 4: Testing and Quality Assurance

Stage 5: Deployment, CI/CD, and Release Management

Stage 6: Maintenance, Monitoring, and Operations

The New Roles Across the Agent-Native SDLC

The Amplifier Effect: Why Your Foundation Determines Your Outcome

Audit One SDLC Stage Before You Scale Agents

FAQ

Written by

Paula Hingel

Give your codebase the agents it deserves

TL;DR

The Fork Every Engineering Leader Faces

The Agentic SDLC

Stage 1: Requirements Gathering and Planning

Stage 2: Software Design and Architecture

Stage 3: Implementation and Development

Stage 4: Testing and Quality Assurance

Stage 5: Deployment, CI/CD, and Release Management

Stage 6: Maintenance, Monitoring, and Operations

The New Roles Across the Agent-Native SDLC

The Amplifier Effect: Why Your Foundation Determines Your Outcome

Audit One SDLC Stage Before You Scale Agents

FAQ

How does AI change the SDLC at the planning stage specifically?

What is the biggest risk of AI in the testing stage?

Does AI adoption improve DORA metrics?

What new roles does an agent-native SDLC require?

Should organizations start AI adoption at the coding stage?

How fast should organizations expect productivity gains?

Related

Written by

Paula Hingel

Give your codebase the agents it deserves