How do most organizations currently score on agentic engineering maturity?

Most organizations operate at Level 1 or Level 2. Market surveys suggest deployment of AI agents remains limited, and the article's maturity model reflects that most teams have not yet progressed to orchestrated, memory-backed, or governed autonomous workflows.

What is the single highest-ROI maturity transition?

Empirical studies show that organizations in the first two AI maturity stages underperform their industry financially, while organizations in the last two stages outperform their industry peers. MIT CISR's research places the crossover at the Stage 2 to Stage 3 boundary and identifies that move as the highest-impact transition. Reaching it requires changing how work flows through the team, beyond adopting new tools.

Should every organization aim for Level 5?

No. CMU SEI's AI Adoption Maturity Model is built so that organizations choose what level of maturity to pursue, scoped to a business unit, an organization, or the full enterprise. Pushing toward Level 5 without the governance, data, and organizational foundations creates failure modes.

How do teams avoid overestimating their maturity level?

Objective delivery metrics replace subjective self-assessment. Track DORA's five metrics, with a specific separation between throughput and instability. The stability metrics reveal whether AI adoption is creating downstream disorder that throughput-only measurement misses.

What role does data readiness play in maturity advancement?

Data readiness is a non-negotiable prerequisite at every level. MIT Sloan research states that no amount of algorithmic sophistication will overcome a lack of data. CMU SEI highlights foundational gaps in areas such as data and AI lifecycle management, governance, architectural standards, monitoring, security, and workforce training. Those gaps can lead to fragmented AI adoption, operational risk, and unsustainable implementations. Treat this as a Level 1 prerequisite.

Agentic Engineering Maturity Model: 5-Level Self-Assessment

Agentic engineering maturity is a five-level framework for assessing how systematically engineering organizations adopt AI agents.

TL;DR

Engineering teams often confuse individual AI tool use with organizational agent maturity. Conventional adoption metrics miss the instability that comes with faster shipping. MIT CISR and CMU SEI have published different maturity frameworks, and MIT CISR reports that the greatest financial impact in its enterprise AI maturity framework comes from moving from stage 2 to stage 3.

Why CTOs Need a Maturity Baseline Before Scaling Agents

Engineering leaders face a practical frustration: AI use is spreading faster than organizations can measure, govern, or operationalize it. Developers report productivity gains, but leadership still needs a reliable way to tell whether teams share prompts, version agent instructions, capture audit trails, assign owners, and measure delivery outcomes. DORA 2025 research shows AI adoption can increase both throughput and instability.

The words assistant and agent often get used interchangeably, which makes maturity hard to judge by labels alone. A structured maturity model instead maps agent adoption to observable practices. The five levels below synthesize concepts from MIT CISR, CMU SEI, Microsoft, Gartner, and DORA. Together, they form a self-assessment framework for engineering organizations adopting agentic workflows. Teams evaluating supporting infrastructure often compare categories such as AI coding tools, CI tools, and observability platforms before connecting agents to their pipelines and review process.

The product examples below use Augment Cosmos, a unified cloud agents platform that gives agents shared context and memory across the software development lifecycle.

[ Free report ]

The Agentic SDLC

How teams like Stripe, Ramp, and Uber move from solo coding agents to a coordinated, team-level system.

Download the guide

Level 1: Ad-Hoc (Individual Agents, No Shared Patterns)

Level 1 describes organizations where agent use remains personal instead of institutional. Engineers experiment in isolation, so the organization gets weak reproducibility, weak governance, and little shared learning.

Level 1 operating patterns keep agent use isolated to individual experiments

At this stage, agent activity rarely leaves an auditable path that another engineer or team can reuse. Teams usually show this pattern through experiments, ownership, and review:

AI experiments run in notebooks with no versioning, no pipeline automation, and no CI/CD integration
No centralized model registry or feature store exists
AI initiatives depend on individual champions rather than team-level practices
No documented evaluation process determines which workflows benefit from AI
No human-in-the-loop review process governs AI outputs

At Level 1, the first practical need is capture. Augment Cosmos Sessions capture each run as an audit trail that teams can replay, so the organization can reuse workflow knowledge instead of leaving expertise trapped in individual configs.

Level 1 underperforms financially because isolated experimentation does not compound

MIT CISR's 2022 survey of 721 companies found that enterprises in the first two stages of AI maturity had financial performance below industry average, while stages 3 and 4 performed well above. In that research, 28% of enterprises were at this stage.

Dimension	Level 1 Signal
Agent configuration	Per-engineer, no version control
Knowledge sharing	Trapped in individual prompts and configs
Governance	No AI-specific policies
Measurement	No tracking of AI's effect on delivery metrics
Organizational design	No shared understanding of what AI can and cannot do

Level 2: Standardized (AGENTS.md, Shared Rules)

Level 2 introduces shared defaults that make agent behavior more consistent across a team. Version-controlled guidance gives teams repeatability before they add orchestration and shared learning.

AGENTS.md standardizes agent behavior through persistent repository guidance

A repo-level AGENTS.md file gives AI coding agents persistent, project-specific operational guidance before any work begins. OpenAI's Codex documentation describes the core behavior: Codex reads AGENTS.md files before doing any work. By layering global guidance with project-specific overrides, each task starts with consistent expectations regardless of which repository is opened.

The file's hierarchical discovery system enables layered standardization across global, project, and subdirectory scopes:

Scope	Location	Audience
Personal global defaults	~/.codex/AGENTS.md	Individual engineer, applies across all repos
Team standards	AGENTS.md at repo root	Version-controlled, shared across all contributors
Module-specific overrides	Subdirectory AGENTS.md files	Service or module teams
Directory-level overrides	AGENTS.override.md	Optional higher-priority file; when present in a directory, takes precedence over AGENTS.md for that directory

Before AGENTS.md, major coding tools used their own tool-specific instruction files: GitHub Copilot uses .github/copilot-instructions.md, while Cursor moved from a legacy .cursorrules file to .cursor/rules/ with per-rule .mdc files. This fragmentation forced organizations to maintain engineering standards in multiple locations. AGENTS.md addresses that fragmentation.

Advancing beyond Level 2 requires process change that creates coordinated workflows

The next step is to turn shared standards into coordinated workflows and organizational learning.

MIT CISR identifies the Stage 2 to Stage 3 transition as the highest-value transition in enterprise AI maturity. Financial performance crosses from below-average to above-average at this boundary. MIT Sloan research specifies that organizations making this transition intentionally change processes, broadly and deeply, to facilitate organizational learning with AI.

Dimension	Level 2: Standardized	Level 3: Orchestrated
Primary mechanism	Version-controlled guidance	Orchestration across identities, triggers, and pipelines
Core operating pattern	Shared defaults through AGENTS.md	Coordinated execution across agents, systems, and delivery workflows
Coordination model	Consistent repository guidance	Multi-agent coordination through intent
Delivery integration	Repeatability that can later support orchestration	CI/CD-integrated execution with cross-system identity and ownership
Main organizational outcome	More consistent agent behavior across a team	Teams can see which agent ran, what triggered it, what systems it touched, and who owns the result

Level 3: Orchestrated (Multi-Agent, Spec-Driven, CI/CD)

Level 3 shifts from shared rules to coordinated execution across agents, systems, and delivery workflows. At this level, teams need traceable agent identities, event triggers, pipeline integration, and ownership paths for non-human execution.

Level 3 orchestration coordinates agents through parallelism, visibility, and intent

Traditional workflow orchestration follows deterministic paths. Agent orchestration involves non-deterministic routing decisions made by the agents themselves, which makes failures harder to trace and test than failures in conventional CI pipelines. Teams comparing implementation paths often evaluate workflow orchestration platforms and autonomous agents when they move beyond isolated experimentation.

Level 3 CI/CD integration becomes the bottleneck across identities, triggers, and controls

CI/CD integration becomes the practical bottleneck at this level because identity, ownership, and triggering all become cross-system concerns. The organization needs non-human execution paths that fit existing build, test, review, and deployment controls.

A specific operational problem at Level 3 is pipeline authentication through individual user accounts. Token lifecycle and ownership become unmanageable at scale. Running 10 agents across 20 tools generates 200 separate OAuth flows without centralized identity management.

For teams using Augment Cosmos, Service Accounts address that identity and ownership problem: service-account execution gives non-human runs one ownership model. Connected systems supply triggers from tickets, incident alerts, and PR submissions, and teams do not rewire new agents into the stack because the platform already connects to the build, test, review, and deployment pipeline.

Level 3 governance detects failure patterns before faster delivery increases incidents

Level 3 governance stays close to delivery work.

CMU SEI's AI Adoption Maturity Model, released with Accenture in 2026, defines capability areas that organizations build as they mature, including experimentation, responsible AI, and AI architecting.

DORA's research identifies the delivery risk introduced earlier: AI adoption can raise throughput while increasing instability. DORA tracks the deployment rework rate metric as one of its instability measures. Organizations that track throughput and stability separately can identify emerging delivery risks before they surface as incidents.

With Augment Cosmos, teams use shared context and reusable records across workflows, so agents draw on previous runs instead of starting from scratch.

Level 4: Systematic (Shared Memory, Experts, Learning Flywheel)

Level 4 turns repeated agent activity into accumulated team capability. Corrections, context, and specialization persist across sessions, so each run can draw on prior work.

Shared memory turns repeated agent sessions into accumulated organizational context

Without project memory, every agent session starts cold. Teams must reconstruct institutional knowledge from scratch, including architectural decisions, prior debugging approaches, and team conventions. The CoALA (Cognitive Architectures for Language Agents) paper maps four memory types to agent implementations:

Memory Type	What Is Stored	Agent Example
Working	Current goals, intermediate reasoning	Active context window
Procedural	Rules determining behavior	AGENTS.md, system prompts
Semantic	Facts about the world	Facts about a user or codebase
Episodic	Sequences of past behaviors	Past agent actions, prior debugging approaches

Augment Cosmos's organizational knowledge layer addresses this memory problem at team scope. Organization-level shared memory persists context across sessions and across the team rather than scoping it per engineer or per repository. That reduces repeated context reconstruction in shared cross-session agent workflows.

Specialized agents create durable domain competence through narrow scope and persistent corrections

Specialization at Level 4 separates general agent access from durable domain competence. Teams narrow scope, preserve corrections, and reuse learned behavior across the team.

The Agentic Software Engineering: Foundational Pillars paper presents the Structured Agentic Software Engineering (SASE) vision and outlines several foundational pillars for the future of software engineering.

Augment Cosmos Experts fit this narrow-scope pattern. Each expert combines narrow task scope, shared memory that persists across runs, and coaching-based feedback that distills important information into stored knowledge.

The learning flywheel compounds agent performance through stored corrections across sessions

The learning flywheel converts one corrected run into a better future run. The outcome is compounding improvement across future sessions.

The learning flywheel follows a four-stage architecture:

Execute
Coach
Distill
Improve

The sequence turns corrected runs into stored improvements that carry into future sessions.

Augment Cosmos's learning flywheel applies that sequence to coaching-based agent improvement. The flywheel distills each corrected session into stored knowledge, so those corrections carry into future runs instead of disappearing between executions.

Level 5: Autonomous (Agents Execute, Humans Steer, Knowledge Compounds)

Level 5 keeps humans responsible for oversight while agents execute more work inside governed boundaries. Human-on-the-loop oversight becomes the operating model for increasingly capable workflows, and governance risk concentrates as agent scope expands.

Level 5 operating models expand agent execution while keeping humans in control

The governance shift at Level 5 moves from human-in-the-loop to human-on-the-loop. Forrester's AEGIS Framework, a security model that defines 39 controls across six domains for securing enterprise AI agents, treats human oversight as a core requirement at this stage.

Open source

augmentcode/augment-swebench-agent★877

Star on GitHub

Augment Cosmos human-in-the-loop controls enforce policy-based approval boundaries at the handoffs where teams require human judgment. Teams set the policies for where human judgment is required, and Cosmos enforces them.

Level 5 knowledge compounding reinforces human and machine learning together

Knowledge compounding means corrected agent behavior, stored context, and human feedback keep accumulating across workflows. At this stage, the execute, coach, distill, improve pattern becomes an operating model in which human and machine learning reinforce each other.

MIT Sloan describes a centralized AI structure in which a global data science and analytics team builds enterprise AI capabilities and works with business units and centers of excellence to scale and operate AI solutions.

Level 5 autonomous execution concentrates governance risk when institutional understanding lags

Level 5 risk concentration comes from autonomous execution outpacing institutional understanding. The failure mode includes bad output and the loss of knowledge needed to diagnose and correct it.

Self-Assessment Matrix

The self-assessment matrix turns the five maturity levels into observable operating signals. Use it to compare current agent practices across configuration, knowledge sharing, governance, measurement, and organizational design rather than relying on perception alone.

Dimension	Level 1	Level 2	Level 3	Level 4	Level 5
Agent config	Per-engineer	Repo-root AGENTS.md	CI/CD-integrated, service accounts	Expert-scoped, memory-backed	Self-improving, org-wide
Knowledge sharing	Individual prompts	Shared rules files	Spec-driven coordination	Cross-agent shared memory	Compounding knowledge base
Governance	None	Informal policies	Documented AI-specific policies	Named agent owner with accountability	Tiered auto-approval, human pull-in for judgment
Measurement	No AI metrics	Ad-hoc tracking	DORA metrics with throughput/stability split	Agent behavior audit trails	Strategic dashboards, risk-aware merge policies
Organizational design	Individual champions	Small specialist group	AI director with cross-functional authority	Cross-functional transformation squads	Bifurcated: factory team + operations team

Organizations overestimate agentic maturity through definitional confusion and weak measurement

Organizations can overestimate their maturity when definitions blur, perceptions outrun evidence, and teams track shipping speed without tracking instability. Three specific error patterns cause this:

Agentwashing: Organizations often conflate AI assistants with AI agents, which can lead teams to overestimate their maturity level.
Perception gap: Objective delivery metrics, particularly DORA's five metrics including the deployment rework rate, provide a more accurate signal.
Throughput-only tracking: Organizations that measure only shipping speed miss the instability signal described in the DORA research.

Maturity advancement consumes organizational effort in process, governance, data, and integration

Mature agentic engineering practices require more than model selection and prompt engineering. Teams often quantify that operational burden with ROI frameworks before expanding deployment.

DORA 2025 frames the takeaway directly: AI returns depend far more on the strength of a team's delivery system than on the tools it adopts.

How Coordination, Memory, and Governance Map to Levels 3-5

Levels 3-5 are where coordination, memory, and governed execution become infrastructure requirements, up to self-improving development tools that let agents extend the platform itself.

Level	Organizational Requirement	Capability
3: Orchestrated	Multi-agent coordination with CI/CD integration	Event bus triggers agents from software development lifecycle events; Service Accounts provide non-human identities; AGENTS.md support standardizes behavior
3: Orchestrated	Spec-driven agent execution	Agents respond to structured specifications rather than step-by-step prompts
4: Systematic	Persistent organizational memory	Organizational knowledge layer shares context across sessions and team members
4: Systematic	Specialized domain agents	Expert Registry with coaching-based feedback; corrections persist across runs
4: Systematic	Continuous improvement loops	Learning flywheel: Execute, Coach, Distill, Improve
5: Autonomous	Tiered approval governance	Auto-approval for low-risk changes; line-by-line correctness analysis for medium risk; human pull-in for judgment calls
5: Autonomous	Self-improving system	Agents work on the platform itself, building automations, specifying experts, and debugging existing workflows

Benchmark Your Organization, Then Build the System

Leaders need to balance speed with control. Individual developers can move quickly with AI tools, but organizations get durable gains only when they can measure instability, standardize behavior, and turn individual wins into versioned instructions, audit trails, named owners, and measured delivery outcomes.

For engineering organizations at Level 2 or above, the concrete next step is to baseline current workflows against the maturity matrix, then split measurement into throughput and stability. That exposes where AI adoption is creating downstream disorder, which teams need shared standards, and where orchestration or memory infrastructure becomes necessary. Only after that baseline should leaders expand agent scope or automate more of delivery.

Agentic Engineering Maturity Model: 5-Level Self-Assessment

TL;DR

Why CTOs Need a Maturity Baseline Before Scaling Agents

The Agentic SDLC

Level 1: Ad-Hoc (Individual Agents, No Shared Patterns)

Level 1 operating patterns keep agent use isolated to individual experiments

Level 1 underperforms financially because isolated experimentation does not compound

Level 2: Standardized (AGENTS.md, Shared Rules)

AGENTS.md standardizes agent behavior through persistent repository guidance

Advancing beyond Level 2 requires process change that creates coordinated workflows

Level 3: Orchestrated (Multi-Agent, Spec-Driven, CI/CD)

Level 3 orchestration coordinates agents through parallelism, visibility, and intent

Level 3 CI/CD integration becomes the bottleneck across identities, triggers, and controls

Level 3 governance detects failure patterns before faster delivery increases incidents

Level 4: Systematic (Shared Memory, Experts, Learning Flywheel)

Shared memory turns repeated agent sessions into accumulated organizational context

Specialized agents create durable domain competence through narrow scope and persistent corrections

The learning flywheel compounds agent performance through stored corrections across sessions

Level 5: Autonomous (Agents Execute, Humans Steer, Knowledge Compounds)

Level 5 operating models expand agent execution while keeping humans in control

Level 5 knowledge compounding reinforces human and machine learning together

Level 5 autonomous execution concentrates governance risk when institutional understanding lags

Self-Assessment Matrix

Organizations overestimate agentic maturity through definitional confusion and weak measurement

Maturity advancement consumes organizational effort in process, governance, data, and integration

How Coordination, Memory, and Governance Map to Levels 3-5

Benchmark Your Organization, Then Build the System

FAQ

Written by

Ani Galstian

Give your codebase the agents it deserves

TL;DR

Why CTOs Need a Maturity Baseline Before Scaling Agents

The Agentic SDLC

Level 1: Ad-Hoc (Individual Agents, No Shared Patterns)

Level 1 operating patterns keep agent use isolated to individual experiments

Level 1 underperforms financially because isolated experimentation does not compound

Level 2: Standardized (AGENTS.md, Shared Rules)

AGENTS.md standardizes agent behavior through persistent repository guidance

Advancing beyond Level 2 requires process change that creates coordinated workflows

Level 3: Orchestrated (Multi-Agent, Spec-Driven, CI/CD)

Level 3 orchestration coordinates agents through parallelism, visibility, and intent

Level 3 CI/CD integration becomes the bottleneck across identities, triggers, and controls

Level 3 governance detects failure patterns before faster delivery increases incidents

Level 4: Systematic (Shared Memory, Experts, Learning Flywheel)

Shared memory turns repeated agent sessions into accumulated organizational context

Specialized agents create durable domain competence through narrow scope and persistent corrections

The learning flywheel compounds agent performance through stored corrections across sessions

Level 5: Autonomous (Agents Execute, Humans Steer, Knowledge Compounds)

Level 5 operating models expand agent execution while keeping humans in control

Level 5 knowledge compounding reinforces human and machine learning together

Level 5 autonomous execution concentrates governance risk when institutional understanding lags

Self-Assessment Matrix

Organizations overestimate agentic maturity through definitional confusion and weak measurement

Maturity advancement consumes organizational effort in process, governance, data, and integration

How Coordination, Memory, and Governance Map to Levels 3-5

Benchmark Your Organization, Then Build the System

FAQ

How do most organizations currently score on agentic engineering maturity?

What is the single highest-ROI maturity transition?

Should every organization aim for Level 5?

How do teams avoid overestimating their maturity level?

What role does data readiness play in maturity advancement?

Related

Written by

Ani Galstian

Give your codebase the agents it deserves