Skip to content
Book demo
Back to Guides

Agentic Engineering Maturity Model: 5-Level Self-Assessment

Jun 12, 2026Last updated: Jun 15, 2026
Ani Galstian
Ani Galstian
Agentic Engineering Maturity Model: 5-Level Self-Assessment

Agentic engineering maturity is a five-level framework for assessing how systematically engineering organizations adopt AI agents.

TL;DR

Engineering teams often confuse individual AI tool use with organizational agent maturity. Conventional adoption metrics miss the instability that comes with faster shipping. MIT CISR and CMU SEI have published different maturity frameworks, and MIT CISR reports that the greatest financial impact in its enterprise AI maturity framework comes from moving from stage 2 to stage 3.

Why CTOs Need a Maturity Baseline Before Scaling Agents

Engineering leaders face a practical frustration: AI use is spreading faster than organizations can measure, govern, or operationalize it. Developers report productivity gains, but leadership still needs a reliable way to tell whether teams share prompts, version agent instructions, capture audit trails, assign owners, and measure delivery outcomes. DORA 2025 research shows AI adoption can increase both throughput and instability.

The words assistant and agent often get used interchangeably, which makes maturity hard to judge by labels alone. A structured maturity model instead maps agent adoption to observable practices. The five levels below synthesize concepts from MIT CISR, CMU SEI, Microsoft, Gartner, and DORA. Together, they form a self-assessment framework for engineering organizations adopting agentic workflows. Teams evaluating supporting infrastructure often compare categories such as AI coding tools, CI tools, and observability platforms before connecting agents to their pipelines and review process.

The product examples below use Augment Cosmos, a unified cloud agents platform now in public preview that gives agents shared context and memory across the software development lifecycle.

See how Cosmos turns isolated agent runs into governed, replayable team workflows.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

Level 1: Ad-Hoc (Individual Agents, No Shared Patterns)

Level 1 describes organizations where agent use remains personal instead of institutional. Engineers experiment in isolation, so the organization gets weak reproducibility, weak governance, and little shared learning.

Level 1 operating patterns keep agent use isolated to individual experiments

At this stage, agent activity rarely leaves an auditable path that another engineer or team can reuse. Teams usually show this pattern through experiments, ownership, and review:

  • AI experiments run in notebooks with no versioning, no pipeline automation, and no CI/CD integration
  • No centralized model registry or feature store exists
  • AI initiatives depend on individual champions rather than team-level practices
  • No documented evaluation process determines which workflows benefit from AI
  • No human-in-the-loop review process governs AI outputs

At Level 1, the first practical need is capture. Augment Cosmos Sessions capture each run as an audit trail that teams can replay, so the organization can reuse workflow knowledge instead of leaving expertise trapped in individual configs.

Level 1 underperforms financially because isolated experimentation does not compound

MIT CISR's 2022 survey of 721 companies found that enterprises in the first two stages of AI maturity had financial performance below industry average, while stages 3 and 4 performed well above. In that research, 28% of enterprises were at this stage.

DimensionLevel 1 Signal
Agent configurationPer-engineer, no version control
Knowledge sharingTrapped in individual prompts and configs
GovernanceNo AI-specific policies
MeasurementNo tracking of AI's effect on delivery metrics
Organizational designNo shared understanding of what AI can and cannot do

Level 2: Standardized (AGENTS.md, Shared Rules)

Level 2 introduces shared defaults that make agent behavior more consistent across a team. Version-controlled guidance gives teams repeatability before they add orchestration and shared learning.

AGENTS.md standardizes agent behavior through persistent repository guidance

A repo-level AGENTS.md file gives AI coding agents persistent, project-specific operational guidance before any work begins. OpenAI's Codex documentation describes the core behavior: Codex reads AGENTS.md files before doing any work. By layering global guidance with project-specific overrides, each task starts with consistent expectations regardless of which repository is opened.

The file's hierarchical discovery system enables layered standardization across global, project, and subdirectory scopes:

ScopeLocationAudience
Personal global defaults~/.codex/AGENTS.mdIndividual engineer, applies across all repos
Team standardsAGENTS.md at repo rootVersion-controlled, shared across all contributors
Module-specific overridesSubdirectory AGENTS.md filesService or module teams
Directory-level overridesAGENTS.override.mdOptional higher-priority file; when present in a directory, takes precedence over AGENTS.md for that directory

Before AGENTS.md, major coding tools used their own tool-specific instruction files: GitHub Copilot uses .github/copilot-instructions.md, while Cursor moved from a legacy .cursorrules file to .cursor/rules/ with per-rule .mdc files. This fragmentation forced organizations to maintain engineering standards in multiple locations. AGENTS.md addresses that fragmentation.

Advancing beyond Level 2 requires process change that creates coordinated workflows

The next step is to turn shared standards into coordinated workflows and organizational learning.

MIT CISR identifies the Stage 2 to Stage 3 transition as the highest-value transition in enterprise AI maturity. Financial performance crosses from below-average to above-average at this boundary. MIT Sloan research specifies that organizations making this transition intentionally change processes, broadly and deeply, to facilitate organizational learning with AI.

DimensionLevel 2: StandardizedLevel 3: Orchestrated
Primary mechanismVersion-controlled guidanceOrchestration across identities, triggers, and pipelines
Core operating patternShared defaults through AGENTS.mdCoordinated execution across agents, systems, and delivery workflows
Coordination modelConsistent repository guidanceMulti-agent coordination through intent
Delivery integrationRepeatability that can later support orchestrationCI/CD-integrated execution with cross-system identity and ownership
Main organizational outcomeMore consistent agent behavior across a teamTeams can see which agent ran, what triggered it, what systems it touched, and who owns the result

Level 3: Orchestrated (Multi-Agent, Spec-Driven, CI/CD)

Level 3 shifts from shared rules to coordinated execution across agents, systems, and delivery workflows. At this level, teams need traceable agent identities, event triggers, pipeline integration, and ownership paths for non-human execution.

Level 3 orchestration coordinates agents through parallelism, visibility, and intent

Traditional workflow orchestration follows deterministic paths. Agent orchestration involves non-deterministic routing decisions made by the agents themselves, which makes failures harder to trace and test than failures in conventional CI pipelines. Teams comparing implementation paths often evaluate workflow orchestration platforms and autonomous agents when they move beyond isolated experimentation.

Level 3 CI/CD integration becomes the bottleneck across identities, triggers, and controls

CI/CD integration becomes the practical bottleneck at this level because identity, ownership, and triggering all become cross-system concerns. The organization needs non-human execution paths that fit existing build, test, review, and deployment controls.

A specific operational problem at Level 3 is pipeline authentication through individual user accounts. Token lifecycle and ownership become unmanageable at scale. Running 10 agents across 20 tools generates 200 separate OAuth flows without centralized identity management.

For teams using Augment Cosmos, Service Accounts address that identity and ownership problem: service-account execution gives non-human runs one ownership model. Connected systems supply triggers from tickets, incident alerts, and PR submissions, and teams do not rewire new agents into the stack because the platform already connects to the build, test, review, and deployment pipeline.

Level 3 governance detects failure patterns before faster delivery increases incidents

Level 3 governance stays close to delivery work.

CMU SEI's AI Adoption Maturity Model, released with Accenture in 2026, defines capability areas that organizations build as they mature, including experimentation, responsible AI, and AI architecting.

DORA's research identifies the delivery risk introduced earlier: AI adoption can raise throughput while increasing instability. DORA tracks the deployment rework rate metric as one of its instability measures. Organizations that track throughput and stability separately can identify emerging delivery risks before they surface as incidents.

With Augment Cosmos, teams use shared context and reusable records across workflows, so agents draw on previous runs instead of starting from scratch.

See how Cosmos centralizes agent identity, event triggers, and audit trails for orchestrated CI/CD execution.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

ci-pipeline
···
$ cat build.log | auggie --print --quiet \
"Summarize the failure"
Build failed due to missing dependency 'lodash'
in src/utils/helpers.ts:42
Fix: npm install lodash @types/lodash

Level 4: Systematic (Shared Memory, Experts, Learning Flywheel)

Level 4 turns repeated agent activity into accumulated team capability. Corrections, context, and specialization persist across sessions, so each run can draw on prior work.

Shared memory turns repeated agent sessions into accumulated organizational context

Without project memory, every agent session starts cold. Teams must reconstruct institutional knowledge from scratch, including architectural decisions, prior debugging approaches, and team conventions. The CoALA (Cognitive Architectures for Language Agents) paper maps four memory types to agent implementations:

Memory TypeWhat Is StoredAgent Example
WorkingCurrent goals, intermediate reasoningActive context window
ProceduralRules determining behaviorAGENTS.md, system prompts
SemanticFacts about the worldFacts about a user or codebase
EpisodicSequences of past behaviorsPast agent actions, prior debugging approaches

Augment Cosmos's organizational knowledge layer addresses this memory problem at team scope. Organization-level shared memory persists context across sessions and across the team rather than scoping it per engineer or per repository. That reduces repeated context reconstruction in shared cross-session agent workflows.

Specialized agents create durable domain competence through narrow scope and persistent corrections

Specialization at Level 4 separates general agent access from durable domain competence. Teams narrow scope, preserve corrections, and reuse learned behavior across the team.

The Agentic Software Engineering: Foundational Pillars paper presents the Structured Agentic Software Engineering (SASE) vision and outlines several foundational pillars for the future of software engineering.

Augment Cosmos Experts fit this narrow-scope pattern. Each expert combines narrow task scope, shared memory that persists across runs, and coaching-based feedback that distills important information into stored knowledge.

The learning flywheel compounds agent performance through stored corrections across sessions

The learning flywheel converts one corrected run into a better future run. The outcome is compounding improvement across future sessions.

The learning flywheel follows a four-stage architecture:

  1. Execute
  2. Coach
  3. Distill
  4. Improve

The sequence turns corrected runs into stored improvements that carry into future sessions.

Augment Cosmos's learning flywheel applies that sequence to coaching-based agent improvement. The flywheel distills each corrected session into stored knowledge, so those corrections carry into future runs instead of disappearing between executions.

Level 5: Autonomous (Agents Execute, Humans Steer, Knowledge Compounds)

Level 5 keeps humans responsible for oversight while agents execute more work inside governed boundaries. Human-on-the-loop oversight becomes the operating model for increasingly capable workflows, and governance risk concentrates as agent scope expands.

Level 5 operating models expand agent execution while keeping humans in control

The governance shift at Level 5 moves from human-in-the-loop to human-on-the-loop. Forrester's AEGIS Framework, a security model that defines 39 controls across six domains for securing enterprise AI agents, treats human oversight as a core requirement at this stage.

Open source
augmentcode/augment.vim611
Star on GitHub

Augment Cosmos human-in-the-loop controls enforce policy-based approval boundaries at the handoffs where teams require human judgment. Teams set the policies for where human judgment is required, and Cosmos enforces them.

Level 5 knowledge compounding reinforces human and machine learning together

Knowledge compounding means corrected agent behavior, stored context, and human feedback keep accumulating across workflows. At this stage, the execute, coach, distill, improve pattern becomes an operating model in which human and machine learning reinforce each other.

MIT Sloan describes a centralized AI structure in which a global data science and analytics team builds enterprise AI capabilities and works with business units and centers of excellence to scale and operate AI solutions.

Level 5 autonomous execution concentrates governance risk when institutional understanding lags

Level 5 risk concentration comes from autonomous execution outpacing institutional understanding. The failure mode includes bad output and the loss of knowledge needed to diagnose and correct it.

Self-Assessment Matrix

The self-assessment matrix turns the five maturity levels into observable operating signals. Use it to compare current agent practices across configuration, knowledge sharing, governance, measurement, and organizational design rather than relying on perception alone.

DimensionLevel 1Level 2Level 3Level 4Level 5
Agent configPer-engineerRepo-root AGENTS.mdCI/CD-integrated, service accountsExpert-scoped, memory-backedSelf-improving, org-wide
Knowledge sharingIndividual promptsShared rules filesSpec-driven coordinationCross-agent shared memoryCompounding knowledge base
GovernanceNoneInformal policiesDocumented AI-specific policiesNamed agent owner with accountabilityTiered auto-approval, human pull-in for judgment
MeasurementNo AI metricsAd-hoc trackingDORA metrics with throughput/stability splitAgent behavior audit trailsStrategic dashboards, risk-aware merge policies
Organizational designIndividual championsSmall specialist groupAI director with cross-functional authorityCross-functional transformation squadsBifurcated: factory team + operations team

Organizations overestimate agentic maturity through definitional confusion and weak measurement

Organizations can overestimate their maturity when definitions blur, perceptions outrun evidence, and teams track shipping speed without tracking instability. Three specific error patterns cause this:

  1. Agentwashing: Organizations often conflate AI assistants with AI agents, which can lead teams to overestimate their maturity level.
  2. Perception gap: Objective delivery metrics, particularly DORA's five metrics including the deployment rework rate, provide a more accurate signal.
  3. Throughput-only tracking: Organizations that measure only shipping speed miss the instability signal described in the DORA research.

Maturity advancement consumes organizational effort in process, governance, data, and integration

Mature agentic engineering practices require more than model selection and prompt engineering. Teams often quantify that operational burden with ROI frameworks before expanding deployment.

DORA 2025 frames the takeaway directly: AI returns depend far more on the strength of a team's delivery system than on the tools it adopts.

How Coordination, Memory, and Governance Map to Levels 3-5

Levels 3-5 are where coordination, memory, and governed execution become infrastructure requirements, up to self-improving development tools that let agents extend the platform itself.

LevelOrganizational RequirementCapability
3: OrchestratedMulti-agent coordination with CI/CD integrationEvent bus triggers agents from software development lifecycle events; Service Accounts provide non-human identities; AGENTS.md support standardizes behavior
3: OrchestratedSpec-driven agent executionAgents respond to structured specifications rather than step-by-step prompts
4: SystematicPersistent organizational memoryOrganizational knowledge layer shares context across sessions and team members
4: SystematicSpecialized domain agentsExpert Registry with coaching-based feedback; corrections persist across runs
4: SystematicContinuous improvement loopsLearning flywheel: Execute, Coach, Distill, Improve
5: AutonomousTiered approval governanceAuto-approval for low-risk changes; line-by-line correctness analysis for medium risk; human pull-in for judgment calls
5: AutonomousSelf-improving systemAgents work on the platform itself, building automations, specifying experts, and debugging existing workflows

Benchmark Your Organization, Then Build the System

Leaders need to balance speed with control. Individual developers can move quickly with AI tools, but organizations get durable gains only when they can measure instability, standardize behavior, and turn individual wins into versioned instructions, audit trails, named owners, and measured delivery outcomes.

For engineering organizations at Level 2 or above, the concrete next step is to baseline current workflows against the maturity matrix, then split measurement into throughput and stability. That exposes where AI adoption is creating downstream disorder, which teams need shared standards, and where orchestration or memory infrastructure becomes necessary. Only after that baseline should leaders expand agent scope or automate more of delivery.

See how Cosmos enforces policy-based approvals so agents execute inside the boundaries your team sets.

Try Cosmos

Free tier available · VS Code extension · Takes 2 minutes

FAQ

Written by

Ani Galstian

Ani Galstian

Ani writes about enterprise-scale AI coding tool evaluation, agentic development security, and the operational patterns that make AI agents reliable in production. His guides cover topics like AGENTS.md context files, spec-as-source-of-truth workflows, and how engineering teams should assess AI coding tools across dimensions like auditability and security compliance

Get Started

Give your codebase the agents it deserves

Install Augment to get started. Works with codebases of any size, from side projects to enterprise monorepos.