How do I know my DIY agent infrastructure needs replacement rather than iteration?

Track harness complexity over time relative to model capability improvements. If complexity rises while model capability improves, the pattern matches Philipp Schmid's warning about over-engineering. The Brex CTO described a related tipping point on the latent.space podcast as the inability to create a coherent verbal framework for all their LLM investments, meaning that without that framework, they could not set a vision or a roadmap to present to leadership.

Is self-hosted open-source (LangGraph, CrewAI) economically equivalent to buying a managed platform?

No. Self-hosted open-source maps more closely to the build path than the buy path. The license cost is zero, but the adopting organization carries every dollar of infrastructure, maintenance, governance, and upgrade cost.

What is the realistic timeline for migrating from custom agent infrastructure to a managed platform?

A strangler fig migration is typically planned and executed incrementally, with the legacy system decommissioned after functionality has been moved to the new system. Existing custom stacks may require additional migration effort compared with greenfield adoption.

Does Cosmos support on-premise deployment for regulated industries?

Cosmos is in public preview for MAX plan users, and Cosmos-specific on-premise availability is not documented in current public materials. The broader Augment Code platform does document on-premises deployment options, Customer Managed Keys, zero data retention policies, SOC 2 Type II compliance, and ISO/IEC 42001 certification. Confirm Cosmos-specific deployment and compliance posture directly before including it in compliance assessments.

When does building custom agent infrastructure remain the right decision?

Building remains justified when AI infrastructure is the core product being sold, the organization can dedicate 5-10 engineers for 12-18 months without impacting core product, or compliance requires on-premise deployment with full operational ownership.

DIY Agent Infrastructure vs Agentic Platform (2026)

Teams building custom agent infrastructure should reassess the approach after the first production rebuild, especially given the ongoing costs of maintenance, monitoring, integration, and compliance.

TL;DR

For teams assigning 5-10 engineers for 12-18 months, internal platform labor and maintenance remain the main 24-month cost variables. A community analysis of Claude Code's leaked codebase found that 98.4% of production agent code is deterministic infrastructure, with only 1.6% handling AI decision logic. Rebuild pressure usually comes from runtime, memory, observability, and governance layers discovered after the prototype succeeds.

Why Agent Infrastructure Scope Expands After the Prototype

The trap usually starts with a working prototype. A senior engineer connects a framework to a few tools, gets an agent demo running in two weeks, and leadership sees a fast path to production. After the demo, the organization owns far more infrastructure than the prototype suggested. Permission scoping, checkpoint persistence, model migration, context isolation, observability, and governance rarely block the prototype, but they all become production responsibilities the moment it goes live.

Many teams discover too late that they budgeted for the agent logic when the larger cost sits in the runtime, memory, context, observability, and governance layers underneath it. The examples below show how that gap surfaces in production as rebuilds, tool reduction, architecture simplification, and migration pressure. The sections that follow map that hidden scope, compare DIY against Cosmos across six dimensions, and outline when to keep building and when to stop.

Cosmos is Augment Code's Unified Cloud Agents Platform. It runs agents in the cloud with shared context and memory that compound across the team and the software development lifecycle, treating runtime, context, governance, and human-in-the-loop as shared primitives the platform provides directly.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT

The Build Instinct Stays Rational Only Up to a Point

A familiar pattern emerges in engineering organizations adopting agent frameworks.

A senior engineer spins up a LangGraph pipeline, connects it to a few tools, gets a demo working in two weeks, and the CTO greenlights a full build.
Six months later, the team has spent more time on permission scoping, checkpoint persistence, and model migration than on the agent logic itself.
The instinct to build is rational: your codebase is unique, your compliance requirements are specific, and your team has the talent.
The problem is scope: teams budget for building an agent when the actual scope covers the platform the agent runs on.
Choosing to build means assembling agentic frameworks, orchestration layers, custom governance, and the underlying infrastructure to run it all.
Much of the implementation burden sits outside model logic, especially in governance and workflow integration.

In production, the organization owns a deeper set of responsibilities than the prototype suggested. The community analysis of Claude Code's leaked codebase examined its size, tool structure, and architecture, and the deterministic layer it documents includes the runtime and governance work teams usually discover only after the prototype succeeds. That is the harness engineering burden in practice: a deterministic substrate that has to be built, maintained, and governed for every model the team plugs in.

Why Most Teams Build Agent Infrastructure Twice

Engineering teams rebuild agent infrastructure because the failure modes that trigger a rebuild stay invisible at prototype stage. They do not appear in demos, synthetic test data does not trigger them, and standard APM observability does not surface them.

Three documented production cases illustrate the pattern:

Vercel removed 80% of their agent's tools and watched success rates climb from 80% to 100%. The team reported the outcome without attaching a specific quoted conclusion about simplification improving success rates.

Manus rebuilt their agent framework multiple times. Each rebuild followed the discovery that context management was the actual bottleneck, a pattern the team documents in their context engineering lessons: "We affectionately refer to this manual process of architecture searching, prompt fiddling, and empirical guesswork as 'Stochastic Graduate Descent'".

EGO Digital discovered a high failure rate in their document processing agent. After decomposing it into three specialized agents with individual model tiers, timeout budgets, and output schemas, reliability improved materially and cost per document fell.

Across these cases, the recurring signals are consistent:

repeated rebuilds
tool reduction improving outcomes
behavioral issues discovered late
architecture simplification outperforming added complexity

Philipp Schmid, a Staff Engineer at Google DeepMind, captured the pattern precisely in his context engineering work: if your harness keeps getting more complex while models improve, that is a sign of over-engineering.

For teams evaluating managed options, Cosmos is the operating system for agentic software development, combining agent runtime, shared context, and governance-related capabilities in one platform.

What You're Actually Building: Runtime, Context, Memory, Observability, Governance

When mapped as a production-grade agent platform, the scope expands across runtime, context, memory, observability, and governance.

Runtime

Official LangChain materials highlight production-agent capabilities such as streaming, task queues, checkpointing, human-in-the-loop support, and tracing. Available sources do not confirm a standardized six-feature list consisting exactly of parallelization, streaming, task queue, checkpointing, human-in-the-loop, and tracing.

Scheduling is supported via LangGraph's and LangSmith's built-in cron jobs, so teams do not need to build their own cron and task queue infrastructure from scratch.

Retry semantics are often mismatched to agent failures. When failures are persistent semantic errors and not transient transport errors, retry-with-backoff is the wrong pattern. The right pattern is raise-and-replan.

Context Management

Anthropic's official engineering materials discuss context engineering and related strategies such as memory, compaction, and tool clearing. They do not present a formal five-part taxonomy of context types labeled input context, runtime context, context compression, context isolation, and long-term memory. An arXiv paper on agent token consumption reports that token cost can vary substantially across agentic runs. Manus reported that, with Claude Sonnet, cached input tokens cost about 10% of the base input token price, and changes in prompt prefixes including timestamps or non-deterministic ordering can cause cache misses that wipe out the savings.

Memory and State

Four memory tiers require distinct persistence strategies.

In-context working memory
Short-term session state
Long-term cross-session memory
Episodic memory with temporal retrieval

This taxonomy does not appear in the AutoGen memory docs, the LangGraph runtime work, or the Mem0 research paper, which each describe different memory models. A simple example shows the staleness problem: memory about a user's employer is accurate until the user changes jobs.

Observability

Standard APM tracks traditional services. Agent systems add observability needs the cited examples document in detail. Oracle's engineering content emphasizes agent memory and related system components for maintaining continuity across sessions. Mesa built custom LLM observability directly in Postgres because no existing tool provided trace granularity at the phase, agent session, and tool call level, joinable with business data.

Governance

Governance requires hooks into runtime, memory, context management, and observability.

Prompt injection defenses
Sandboxing
SIEM and DLP integration
Red-team testing

Agents touching code and infrastructure must meet these obligations. The runtime architecture needs integration points for observability, governance, and context management designed in from day one, before the first deployment. In the examples above, late discovery of these layers is what creates rebuild pressure.

The Hidden Cost of Harness Engineering

The recurring pattern in the cited cases is straightforward: teams begin with foundation model integration and then discover additional platform layers one by one in production.

Those layers typically include:

memory systems
authentication and RBAC
multi-surface interface support
governance controls

Six distinct cost categories accumulate over time. The harness engineering guide covers the first category in more depth, and the AI evaluation paper cited in the regression testing row explains why AI QA tooling differs architecturally from existing systems.

Cost Category	Nature	Key Evidence
Harness engineering	One-time build plus compounding maintenance	Claude Code's architecture has been described as dominated by surrounding harness and infrastructure rather than AI-specific decision logic
Integration tax	Recurring operational tax, not one-time CAPEX	Prompts, tool schemas, model versions, and business rules change
Model migration	Behavioral drift problem, not deployment problem	Production migration often requires a shadow pipeline mirroring production
Regression testing	New probabilistic infrastructure category	AI QA requires tooling architecturally different from existing systems
Security and compliance	Ongoing obligation with no off switch	Prompt injection defenses, sandboxing, SIEM/DLP integration
Opportunity cost	Senior engineer capacity consumed by infra	"Every week spent setting up infrastructure is a week not spent improving models or delivering product value"

Cosmos centers on runtime, shared context, and memory primitives, with the Context Engine processing 400,000+ files for codebase understanding. Runtime-layer permission scoping applied before prompts reach the model is not yet documented in public Cosmos materials, and teams with strict gating requirements should confirm posture directly.

At a Glance: DIY vs Cosmos Across 6 Dimensions

The comparison below summarizes how each path handles the six platform layers teams typically discover after the first prototype: runtime, context management, memory, observability, governance, and integration. Each row reflects what is documented in publicly available materials, with caveats noted where official sources are limited.

Dimension	DIY Build	Cosmos
Runtime	Teams assemble from LangGraph, CrewAI, or custom code. Scheduling, retry semantics, and tool concurrency flags require custom engineering.	An agent runtime, the Context Engine, an event bus tied to the SDLC, and an organization-wide knowledge layer act as shared primitives across the stack, with human-in-the-loop as a first-class primitive.
Context management	Five separate context types to engineer per Anthropic's taxonomy. Token costs grow with additional agent steps. Cache hit optimization requires prompt prefix stability discipline.	The Context Engine processes entire codebases across 400,000+ files, providing codebase understanding for AI coding agents and software-development workflows.
Memory and state	Four memory tiers (working, session, cross-session, episodic) with no industry-standard persistence pattern. Multiple frameworks and vector stores remain in use.	A shared filesystem with tenant and private memory, plus an Org Knowledge Layer for persistent scratchpad behavior, so corrections and patterns compound across sessions.
Observability	Requires purpose-built agent episode tracing distinct from standard APM. Mesa built its observability stack in-house.	Canonical positioning for staff engineer and platform lead audiences describes Cosmos as "governed, observable, and reproducible." No dedicated tracing UI or logging pipeline is documented publicly today.
Governance	Cross-cutting with hooks into every other layer. Cannot be bolted on after the fact.	Human-in-the-Loop policies and runtime-layer policy enforcement around tool calls and permissions are first-class, with SOC 2 Type 2, ISO 42001, and GDPR referenced in product materials.
Integration and deployment	Each tool, each service, each event source requires individual wiring. Integration functions as recurring operational tax.	Integrations span GitHub, Jira, and Slack, with the platform designed to plug into existing CI/CD and collaboration workflows.

One caveat: Dedicated tracing and logging interfaces are underspecified in available documentation. Engineering leaders with strict observability requirements should verify this dimension directly.

Break-Even Math

This model draws from the cited source set. In a build scenario that assigns 5-10 engineers for 12-18 months, engineering labor is the largest cost variable discussed in the article. The next three subsections break that down into salary baseline, maintenance multiplier, and a 24-month TCO view.

Salary baseline

In the bounded scenario used throughout this article, engineering labor matters most because the work depends on senior AI/ML infrastructure engineering time, not generic software engineering time.

Maintenance multiplier

Maintenance runs continuously through the life of the system. Model updates, orchestration changes, prompt changes, tool schema drift, and governance requirements all create recurring work in AI agent infrastructure.

24-month model

The table below contrasts the build and buy paths across the cost elements that compound over a 24-month horizon: engineering labor, infrastructure and license spend, and ongoing maintenance overhead. Both columns assume the same scenario of 5-10 engineers and 12-18 months of initial work.

Open source

augmentcode/auggie★249

Star on GitHub

Cost Element	Build (Conservative)	Buy (Conservative)
Year 1: Engineering labor	5-10 engineers building and integrating core platform layers	Internal work shifts toward platform adoption and integration, not full platform construction
Year 1: Infrastructure/license	Compute, storage, and tooling stack assembled internally	Platform license plus integration work
Year 1 total	Internal teams carry runtime, context, memory, and governance engineering during the first 12-18 months	Managed platform adoption reduces part of that engineering scope during the same period
Year 2: Engineering labor	Internal teams continue carrying maintenance, model migration, and rebuild pressure after initial deployment	Managed platform adoption reduces part of runtime, memory, and governance maintenance
Year 2: Maintenance overhead	Additional maintenance remains with the internal team as prompts, tools, and governance requirements change	Maintenance is split between internal integration work and the licensed platform
Year 2: Infrastructure/license	Continued internal platform costs	Continued platform costs
24-month TCO	Build economics depend on carrying full build plus maintenance responsibility across 24 months	Buy economics depend on platform pricing relative to the cost of 5-10 engineers and the associated maintenance burden across the same period

LLM API costs sit at the center of most build-vs-buy analyses and get compared explicitly against self-hosting or GPU costs. They are not zeroed out as identical across both paths. The break-even point requires either platform vendor pricing scaling with usage to approach build costs, or the custom build generating measurable differentiated value sufficient to offset the internal engineering investment.

Three conditions justify continuing to build: AI infrastructure is the core product being sold, the organization can commit 5-10 engineers for 12-18 months without impacting core product, or sufficient AI expertise to execute exists in-house.

The Migration Path from DIY to Cosmos

For teams that have already built custom infrastructure, the strangler fig pattern documented by Microsoft's Azure Architecture Center provides the migration architecture.

A routing façade intercepts requests and directs them to either the legacy system or the new platform.
Teams reduce migration risk by shifting traffic in small increments, keeping each step within rollback range.
The façade supports dual-system traffic during migration.
The same approach preserves a rollback path while traffic is still moving.

This is the migration structure used throughout the phased plan below, and it lowers risk while preserving rollback during transition.

Phase 1: Audit and parallel setup (weeks 1-6)

The first phase maps dependencies and defines the first safe migration target.

Map all custom agent workflows, their state stores, tool integrations, and inter-service dependencies.
Identify the first migration candidate: the workflow with the most clearly defined inputs and outputs, lowest blast radius, and highest current maintenance burden.
Deploy the managed platform in non-production capacity.
Deploy the routing façade.
Define quantitative success criteria before any traffic moves.
A finding from a Ponsse migration study on Azure DevOps to GitHub Actions migration was that automated tooling could only partially convert pipelines, and some tasks or constructs still required manual migration.

Migration effort therefore depends heavily on the ratio of custom-to-standard components.

Phase 2: Shadow testing (weeks 6-16)

The second phase validates behavior while keeping production risk low.

Send real production requests to both systems simultaneously.
Serve only legacy system responses.
Log both outputs.
Compare against behavioral equivalence thresholds defined before testing begins.
The façade must always be able to route 100% of traffic back to the legacy system.
Test rollback capability explicitly before any canary release begins.

Rollback readiness belongs in the test plan and gets validated before any canary release.

Phase 3: Incremental workflow migration (months 4-9)

Migrate lowest-risk workflows first, highest-maintenance workflows second, and highest-complexity stateful workflows last. Starting fresh on state is usually the simpler default. Historical state migration makes the most sense when regulatory requirements mandate continuity, agent utility depends on historical context, or the state schema is documented well enough for reliable mapping.

Phase 4: Legacy decommission (months 9-12+)

Three criteria must all be satisfied before decommissioning: 100% of traffic routing to the managed platform, the platform stable within defined thresholds for a minimum of 30 continuous days, and formal stakeholder sign-off on rollback path retirement.

One observability cost risk to model is telemetry volume. Model cost curves at 10x and 100x your current agent invocation volume before committing. Negotiate volume pricing upfront.

Reclaim Engineering Time Before the Second Rebuild

The real decision is whether checkpoint persistence, cache hit optimization, governance retrofits, and migration plumbing deserve 12-18 months of senior engineering time. The production cases in this article show that many teams discover the full platform burden only after the first version is already live, which is why rebuild pressure appears so consistently. A practical next step is to audit the layers your team already owns: runtime, context, memory, observability, and governance. If the roadmap still assumes you are only building agent logic, the scope is already understated. Before announcing a migration, define which product work will absorb the recovered engineering time, and confirm whether the current stack still justifies carrying full build plus maintenance responsibility across 24 months.

DIY Agent Infrastructure vs Agentic Platform (2026)

TL;DR

Why Agent Infrastructure Scope Expands After the Prototype

The New Code Review Workflow for AI-Native Engineering Teams

The Build Instinct Stays Rational Only Up to a Point

Why Most Teams Build Agent Infrastructure Twice

What You're Actually Building: Runtime, Context, Memory, Observability, Governance

Runtime

Context Management

Memory and State

Observability

Governance

The Hidden Cost of Harness Engineering

At a Glance: DIY vs Cosmos Across 6 Dimensions

Break-Even Math

Salary baseline

Maintenance multiplier

24-month model

The Migration Path from DIY to Cosmos

Phase 1: Audit and parallel setup (weeks 1-6)

Phase 2: Shadow testing (weeks 6-16)

Phase 3: Incremental workflow migration (months 4-9)

Phase 4: Legacy decommission (months 9-12+)

Reclaim Engineering Time Before the Second Rebuild

FAQ

Written by

Paula Hingel

Give your codebase the agents it deserves

TL;DR

Why Agent Infrastructure Scope Expands After the Prototype

The New Code Review Workflow for AI-Native Engineering Teams

The Build Instinct Stays Rational Only Up to a Point

Why Most Teams Build Agent Infrastructure Twice

What You're Actually Building: Runtime, Context, Memory, Observability, Governance

Runtime

Context Management

Memory and State

Observability

Governance

The Hidden Cost of Harness Engineering

At a Glance: DIY vs Cosmos Across 6 Dimensions

Break-Even Math

Salary baseline

Maintenance multiplier

24-month model

The Migration Path from DIY to Cosmos

Phase 1: Audit and parallel setup (weeks 1-6)

Phase 2: Shadow testing (weeks 6-16)

Phase 3: Incremental workflow migration (months 4-9)

Phase 4: Legacy decommission (months 9-12+)

Reclaim Engineering Time Before the Second Rebuild

FAQ

How do I know my DIY agent infrastructure needs replacement rather than iteration?

Is self-hosted open-source (LangGraph, CrewAI) economically equivalent to buying a managed platform?

What is the realistic timeline for migrating from custom agent infrastructure to a managed platform?

Does Cosmos support on-premise deployment for regulated industries?

When does building custom agent infrastructure remain the right decision?

Related

Written by

Paula Hingel

Give your codebase the agents it deserves