Context engineering is the discipline of curating and maintaining the right information inside an AI agent's working memory because multi-step, multi-file, and multi-agent coding workflows fail when agents lack persistent access to intent, environment state, and system memory. Unlike prompt engineering, which optimizes a single set of instructions, context engineering manages everything that enters the context window: retrieved documents, tool outputs, conversation history, agent state, and persistent memory.
TL;DR
Context engineering curates everything in an AI agent's context window beyond prompts, including retrieved documents, tool outputs, conversation history, and persistent memory. It becomes necessary when tasks span multiple files, exceed 10-20 turns, or involve multiple agents. The primary risk is not missing context but poisoned context: stale information that degrades output.
Intent assembles targeted context from 400,000+ codebase files without manual compaction.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
The Knowledge Problem in AI Coding
A team's best engineer just quit, and the one who understood how the payment system talks to the user database, and why there's that weird timeout in the authentication service. The only person who knew that the UserRole.admin field means something completely different in the billing context than everywhere else.
Now the team is staring at 500,000 lines of code that might as well be written in ancient Sumerian.
Standard AI coding tools suggest code that looks reasonable but breaks everything. The AI can write syntactically perfect code, but it has no idea how the system actually works. Everyone assumes AI coding tools fail because they aren't smart enough. That's wrong. They fail because they don't know what they're looking at.
If a brilliant programmer were blindfolded and could only see one function at a time, they'd write terrible code, too. That's what current AI tools do: they look at code through a keyhole. Context engineering solves this by giving AI agents the same kind of understanding that senior engineers carry, including relationships, history, and the unwritten rules that make a codebase work.
Why Prompt Engineering Breaks Down at Scale
Prompt engineering works for bounded, single-turn tasks. It breaks down structurally when the scope expands to multi-file, multi-step, or multi-agent workflows. Here are five failure modes that are well-documented:
- Stateless inference: LLM model weights are frozen at training time, and context windows are rebuilt from scratch at every session boundary. Most AI applications use LLMs in a stateless fashion, where each query is treated independently with no recall of prior interactions.
- Context window degradation: Performance degrades as context grows, and the degradation is positional. The "Lost in the Middle" research by Liu et al. demonstrates that information placed in the middle of a long context receives systematically less attention than information at the beginning or end. No prompt instruction can overcome this positional effect.
- Self-conditioning errors: When an agent's context window contains its own previous errors, it becomes measurably more likely to produce further errors. The degradation is not linear; it accelerates as each subsequent step treats prior errors as established correct patterns.
- System prompt decay: Martin Fowler provides the practitioner confirmation that LLM generation becomes more unreliable the longer a session runs, even as context windows grow. The standard recommendation to restart sessions frequently is itself evidence that no prompt configuration maintains coherence across a full real-world software task.
- Handoff context loss: When tasks are delegated among agents, context implicit in one agent's session is not automatically transferred. A sub-agent receiving a task mid-pipeline has no access to the file contents, test outputs, or architectural reasoning accumulated by the orchestrator.
Mapped against their root causes, the table below shows why no amount of prompt refinement closes the gap:
| Failure Mode | Root Cause | Why Prompts Can't Fix It |
|---|---|---|
| Stateless inference | Frozen weights, rebuilt context | Instructions can't substitute for missing information |
| Positional degradation | Attention dilution in a long context | Prompt instructions are subject to the same effect |
| Error amplification | Self-conditioning across turns | Longer prompts don't clear erroneous prior output |
| Handoff context loss | Agent boundary fragmentation | Sub-agent windows don't contain orchestrator history |
| System prompt decay | Competing content volume | Repeated instructions consume budget and degrade too |
The Core Components of Context Engineering
Anthropic's engineering team draws the boundary precisely: prompt engineering covers writing and organizing LLM instructions; context engineering covers curating and maintaining the optimal set of tokens during inference, including everything outside the prompts: tool outputs, retrieved documents, conversation history, and agent state.
Context engineering for AI coding operates across four distinct layers:
- Intent layer: What the agent is supposed to accomplish, including task specifications, architectural constraints, and project invariants. Without persistent intent, agents drift from original goals across multi-turn sessions.
- Environment layer: The actual state of the codebase, dependencies, and runtime, covering file structures, dependency graphs, test results, and service relationships. Intent's Context Engine processes 400,000+ files to maintain this layer through semantic dependency analysis.
- System memory layer: Persistent knowledge that outlives individual sessions, including discovered patterns, architectural decisions, and validated conventions. Google's ADK architecture separates this into four tiers: working context (immediate prompt), session (durable log), memory (long-lived searchable knowledge), and artifacts (files and logs addressed by name).
- Shared agent state layer: Coordination data between multiple agents working on the same task, covering which files have been modified, which tests have been run, and which decisions have been validated.
Enterprise code suggestion acceptance rates sit at approximately 30% baseline. Context-aware approaches that maintain these layers increase acceptance rates because suggestions reflect the actual codebase structure rather than syntax patterns alone.
When Bad Context Is Worse Than No Context
Context poisoning is distinct from prompt quality issues. It occurs when incorrect, stale, or contradictory information is embedded in an agent's working memory and compounds over time. Practitioner taxonomies identify four failure patterns: context poisoning (errors compound), context distraction (over-reliance on history), context confusion (irrelevant information degrades quality), and context clash (contradictions within the window).
Empirical research on long-context LLMs confirms a consistent pattern: every model tested exhibits measurable output quality degradation as input context length increases, at every increment tested. The drop is not a cliff at some token threshold but a steady decline that begins almost immediately.
Context drift is a distinct, slower failure. The arXiv paper "Drift No More?" defines it as a gradual erosion of alignment with the original intent over multi-turn interactions. Most current benchmarks are blind to this degradation because they measure end-task success without capturing temporal misalignment.
Poor context files actively cause regressions rather than merely failing to help. Research on agent context files has shown that LLM-generated specs can reduce task success rates compared to using no context file, while well-maintained human-written specs provide only modest improvements. Content quality matters more than presence.
CrewAI's engineering team explicitly acknowledges this: "naive memory implementations create problems including context bloat and outdated information poisoning new executions, where the agent hallucinates, and the problem becomes worse than the original one being solved."
Mitigation requires incremental updates rather than periodic wholesale rewrites, temporal filtering in retrieval, and deliberate positioning of information within the context window.
Intent replaces context-window guesswork with architectural reasoning across repositories.
Free tier available · VS Code extension · Takes 2 minutes
Context Engineering Patterns by Scope
Not every task requires the same context strategy. The right pattern depends on scope conditions:
- Prompt-only: For targeted bug fixes, single-file changes, and exploratory prototyping, standard prompt engineering handles the sis sufficientcope. The overhead of building context infrastructure is not justified when a human is consistently in the loop, and the task touches one or two files.
- Context compaction and summarization: When sessions grow long, compaction summarizes message history while preserving architectural decisions and unresolved issues. Anthropic's Claude Code documents auto-compaction as a core practice for managing long-running coding sessions, treating the context window as a finite resource that must be actively curated rather than passively filled. Research from the "Complexity Trap" paper found that simpler observation masking (hiding stale tool outputs without summarizing them) outperformed LLM-based summarization on coding tasks.
- OS-style paging under constrained windows: This pattern treats the context window as a cache, with external storage tiers for session history, memory, and artifacts. The working context remains small, while a hierarchy of slower stores holds session logs, long-lived memory, and named files.
- Parallel sub-agents with coordinated context sharing: When accumulated reasoning traces exceed the effective context window, splitting work across parallel agents with separate context windows prevents degradation. Each agent maintains a focused context for its subtask, while an orchestrator collects and aggregates results.
Intent addresses compaction differently: by maintaining semantic dependency graphs across 400,000+ files, it assembles targeted context from modular, task-relevant components rather than injecting the full contents of files.
The summary below pairs each pattern with the conditions that justify it and the cost it carries:
| Pattern | When to Apply | Key Tradeoff |
|---|---|---|
| Prompt-only | Single file, short session, human in loop | No overhead; no persistence |
| Compaction/masking | Long sessions approaching window limits | Loses detail; masking often outperforms summarization |
| OS-style paging | Tasks requiring full repo understanding | Infrastructure overhead; tier management complexity |
| Parallel sub-agents | Multi-file changes with separable subtasks | Coordination tax; context fragmentation at boundaries |
Spec-Driven Context Anchoring
Long-running, multi-agent workflows need a stable source of truth to prevent drift. AGENTS.md is an open standard for providing persistent context to AI coding agents, functioning as a README written for agents rather than humans. Created by Sourcegraph's Amp team in July 2025, the specification was donated to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation, alongside Anthropic's MCP and Block's Goose.
The problem AGENTS.md solves: teams using multiple AI tools were maintaining separate, diverging configuration files per tool, with the same rules written three or more times in different formats, each drifting independently.
Spec files anchor agent behavior across sessions. Intent implements this principle through living specifications that specialist agents reference during execution. Specialist agents execute in isolated git worktrees, and a verifier agent checks results against the spec before developer review. This prevents the drift that accumulates when agents operate without a persistent reference point.
Three rigor levels apply: spec-first (specification before code), spec-anchored (specification maintained alongside code), and spec-as-source (specification is the primary artifact; code is generated from it).
Context Window Allocation Tradeoffs
Research demonstrates that even when correct content has been retrieved and is present in context, the surrounding context length independently degrades task performance on code generation tasks. The NoLiMa benchmark tested 13 LLMs: 11 of 13 dropped below 50% of baseline performance at 32K tokens.
The maximum effective context window (MECW) is task-type dependent. A model handling simple retrieval at 5,000 tokens may fail on complex operations at 400-1,200 tokens. Intent addresses this through intelligent context selection, assembling precisely scoped context rather than filling windows to capacity, targeting the root cause of degraded performance.
Smaller windows require more deliberate summarization and prioritization but produce more stable per-step results. Anthropic's Claude Code documentation puts it directly: "The context window is the most important resource to manage."
When Context Engineering Is Not Worth the Overhead
Context engineering has real costs. For small tasks, short workflows, or limited file scope, the overhead of building and maintaining context outweighs the benefits.
Martin Fowler documented a critical finding: "Even with all of these files and templates and prompts and workflows and checklists, I frequently saw the agent ultimately not follow all the instructions." Agent compliance does not scale in proportion to context investment.
Some open-source code agent projects have moved away from RAG-based context engineering for coding, replacing it with agentic search. The overhead of maintaining a vector index isn't justified when the agent can search directly.
For single-repository, small-team projects, default context handling built into inline coding tools is architecturally sufficient. Context engineering investment is justified when the scope exceeds workspace-focused defaults: multi-repository enterprise environments, multi-file changes at scale, and long-running agentic sessions requiring state persistence.
Enterprise Scale and Security
Large companies face a specific problem that context engineering solves. Their codebases are too large for any individual to fully understand: hundreds of services, thousands of dependencies and architectural decisions made by people who left years ago. Gartner projects 90% of enterprise engineers will use AI code assistants by 2028, up from under 14% in 2024.
Yet only 29% of developers trust AI tools in 2025, down 11 percentage points from 2024. More developers actively distrust AI accuracy (46%) than trust it (33%). Context engineering directly targets this trust gap by making AI outputs more consistent and predictable.
Intent accelerates developer onboarding by providing instant context across 400,000+ files. When the system understands which services depend on specific response formats, it can predict which teams need to be notified and which tests might break.
Specialist Agents execute in parallel, each running in an isolated git worktree. A Verifier Agent checks results against the spec before developer review. Agents operate with architectural-level understanding using a Coordinator-Implementor-Verifier pattern.
The Coordination Reality
The hardest part of context engineering for multi-agent workflows isn't the technology; it's the coordination. Cognition AI states: “running multiple agents in collaboration creates fragile systems with context fragmentation and dispersed decision-making.”
The implication: start with a single agent. Add multi-agent coordination only when a specific failure mode demands it. When it does, clear responsibilities, structured handoffs, and shared state through Intent's Context Engine prevent the fragmentation that degrades multi-agent performance.
Prompt Engineering vs. Context Engineering Checklist
The decision between prompt engineering and context engineering depends on measurable scope conditions. Use the following checklist to determine the right approach for a given task.
| Condition | Approach |
|---|---|
| Single file, under 10 turns, human reviewing each output | Prompt engineering is sufficient |
| Targeted bug fix or surgical code change | Prompt engineering is sufficient |
| Exploratory prototype or spike | Prompt engineering is sufficient |
| Multi-file changes across 3+ files | Context engineering required |
| Session exceeds 20+ turns or approaches window limits | Context compaction or paging is required |
| Multiple developers need consistent outputs | Spec-driven anchoring required |
| Multi-repository enterprise scope | Full context engineering infrastructure justified |
| Parallel agents working on related subtasks | Coordinated context sharing is required |
| Long-running agentic session with state persistence | Session memory and drift mitigation required |
The Real Risk Isn't Missing Context
Context engineering is a distinct discipline required when AI coding tasks exceed a single-file, single-turn scope. The primary risk is not missing context but poisoned context: stale or incorrect information that actively degrades agent output over time. Start with spec-driven anchoring on your next cross-service change, even if the change seems contained to one service boundary, and make agent context files a maintained artifact rather than a forgotten one.
Intent gives agents architectural-level understanding before your next multi-file refactor.
Free tier available · VS Code extension · Takes 2 minutes
Frequently Asked Questions About Context Engineering
Related Guides
Written by

Molisha Shah
Molisha is an early GTM and Customer Champion at Augment Code, where she focuses on helping developers understand and adopt modern AI coding practices. She writes about clean code principles, agentic development environments, and how teams are restructuring their workflows around AI agents. She holds a degree in Business and Cognitive Science from UC Berkeley.