Your Agent's Context Is a Junk Drawer

There's a GitHub repo for sharing AI coding agent rules. It has 37,800 stars. It has 68 contributors.

That's a 556-to-1 ratio. For every person who contributed a rule, 556 people copied one without reading it.

This is the state of AI agent configuration in 2026. Developers downloading context packs like npm packages, stacking markdown files they didn't write, wondering why their agent keeps ignoring instructions.

The copy-paste problem

Open a typical project that's been through a few months of AI-assisted development. You'll find some combination of CLAUDE.md, .cursorrules, copilot-instructions.md, AGENTS.md, and maybe a gemini.md for good measure. Almost the same content in each one. Slowly drifting apart. Technically required by a different tool.

One developer described it as "confetti in the root directory." Another resorted to symlinks to keep five config files in sync. A third built a CLI tool with 156 validation rules across 28 categories because AI config files now need their own linter.

The pattern is familiar if you've been around long enough. Someone publishes a "starter template," thousands of people copy it, nobody audits it, and six months later everyone's debugging configuration instead of shipping code. We did this with webpack. We did this with Docker Compose.

The difference this time: a bad webpack config made your build slow. A bad agent config makes your agent dumber.

The research says stop

In February 2026, researchers at ETH Zurich published a paper evaluating AGENTS.md files across multiple coding agents and LLMs. The finding was blunt.

Context files reduce task success rates compared to providing no repository context, while increasing inference cost by over 20%.

Adding context files made agents perform worse than giving them nothing. And it cost more.

The paper's author clarified on Hacker News that even human-written context files only improved performance by about 4%, and that improvement wasn't consistent across models. On Sonnet 4.5, performance actually dropped by over 2%.

CodeIF-Bench tested instruction-following in interactive code generation across multi-turn sessions. One of their key findings: "additional repository context" actively degraded models' ability to follow instructions. More context, worse compliance. The researchers identified context management as the critical unsolved problem.

ConInstruct (AAAI 2026) went further. They tested whether models can even detect conflicting constraints in their instructions. Claude 4.5 Sonnet scored 87.3% F1 at detecting conflicts. Not bad. But here's the problem: even when models spotted the contradiction, they almost never flagged it to the user. They just silently picked one interpretation and kept going. Your config file says "use tabs" in one section and "use spaces" in another. The model notices. It doesn't tell you. It just picks.

PACIFIC confirmed the sequential version of the same problem. As instruction chains get longer in code tasks, even state-of-the-art models lose track. The framework generates benchmarks of increasing difficulty, and the results are consistent: more sequential instructions, more failures. Even among advanced models.

Your AGENTS.md has how many instructions?

Anthropic knows this. Their own docs warn: "Bloated CLAUDE.md files cause Claude to ignore your actual instructions." Karpathy said it plainly: "Too much or too irrelevant and the LLM costs might go up and performance might come down."

Birgitta Böckeler, writing on Martin Fowler's site: "An agent's effectiveness goes down when it gets too much context, and too much context is a cost factor as well."

More rules, worse output.

Why we do it anyway

Because we don't trust the agent.

Stack Overflow's 2025 survey: 84% of developers use or plan to use AI tools. Only 29% trust them. Down from 40%.

When you don't trust something, you over-specify. You write a 200-line AGENTS.md explaining your folder structure because you don't believe the agent can figure it out. You add coding style rules your linter already enforces. You paste in architecture docs the agent could read from the repo itself.

Two years ago, this made sense. Early agents were genuinely blind. They couldn't see your codebase. You had to explain everything.

That muscle memory stuck. But agents got better. Context engines got better. The tools now read your code, your dependencies, your git history, your file structure. They derive patterns automatically. Developers are still writing instructions for the blind version.

Tim Sylvester nailed the frustration cycle: "You write down these extensive lists of rules. The agent dutifully ignores them. You call it out. 'You're right to call me out!' it chirps, and apologizes. These are empty apologies it performs by rote. Many of us have been in relationships like this before."

That last line lands because it's true. The instinct when something ignores you is to repeat yourself louder. More rules. More detail. More emphasis. It doesn't work with people and it doesn't work with agents.

The research says that's exactly backwards.

The two buckets

The fix is knowing which context goes where.

What the agent can already see. Your code, your file structure, your dependencies, your git history. A good context engine reads all of this. You don't need to restate it in a markdown file. That's like writing a README for a coworker who already has the repo cloned.

What the agent can't see. How to deploy. How to run tests. Team conventions that live in people's heads, not in linter configs. What your staging environment looks like. Why you made that weird architecture decision three months ago.

Most people use the second category's tools for the first category's problems. They write AGENTS.md files describing their code structure. They add rules explaining API patterns that are already visible in the code. The agent knows. You're adding noise.

A good context engine reads your codebase so you don't have to explain it. The less you tell the agent about what it can already see, the more attention budget remains for the things it genuinely can't figure out.

What actually works

Vercel ran evals on Next.js 16 APIs comparing two approaches: skills (on-demand retrieval) and AGENTS.md (passive context). Skills produced zero improvement over baseline. The agent had access to the docs but never bothered to look at them.

Then they tried something dumber. They compressed their entire docs index into an 8KB AGENTS.md file. Not the full documentation. Just an index pointing to retrievable files. 100% pass rate across build, lint, and test.

40KB compressed to 8KB. Perfect score. The "dumb" approach won.

Jan-Niklas Wortmann went through a similar arc. Started with 80+ lines of aspirational rules. Cut to 30 lines of failure-backed instructions. "Dramatically better behavior." The pruning rubric he landed on: "Failure-backed? Tool-enforceable? Decision-encoding? Triggerable? If it fails all four, delete it."

Start with nothing. Add what prevents failures. Verify it actually helps.

What to delete

Open your AGENTS.md or CLAUDE.md right now. For each line, ask: would the agent make a mistake without this?

If no, delete it.

Things that almost certainly don't belong:

Your folder structure. The agent can see it. Your tech stack. It's in package.json or Cargo.toml or go.mod. Coding style rules your linter already enforces. ("Never send an LLM to do a linter's job.") API patterns visible in your existing code. Generic best practices like "write clean code" or "follow SOLID principles." The agent was trained on the internet. It knows.

What should stay: build, test, and lint commands. Deploy steps. Environment setup. Team conventions that live in people's heads. Known gotchas. Architecture decisions that aren't obvious from reading the code.

DHH made the connection explicit: "Convention over configuration set the path for 20+ years of great training data for AI to use today." If your codebase follows conventions, the agent already understands them. You don't need to re-explain Rails to an agent trained on every Rails app on GitHub.

The best agent setup isn't the one with the most files. It's the one where every line prevents a specific failure.

The attention budget

Your agent's system prompt already contains dozens of instructions. Every benchmark from the last year tells the same story: instruction-following degrades as constraint density increases. CodeIF-Bench showed it in interactive coding. PACIFIC showed it in sequential code tasks. ConInstruct showed models silently ignore conflicts rather than ask. That leaves a narrow window for your AGENTS.md, your skills, your plugins, and your actual prompts. Combined.

Every line you add pushes something else out. A rule about folder structure displaces a rule about deploy steps. A generic best practice crowds out a project-specific gotcha. You're choosing what gets ignored.

Treat every line like ad space. It has to justify its rent.

Convention over configuration, again

The Rails community solved a version of this twenty years ago. Before Rails, you configured everything. Database mappings. URL routing. File locations. All explicit, all manual. Rails said: follow the convention and skip the config. The framework figures it out.

Agents are getting there. The tools derive context from your codebase now. Most developers haven't updated their habits to match.

The test

Open your AGENTS.md right now. For every line, ask: does this prevent a failure the agent would actually make?

If you can't point to the failure, delete the line.

You'll notice the difference when it actually follows the ones that remain.