Three things worth knowing
- A single GitHub repo collecting internal system prompts from 30+ major AI coding tools has just hit 136K stars and 34K forks.
- For the first time, developers can read the actual instructions each tool sends to its model, rather than relying on marketing copy or guesswork.
- This raises both a practical question (what are these tools actually doing?) and a security one (are your own prompts exposed?).
There's a GitHub repository I keep coming back to whenever someone asks me how to actually evaluate AI coding tools. It collects the raw system prompts of nearly every major AI coding assistant, and it just crossed 136K stars and 34K forks.
Maintained by developer Lucas Valbuena, the repo exposes how tools like Cursor, Windsurf, Claude Code, Augment Code, and Devin AI actually instruct their underlying models. If you've ever wondered why one tool refuses a certain request or formats output differently from another, this is where you go to find out. It's the closest thing to a public teardown of the competition that I've seen, and I think most developers are sleeping on it.

What Happened
The repository x1xhlol/system-prompts-and-models-of-ai-tools has been accumulating system prompts since early 2025. As of its latest update on April 17, 2026, it covers over 30 AI tools with 490 commits across 28 contributors.
What makes this more useful than most repos like this is the depth. The JSON tool schemas are in there too, not just the prompt text: things like Augment Code's gpt-5-tools.json. That's the part that actually shows you what a tool has been given permission to do, which is often more revealing than the instructions themselves.
What stood out to me is that it's still being actively updated. That cadence matters: a snapshot from a year ago would tell you very little about how these tools behave today.
Key Features
- Full prompt text for 30+ tools. Each tool has its own directory with raw system prompts, sometimes across multiple versions. Cursor includes an "Agent Prompt 2.0" added in November 2025, which alone is worth reading if you're building anything agent-based.
- Technical depth beyond prompts. Several entries include JSON tool-definition files for Augment Code, Leap.new, Traycer AI, and Windsurf. Augment Code's
gpt-5-tools.jsonand Amp's separate prompts for Sonnet and GPT-5 reveal which models power which features. I find the schema files more revealing than the prompts themselves. - Version history. With 490 commits, you can diff the prompts to see how they have changed over time. Watching how vendors quietly adjust their instructions tells you a lot about where they're struggling, iterating, or quietly rolling back decisions.
- Full coverage across tool types. IDE agents (Cursor, Windsurf, VSCode Agent, Xcode), autonomous agents (Devin AI, Manus), app builders (Lovable, v0, Replit, Same.dev), and search tools (Perplexity) are all represented. That scope is what makes it a useful reference across different kinds of builds.
Why It Matters
System prompts are the invisible layer between a foundation model and the product you interact with. They define tone, capabilities, access to tools, safety constraints, and workflow logic. Before repos like this surfaced, there was no real way to compare that across products. You were just taking vendors at their word.
Tool evaluation used to mean spending weeks with each product before forming a view. Now you can read what each tool actually sends to its model before you commit to anything.
- You can see whether a tool defaults to cautious refusals or aggressive code generation, whether it has file system access, and how it handles multi-step tasks.
- A prompt that tells the model to "always explain before editing" will produce a different experience than one that says "edit files directly without asking." That difference shows up in your workflow every day.
- For anyone building AI-powered developer tools, these prompts are a reference architecture: real patterns for tool-use schemas, context management, and agent behavior that have already shipped to millions of users.
There's also a prompt-security angle I'm seeing come up more often in conversations. The repo maintainer flags that exposed prompts can become attack surfaces and links to a service called ZeroLeaks that aims to identify extraction risks. A year ago, this felt theoretical. I don't think it does anymore.
Example Use Case
Say your team is evaluating AI coding agents for a large TypeScript monorepo. You want an agent that can read multiple files, run tests, and make cross-file edits without excessive confirmation dialogs.
Pull up the Cursor Agent Prompt 2.0 and compare it against the Devin AI and Augment Code system prompts in the repository. You can read exactly how each tool instructs its model to handle multi-file edits, test execution, and user confirmation flows. By comparing how each tool's prompt defines available operations and multi-file editing behavior, you can assess architectural differences before committing to a paid plan.
This is the research I'd do before any serious tool evaluation. It takes 30 minutes and tells you more than a week of trial accounts.
Competitive Context
The repository puts tools side by side in a way their own documentation doesn't. Cursor and Windsurf both ship as VS Code-based AI editors, but their system prompts reveal different philosophies regarding agent autonomy, tool schemas, and the way they structure multi-step tasks. Windsurf's directory includes tool definitions labeled Wave 11, indicating a versioned iteration of its tool-calling interface.
The Anthropic directory includes model-specific prompts, such as Claude Sonnet 4.6. Augment Code's entry includes both prompts and GPT-5 tool definitions, confirming multi-model support in a way the product page doesn't explicitly state. Devin AI's directory contains a separate DeepWiki prompt alongside its main system prompt, suggesting the tool uses more than basic code generation.
No vendor publishes their full system prompt alongside competitor prompts for easy comparison. This community-maintained collection does.
My Take
This repository gives developers direct access to the instructions that define how today's AI coding tools actually behave. If you're choosing between Cursor, Windsurf, Claude Code, Augment Code, Devin, or any of the 25+ other tools listed, reading their system prompts is now the most concrete way to compare them.
For AI startups, the other takeaway is harder to ignore: your prompts are probably not as private as you think, and this repo is the evidence. Whether that changes how you design your system or not, it's worth knowing.
Intent is built to work with your codebase from day one, not around it.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
Written by

Paula Hingel
Technical Writer
Paula writes about the patterns that make AI coding agents actually work — spec-driven development, multi-agent orchestration, and the context engineering layer most teams skip. Her guides draw on real build examples and focus on what changes when you move from a single AI assistant to a full agentic codebase.