Three things worth knowing
- A single GitHub repo has compiled the internal system prompts of 28+ major AI coding tools, including Cursor, Windsurf, Claude Code, Augment Code, and Devin AI, and it now has 134K stars.
- For the first time, developers can read the actual instructions each tool sends to its model, rather than relying on marketing copy or guesswork. In practice, this changes how you evaluate and build with these tools.
- This also raises a real security question for AI startups: if your prompts are this extractable, how are you protecting them?

There's a GitHub repository I keep coming back to whenever someone asks me how to evaluate AI coding tools. It collects the raw system prompts of nearly every major AI coding assistant, and it just crossed 134,000 stars and 33,700 forks.
Maintained by developer Lucas Valbuena, the repo exposes how tools like Cursor, Windsurf, Claude Code, Augment Code, and Devin AI actually instruct their underlying models. If you've ever wondered why one tool refuses a certain request or formats output differently from another, this is where you go to find out. It's the closest thing to a public teardown of the competition that I've seen, and I think most developers are sleeping on it.
What Happened
The repository system-prompts-and-models-of-ai-tools has been accumulating extracted system prompts since early 2025. As of its latest update on March 28, 2026, it has 489 commits across 28 contributors and covers more than 28 distinct AI tools.
What makes this more useful than most repos like this is the depth. The JSON tool schemas are in there too, not just the prompt text. That's the part that actually shows you what a tool has been given permission to do. That combination is rare, and it's what makes comparison meaningful rather than superficial.
What stood out to me is that it's still being actively updated. Recent commits include changes to v0's prompt (March 8, 2026) and Anthropic's Claude Sonnet 4.6 prompt (March 4, 2026). That cadence matters; a snapshot from a year ago would tell you very little about how these tools behave today.
Key Features
- Full prompt text for 28+ tools. Each tool has its own directory with raw system prompts, sometimes across multiple versions. Cursor includes an "Agent Prompt 2.0," which alone is worth reading if you're building anything agent-based.
- Tool and function call schemas. Several entries go beyond the prompt text to include the JSON definitions of internal tools. Windsurf's entries run through "Tools Wave 11," and Augment Code includes GPT-5 tool definitions. I find the schema files more revealing than the prompts themselves; they show you what capabilities a vendor actually prioritized shipping.
- Version history. With 489 commits, you can diff the prompts to see how they have changed over time. This is where it gets interesting: watching how vendors quietly adjust their instructions tells you a lot about where they're struggling, iterating, or quietly rolling back decisions.
- Full coverage across tool types. IDE agents (Cursor, Windsurf, VSCode Agent, Xcode), autonomous agents (Devin AI, Manus), and app builders (Lovable, Replit, v0) are all represented. That breadth is what makes it a real reference, not just a curiosity.
Why It Matters
System prompts are the hidden layer that actually shapes how an AI coding tool behaves. They define what the model prioritizes, what it refuses, how it formats output, and which tools it can call. Before repos like this surfaced, there was no real way to compare that across products. You were just taking vendors at their word.
Testing Gemini 3.1 Pro on real engineering work (live with Google DeepMind)
Apr 35:00 PM UTC
The real shift here is that tool evaluation no longer has to be purely experiential. You don't have to spend weeks with each product to form a view; you can read what each tool actually sends to its model and make a more informed call before you commit.
- You can see whether a tool defaults to cautious refusals or aggressive code generation, whether it has file system access, and how it handles multi-step tasks.
- For anyone building AI-powered developer tools, these prompts are a reference architecture. Real patterns for tool-use schemas, context management, and agent behavior that have already shipped to millions of users.
- The version history layer makes it a living record, not a static snapshot, which means it gets more useful over time, not less.
There's also a prompt-security angle I'm seeing come up more often in conversations. The repo maintainer flags that exposed prompts can become attack surfaces and links to a service called ZeroLeaks that aims to identify extraction risks. A year ago, this felt theoretical. I don't think it does anymore, and if you're an AI startup that hasn't thought about prompt extraction, this repo is a good reminder to start.
Example Use Case
Say you're building a VS Code extension that uses Claude to assist with TypeScript refactoring. You need to decide how to structure your system prompt: what context to include, how to handle file references, and what tool calls to expose. This is exactly the kind of problem I'd point someone to this repo for.
Instead of starting from scratch, open the Cursor Prompts and VSCode Agent directories side by side. See how Cursor's Agent Prompt 2.0 structures tool definitions versus how the VSCode Agent approaches the same problem. Borrow the patterns that fit, skip the ones you've already seen cause issues in tools you've tested yourself.

What this makes easier is skipping the trial-and-error phase that eats up most of the time in early prompt engineering. You're not guessing at best practices, you're reading what's already working at scale.
Competitive Context
Reading these prompts side by side, a few things stand out that don't show up in any product comparison chart.
Cursor and Windsurf both ship with detailed agent prompts and extensive tool schemas. Windsurf's collection, running through 11 "waves" of tool definitions, suggests a team that has been aggressively iterating on its function-calling surface, likely in response to real failure modes they were seeing in production. Cursor's second-generation agent prompt points to a meaningful rewrite, not just incremental tuning. That kind of structural change usually signals a core UX problem they were trying to fix.
Augment Code's inclusion of a GPT-5 tools JSON file is the kind of detail I wouldn't have caught without this repo. It confirms multi-model provider support in a way the product page doesn't surface directly, which likely means more flexibility in how Augment routes tasks based on model availability or cost.
Devin AI's DeepWiki prompt points to a knowledge-retrieval layer beyond basic code generation. This suggests Devin is investing more in structured context than in raw generation, which makes sense for an autonomous agent that needs to reason across large codebases, not just complete single-file tasks.
Claude shows up both as a standalone model (with Sonnet 4.6 entries) and as the backend for several other tools in the collection. That's consistent with what I'm seeing in the broader ecosystem. Claude is increasingly the default inference layer that other products build on, rather than a standalone product.
My Take
If you're not reading these prompts, you're missing the point about how these tools actually work. Marketing pages tell you what a tool can do. System prompts tell you how it's been instructed to do it, and those are very different things.
For developers choosing among Cursor, Windsurf, Augment Code, or Devin, this repo offers a signal that no feature comparison chart can match. For teams building AI dev tools, it's the reference library I wish existed when I was starting out. And for AI startups still treating their prompts as proprietary secrets: they're probably not as protected as you think, and this repo is evidence of that.
Curious how Augment Code's prompts compare? See for yourself.
Free tier available · VS Code extension · Takes 2 minutes
in src/utils/helpers.ts:42
Written by

Ani Galstian
Developer Evangelist