Graphify hits 63.2K stars: turning codebases into queryable knowledge graphs

Three things worth knowing

Graphify is a YC-backed open-source tool that converts entire codebases into queryable knowledge graphs, now at 63.2K stars and 6.5K forks.
It works across 20+ AI coding assistant platforms, parses code locally via tree-sitter with zero API calls, and stores results as three files your whole team can query.
v0.8.35 added Streamable HTTP transport so teams can share a single graph server, live PostgreSQL introspection, and CodeBuddy support as a new platform.

Ask any AI coding assistant how a specific service connects to your database across a large codebase. It greps through files, misses relationships that aren't obvious from file contents, and gives you a partial answer.

The problem is structural. AI coding assistants read files. They don't build maps of how concepts relate across your project: functions, classes, database tables, API handlers, Terraform resources. All of that context gets reconstructed from scratch on every query.

Graphify pre-computes that map. The YC S26-backed tool converts code, docs, PDFs, images, and videos into a typed knowledge graph that your AI assistant can query, rather than reading files. At 63.2K stars, 6.5K forks, and 130 releases, it's the most widely adopted solution to this problem I've seen in the open-source space.

The safishamsi/graphify GitHub repository showing 63.2K stars, 6.5K forks, and a directory listing including graphify, docs, and tests folders, with the latest release v0.8.35 visible in the sidebar.

What Happened

Developer Safi Shamsi released Graphify v0.8.35 on June 7, 2026. The project has 77 contributors and ships roughly one release per day.

The recent commit wave is worth noting: .slnx solution file support for Visual Studio 2022, a graphify-mcp console script entry point, CodeBuddy platform support, Streamable HTTP transport for the MCP server, and live PostgreSQL schema introspection via --postgres. Ghost duplicate auto-merge at build time also landed in v0.8.33, quietly fixing one of the more common pain points for large repos.

That cadence tells me this is a project with real production usage behind it. Fixes landing daily at this rate reflect genuine developer feedback rather than speculative feature development.

Key Features

Local-first code extraction: All 28 Tree-Sitter language grammars run locally. No API calls for code files. A code-only corpus requires no API key at all and runs fully offline.
Multi-format ingestion: Handles Python, TypeScript, Go, Rust, Java, SQL, Terraform HCL, Salesforce Apex, PDFs, images, video/audio via faster-whisper, Google Workspace files, and live PostgreSQL schema introspection via --postgres.
20+ AI assistant integrations: Ships platform-specific install commands for Claude Code, Cursor, Codex, Gemini CLI, Kilo Code, Aider, Amp, Kiro, Devin CLI, Copilot CLI, CodeBuddy, and others. Each gets a tailored skill file and, where supported, PreToolUse hooks that redirect file reads toward graph queries.
Graph queries from the terminal: graphify query "what connects auth to the database?" and graphify path "UserService" "DatabasePool" work directly. A Streamable HTTP MCP server now serves the same tools over the network, so a single shared process can cover the whole team.
Team workflow with git merge driver: Commit graphify-out/ to your repo. A post-commit hook rebuilds the AST graph after each commit. A custom git merge driver union-merges graph.json so parallel commits never produce conflict markers.
PR impact analysis: graphify prs --triage ranks your review queue by graph impact. graphify prs --conflicts flags PRs touching overlapping graph communities, surfacing merge-order risk.

Why It Matters

Large codebases spread context across hundreds of files, schemas, config, and docs. AI assistants typically handle this by reading files one at a time or using embedding-based search. Graphify builds a persistent, structured graph with typed edges (imports, calls, contains) and confidence tags (EXTRACTED, INFERRED, AMBIGUOUS). The assistant can answer structural questions like "what depends on this service" without scanning the full repo each time.

The confidence tagging is the part I find most useful in practice. Every inferred relationship is tagged, so developers know what was found from code versus what was guessed. That changes how much you trust the assistant's structural answers.

For teams, the shared graph means every developer's assistant starts with the same map. New team members get architecture context on day one. Code-only rebuilds cost zero API calls, which removes a real barrier for large repos running this in CI.

Example Use Case

A team maintains a Python and TypeScript monorepo with a PostgreSQL database and Terraform infrastructure. They run graphify extract --postgres "postgresql://user:pass@host/db" to pull the live schema, then /graphify . inside Claude Code. The resulting graph connects Python service classes to database tables to Terraform resource definitions in a single queryable structure.

When a developer asks "what breaks if I rename the users table," the assistant traces edges from the table node through SQLAlchemy models, API route handlers, and the Terraform RDS configuration, all without reading each file individually.

This is the query I'd run with a team that's been spending half a sprint tracing dependencies manually before touching anything in the data layer.

Competitive Context

Cursor users get Graphify via .cursor/rules/graphify.mdc with alwaysApply: true, so every conversation automatically includes graph context. Claude Code gets deeper integration: PreToolUse hooks fire before file reads and search commands, redirecting Claude toward graphify query instead of reading raw files one by one. Claude Code also supports parallel subagent dispatch for extraction, while platforms like Aider and OpenClaw still run sequential extraction.

Cursor's built-in codebase indexing and Claude Code's native file access index files for retrieval. Graphify builds typed relationships with explicit edges, community detection, and confidence scoring. The two aren't competing directly; Graphify adds a structural layer on top of whatever retrieval your existing tools provide.

For teams on large, multi-language projects, the graph surfaces connections that file-level search misses. For teams where the codebase spans a database schema and infrastructure config alongside application code, that structural layer is where Graphify earns its keep.

My Take

Graphify converts codebases, schemas, docs, and media into a single knowledge graph that any major AI coding assistant can query. If your repo spans multiple languages, a database schema, and infrastructure config, the setup investment is worth it.

Open source

augmentcode/auggie★258

Star on GitHub

The 1.2M PyPI downloads and daily release cadence tell me this isn't experimental. The Streamable HTTP transport in v0.8.35 is the feature I'd watch for enterprise teams: one shared graph server for the whole team removes a real per-developer setup burden.

[ Free report ]

The Agentic SDLC

How teams like Stripe, Ramp, and Uber move from solo coding agents to a coordinated, team-level system.

Download the guide