claude-mem v13.8.0: Persistent Agent Memory Across Sessions

Three things worth knowing

claude-mem v13.8.0 shipped June 21, 2026, giving coding agents persistent, searchable memory that survives session resets, with 83.9k stars and 288 releases.
It works as a plugin for Claude Code, Gemini CLI, Codex, OpenCode, OpenClaw, and Copilot, capturing tool usage and automatically injecting compressed context into future sessions.
A three-layer MCP search pattern and token-cost telemetry mean the memory system is built to be measurable, not just convenient.

Most coding agents start every session with no idea what you did yesterday. You explained your auth setup last Tuesday, walked through the data model on Wednesday, and debugged a session store race condition Thursday morning. None of that carries over. You open a new session and start from scratch.

claude-mem fixes that by capturing what your agent does during a session, compressing it with AI, and injecting the relevant pieces into future sessions automatically. It's a plugin, not a replacement: it sits on top of whatever agent you're already using.

The thedotmack/claude-mem GitHub repository showing 83.9k stars, 7.2k forks, and a directory listing including the plugin, cursor-hooks, and ragtime folders.

What Happened

Version 13.8.0 shipped June 21, 2026. The thedotmack/claude-mem repository now holds 83.9k stars, 7.2k forks, 124 contributors, and 288 releases, with the most recent commit co-authored by @claude.

Install with one command: npx claude-mem install. One thing to know before you try: npm install -g claude-mem only pulls the SDK library. It doesn't register the lifecycle hooks or set up the worker service. Use the npx installer or run /plugin marketplace add thedotmack/claude-mem inside Claude Code.

Requires Node.js 20.0.0+. Bun and uv auto-install if missing. The default model is claude-haiku-4-5-20251001.

Key Features

Five lifecycle hooks: SessionStart, UserPromptSubmit, PostToolUse, Stop, and SessionEnd capture session data at the right moments. PostToolUse is where the real signal is: it records every Read and Edit the agent makes.
Worker service and web viewer: An HTTP API runs at http://localhost:37777 (port uses a per-user formula to avoid collisions), with a web viewer to browse your memory stream in real time.
Three-layer MCP search: search returns a compact index at 50-100 tokens per result; timeline gives chronological context; get_observations fetches full details for filtered IDs at 500-1,000 tokens each. The repo claims roughly 10x token savings over naive injection, and recent telemetry work tracks cost per observation, so you can verify that yourself.
Hybrid search: SQLite FTS5 for keyword queries, Chroma for semantic retrieval. Both run locally.
Privacy control: Wrap content in <private> tags to exclude it from storage. Fully private batches skip the provider call entirely.
Multi-language mode config: Set CLAUDE_MEM_MODE to control workflow behavior and the language of generated observations. Simplified Chinese (code--zh) and Japanese (code--ja) ship built-in.

Why It Matters

Every session you spend re-explaining your architecture is time you're not spending on the actual work. At one session a day over a two-week sprint, that adds up to a meaningful chunk of agent time burned on setup.

Token economics are worth paying attention to if your team is monitoring API costs. The three-layer search pattern means the agent first scans a compact index, then fetches full observations only for the relevant results. The recent telemetry additions in the repo track cost per observation, so you can see actual savings rather than take the maintainer's word for it.

Apache-2.0 licensing was a deliberate choice. The README explains that it was picked specifically so that memory can be embedded in developer tools, MCP servers, enterprise systems, and production agent harnesses without licensing friction. For teams thinking about building on top of this, that matters.

Example Use Case

You spend a morning debugging a JWT authentication issue in a TypeScript service. The agent reads your auth middleware, edits the token validation logic, and fixes a race condition in the session store. The session ends.

Three days later, you open a new session to add refresh-token rotation. claude-mem injects the relevant context: the agent already knows where your auth code lives, what the prior bug was, how the session store works. You skip straight to the new work. If you want to pull up specifics, run search(query="authentication bug", type="bugfix", limit=10), check the index, then call get_observations(ids=[123, 456]) for the full details on the exact fixes.

I'd walk any team running multi-day Claude Code projects through this flow. One install command, and the payoff shows up the first time you don't have to re-explain something you already fixed.

Competitive Context

Claude Code and Copilot maintain context within a session but don't, by default, persist structured, searchable memory across sessions. claude-mem adds that layer without replacing the underlying agent: it integrates with Claude Code as a plugin and lists Copilot among its supported agents.

Open source

augmentcode/augment.vim★611

Star on GitHub

The distinction from Claude's context window is durability. A context window resets. claude-mem writes observations to SQLite and a Chroma vector store, so memory survives restarts, reconnects, and new sessions. Claude Code gives you a session; claude-mem gives you a project history.

Supporting six agents with one install also sets it apart from memory tools tied to a single platform. Claude Code, Gemini CLI, OpenCode, Codex, OpenClaw, and Copilot all work out of the box.

My Take

83.9k stars is a lot for a tool that solves what looks like a narrow problem. Memory persistence sounds like a nice-to-have until you've spent a week on a project with an agent and tracked how much time it takes to re-establish context.

288 releases in what appears to be a fairly short project history is fast iteration. The telemetry additions in recent commits suggest the maintainer is trying to prove the efficiency claims with actual data, which I'd rather see than assertions.

What I'm still curious about: whether team-shared memory changes how developers coordinate, or whether it mostly reduces the re-explanation tax for individuals. That distinction matters a lot for how you'd want to architect memory at the org level.

[ Coming up next ]

The New Code Review Workflow for AI-Native Engineering Teams

See how leading teams keep code review fast and rigorous as AI writes more of the code.

Save your seat

— Thu, Jul 9 // 9:45 AM PDT