Long-term memory for AI coding agents. Quit Claude Code mid-task, start OpenAI Codex in the same directory, continue without re-explaining the architecture, the failed approaches, or the open questions.
What it is
LLM coding agents lose all context when a session ends. ai-memory gives them a shared, persistent wiki: every prompt, tool call, and decision is captured automatically; when a session ends, the relevant pages get rewritten as a coherent narrative; when the next agent starts (Claude Code, Codex, OpenCode, …) it sees a handoff with "where you left off" already prepended.
The wiki is plain markdown in a git repo - grep-able, openable in
Obsidian, backed up with rsync. No vector database to babysit, no
write_note ceremony, no manual context-loading. The full design is
in docs/ARCHITECTURE.md; the influences and
priors are at the bottom.
Key features
- Zero-friction capture. Lifecycle hooks fire-and-forget every
prompt + tool call + session boundary. You never type
write_note. - Cross-agent handoffs. Quit Claude Code mid-task, start Codex in the same directory hours later - the next agent sees a "where you left off" block before its first prompt.
- Per-project isolation by construction. Each project lives at
<wiki_root>/<workspace_id>/<project_id>/…keyed by stable UUIDs. By defaultworkspace = "default"andproject = basename($cwd). Drop a.ai-memory.tomlmarker file in any ancestor directory to override either or opt into repo-root project identity — perfect for multi-client consultancies, work/personal split, mono-repos, or linked git worktrees. Same page path can exist in two projects without collision; a rename is one column update; a purge is onerm -rf. - Karpathy-style LLM wiki. Pages are compiled from observations
at session-end (or PreCompact), not retrieved over raw logs.
Supersession chain + git-versioned markdown means you can
time-travel with
git log. - Built-in
/webbrowser. Read-only HTML UI for the wiki - project list, folder tree, FTS5 search, markdown rendering, dark mode. Mounted on the same axum server as MCP. - Multi-agent + multi-machine ready. Supported clients: Claude
Code, Codex, OpenCode, Cursor, Claude Desktop (via
mcp-remote), Gemini CLI, Antigravity CLI, OpenClaw, and Oh My Pi / OMP (pi/ompaliases). Server runs local (loopback) OR on a homelab box (LAN/VPN/cloud) with bearer-token auth. - Thin-client CLI.
ai-memory status,bootstrap,purge-project,rename-project,lint,embed,forget-sweep,backupare all HTTP clients of the running server - never touch SQLite or wiki files directly.statusalso reports passive LLM/embedding provider health from the last real provider call. Server is the single source of truth. - LLM is opt-in. Zero-LLM mode still gives you FTS5 search + rule-based summarisation. Add a provider when you want consolidated pages and lint contradictions.
Use cases
- "Quit at 4 PM, pick up at 9 AM in a different agent." The classic. SessionStart hook in the next agent (any of the supported CLIs) prepends a typed handoff with the open questions, next steps, and a session summary.
- "What did we decide about X six weeks ago?" Type
memory_query Xfrom the agent (orai-memory search Xfrom a terminal) - FTS5 over the wiki. Pages are LLM-consolidated, so the hit is a coherent decision page, not a raw chat log. - "Remember this permanently." When something is worth keeping
beyond auto-captured session logs - a decision, a convention, a
gotcha - tell the agent "save a permanent note that we standardised
on Postgres for X" or "annotate this as a project rule" and it calls
memory_write_pageto write a durable, git-versioned wiki page. From a terminal it'sai-memory write-page --path decisions/0007-db.md --title "Standardised on Postgres" --body "..." --pinned(--pinnedexempts it from the decay sweep). Unlike a handoff (single-use) or an auto-synthesised session page (rewritten on consolidation), a write-page note is yours: it shows up inmemory_query, renders in/web, and stays until you change it. - "This new project has months of history before ai-memory."
cd /path/to/my-project && ai-memory bootstrapcollectsgit log, README,docs/, module headers, project rules and one-shot-summarises them into seed wiki pages. Future sessions build on top. - "Run one ai-memory for the whole household." Stand the server
up on a homelab box at
0.0.0.0:49374with a bearer token; every laptop/desktop talks to it. Per-cwd routing keeps each project's pages cleanly separated; the/webUI is reachable from a browser anywhere on the LAN. - "Audit what landed before sharing with a teammate." Browse
the wiki at
http://<server>:49374/web- HTTP Basic dialog if auth is on, paste the token as password. Per-project tree view, rendered markdown, supersession chain visible per page. - "Drop an experiment, keep the rest."
ai-memory purge-project --project experimental --confirm. Atomic: that project's DB rows cascade away, its wiki subdir getsrm -rf'd, every sibling project is untouched by construction.
Quick start
Arch Linux (AUR)
For native Arch installs, use the AUR packages. They install
/usr/bin/ai-memory, packaged hook sources, and both system-level and
user-level systemd units.
yay -S ai-memory-bin # prebuilt Linux x86_64/aarch64 binary
yay -S ai-memory # builds from source
Single-user workstation:
mkdir -p ~/.config/ai-memory ~/.local/share/ai-memory
ai-memory --data-dir ~/.local/share/ai-memory \
--config ~/.config/ai-memory/config.toml init
systemctl --user enable --now ai-memory.service
ai-memory install-mcp --client claude-code --apply
ai-memory install-hooks --agent claude-code --apply
System service installs use /var/lib/ai-memory and /etc/ai-memory/ via the
packaged unit. Full user-service, system-service, auth, and provider setup is in
docs/install.md#arch-linux-native-packages-aur.
Docker
You need: Docker + an agent CLI (Claude Code, Codex, OpenCode, OMP, Cursor, Antigravity CLI, or anything else that speaks MCP).
The published Docker image includes linux/amd64 and linux/arm64 variants,
so Apple Silicon Macs and ARM64 Linux hosts can pull akitaonrails/ai-memory
without --platform linux/amd64 emulation.
The default quick-start has no authentication - the server binds to loopback only, so on a single-user laptop nothing else can reach it. Adding a bearer token is a one-line change once you're ready to expose the server on the LAN; see Security below.
# runs the binary inside docker with your $HOME mounted). This is
# the only thing that needs to live on the host filesystem.
mkdir -p ~/.local/bin
curl -fsSL https://raw.githubusercontent.com/akitaonrails/ai-memory/main/bin/ai-memory \
-o ~/.local/bin/ai-memory
chmod +x ~/.local/bin/ai-memory
# Most distros put ~/.local/bin on PATH automatically. If `which
# ai-memory` comes up empty, add this to ~/.bashrc / ~/.zshrc:
# export PATH="$HOME/.local/bin:$PATH"
# 2. Start the server. `--restart unless-stopped` makes it come back
# on docker daemon restart and on machine boot (provided your
# docker service is enabled at boot — `sudo systemctl enable
# docker` on most distros). Loopback-only bind (`127.0.0.1:49374`)
# so nothing outside this machine can reach it. Omit the LLM /
# EMBEDDING lines for zero-LLM mode — FTS5 search still works
# without any keys.
docker run -d --name ai-memory \
--restart unless-stopped \
-p 127.0.0.1:49374:49374 \
-v ai-memory-data:/data \
-e AI_MEMORY_LLM_PROVIDER=anthropic \
-e ANTHROPIC_API_KEY=sk-ant-... \
-e AI_MEMORY_EMBEDDING_PROVIDER=openai \
-e OPENAI_API_KEY=sk-... \
akitaonrails/ai-memory:latest
# 3. Wire your agent CLI in two commands. The wrapper takes care of
# mounts + auto-detecting ~/.claude/settings.json. Re-run with
# `--agent codex`, `--agent opencode`, `--agent gemini-cli`,
# `--agent omp`/`pi`, `--client cursor`, `--client gemini-cli`, etc.
# for additional agents; full list in docs/install.md.
ai-memory install-mcp --client claude-code --apply
ai-memory install-hooks --agent claude-code --apply
On Linux/macOS, that's it. Start a Claude Code session as usual - every prompt and tool call now lands in ai-memory, and the next session you open in this project will see a handoff with where you left off.
The install-mcp / install-hooks commands use
AI_MEMORY_SERVER_URL / AI_MEMORY_AUTH_TOKEN when set; otherwise
they default to http://127.0.0.1:49374 (matching the server above)
and no bearer token. If hooks are installed after an ai-memory MCP
entry already exists, install-hooks reuses that endpoint so a remote
MCP setup cannot silently regenerate loopback-only hooks. Both commands
are idempotent - re-runs replace ai-memory's entry, preserve every
other server / hook you have configured, and write a timestamped
.bak-<ts> next to the file before each modifying write. The hook
scripts are staged into ~/.local/share/ai-memory/hooks/<agent>/
automatically; re-running overwrites them so future image updates ship
updated hooks. Drop --apply to print the snippet instead of mutating.
The Docker wrapper also bridges thin-client commands such as
ai-memory status and ai-memory bootstrap back to the host's
loopback server. With the local Docker quick start above, no
AI_MEMORY_SERVER_URL override is needed.
To remove ai-memory later, run ai-memory uninstall --apply from the
same host environment. It removes ai-memory-owned config entries and
generated plugin files only after matching their ai-memory signatures;
use --mcp-url if you installed MCP with a custom endpoint, and
--mcp-name only when you need to narrow removal to one matching entry.
Install Notes
- Windows: use the Linux path inside WSL2, or the native Windows wrapper
from PowerShell/cmd. Native Claude Code uses Git Bash
.shhooks; other script-hook agents use PowerShell defaults. Do not mix path worlds. Seedocs/windows.md. - Docker compose:
docker compose -f docker/docker-compose.yml up -dis supported; agent setup is the same as step 3 above. - Remote server: set
AI_MEMORY_SERVER_URL=http://<server-ip>:49374andAI_MEMORY_AUTH_TOKEN=<token>on the client before installing MCP/hooks. Explicit--server-urlflags still work, but are no longer required when the env vars are set. Any non-loopback server should use bearer auth. - Upgrades: run
ai-memory upgradeto refresh the wrapper, image, and staged hook scripts. Redeploy remote servers separately.
For Codex, OpenCode, OMP, Cursor, Claude Desktop, Gemini CLI, Antigravity CLI,
OpenClaw, curl-based hook installs, source builds, CLI env vars, and the full
subcommand reference, see docs/install.md.
Security
Loopback-only (127.0.0.1:49374) with no auth is the default because
it is safe for a single-user laptop: no process outside the machine can
reach the server.
Enable bearer auth when the server is exposed beyond loopback, when untrusted local processes share the machine, or when the data dir holds sensitive project history:
TOKEN=$(ai-memory generate-auth-token)
docker run -d --name ai-memory \
--restart unless-stopped \
-p 0.0.0.0:49374:49374 \
-v ai-memory-data:/data \
-e AI_MEMORY_AUTH_TOKEN="$TOKEN" \
-e AI_MEMORY_ALLOWED_HOSTS="<server-ip>,localhost,127.0.0.1" \
akitaonrails/ai-memory:latest
ai-memory install-mcp --client claude-code --apply \
--server-url "http://<server-ip>:49374/mcp" --auth-token "$TOKEN"
ai-memory install-hooks --agent claude-code --apply \
--server-url "http://<server-ip>:49374" --auth-token "$TOKEN"
Bearer auth protects /mcp, /hook, /handoff, /admin/*, and
/web/*. Browser access to /web uses HTTP Basic auth with the token
as the password. Non-loopback binds should also set
AI_MEMORY_ALLOWED_HOSTS to guard against DNS rebinding.
Want HTTPS? ai-memory deliberately does not terminate TLS itself —
the right answer is a battle-tested reverse proxy in front of it.
docs/https-via-proxy.md is the deployment
guide, with copy-paste docker compose templates in
docker/compose.tls.caddy.yml (Caddy
with Let's Encrypt or internal CA) and
docker/compose.tls.cloudflared.yml
(Cloudflare Tunnel — no open ports). Both are recommended once you
turn on multi-user or bind beyond loopback. The Quick Start happy
path of single-user on loopback doesn't need TLS — that case is
called out explicitly in the guide so you don't add ceremony where
it doesn't earn its keep.
Multi-user attribution (v0.8, optional). When more than one human
shares a server, ai-memory can attribute each write to a named user.
The bearer token continues to authenticate at the wire level; users
created via ai-memory user add get their own tokens that resolve to
their identity in audit logs (and, in subsequent milestones, page
frontmatter + the web UI). Data stays single-tenant — there is no
RBAC. Existing single-user installs are not affected unless you opt
in by setting [auth].token_pepper (auto-generated for new installs
by ai-memory init). See docs/users.md for the
full walkthrough and the four-rung auth ladder.
See docs/deploy.md for the full homelab pattern
with bearer auth, host allowlisting, and TLS/reverse-proxy options.
Using Memory
Day to day, you mostly do not think about ai-memory. Lifecycle hooks capture prompts, tool calls, compaction checkpoints, and session boundaries. SessionStart hooks fetch pending handoffs before your first prompt in the next agent.
Useful entry points:
Ask "where did we leave off?" to continue from the pending handoff.
Ask "have we discussed X?" or "search memory for Y" to query the wiki.
Ask "catch me up" for a prose digest of recent project activity.
Run
ai-memory bootstraponce when adopting ai-memory in an existing project with months of history.Start the server with
--enable-weband visit/webfor a read-only browser view of the markdown wiki.--enable-webalso mounts a read-only JSON frontend API at/api/v1(workspaces, projects, pages, recent, briefing, search) so custom web UIs can read the memory without opening SQLite or wiki files directly:GET /api/v1/workspaces GET /api/v1/projects?workspace=... GET /api/v1/workspaces/{workspace}/projects/{project}/pages GET /api/v1/workspaces/{workspace}/projects/{project}/pages/{path} GET /api/v1/workspaces/{workspace}/projects/{project}/recent?limit=... GET /api/v1/workspaces/{workspace}/projects/{project}/briefing?limit=... GET /api/v1/workspaces/{workspace}/overview?limit=... GET /api/v1/workspaces/{workspace}/projects/{project}/overview?limit=... GET /api/v1/search?q=...&workspace=...&project=...&limit=... POST /api/v1/search { "q": "...", "scopes": [{ "workspace": "...", "project": "..." }] }overviewbundles the open handoff + briefing + memory-health for a workspace or project in one call (the data a project overview screen needs).Full integration guide: see
docs/frontend-api.mdfor auth setup, response schemas, error model, limits/pagination, custom-UI hosting, a workedfetch/curlexample, and the canonical source-of-truth files. Read that first if you're building a frontend.To serve your own static frontend instead of the built-in UI, point
--web-ui-dirat the frontend's build output (same-origin with/api/v1,/mcp,/admin/*, so the existing auth applies):ai-memory serve --transport http --bind 127.0.0.1:49374 \ --enable-web --web-ui-dir ../ai-memory-ui/distA reference implementation — a SolidJS knowledge browser with screenshots and e2e tests — lives at djalmajr/ai-memory-ui.
Install the routing snippet once so agents proactively call the right MCP tool for those prompts:
ai-memory install-instructions
See docs/usage.md for handoff examples, proactive
query routing, bootstrap details, web UI screenshots, and the raw-wiki
inspection commands. CLI URL/auth configuration lives in
docs/install.md.
LLM Providers
ai-memory runs without an LLM: hooks still capture sessions, search uses
FTS5, and summaries fall back to rule-based output. Add an LLM provider
when you want LLM consolidation (on PreCompact, on demand via
memory_consolidate, or opt-in at session end with
AI_MEMORY_CONSOLIDATE_ON_SESSION_END), richer linting, and bootstrap.
Session end always writes a rule-based summary page + handoff either way.
Recommended defaults:
| Provider | Default | Use when |
|---|---|---|
anthropic |
claude-haiku-4-5 |
Best default for consolidation quality and rule classification. |
anthropic-oauth |
claude-sonnet-4-6 |
Use a Claude Pro/Max subscription via claude setup-token, no API key. |
openai |
gpt-5.4-mini |
Cheaper and faster hosted option. |
openai-oauth |
gpt-5.5 |
ChatGPT Pro/Plus/Codex backend via ai-memory auth login openai-oauth; no Platform API key. |
copilot |
gpt-5.5 |
GitHub Copilot Chat backend via ai-memory auth login copilot or COPILOT_GITHUB_TOKEN; requires a Copilot subscription. |
gemini |
gemini-2.5-flash |
Google-hosted option with a generous free tier. |
openai-compat |
no default | OpenRouter, Ollama, vLLM, LM Studio, and other compatible endpoints. |
openai-oauth stores a refresh token in <data_dir>/auth.json and talks to
the ChatGPT/Codex Responses backend, not api.openai.com. For Docker quick
starts, run ai-memory auth login openai-oauth with the wrapper so the token
lands in the same ai-memory-data volume as the server.
anthropic-oauth hits the same /v1/messages endpoint as anthropic but
authenticates with an OAuth bearer token instead of an API key. Run
claude setup-token once, then set AI_MEMORY_LLM_PROVIDER=anthropic-oauth and
ANTHROPIC_OAUTH_TOKEN=<token> (or CLAUDE_CODE_OAUTH_TOKEN, which claude setup-token writes automatically). No ANTHROPIC_API_KEY is needed.
⚠️ Unofficial and against Anthropic's usage policies — use at your own risk;
it may get your account rate-limited or banned. See
the warning in docs/install.md.
copilot stores a GitHub user token in the same auth file, exchanges it for a
short-lived Copilot API token via GitHub's /copilot_internal/v2/token, and
uses the Copilot Chat endpoint with vscode-chat integration headers. You can
also set COPILOT_GITHUB_TOKEN, GH_TOKEN, or GITHUB_TOKEN on the server.
[!TIP] For the OAuth/subscription backends (
anthropic-oauth,openai-oauth,copilot), pick a small, fast model viaAI_MEMORY_LLM_MODEL— e.g.claude-haiku-4-5orgpt-5-mini. ai-memory's LLM work (consolidation, lint, explore) is summarisation, not hard reasoning, so a Haiku/mini-class model is plenty and is much easier on subscription rate limits. Save the high-effort thinking models for your coding agent.
Embeddings are optional and separate from the LLM provider. Set
AI_MEMORY_EMBEDDING_PROVIDER=openai, voyage, google, or gemini when
you want vector reranking in addition to FTS5 + graph-neighbor retrieval.
See docs/install.md#llm-provider-tiers
for env vars and Ollama/OpenRouter examples, and
docs/llm-provider-comparison.md
for the empirical model comparison.
Architecture
One Rust binary runs an MCP/HTTP server and owns one data directory:
<data_dir>/
├── wiki/ # markdown source of truth, git-versioned
├── raw/ # immutable session log archive
├── db/ # SQLite indexes, including FTS5 and embeddings
├── models/ # reserved for local embedding models
└── logs/ # rolling tracing output
Hooks POST observations to the server. The server serializes writes through one SQLite writer, compiles session observations into markdown pages, and serves retrieval through FTS5, graph-neighbor RRF, optional vector RRF, and bounded raw-observation fallback.
See docs/ARCHITECTURE.md for the data-flow
diagram, crate breakdown, schema notes, and invariants.
Docs
| File | What it is |
|---|---|
docs/install.md |
Installation cookbook. Every agent CLI, every alternative (curl, source build, no-docker, no-auth), and the server-on-a-different-machine (homelab/LAN) walkthrough. Read after the Quick start if your setup doesn't match the happy path. |
docs/usage.md |
Handoffs, proactive memory queries, routing snippet, web UI, raw-wiki inspection, and rules-vs-facts workflow. |
docs/marker-file.md |
.ai-memory.toml workspace/project routing for multi-client trees, mono-repos, worktrees, and work/personal separation. |
docs/windows.md |
Windows install modes: full WSL2, native Windows with Docker Desktop, native source builds, and current hook/MCP harness caveats. |
docs/mcp-install.md |
Per-client MCP and lifecycle notes (Cursor, Claude Desktop, Gemini CLI, Antigravity CLI, OpenClaw, OMP). |
docs/deploy.md |
Homelab deploy: bin/deploy, bearer-token auth, pointers to the TLS guide. |
docs/users.md |
Multi-user attribution (v0.8). Four-rung auth ladder, ai-memory user add/list/expire/revive/rotate-token walkthrough, backward-compat migration for pre-v0.8 installs, token storage rationale. |
docs/https-via-proxy.md |
HTTPS via a reverse proxy. When you need TLS (multi-user, non-loopback) and when you don't (loopback / stdio). Copy-paste docker compose templates for Caddy + Let's Encrypt, Caddy + internal CA (LAN-only), Cloudflare Tunnel (no open ports), and external cert files; plus native-Caddy + nginx recipes. The "thinking you're secure when you're not" failure modes explicitly called out. |
docs/lifecycle-ops.md |
Read before running purge / rename / backup / restore / reset. Safety matrix for the state-touching commands, per-project disk layout (how isolation actually works), and operator workflows for "fresh start", "snapshot before risky op", "drop one project". |
docs/llm-provider-comparison.md |
Empirical notes behind the recommended LLM defaults. |
docs/ARCHITECTURE.md |
Operational summary: data flow, crate layout, cross-cutting invariants, schema. |
docs/design-decisions.md |
The full v1 spec. |
Research docs under docs/ |
Karpathy LLM Wiki notes, agentmemory / basic-memory / cognee deep-dives, lessons-learned from upstream issues. |
Influences and prior art
- Karpathy LLM Wiki - the compile-not-retrieve pattern.
- agentmemory - most of the right ideas; this project is the Rust successor.
- basic-memory - the markdown-on-disk source-of-truth model.
- cognee - pipeline composition and triplet embeddings.
- A-MEM - Zettelkasten-style atomic notes with link evolution.
Comments