Give your AI agent a searchable memory of any codebase. One file. Zero servers. Instant recall.
# Index 89K files
python3 scripts/phase1/manything_build_db.py /path/to/project
# Query in milliseconds
sqlite3 /manything/myproject/source.db \
"SELECT f.path, rank FROM files_fts, files f
WHERE files_fts MATCH 'layout prepare'
AND files_fts.rowid = f.id
ORDER BY rank LIMIT 10;"
π€ Zero-Setup Agent Prompts (if in a hurry)
No installation needed. Copy these three prompts, feed them to any fresh AI coding agent one by one, and watch it reproduce the entire setup automatically:
Prompt 1: Index the codebase
FTS5 + trigram full-text index the target project; design your own filter rules
Prompt 2: Enrich & explore
Interactively query the DB, discover enrichment table designs, write batch enrichment scripts
Prompt 3: Enable query memory
Auto-ingest SQL queries into the trace database, enabling historical query exploration before running new searches
Now you know the recipe. Come back for the details when you're ready.
Quick Start
Let your agent install it
SQL-ManyThing ships as self-contained scripts with zero external dependencies. Copy this to your agent:
Requires: An agent with web access and shell execution (Claude, Cursor, Codex, etc.). For air-gapped environments, use the manual steps below.
Read AGENTS.md from https://github.com/IOchair/SQL-ManyThing/tree/master and install it for this project. Index the project with Phase 1, run Phase 2 universal enrichment, set up Phase 3 query tracing, then invoke the
sql-manythingskill to index the target directory.
No pip install, no config files. The agent reads SKILL.md, finds the scripts, and runs them.
Or do it yourself
git clone https://github.com/IOchair/SQL-ManyThing.git
cd SQL-ManyThing
# 1. Index (Phase 1)
python3 scripts/phase1/manything_build_db.py /path/to/project
# β creates .srcidx/source.db
# 2. Enrich (Phase 2 β universal, all languages)
python3 scripts/phase2/enrich_depth_segments.py /path/to/project --batch 500
python3 scripts/phase2/enrich_file_refs.py /path/to/project --batch 500
python3 scripts/phase2/flatten_file_deps.py /path/to/project
python3 scripts/phase2/create_enriched_view.py /path/to/project
# 3. Tracing wrapper (Phase 3)
python3 scripts/phase3/install.py
echo 'MANYTHING_myproject="/path/to/project"' >> ~/.hermes/manything/aliases.sh
Default extensions for Phase 1: .h,.cpp,.cs,.py,.ts,.tsx,.js,.jsx,.rs,.java. Add --ext .md,.toml for docs/config, or --profile unreal-installed-core for an Unreal Engine build tree.
Verify
sqlite3 /manything/myproject/source.db "SELECT COUNT(*) FROM files;"
sqlite3 /manything/myproject/source.db \
"SELECT f.path, rank FROM files_fts, files f
WHERE files_fts MATCH 'layout prepare'
AND files_fts.rowid = f.id
ORDER BY rank LIMIT 10;"
Features
89,203 files Β· 3 GB Β· millisecond search Β· one SQLite file.
- π FTS5 trigram full-text search β find any symbol across your entire codebase in milliseconds
- π¦ Single SQLite file β no server, no daemon, no network.
scpit anywhere. - βοΈ Bounded extraction β
block_content_fullreturns only the function body. Full files never enter context. - π§ Query memory β every query logged, tagged, reusable. The index gets smarter with use.
- ποΈ Pure SQL, fully auditable β no black-box retrieval. Every result traceable to a
SELECT. - π Language-agnostic β C++, Python, JS, Rust, GLSL, generated code β if it's text, it's indexable.
- π€ Agent-native β stable entrypoints, canonical SQL templates,
:tracefor query reuse.
Requirements
- Python 3.9+ (stdlib only for Phase 1β2; no pip install needed)
- SQLite 3.35+ with FTS5 trigram tokenizer enabled (included by default on most systems)
- Bash (Linux/macOS/WSL) or PowerShell (Windows) for the Phase 3 wrapper
- Phase 2 enrichment scripts use only Python stdlib; optional
cymbalenrichment needs thecymbalCLI binary
What Gets Built
Per project β .srcidx/source.db:
| Table | What it holds |
|---|---|
files |
File metadata + full text content |
files_fts |
FTS5 trigram index over path + content |
v_enriched |
Unified view: depth-segmented blocks with block_content_full |
enrich_depth_segments |
Raw depth-segmented block data |
enrich_file_deps |
Resolved import/include dependency graph |
enrich_file_refs |
Raw #include / import strings with line numbers |
Optional enrichment adds: enrich_cymbal (symbol definitions), enrich_graphify_nodes / _edges (AST/document graph), file_enrich (UHT reflection JSON).
Global (Phase 3):
| Path | Purpose |
|---|---|
~/.hermes/manything/query_log.db |
Query trace database |
~/.hermes/manything/aliases.sh |
Project alias registry |
~/.hermes/manything/pending.jsonl |
Pending query log buffer |
Usage Examples
These are the canonical SQL templates. Every query follows this shape.
DISCOVER β find candidate files
SELECT f.path, rank FROM files_fts, files f
WHERE files_fts MATCH 'layout prepare'
AND files_fts.rowid = f.id
AND f.path NOT LIKE '%test%'
ORDER BY rank LIMIT 15;
EXTRACT β probe file structure
SELECT depth_level, start_offset,
length(block_content) AS bytes,
length(block_content_full) AS full_bytes,
substr(block_content, 1, 100) AS preview
FROM v_enriched
WHERE file_path = 'src/layout.ts'
AND depth_level <= 1
ORDER BY depth_level, start_offset;
EXTRACT_BLOCK β terminal extraction
SELECT block_content_full FROM v_enriched
WHERE file_path = 'src/layout.ts'
AND block_content LIKE '%function layout%'
AND depth_level = 1
ORDER BY start_offset LIMIT 1;
Depth by language: Brace-based (JS, TS, Go, Rust, Java, C++, C#) β bodies at
depth=1. Indent-based (Python, Ruby, YAML) β signatures atdepth=1, bodies atdepth=2.
Utility β project shape at a glance
SELECT ext, COUNT(*) AS n FROM files
GROUP BY ext ORDER BY n DESC;
For advanced patterns (dependency tracing, trace reuse, symbol search), see Enrichment and Query Tracing.
Enrichment
Phase 1 gives you FTS5 search. Phase 2 adds structure β depth-segmented blocks, resolved dependencies, and raw reference strings β enabling bounded extraction without whole-file reads.
Universal Workflow (all languages)
python3 scripts/phase2/enrich_depth_segments.py /path/to/project --batch 500
python3 scripts/phase2/enrich_file_refs.py /path/to/project --batch 500
python3 scripts/phase2/flatten_file_deps.py /path/to/project
python3 scripts/phase2/create_enriched_view.py /path/to/project
This adds v_enriched, enrich_file_deps, enrich_file_refs, and enrich_depth_segments. The v_enriched view provides depth-aware block extraction:
-- Extract a function body at depth=1 (brace-based languages)
SELECT block_content_full FROM v_enriched
WHERE file_path = 'src/parser.ts'
AND block_content LIKE '%function parseExpr%'
AND depth_level = 1
ORDER BY start_offset LIMIT 1;
Additional Enrichment Scripts
| Script | What it adds | Scope |
|---|---|---|
enrich_cymbal.py |
Symbol definitions (classes, functions, methods) | Python, Go, JS via cymbal CLI |
enrich_graphify.py |
AST/document graph nodes + edges | Python + Markdown only |
uht_enrich.py |
Unreal Header Tool reflection metadata | UE builds with UHT output |
enrich_java_build.py |
Java import resolution | Java projects |
On Windows, scripts/phase2/run_phase2_universal_windows.bat runs all 4 universal steps. On Linux/WSL/macOS, run the Python scripts directly.
Full details: Phase 2 Enrichment Guide
Query Tracing
The Phase 3 wrapper intercepts every sqlite3 call to a /manything/ database and logs it. Past queries become discoverable β agents can search what queries were run before and reuse proven patterns.
Install the wrapper
python3 scripts/phase3/install.py
This places a sqlite3 wrapper in ~/.local/bin/ that intercepts /manything/<project>/source.db and :trace. Verify with which sqlite3 β it should point to ~/.local/bin/sqlite3. Ensure ~/.local/bin is before /usr/bin in PATH.
Register a project
echo 'MANYTHING_myproject="/path/to/project"' >> ~/.hermes/manything/aliases.sh
Query through the virtual path
sqlite3 /manything/myproject/source.db "SELECT COUNT(*) FROM files;"
Search past queries
# :trace is a virtual path intercepted by the Phase 3 wrapper
sqlite3 :trace "
SELECT id, project, tag, substr(sql_text, 1, 120) FROM query_trace
WHERE project = 'myproject'
ORDER BY id DESC LIMIT 10;"
Tag useful query patterns
INSERT INTO query_notes (log_id, note, tag, created_at)
VALUES (42, 'overview entrypoint query', 'useful_pattern', strftime('%s','now'));
Tagged queries accumulate over time β future sessions can SELECT * FROM query_trace WHERE tag IS NOT NULL to discover proven patterns without repeating discovery work.
Full architecture: Phase 3 Design Rationale
How It Works
Instead of throwing whole files at the LLM, SQL-ManyThing treats code search as an A* exploration: find the right file, extract only the proof, and answer from minimal evidence.
The Search Loop
Mainstream RAG: SQL-ManyThing:
βββββββββββββββββ ββββββββββββββββββββββ
embed whole files β FTS5 rank candidates
retrieve chunks β substr() extract proof
stuff context β answer from evidence
hope for the best β verify with SQL
Under the hood this is A* search over an information state-space tree (g(n) = cost spent, h(n) = estimated remaining cost, operator = one SQL query, goal = evidence-rich answer). The four canonical templates β DISCOVER β EXTRACT β EXTRACT_BLOCK β encode this loop as reusable SQL patterns.
Narrow first. Extract second. Answer from evidence.
Design Principles
- SQLite first. Query everything with SQL. One file, fully inspectable.
- Build once, reuse forever. Index cost is paid once; queries are free.
- Trace behavior, not just answers. Every session leaves navigable breadcrumbs for the next.
- Never read whole files. Bounded
substr()proves the answer without blowing context. - Profile policies over
.gitignoreassumptions. Control what gets indexed explicitly. - Project-agnostic by default. Domain-specific lessons live in
references/, not the core.
Why Scripts, Not a Unified CLI
Every script is a stable entrypoint: python3 scripts/phase1/manything_build_db.py ... keeps token positions predictable across sessions β important for agent-driven workflows where command strings are reproduced verbatim from query traces. See Phase 3 Design Rationale for the full argument.
Performance
A full Unreal Engine 5.8 install β indexed locally, queried locally:
89,203 files indexed
~3.0 GB single SQLite database
Full-text search: sub-second
UHT reflection symbols: 4,455 classes Β· 3,247 structs Β· 1,590 enums Β· 8,902 functions
This is the stress test. The framework works on anything with files: JS/TS libraries, Python tools, Java projects, monorepos, generated code, build outputs β whatever you point it at.
WSL note: DrvFS queries on Windows-hosted repos can be 30Γ slower than ext4. Copy the DB to
~/.hermes/manything/for sub-second performance. See DrvFS Performance.
Comparison
| grep | LSP | Cloud RAG | SQL-ManyThing | |
|---|---|---|---|---|
| Offline | β | β | β cloud dependency | β |
| Sub-second search | β O(n) | β narrow | β + latency | β FTS5 |
| Bounded extraction | β full lines | β | β chunks | β substr() |
| Query memory | β | β | β | β trace log |
| Auditable results | β | β | β | β pure SQL |
| Self-hosted index | N/A | β auto | β | β local SQLite |
Unreal Engine
UE is the project's primary stress-test target. The workflow is the same universal Phase 1 + Phase 2 β UE is just large, so batch files are provided to run on the Windows side (avoids WSL DrvFS write overhead).
Phase 1 β Index with UE profile
# Or use templates/run_phase1_unreal_windows.bat on Windows
python3 scripts/phase1/manything_build_db.py /path/to/Engine \
--gitignore /path/to/Engine/.gitignore \
--profile unreal-installed-core
The unreal-installed-core profile filters extensions (.h,.cpp,.cs,.usf,.ush,.hlsl,.py,.ini,.uplugin) and skips high-noise paths (Source/ThirdParty/, Content/, Platforms/, ScriptModules/). Result: ~89K files, ~3 GB.
Phase 2 β Universal enrichment (overload-tested on 89K files)
# Or use scripts/phase2/run_phase2_universal_windows.bat on Windows
python3 scripts/phase2/enrich_depth_segments.py /path/to/Engine --batch 500
python3 scripts/phase2/enrich_file_refs.py /path/to/Engine --batch 500
python3 scripts/phase2/flatten_file_deps.py /path/to/Engine
python3 scripts/phase2/create_enriched_view.py /path/to/Engine
UE shader files (.usf, .ush, .hlsl) are C-style brace languages and work with the universal depth-segment parser. The full run produces 5.2M segments at +394 MB DB growth.
Phase 2 β UHT enrichment (optional)
Only for installed builds with generated UHT headers available:
python3 scripts/phase2/uht_enrich.py \
--db /path/to/Engine/.srcidx/source.db \
--uht-dir /path/to/Engine/Intermediate/Build/Win64/UnrealEditor/Inc \
--source-prefix Engine/ --batch 500
UE-specific documentation:
- Installed Build Indexing β indexing policy and filter profiles
- UE58 Full Run β end-to-end Phase 1β3 results with timing
- UHT Generated Files β reflection metadata structure
Windows / WSL Notes
- Phase 1 indexing: Run with Windows Python when possible β DrvFS writes from WSL are slower. WSL can query the resulting database fine.
- Query performance: Copy
.srcidx/source.dbto a native ext4 filesystem for sub-second queries. See DrvFS Performance. - Wrapper:
scripts/phase3/install.pyworks cross-platform (Windows.cmdwrapper, Linux/WSL shell wrapper). - Template:
templates/run_phase1_unreal_windows.batβ quick-start batch file for UE indexing on Windows.
References
Reference docs are organized by phase and domain:
| Area | Key Docs |
|---|---|
| Phase 1 | β Phase 1 Setup Β· Gitignore Enumeration |
| Phase 2 | β Enrich Coverage Workflow Β· Cymbal Β· Graphify Β· Java Β· UHT |
| Phase 3 | β Design Rationale Β· Importer Parsing |
| Unreal | Installed Build Indexing Β· UE58 Full Run Β· Indexing Profiles |
| Platforms | WSL/Windows Smoke Test |
| Design | SQL Is Many Things Β· DB Maintenance |
| Meta | Agent Query Loop Lessons Β· Public Examples Β· Third-Party Attribution |
Full catalog: references/INDEX.md
Comments