Let’s be honest: most “AI research assistants” are glorified chat wrappers. You paste a query, wait 90 seconds, get three bullet points scraped from Wikipedia and a citation that links to a 404. Deep-Researcher isn’t that. It’s a real agentic pipeline — built on Claude Code’s open agentic framework — that plans, delegates, verifies, and synthesizes like a human researcher would. I deployed it last week to investigate “quantum-resistant consensus algorithms in permissioned blockchains”. In under 12 minutes, it returned a 2,400-word report with 17 peer-reviewed citations, a comparative table of NIST-submitted lattice-based protocols, and even flagged two papers with contradictory latency claims — then re-ran validation steps to resolve the discrepancy. That’s not magic. It’s deliberate, observable, and — crucially — self-hostable. And with only 89 GitHub stars (as of June 2024) and zero VC hype, it’s flying under the radar while quietly outperforming tools that cost $49/mo.
What Is Deep-Researcher — and Why Does It Feel Like a Real Researcher?
deep-researcher is a Python CLI + FastAPI web service that implements a multi-agent loop inspired by Anthropic’s Claude Code agentic patterns — but without vendor lock-in. It’s not another LLM wrapper. It explicitly separates roles: a Planner breaks down complex questions, a Researcher spawns parallel subqueries with targeted search operators (site:arxiv.org, filetype:pdf, etc.), a Validator cross-checks claims across sources, and a Synthesizer writes citations-aware prose. All agents run locally — you bring your own LLM (via Ollama, LM Studio, or OpenRouter), and it respects your privacy by default.
The kicker? It ships with SPINNER_VERBS — that list of 37 delightfully absurd status verbs (yes, “Beboppin’” and “Boondoggling” are real) — not as a gimmick, but as a debugging signal. When you see Boondoggling citations for NIST IR 8413..., you know validation is running in parallel while synthesis waits. That’s intentional observability — something most “agentic” tools hide behind opaque progress bars.
It’s built on langgraph (v0.2.52), ollama (v0.3.12+ recommended), and playwright (v1.44+ for reliable PDF/text extraction). No frontend — just curl, a CLI, or a minimal React frontend you can optionally spin up. The GitHub repo is lean: 38 commits, no monorepo bloat, and the main.py is 182 lines of readable, commented Python.
Installation: From Zero to First Research Loop in Under 5 Minutes
You don’t need Kubernetes. You do need Python 3.11+ and ~4GB RAM free (more if you’re running llama3.1:70b locally). I tested this on a Hetzner AX41 (32GB RAM, AMD EPYC 7402) and a MacBook Pro M2 (16GB unified). Here’s the real-world flow:
First, clone and install:
git clone https://github.com/jackswl/deep-researcher.git
cd deep-researcher
pip install -e .
Then pick your LLM backend. I use Ollama because it’s simple and supports function calling (required for agent delegation):
# Pull a model with tool-calling support
ollama pull llama3.1:8b-instruct-q8_0
# Or for heavier lifting (but 10x slower on CPU):
ollama pull llama3.1:70b-instruct-q4_K_M
Now run it:
# CLI mode — fast, no server
deep-researcher "Compare BFT variants resistant to quantum timing attacks" \
--model llama3.1:8b-instruct-q8_0 \
--max-depth 3 \
--timeout 420
That --max-depth 3 is key: it caps recursive agent spawning (default is 2). Go higher only if you have beefy RAM — I hit OOM at depth 4 with the 70B model on 16GB.
For long-running queries, use the FastAPI server:
uvicorn deep_researcher.api:app --host 0.0.0.0 --port 8000 --reload
Then POST to /research with JSON:
{
"query": "What are the real-world failure modes of Llama.cpp's GGUF quantization on ARM64?",
"model": "llama3.1:8b-instruct-q8_0",
"max_iterations": 5,
"web_search_enabled": true
}
Docker Compose Setup — Because You’re Not Running This on Your Laptop in Prod
I run deep-researcher behind nginx on a dedicated VM — not for scale, but for isolation. Here’s my production-ready docker-compose.yml, tested on Docker 26.1.1:
version: '3.9'
services:
deep-researcher:
build: .
ports:
- "8000:8000"
environment:
- OLLAMA_HOST=http://ollama:11434
- WEB_SEARCH_ENABLED=true
- DEFAULT_MODEL=llama3.1:8b-instruct-q8_0
- MAX_DEPTH=3
- TIMEOUT_SECONDS=480
depends_on:
- ollama
restart: unless-stopped
ollama:
image: ollama/ollama:0.3.12
volumes:
- ./ollama_models:/root/.ollama
ports:
- "11434:11434"
restart: unless-stopped
Note: ./ollama_models must exist and be writable. Pull your model into the container first:
docker compose run --rm ollama ollama pull llama3.1:8b-instruct-q8_0
Then docker compose up -d. Done. No systemd, no cron, no config files buried in /etc. The entire stack is 2 containers, ~500MB RAM idle, and peaks at ~3.2GB under load (measured with docker stats over 2 hours of back-to-back queries).
How It Compares to Alternatives: Perplexity, GPT-4 Research Mode, and LocalRAG
If you’re using Perplexity Pro, you’re paying $20/mo for fast, polished answers — but zero visibility into how it got there. Deep-Researcher logs every agent step to ./logs/ (structured JSONL), including search URLs, extracted snippets, and validation diffs. I caught Perplexity hallucinating an IEEE paper ID (IEEE-123456) that doesn’t exist — deep-researcher failed that validation and told me why.
Compared to GPT-4 Research Mode (in Copilot+ or ChatGPT Enterprise), this has two advantages: it’s offline-capable (swap in llama3.1:8b and disable web_search_enabled), and it cites sources inline, not as footnotes. Try asking either tool “What did the 2023 MIT CSAIL paper on zk-SNARK verification latency actually measure?” — GPT-4 will paraphrase; deep-researcher pulls the PDF, extracts the benchmark table, and quotes the exact sentence.
And versus LocalRAG tools like llama-index or privateGPT: those are document retrieval engines. They assume you already have PDFs. deep-researcher is discovery-first. It starts with nothing and builds a corpus — then validates and synthesizes. It’s the difference between “search my archive” and “go find the archive, vet it, then write the summary.”
That said: it’s not a replacement for domain-specific fine-tuned models. If you need biomedical entity linking, use SciSpacy. But for general-purpose technical inquiry? This bridges the gap between “Google Scholar + Notion” and “real agentic workflow”.
Why Self-Host This? Who Actually Needs It?
Let’s cut the fluff: this is for sysadmins, open-source maintainers, and indie researchers who refuse to outsource their curiosity.
You’re auditing a dependency and need verifiable provenance — not just “this library uses OpenSSL”, but which CVEs were patched in which commit, with links to GitHub diffs and NVD entries.
deep-researcherdoes that.You’re writing a grant proposal and need a literature review that doesn’t sound like AI. Its synthesis layer uses prompt templates that enforce “academic voice” — no “delve into”, no “leverage”, no “paradigm shift”. Just plain English with citations.
You’re teaching undergrads how research actually works — and want them to see the sausage-making. The logs show failed searches, contradictory evidence, and dead ends. That’s pedagogically gold.
It’s not for marketing teams wanting 10 blog posts/hour. The throughput is ~1 complex report per 8–15 minutes. It’s slow by design — because good research is slow.
Hardware-wise:
- Minimum: 8GB RAM, 2 vCPUs, 20GB disk (for Ollama cache)
- Recommended: 16GB+ RAM, 4+ vCPUs, SSD (PDF parsing is I/O-bound)
- GPU optional — Ollama can use it (via
OLLAMA_NUM_GPU=1), butllama3.1:8bruns fine on CPU. I get ~4.2 tokens/sec on an EPYC 7402 — fast enough.
The Rough Edges — And My Honest Verdict
Here’s what I ran into in 10 days of daily use:
- PDF extraction is brittle:
playwrightfails on password-protected or image-only PDFs. It logs the error, but doesn’t auto-retry withpdftotextfallback. I patched it locally with:
# In extractors/pdf_extractor.py
try:
text = await page.evaluate("document.body.innerText")
except:
# Fallback to pdftotext CLI
result = subprocess.run(["pdftotext", "-layout", pdf_path, "-"], capture_output=True, text=True)
text = result.stdout
Web search quotas: By default, it uses DuckDuckGo (no API key), but hits rate limits fast if you spam. I swapped in SerpAPI (
SERPAPI_KEY=xxxenv var) — 100 free searches/mo, and it’s dramatically more reliable.No built-in auth: It’s a research tool, not a SaaS. If you expose the API, slap nginx basic auth on it. Don’t skip this.
No persistent storage: Reports vanish after the process ends. I added a
--save-to ./reports/CLI flag that writes markdown + JSONL to disk. Trivial PR, but not merged yet.
So — is it worth deploying? Yes, if you value transparency over speed. It’s not ready for your CTO to demo to investors. But as a personal research copilot? It’s the most honest AI tool I’ve used in 2 years. It tells me when it’s guessing. It shows its sources. It fails loudly — and teaches you how to fix it.
The GitHub repo has just 89 stars, but the commit history tells the real story: 7 PRs merged in the last 14 days, all from independent contributors adding SerpAPI support, LM Studio compatibility, and a basic React frontend. That’s the signal — not the star count.
TL;DR:
✅ Self-hostable, transparent, citation-aware, truly agentic
❌ Not fast, no auth, PDF parsing needs love, CLI-first
💡 Run it on a $10/mo VM. Use llama3.1:8b. Start with --max-depth 2. Read the logs. Then go fix one thing and PR it.
That’s how good tools get built. Not with press releases — but with 89 people quietly running deep-researcher on their homelabs, watching it Beboppin’ through arXiv, and deciding — yeah, this is worth improving.
Comments