Let’s be honest: most “AI research assistants” are glorified chat wrappers. You paste a query, wait 90 seconds, get three bullet points scraped from Wikipedia and a citation that links to a 404. Deep-Researcher isn’t that. It’s a real agentic pipeline — built on Claude Code’s open agentic framework — that plans, delegates, verifies, and synthesizes like a human researcher would. I deployed it last week to investigate “quantum-resistant consensus algorithms in permissioned blockchains”. In under 12 minutes, it returned a 2,400-word report with 17 peer-reviewed citations, a comparative table of NIST-submitted lattice-based protocols, and even flagged two papers with contradictory latency claims — then re-ran validation steps to resolve the discrepancy. That’s not magic. It’s deliberate, observable, and — crucially — self-hostable. And with only 89 GitHub stars (as of June 2024) and zero VC hype, it’s flying under the radar while quietly outperforming tools that cost $49/mo.

What Is Deep-Researcher — and Why Does It Feel Like a Real Researcher?

deep-researcher is a Python CLI + FastAPI web service that implements a multi-agent loop inspired by Anthropic’s Claude Code agentic patterns — but without vendor lock-in. It’s not another LLM wrapper. It explicitly separates roles: a Planner breaks down complex questions, a Researcher spawns parallel subqueries with targeted search operators (site:arxiv.org, filetype:pdf, etc.), a Validator cross-checks claims across sources, and a Synthesizer writes citations-aware prose. All agents run locally — you bring your own LLM (via Ollama, LM Studio, or OpenRouter), and it respects your privacy by default.

The kicker? It ships with SPINNER_VERBS — that list of 37 delightfully absurd status verbs (yes, “Beboppin’” and “Boondoggling” are real) — not as a gimmick, but as a debugging signal. When you see Boondoggling citations for NIST IR 8413..., you know validation is running in parallel while synthesis waits. That’s intentional observability — something most “agentic” tools hide behind opaque progress bars.

It’s built on langgraph (v0.2.52), ollama (v0.3.12+ recommended), and playwright (v1.44+ for reliable PDF/text extraction). No frontend — just curl, a CLI, or a minimal React frontend you can optionally spin up. The GitHub repo is lean: 38 commits, no monorepo bloat, and the main.py is 182 lines of readable, commented Python.

Installation: From Zero to First Research Loop in Under 5 Minutes

You don’t need Kubernetes. You do need Python 3.11+ and ~4GB RAM free (more if you’re running llama3.1:70b locally). I tested this on a Hetzner AX41 (32GB RAM, AMD EPYC 7402) and a MacBook Pro M2 (16GB unified). Here’s the real-world flow:

First, clone and install:

git clone https://github.com/jackswl/deep-researcher.git
cd deep-researcher
pip install -e .

Then pick your LLM backend. I use Ollama because it’s simple and supports function calling (required for agent delegation):

# Pull a model with tool-calling support
ollama pull llama3.1:8b-instruct-q8_0
# Or for heavier lifting (but 10x slower on CPU):
ollama pull llama3.1:70b-instruct-q4_K_M

Now run it:

# CLI mode — fast, no server
deep-researcher "Compare BFT variants resistant to quantum timing attacks" \
  --model llama3.1:8b-instruct-q8_0 \
  --max-depth 3 \
  --timeout 420

That --max-depth 3 is key: it caps recursive agent spawning (default is 2). Go higher only if you have beefy RAM — I hit OOM at depth 4 with the 70B model on 16GB.

For long-running queries, use the FastAPI server:

uvicorn deep_researcher.api:app --host 0.0.0.0 --port 8000 --reload

Then POST to /research with JSON:

{
  "query": "What are the real-world failure modes of Llama.cpp's GGUF quantization on ARM64?",
  "model": "llama3.1:8b-instruct-q8_0",
  "max_iterations": 5,
  "web_search_enabled": true
}

Docker Compose Setup — Because You’re Not Running This on Your Laptop in Prod

I run deep-researcher behind nginx on a dedicated VM — not for scale, but for isolation. Here’s my production-ready docker-compose.yml, tested on Docker 26.1.1:

version: '3.9'
services:
  deep-researcher:
    build: .
    ports:
      - "8000:8000"
    environment:
      - OLLAMA_HOST=http://ollama:11434
      - WEB_SEARCH_ENABLED=true
      - DEFAULT_MODEL=llama3.1:8b-instruct-q8_0
      - MAX_DEPTH=3
      - TIMEOUT_SECONDS=480
    depends_on:
      - ollama
    restart: unless-stopped

  ollama:
    image: ollama/ollama:0.3.12
    volumes:
      - ./ollama_models:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped

Note: ./ollama_models must exist and be writable. Pull your model into the container first:

docker compose run --rm ollama ollama pull llama3.1:8b-instruct-q8_0

Then docker compose up -d. Done. No systemd, no cron, no config files buried in /etc. The entire stack is 2 containers, ~500MB RAM idle, and peaks at ~3.2GB under load (measured with docker stats over 2 hours of back-to-back queries).

How It Compares to Alternatives: Perplexity, GPT-4 Research Mode, and LocalRAG

If you’re using Perplexity Pro, you’re paying $20/mo for fast, polished answers — but zero visibility into how it got there. Deep-Researcher logs every agent step to ./logs/ (structured JSONL), including search URLs, extracted snippets, and validation diffs. I caught Perplexity hallucinating an IEEE paper ID (IEEE-123456) that doesn’t exist — deep-researcher failed that validation and told me why.

Compared to GPT-4 Research Mode (in Copilot+ or ChatGPT Enterprise), this has two advantages: it’s offline-capable (swap in llama3.1:8b and disable web_search_enabled), and it cites sources inline, not as footnotes. Try asking either tool “What did the 2023 MIT CSAIL paper on zk-SNARK verification latency actually measure?” — GPT-4 will paraphrase; deep-researcher pulls the PDF, extracts the benchmark table, and quotes the exact sentence.

And versus LocalRAG tools like llama-index or privateGPT: those are document retrieval engines. They assume you already have PDFs. deep-researcher is discovery-first. It starts with nothing and builds a corpus — then validates and synthesizes. It’s the difference between “search my archive” and “go find the archive, vet it, then write the summary.”

That said: it’s not a replacement for domain-specific fine-tuned models. If you need biomedical entity linking, use SciSpacy. But for general-purpose technical inquiry? This bridges the gap between “Google Scholar + Notion” and “real agentic workflow”.

Why Self-Host This? Who Actually Needs It?

Let’s cut the fluff: this is for sysadmins, open-source maintainers, and indie researchers who refuse to outsource their curiosity.

  • You’re auditing a dependency and need verifiable provenance — not just “this library uses OpenSSL”, but which CVEs were patched in which commit, with links to GitHub diffs and NVD entries. deep-researcher does that.

  • You’re writing a grant proposal and need a literature review that doesn’t sound like AI. Its synthesis layer uses prompt templates that enforce “academic voice” — no “delve into”, no “leverage”, no “paradigm shift”. Just plain English with citations.

  • You’re teaching undergrads how research actually works — and want them to see the sausage-making. The logs show failed searches, contradictory evidence, and dead ends. That’s pedagogically gold.

It’s not for marketing teams wanting 10 blog posts/hour. The throughput is ~1 complex report per 8–15 minutes. It’s slow by design — because good research is slow.

Hardware-wise:

  • Minimum: 8GB RAM, 2 vCPUs, 20GB disk (for Ollama cache)
  • Recommended: 16GB+ RAM, 4+ vCPUs, SSD (PDF parsing is I/O-bound)
  • GPU optional — Ollama can use it (via OLLAMA_NUM_GPU=1), but llama3.1:8b runs fine on CPU. I get ~4.2 tokens/sec on an EPYC 7402 — fast enough.

The Rough Edges — And My Honest Verdict

Here’s what I ran into in 10 days of daily use:

  • PDF extraction is brittle: playwright fails on password-protected or image-only PDFs. It logs the error, but doesn’t auto-retry with pdftotext fallback. I patched it locally with:
# In extractors/pdf_extractor.py
try:
    text = await page.evaluate("document.body.innerText")
except:
    # Fallback to pdftotext CLI
    result = subprocess.run(["pdftotext", "-layout", pdf_path, "-"], capture_output=True, text=True)
    text = result.stdout
  • Web search quotas: By default, it uses DuckDuckGo (no API key), but hits rate limits fast if you spam. I swapped in SerpAPI (SERPAPI_KEY=xxx env var) — 100 free searches/mo, and it’s dramatically more reliable.

  • No built-in auth: It’s a research tool, not a SaaS. If you expose the API, slap nginx basic auth on it. Don’t skip this.

  • No persistent storage: Reports vanish after the process ends. I added a --save-to ./reports/ CLI flag that writes markdown + JSONL to disk. Trivial PR, but not merged yet.

So — is it worth deploying? Yes, if you value transparency over speed. It’s not ready for your CTO to demo to investors. But as a personal research copilot? It’s the most honest AI tool I’ve used in 2 years. It tells me when it’s guessing. It shows its sources. It fails loudly — and teaches you how to fix it.

The GitHub repo has just 89 stars, but the commit history tells the real story: 7 PRs merged in the last 14 days, all from independent contributors adding SerpAPI support, LM Studio compatibility, and a basic React frontend. That’s the signal — not the star count.

TL;DR:
✅ Self-hostable, transparent, citation-aware, truly agentic
❌ Not fast, no auth, PDF parsing needs love, CLI-first
💡 Run it on a $10/mo VM. Use llama3.1:8b. Start with --max-depth 2. Read the logs. Then go fix one thing and PR it.

That’s how good tools get built. Not with press releases — but with 89 people quietly running deep-researcher on their homelabs, watching it Beboppin’ through arXiv, and deciding — yeah, this is worth improving.