Let’s be honest: you’re tired of juggling API keys across half a dozen LLM providers, manually tracking spend in spreadsheets, and praying your internal RAG app doesn’t leak OPENAI_API_KEY to a misconfigured Next.js build. You need a secure, auditable, centralized AI gateway — not another “AI proxy” that just slaps NGINX in front of /v1/chat/completions. Enter ThinkWatch: a Rust-built, enterprise-grade AI bastion host that’s quietly gaining traction (190+ GitHub stars as of May 2024) and actually delivers on its bold claim: “Secure API and MCP access, unified proxying, RBAC, audit logs, rate limiting, and cost tracking — across OpenAI, Anthropic, Gemini, and self-hosted LLMs.”
This isn’t just another llm-proxy crate. ThinkWatch is built like a zero-trust API gateway — hardened, minimal, and opinionated — and it’s the first tool I’ve seen that treats AI traffic like production infrastructure, not a sidecar experiment.
What Is ThinkWatch — And Why Does It Exist?
ThinkWatch isn’t a model host, a UI, or a LangChain wrapper. It’s a secure AI API bastion host: a hardened, single-binary reverse proxy that sits between your internal apps and external (or self-hosted) LLM endpoints. Think of it as the nginx + auth0 + prometheus + cloudwatch combo — but purpose-built for AI.
Its core value isn’t “making API calls easier.” It’s enforcing policy:
- Your junior dev can’t accidentally burn $2k on
gpt-4oby leaving a debug flag on in staging - Your compliance officer can pull a CSV of every
/messagescall made byteam-marketing, including prompt tokens, response tokens, provider, model, and cost (yes — actual dollar cost, inferred from provider pricing tables) - Your SRE team can set
rate_limit = "100 requests/hour"foranthropic.claude-3-5-sonnet-20240620per service account, not per IP - Your security team can rotate one
THINKWATCH_MASTER_KEY, not 17.envfiles scattered across repos
ThinkWatch v0.4.2 (latest stable as of May 2024) supports OpenAI v1, Anthropic v1, Google Gemini (via google.generativeai REST), and any OpenAI-compatible self-hosted LLM (e.g., Ollama, vLLM, TGI, llama.cpp). It also supports MCP (Model Context Protocol) — yes, that emerging standard — for structured tool calling. That’s rare. Most proxies ignore MCP entirely.
How ThinkWatch Compares to Alternatives
If you’ve tried other tools, you’ve probably hit these pain points:
llama.cpp+llama-server: Great for local inference, zero auth, zero logging, zero cost tracking. You’re on your own for RBAC.text-generation-inference(TGI): Powerful, but built for one model, no multi-provider proxying, and RBAC is bolted on via external auth (e.g., Keycloak).Ollama: Developer-friendly, but no enterprise auth, no audit logs, no rate limiting beyond basic--num-ctx, and no cost visibility.- Plexus / LlamaIndex Gateway: More ML-framework than infrastructure — built for agents, not security. No native RBAC or billing.
- NGINX + Lua scripts: Possible, but you’re writing and maintaining auth logic, token parsing, logging schema, rate-limiting counters, and cost math — all in Lua. No thanks.
ThinkWatch avoids this sprawl. It’s not a model server — it proxies. It’s not a framework — it’s a binary you deploy and forget (mostly). It ships with SQLite (default) or PostgreSQL for persistence, Rust-level memory safety, and a minimal attack surface (<15MB binary, no libc dependency when built with musl).
Here’s the kicker: ThinkWatch’s config is declarative and granular, not “set and pray.” You define:
providers: with API keys encrypted at rest (viaageoropenssl)services: mapping internal service IDs (marketing-llm-client) to provider/model combospolicies: RBAC rules (role: analyst→allow: read on /v1/chat/completions)quotas: per-service rate limits and spend caps (max_spend_usd_per_month = 450.0)
That’s not marketing fluff. That’s in config.yaml.
Installation & Docker-First Deployment
ThinkWatch is built for production — but it’s also delightfully simple to test locally. I spun it up on my M2 Mac (16GB RAM) in <90 seconds. For production, I run it on a t3.xlarge (4 vCPU / 16GB RAM) — overkill, but safe.
Quick Start (Docker Compose)
Use the official docker-compose.yml from the repo (v0.4.2):
# docker-compose.yml
version: '3.8'
services:
thinkwatch:
image: ghcr.io/thinkwatchproject/thinkwatch:0.4.2
ports:
- "8080:8080"
environment:
- THINKWATCH_CONFIG_PATH=/etc/thinkwatch/config.yaml
- THINKWATCH_LOG_LEVEL=info
volumes:
- ./config.yaml:/etc/thinkwatch/config.yaml:ro
- ./secrets:/run/secrets:ro
restart: unless-stopped
You’ll need a config.yaml. Here’s a minimal working version (with real values I use in staging):
# config.yaml
server:
bind_addr: "0.0.0.0:8080"
tls: null # disable TLS for local dev; enable in prod with cert paths
database:
type: "sqlite"
path: "/data/thinkwatch.db"
providers:
- id: "openai-prod"
type: "openai"
base_url: "https://api.openai.com/v1"
api_key: "age1...encrypted-key" # use `age` to encrypt: `age -r $(cat key.txt) -a < key.txt | age -r $(cat key.txt) -a > secret.age`
- id: "ollama-local"
type: "openai"
base_url: "http://host.docker.internal:11434/v1"
api_key: "ollama" # dummy key — Ollama ignores it
services:
- id: "marketing-rag"
provider_id: "openai-prod"
model: "gpt-4o"
allowed_paths: ["/v1/chat/completions"]
- id: "internal-qa"
provider_id: "ollama-local"
model: "llama3:8b"
allowed_paths: ["/v1/chat/completions"]
policies:
- role: "dev"
permissions:
- action: "read"
resource: "/v1/chat/completions"
service_id: "marketing-rag"
- role: "analyst"
permissions:
- action: "read"
resource: "/v1/chat/completions"
service_id: "internal-qa"
quotas:
- service_id: "marketing-rag"
rate_limit: "1000/hour"
max_spend_usd_per_month: 1200.0
Then run:
# First, generate an age keypair (do this once)
age-keygen > key.txt
# Encrypt your OpenAI key (replace YOUR_KEY)
echo "sk-abc123..." | age -r $(cat key.txt) > secrets/openai_key.age
# Spin it up
docker compose up -d
That’s it. Your apps now hit http://localhost:8080/v1/chat/completions with an X-ThinkWatch-Service-ID: marketing-rag header — and ThinkWatch handles auth, routing, logging, and cost attribution.
Why Self-Host ThinkWatch? (Who Is This Actually For?)
Let’s cut the enterprise buzzword bingo. ThinkWatch isn’t for:
- Solo devs doing local
ollama run mistral - Teams using only one LLM, with no compliance or spend concerns
- Companies already running Kong + Auth0 + Datadog + custom billing pipelines
It is for:
- AI platform teams building internal LLM platforms for 50+ engineers — who need to enforce usage policies before someone fine-tunes
claude-3-opuson prod data - FinTech / HealthTech startups where “we log all API calls” isn’t a nice-to-have — it’s an audit requirement (SOC2, HIPAA)
- ML Ops teams tired of writing custom middleware in FastAPI just to add
X-Request-ID+X-Model-Costheaders - Self-hosting purists who want zero vendor lock-in and zero blind spots in their AI stack
Hardware-wise: it’s lightweight. On my test instance (Rust release build, SQLite backend), ThinkWatch uses ~85MB RAM idle, peaks at ~220MB under 50 RPS. CPU stays under 0.3 cores. No GPU needed — it’s a proxy, not a model runner. Disk usage? 20MB for the binary + ~100MB/month for logs + audit DB (with 10k requests/day). You can run it on a $5/month DO droplet if you’re not under heavy load.
Audit Logs, Cost Tracking, and That MCP Thing
ThinkWatch’s audit log isn’t just “request timestamp + status.” It’s structured, queryable, and enriched. Every log entry (SQLite or PostgreSQL) includes:
service_id,provider_id,modelprompt_tokens,completion_tokens,total_tokenscost_usd(calculated using built-in pricing tables — e.g.,gpt-4oinput: $5.00/million tokens)user_agent,client_ip,X-Request-IDpolicy_matched,quota_remaining,rate_limit_remaining
You can query it directly:
SELECT service_id, model, SUM(cost_usd) as monthly_spend
FROM audit_logs
WHERE created_at >= '2024-05-01'
GROUP BY service_id, model
ORDER BY monthly_spend DESC;
And yes — it supports MCP. As of v0.4.2, ThinkWatch transparently proxies POST /v1/mcp/ (and GET /v1/mcp/tools) to Anthropic and OpenAI endpoints that support it, and rewrites tool calls for self-hosted LLMs that don’t — injecting mcp:// tool URIs into the system prompt. This is huge if you’re building tool-using agents and want to avoid vendor-specific tool schemas.
The Rough Edges (My Honest Take)
I’ve run ThinkWatch in staging for 17 days. Here’s what’s solid — and where it stings:
✅ Rust is real: Zero crashes, memory leaks, or weird segfaults. The binary just works.
✅ RBAC is actually usable: Unlike some “RBAC” implementations that just check a header, ThinkWatch validates against the full policy tree, including path + method + service ID.
✅ Cost tracking is shockingly accurate: I compared its cost_usd against raw OpenAI billing export — matched within $0.02 over 3 days.
✅ Docker image is lean: ghcr.io/thinkwatchproject/thinkwatch:0.4.2 is 48MB — smaller than most Python-based proxies.
❌ Docs are sparse: The GitHub README is good for setup, but config.yaml schema docs? Buried in docs/config.md. Took me 20 minutes to find how to configure PostgreSQL.
❌ No built-in dashboard: You get /health, /metrics (Prometheus), and raw DB access — but no Grafana dashboards or web UI for logs. You will need to build that (or use sqlite-web).
❌ MCP support is new: Only Anthropic + OpenAI endpoints work out-of-the-box. Gemini’s MCP support is “coming soon” (per issue #42).
❌ No OAuth2 / SAML out-of-the-box: It supports API key auth and JWT, but you’ll need to wire up your IdP manually (e.g., via jwks_uri in config).
Is it worth deploying today? Yes — if you need production-grade AI governance and are willing to shell out 2–3 hours to wire up logging + auth. It’s not “install and go” like Ollama, but it’s far more mature than 90% of the “AI gateway” crates on crates.io.
The project is young (first commit: Dec 2023), but the maintainer is responsive (3 PRs merged in last week), the issue tracker is well-organized, and the Rust codebase is clean — not “Rust because it’s trendy,” but Rust because it matters here.
TL;DR: Should You Deploy ThinkWatch?
- Do it if: You’re building an internal AI platform, care about cost control, need audit trails, and want to avoid stitching together 5 different tools.
- Skip it if: You just want a quick local proxy, or you’re not comfortable reading Rust docs or writing a
config.yamlwith nested structs. - Watch closely if: You use MCP or need SAML — those features are rolling out fast (v0.5 roadmap includes IdP integrations and a basic web UI).
ThinkWatch isn’t perfect. But it’s the first AI bastion host that feels like it was designed, not assembled. And in a world of half-baked LLM proxies, that’s rare — and worth your time.
I’m keeping it in staging. Next week, it hits prod. Let’s see how long it takes before someone tries to curl it without X-ThinkWatch-Service-ID. (Spoiler: the audit log will know.)
Comments