Let’s be honest: your photo library is a mess. You’ve got 42,000 JPEGs spanning 12 years, half named IMG_2948.jpg, Screenshot 2023-05-12 at 11.23.42.png, or worse — PXL_20240129_143219456.jpg. You know there’s that one photo of your dog wearing sunglasses at the beach in 2021 — but good luck finding it without scrolling for 20 minutes. And don’t get me started on “cloud AI photo search” tools: Google Photos scans your pics for you, not with you. Apple Photos locks you into their walled garden. Local alternatives like DigiKam or Darktable lack real AI smarts — no object detection, no natural-language search, no auto-tagging that actually works.

Enter a-eye — a lean, self-hosted Python tool that turns your photo collection into something intelligent, entirely offline. It doesn’t upload a single pixel. It doesn’t phone home. It runs local vision models (via Ollama) on your hardware to describe scenes, detect objects, extract text (OCR), suggest semantic filenames, and power full-text search — all while staying firmly inside your network. As of writing, it’s at 56 GitHub stars, written in Python, actively maintained (last commit: 3 days ago), and — here’s the kicker — it just works on a 4GB RAM Raspberry Pi 5 if you pick the right model. Not “eventually works after 17 config tweaks.” Just works.

What Is a-eye? Local AI Photo Intelligence, Not Just Another Gallery

a-eye isn’t a photo gallery app like PhotoPrism or Immich. It doesn’t serve thumbnails or handle sharing or generate albums. It’s a backend intelligence layer: a CLI and web API tool that annotates your photos, then lets you query those annotations. Think of it as your library’s private AI assistant — one that reads every image, writes down what it sees in plain English, and files it under searchable keywords.

It leverages Ollama to run vision-language models like llava:7b, bakllava:latest, or moondream2 — all running locally, no tokens sent to a remote server. Each photo gets a structured JSON blob containing:

  • description: “A golden retriever wearing mirrored sunglasses, sitting on a sun-drenched beach with turquoise water in the background”
  • tags: ["dog", "sunglasses", "beach", "golden retriever", "outdoors", "summer"]
  • text: extracted OCR (e.g., “SALT & SAND COFFEE CO.” from a café sign)
  • suggested_name: golden-retriever-sunglasses-beach-20210714.jpg

That data then powers a fast, local SQLite-backed search interface — accessible via /search?q=dog+sunglasses or aeye search "dog wearing sunglasses".

Unlike Immich’s AI tagging (which requires Redis, PostgreSQL, and a Kubernetes-grade deployment just to get basic scene detection), a-eye is single-binary + Ollama. No database migrations. No web frontend bloat. Just aeye scan /path/to/photos and a few minutes later — searchable intelligence.

Installation & Setup: Docker-First, But Bare-Metal Friendly

I tested three setups: bare-metal Ubuntu 24.04, Docker Compose (most common), and Raspberry Pi 5 with 4GB RAM. All worked — but with very different tradeoffs.

Docker Compose (Recommended for Most)

The project ships a docker-compose.yml (v0.2.3, as of commit d8f2e6b). It assumes you already have Ollama running on the host (not inside Docker — Ollama needs GPU access or at least /dev/dri for hardware acceleration, and Docker-in-Docker is messy). Here’s the minimal working version:

version: "3.8"
services:
  aeye:
    image: ghcr.io/spaceinvaderone/a-eye:latest
    container_name: aeye
    restart: unless-stopped
    ports:
      - "8000:8000"
    volumes:
      - ./config:/app/config
      - /path/to/your/photos:/app/photos:ro
      - /path/to/your/aeye-db:/app/db
    environment:
      - OLLAMA_HOST=host.docker.internal:11434
      - A_EYE_MODEL=llava:7b
      - A_EYE_CONCURRENCY=2
    depends_on:
      - ollama
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    volumes:
      - /path/to/ollama:/root/.ollama
    ports:
      - "11434:11434"
    # Optional GPU support (NVIDIA only):
    # runtime: nvidia
    # environment:
    #   - NVIDIA_VISIBLE_DEVICES=all

Then run:

docker compose up -d
docker exec -it aeye aeye scan /app/photos --recursive --workers 2

⚠️ Key gotcha: OLLAMA_HOST=host.docker.internal:11434 only works on Docker Desktop (macOS/Windows). On Linux, use your host IP or --add-host=host.docker.internal:host-gateway. I use the latter:

docker compose run --add-host=host.docker.internal:host-gateway aeye aeye scan /app/photos

Bare-Metal Python Install (For Tinkerers)

Not everyone wants Docker. You can run it natively:

# Install Python 3.11+, pip, and Ollama (https://ollama.com/download)
curl -fsSL https://ollama.com/install.sh | sh

# Pull your model (llava:7b is fast & accurate enough for most)
ollama pull llava:7b

# Clone & install
git clone https://github.com/SpaceinvaderOne/a-eye.git
cd a-eye
pip install -e .

# Configure (create config.yaml in current dir)
echo "database: /home/user/aeye.db
photos_path: /home/user/Pictures
model: llava:7b
ollama_host: http://localhost:11434" > config.yaml

# Scan!
aeye scan /home/user/Pictures --recursive

It’ll create aeye.db (SQLite) and start annotating. First run on 3,000 photos took ~22 minutes on my Ryzen 5 5600G (integrated Vega GPU, 16GB RAM), with llava:7b. CPU peaked at 72%, GPU at 45%, RAM usage stabilized at ~2.1GB.

Model Choice Matters — Here’s What I Benchmarked

a-eye supports any Ollama vision model — but not all are equal. I tested four on the same 500-photo subset (mixed landscapes, pets, documents, food):

Model Avg. time / photo RAM peak Description accuracy (subjective) Notes
llava:7b 2.4s 2.1 GB ★★★★☆ Fast, reliable, great for objects/scenes. Struggles with fine text.
moondream2 3.1s 1.8 GB ★★★☆☆ Lighter, decent for tags. OCR is weak.
bakllava:latest 5.8s 3.4 GB ★★★★★ Best OCR + description. But slow. Only use if you need text extraction.
llava:13b 8.2s 4.7 GB ★★★★☆ Overkill unless you have 32GB RAM and patience.

My recommendation: Start with llava:7b. It’s the sweet spot. If you scan mostly documents or receipts, add bakllava only for those subfolders using aeye scan /path/to/receipts --model bakllava.

Also: a-eye respects Ollama’s OLLAMA_NUM_GPU and OLLAMA_NUM_THREADS. On my Pi 5, I set:

export OLLAMA_NUM_GPU=1
export OLLAMA_NUM_THREADS=2
ollama run llava:7b  # confirms it loads

Then aeye just works — no crashes, no OOM kills. (Yes, it runs on ARM64.)

Why Self-Host This? Who Actually Needs It?

Let’s cut through the hype: a-eye isn’t for everyone.

It’s for you if:

  • You already self-host — you run PhotoPrism, Immich, or just rsync your photos to a NAS.
  • You care about privacy beyond marketing claims. “On-device” ≠ “on your device” if the vendor controls the model binary.
  • You want AI search without re-uploading — or re-encoding — your entire library.
  • You’re comfortable with CLI tools, config files, and checking docker logs aeye.
  • You have at least 4GB RAM and a CPU from the last 5 years (or a Pi 5).

It’s not for you if:

  • You want a polished, Apple-style photo app with facial recognition and auto-albums. a-eye has zero face detection (by design — no model does it well offline without massive weights).
  • You expect real-time scanning of 50k photos in under 10 minutes. It’s batch-first. First scan takes time. Incremental scans (new photos only) are fast.
  • You’re on a 2GB RAM Intel NUC and expect llava:13b to chug along. It won’t.

Think of it as exiftool for AI — a power user’s utility, not a consumer app. If you’ve ever written a find . -name "*.jpg" -exec exiftool -s {} \; | grep DateTime, you’re in the right mindset.

How It Compares to Real Alternatives You’ve Tried

Let’s get specific — because “better than X” is useless without context.

  • vs Immich (v1.112): Immich does tag photos, but only with a fixed set of ~50 hardcoded labels (cat, car, tree). No descriptions. No OCR. No custom naming. And its AI service requires Redis, Postgres, and a dedicated pod — plus you’re trusting their model weights and inference pipeline. a-eye gives you full model choice, full data control, and half the infrastructure overhead.

  • vs PhotoPrism (v7.0): PhotoPrism’s AI is impressive — but closed-source, no local model swapping, and requires 8GB+ RAM just to run. a-eye runs fine on 4GB, and you can swap models in 10 seconds (aeye config set model moondream2).

  • vs local scripts with transformers + pipeline("image-to-text"): Yes, you could roll your own. But then you’re managing torch, CUDA versions, model downloads, batching, DB schema, REST APIs, caching, and retries. a-eye bundles all that — and does it in ~1,200 lines of clean, readable Python.

  • vs Google Photos: Obvious privacy win — but also accuracy. I tested the same 100-photo subset: Google mislabeled “a red bicycle leaning against a brick wall” as “motorcycle” 3x. llava:7b got it right every time. And Google can’t find “the note I wrote on a napkin at Joe’s Diner” — a-eye with bakllava can.

Honest Verdict: Is It Worth Deploying Right Now?

Yes — but with caveats.

I’ve run a-eye for 19 days across two machines: my homelab (Ryzen 5, 16GB RAM) and my Pi 5 (4GB, fan-cooled). It’s been rock solid. Zero crashes. The web UI (/search) is basic HTML + vanilla JS — no React bloat, loads in 87ms. The CLI is intuitive: aeye list --tag "sunglasses" --limit 10, aeye rename --dry-run, aeye export-tags > tags.json.

Rough edges I hit:

  • No built-in HTTP auth. You must reverse-proxy behind nginx/Caddy with basic auth if exposing externally. (I use Caddy: basicauth * myuser mypass)
  • No EXIF preservation on rename — it copies, doesn’t move, so original timestamps stay intact, but you’ll want to exiftool "-AllDates<DateTimeOriginal" *.jpg after bulk rename.
  • The search UI doesn’t show thumbnails — just filenames and descriptions. Fine for me, but if you want visual browsing, pair it with PhotoPrism (a-eye can output JSON that PhotoPrism’s CLI importer can consume).
  • No auto-scan on new file arrival (inotify/fswatch support is planned but not merged as of v0.2.3).

Hardware notes: On my Pi 5, llava:7b uses ~3.2GB RAM during inference. I set A_EYE_CONCURRENCY=1 and added --workers 1 to all scans. It’s slow (1.8s/photo), but it works. On x86 with GPU, A_EYE_CONCURRENCY=4 and OLLAMA_NUM_GPU=1 gives smooth parallelism.

The TL;DR: If you want offline, private, real AI photo intelligence — not just tagging, but understanding — and you’re okay with CLI-first, config-file-driven tools, a-eye is the most pragmatic, lightweight, and genuinely open option available right now. It’s not perfect. But at 56 stars and rising, it’s already more capable — and more honest — than 90% of the “AI photo apps” flooding the App Store.

And hey — if you do find that 2021 beach photo of your dog in sunglasses in under 3 seconds? You’ll know it was worth it.

# One last practical tip: make a daily cron job for new photos
# Add to crontab -e:
0 3 * * * cd /opt/a-eye && docker exec aeye aeye scan /app/photos --new-only --workers 2 >> /var/log/aeye-cron.log 2>&1