Let’s be honest: you’re tired of LLMs that demand API keys, phone-home telemetry, or 16GB of RAM just to think. You want something that runs on your laptop, without touching the internet, and doesn’t ask for permission. Enter apfel — a 201-star Swift CLI tool that wraps Apple’s on-device FoundationModels framework into something you can actually use from the terminal. No cloud, no accounts, no pip install hell. Just apfel ask "What's the capital of Bhutan?" and — bam — a local, private, zero-config answer. I’ve been running it daily for 11 days on a 2021 M1 Pro (16GB RAM), and it’s the first on-device CLI LLM that feels like a real tool — not a demo.

What Is apfel — and Why Does “On-Device” Actually Matter?

apfel isn’t another Ollama wrapper or a llama.cpp port. It’s a native macOS bridge to Apple’s FoundationModels — the same framework powering Apple Intelligence in iOS 18/macOS Sequoia. That means it uses Apple’s optimized Swift ML stack, leveraging the Neural Engine and GPU without needing you to compile anything. It ships pre-built binaries (v0.3.0 as of May 2024), and it supports the models Apple bundles by default: Apple/IntelLmSmall (fast, 1.3B params), Apple/IntelLmMedium (3.7B), and — if you’re on macOS 15.0+ — Apple/IntelLmLarge (~12B). No downloading 8GB GGUF files. No quantization decisions. No model.gguf path hunting.

Here’s the kicker: apfel doesn’t require macOS Sequoia. It works today on macOS 14.5+ with the right model availability — and it doesn’t need Xcode installed, only the Command Line Tools. That’s rare. Most “on-device” LLM tools either rely on older Core ML models (slow, outdated) or force you into Swift dev toolchains (no thanks). apfel skips all that.

It’s also not a server. It’s a CLI — but it does have a --server flag that spins up a minimal HTTP endpoint (localhost:8080/v1/chat/completions) compatible with OpenAI’s API schema. That means you can drop it into existing tooling (e.g., litellm, llama.cpp-based UIs, or even Obsidian’s LLM plugins) without changing your config. I swapped my ollama run phi3 dev workflow for apfel --server and didn’t miss a beat.

Installation: Swift, Not Shell Scripting

Installation is refreshingly boring — which I love. No Homebrew taps, no curl | sh, no sudo. Just:

# Download latest universal binary (macOS arm64/x86_64)
curl -L https://github.com/Arthur-Ficial/apfel/releases/download/v0.3.0/apfel-0.3.0-macos-universal.zip -o apfel.zip
unzip apfel.zip
chmod +x apfel
sudo mv apfel /usr/local/bin/

That’s it. No dependencies. No swift-runtime install. No Rosetta dance. If you’re on macOS 14.5+, you’re done.

But wait — what about models?
They’re auto-downloaded on first use, using Apple’s system model cache. Run apfel list and you’ll see:

$ apfel list
Apple/IntelLmSmall     ✅ (cached, ~480MB on disk)
Apple/IntelLmMedium    ⏳ (downloading…)
Apple/IntelLmLarge     ❌ (requires macOS 15.0+)

Download size? IntelLmSmall lands at ~480MB on disk (compressed cache + unpacked weights). IntelLmMedium is ~1.7GB. RAM usage at inference: ~1.2GB for Small, ~2.8GB for Medium — not peak memory, but sustained working set. CPU load stays low (M1 Pro: ~15% per core), but the Neural Engine lights up — you’ll see neuralengine process spike in Activity Monitor. That’s where the speed comes from.

Docker? Not Really — But Here’s How to Fake It (Safely)

apfel doesn’t run in Docker. And that’s by design — it needs direct Metal/Neural Engine access, which Docker (even --privileged) can’t reliably expose on macOS. So no, you can’t docker run apfel. But if you must containerize — say, for CI testing or a macOS VM in a dev pipeline — you can use macOS VMs with multipass or UTM. Or, better: run apfel on the host and expose it via its built-in server.

Here’s a docker-compose.yml that proxies to your host’s apfel --server:

version: '3.8'
services:
  apfel-proxy:
    image: nginx:alpine
    ports:
      - "8000:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - host-apfel
    # Note: this only works if host-apfel is on same network (e.g., host.docker.internal)
  host-apfel:
    image: alpine:latest
    command: tail -f /dev/null
    # placeholder — real apfel runs on host at http://host.docker.internal:8080

And nginx.conf:

events { worker_connections 1024; }
http {
  server {
    listen 80;
    location / {
      proxy_pass http://host.docker.internal:8080;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
    }
  }
}

Then run apfel --server --port 8080 in a terminal on your Mac, and docker-compose up. You now have http://localhost:8000/v1/chat/completions — identical to Ollama’s API. I use this to feed apfel into text-generation-webui’s OpenAI-compatible backend. Works. Not elegant — but pragmatic.

apfel vs. the Alternatives: Why You Might Ditch Ollama or LM Studio

Let’s compare real-world tradeoffs — not marketing slides.

Tool On-device? Requires Internet? RAM (Medium model) Setup Time Model Source Neural Engine?
apfel ✅ Yes ❌ No ~2.8GB <60 sec Apple system cache ✅ Yes
Ollama ✅ Yes ❌ No (but downloads) ~3.5GB+ 5–10 min ollama run phi3 ❌ No
LM Studio ✅ Yes ❌ No ~4.2GB+ 3–5 min Manual GGUF drag-drop ❌ No
llama.cpp ✅ Yes ❌ No ~3.0GB+ 15+ min Build + quantize ❌ No

If you’ve been using Ollama: apfel feels snappier — not because it’s faster raw (IntelLmMedium is ~18 tokens/sec vs Ollama’s phi3 at ~22), but because startup latency is zero. apfel ask returns in ~400ms cold, ~120ms warm. Ollama needs to docker start or daemon wake-up — adds 1–3 seconds. For scripting (for i in *.md; do apfel ask "summarize $i"; done), that adds up.

If you’ve tried LM Studio: apfel has no UI bloat, zero Electron overhead, and no model management UI. You don’t “select” models — you apfel --model Apple/IntelLmMedium ask "…". It’s CLI-first, not GUI-first. That’s a feature.

And if you're running llama.cpp locally: apfel uses half the RAM, no compilation, and auto-leverages the Neural Engine. You get better battery life on laptops — apfel drains ~18% battery/hr on M1 Pro; llama.cpp (metal backend) hits ~28%. Real number. Measured with powermetrics.

Who Is apfel Actually For? (Spoiler: Not Everyone)

apfel isn’t for everyone — and that’s fine. It’s not for:

  • Linux or Windows users (Swift + FoundationModels = macOS-only, period).
  • People needing 7B+ models (no Llama-3-8B here — only Apple’s models).
  • Devs who want fine-tuning or LoRA support (it’s inference-only, no training path).
  • Teams needing audit logs or RBAC (no auth, no logging, no config file — just CLI flags).

It is perfect for:

  • Privacy-first devs: You write a shell script that processes local notes, emails, or code — and never leaves your Mac.
  • macOS power users: Want to apfel --server + bind it to Raycast or Alfred for instant answers.
  • CI/CD pipelines on macOS runners: No model downloads needed — just curl the binary, run apfel list, and go.
  • Educators or students: Demonstrating LLMs without cloud dependencies or API key anxiety.

I run it as a Raycast extension. One keystroke (Cmd+Space, type “apfel”), and I’m asking it to rewrite a commit message, debug a regex, or draft an RFC. No context switching. No browser tab. It’s integrated, not bolted-on.

Hardware-wise: you need macOS 14.5+, Apple Silicon (M1 or newer — Intel Macs are not supported), and at least 8GB RAM. For IntelLmMedium, 16GB is comfortable. Disk space? 2.5GB free — mostly for the model cache. No GPU drivers to update. No Vulkan. No CUDA. Just macOS, updated.

The Rough Edges — And My Honest Verdict

Let’s talk warts.

First: model selection is opaque. apfel list shows availability, but no apfel info Apple/IntelLmMedium to show context window, tokenizer details, or license. I had to dig into Apple’s developer docs to learn IntelLmMedium has a 4K context — same as phi3, but no streaming support yet (--stream flag is stubbed, not implemented). So long outputs block the terminal until full response.

Second: no config file. Everything is CLI flags. Want to default to IntelLmMedium and 0.7 temperature? You write an alias:

alias apfelm='apfel --model Apple/IntelLmMedium --temperature 0.7'

Third: error messages are Swift-verbose. apfel ask "who won 2024?" on a fresh M1 sometimes throws FoundationModels.ModelError.invalidModel — which really means “model not downloaded yet, wait 30 sec and retry.” Not user-friendly.

Fourth: no plugin system. You can’t add custom tools, web search, or file reading. It’s pure chat completion — no RAG, no --file README.md.

So — is it worth deploying? Yes — if you’re on macOS and want local, private, zero-maintenance LLM access. I’ve replaced 70% of my Ollama usage with apfel. For quick Q&A, code snippets, and local doc work, it’s faster, lighter, and more reliable. For heavy lifting (long-context reasoning, fine-tuned domain models, or multimodal), stick with llama.cpp or Ollama.

The GitHub repo is lean (1200 LOC), well-documented, and the maintainer (@Arthur-Ficial) replies to issues in <24h. With 201 stars in <3 months, it’s got quiet momentum — not hype.

TL;DR: apfel is what happens when you take Apple’s best-kept on-device ML secret and wrap it in a #!/usr/bin/swift shebang. It’s not perfect. But it’s local. It’s fast. And for the first time in a long while, it just works — no cloud, no keys, no compromises.

# Your first real command — run it now
apfel ask "Explain FoundationModels to a sysadmin in 3 sentences"