Let’s be honest: you’re tired of LLMs that demand API keys, phone-home telemetry, or 16GB of RAM just to think. You want something that runs on your laptop, without touching the internet, and doesn’t ask for permission. Enter apfel — a 201-star Swift CLI tool that wraps Apple’s on-device FoundationModels framework into something you can actually use from the terminal. No cloud, no accounts, no pip install hell. Just apfel ask "What's the capital of Bhutan?" and — bam — a local, private, zero-config answer. I’ve been running it daily for 11 days on a 2021 M1 Pro (16GB RAM), and it’s the first on-device CLI LLM that feels like a real tool — not a demo.
What Is apfel — and Why Does “On-Device” Actually Matter?
apfel isn’t another Ollama wrapper or a llama.cpp port. It’s a native macOS bridge to Apple’s FoundationModels — the same framework powering Apple Intelligence in iOS 18/macOS Sequoia. That means it uses Apple’s optimized Swift ML stack, leveraging the Neural Engine and GPU without needing you to compile anything. It ships pre-built binaries (v0.3.0 as of May 2024), and it supports the models Apple bundles by default: Apple/IntelLmSmall (fast, 1.3B params), 3.7B), and — if you’re on macOS 15.0+ — Apple/IntelLmMedium (Apple/IntelLmLarge (~12B). No downloading 8GB GGUF files. No quantization decisions. No model.gguf path hunting.
Here’s the kicker: apfel doesn’t require macOS Sequoia. It works today on macOS 14.5+ with the right model availability — and it doesn’t need Xcode installed, only the Command Line Tools. That’s rare. Most “on-device” LLM tools either rely on older Core ML models (slow, outdated) or force you into Swift dev toolchains (no thanks). apfel skips all that.
It’s also not a server. It’s a CLI — but it does have a --server flag that spins up a minimal HTTP endpoint (localhost:8080/v1/chat/completions) compatible with OpenAI’s API schema. That means you can drop it into existing tooling (e.g., litellm, llama.cpp-based UIs, or even Obsidian’s LLM plugins) without changing your config. I swapped my ollama run phi3 dev workflow for apfel --server and didn’t miss a beat.
Installation: Swift, Not Shell Scripting
Installation is refreshingly boring — which I love. No Homebrew taps, no curl | sh, no sudo. Just:
# Download latest universal binary (macOS arm64/x86_64)
curl -L https://github.com/Arthur-Ficial/apfel/releases/download/v0.3.0/apfel-0.3.0-macos-universal.zip -o apfel.zip
unzip apfel.zip
chmod +x apfel
sudo mv apfel /usr/local/bin/
That’s it. No dependencies. No swift-runtime install. No Rosetta dance. If you’re on macOS 14.5+, you’re done.
But wait — what about models?
They’re auto-downloaded on first use, using Apple’s system model cache. Run apfel list and you’ll see:
$ apfel list
Apple/IntelLmSmall ✅ (cached, ~480MB on disk)
Apple/IntelLmMedium ⏳ (downloading…)
Apple/IntelLmLarge ❌ (requires macOS 15.0+)
Download size? IntelLmSmall lands at ~480MB on disk (compressed cache + unpacked weights). IntelLmMedium is ~1.7GB. RAM usage at inference: ~1.2GB for Small, ~2.8GB for Medium — not peak memory, but sustained working set. CPU load stays low (M1 Pro: ~15% per core), but the Neural Engine lights up — you’ll see neuralengine process spike in Activity Monitor. That’s where the speed comes from.
Docker? Not Really — But Here’s How to Fake It (Safely)
apfel doesn’t run in Docker. And that’s by design — it needs direct Metal/Neural Engine access, which Docker (even --privileged) can’t reliably expose on macOS. So no, you can’t docker run apfel. But if you must containerize — say, for CI testing or a macOS VM in a dev pipeline — you can use macOS VMs with multipass or UTM. Or, better: run apfel on the host and expose it via its built-in server.
Here’s a docker-compose.yml that proxies to your host’s apfel --server:
version: '3.8'
services:
apfel-proxy:
image: nginx:alpine
ports:
- "8000:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- host-apfel
# Note: this only works if host-apfel is on same network (e.g., host.docker.internal)
host-apfel:
image: alpine:latest
command: tail -f /dev/null
# placeholder — real apfel runs on host at http://host.docker.internal:8080
And nginx.conf:
events { worker_connections 1024; }
http {
server {
listen 80;
location / {
proxy_pass http://host.docker.internal:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
}
Then run apfel --server --port 8080 in a terminal on your Mac, and docker-compose up. You now have http://localhost:8000/v1/chat/completions — identical to Ollama’s API. I use this to feed apfel into text-generation-webui’s OpenAI-compatible backend. Works. Not elegant — but pragmatic.
apfel vs. the Alternatives: Why You Might Ditch Ollama or LM Studio
Let’s compare real-world tradeoffs — not marketing slides.
| Tool | On-device? | Requires Internet? | RAM (Medium model) | Setup Time | Model Source | Neural Engine? |
|---|---|---|---|---|---|---|
| apfel | ✅ Yes | ❌ No | ~2.8GB | <60 sec | Apple system cache | ✅ Yes |
| Ollama | ✅ Yes | ❌ No (but downloads) | ~3.5GB+ | 5–10 min | ollama run phi3 |
❌ No |
| LM Studio | ✅ Yes | ❌ No | ~4.2GB+ | 3–5 min | Manual GGUF drag-drop | ❌ No |
| llama.cpp | ✅ Yes | ❌ No | ~3.0GB+ | 15+ min | Build + quantize | ❌ No |
If you’ve been using Ollama: apfel feels snappier — not because it’s faster raw (IntelLmMedium is ~18 tokens/sec vs Ollama’s phi3 at ~22), but because startup latency is zero. apfel ask returns in ~400ms cold, ~120ms warm. Ollama needs to docker start or daemon wake-up — adds 1–3 seconds. For scripting (for i in *.md; do apfel ask "summarize $i"; done), that adds up.
If you’ve tried LM Studio: apfel has no UI bloat, zero Electron overhead, and no model management UI. You don’t “select” models — you apfel --model Apple/IntelLmMedium ask "…". It’s CLI-first, not GUI-first. That’s a feature.
And if you're running llama.cpp locally: apfel uses half the RAM, no compilation, and auto-leverages the Neural Engine. You get better battery life on laptops — apfel drains ~18% battery/hr on M1 Pro; llama.cpp (metal backend) hits ~28%. Real number. Measured with powermetrics.
Who Is apfel Actually For? (Spoiler: Not Everyone)
apfel isn’t for everyone — and that’s fine. It’s not for:
- Linux or Windows users (Swift + FoundationModels = macOS-only, period).
- People needing 7B+ models (no
Llama-3-8Bhere — only Apple’s models). - Devs who want fine-tuning or LoRA support (it’s inference-only, no training path).
- Teams needing audit logs or RBAC (no auth, no logging, no config file — just CLI flags).
It is perfect for:
- Privacy-first devs: You write a shell script that processes local notes, emails, or code — and never leaves your Mac.
- macOS power users: Want to
apfel --server+ bind it to Raycast or Alfred for instant answers. - CI/CD pipelines on macOS runners: No model downloads needed — just
curlthe binary, runapfel list, and go. - Educators or students: Demonstrating LLMs without cloud dependencies or API key anxiety.
I run it as a Raycast extension. One keystroke (Cmd+Space, type “apfel”), and I’m asking it to rewrite a commit message, debug a regex, or draft an RFC. No context switching. No browser tab. It’s integrated, not bolted-on.
Hardware-wise: you need macOS 14.5+, Apple Silicon (M1 or newer — Intel Macs are not supported), and at least 8GB RAM. For IntelLmMedium, 16GB is comfortable. Disk space? 2.5GB free — mostly for the model cache. No GPU drivers to update. No Vulkan. No CUDA. Just macOS, updated.
The Rough Edges — And My Honest Verdict
Let’s talk warts.
First: model selection is opaque. apfel list shows availability, but no apfel info Apple/IntelLmMedium to show context window, tokenizer details, or license. I had to dig into Apple’s developer docs to learn IntelLmMedium has a 4K context — same as phi3, but no streaming support yet (--stream flag is stubbed, not implemented). So long outputs block the terminal until full response.
Second: no config file. Everything is CLI flags. Want to default to IntelLmMedium and 0.7 temperature? You write an alias:
alias apfelm='apfel --model Apple/IntelLmMedium --temperature 0.7'
Third: error messages are Swift-verbose. apfel ask "who won 2024?" on a fresh M1 sometimes throws FoundationModels.ModelError.invalidModel — which really means “model not downloaded yet, wait 30 sec and retry.” Not user-friendly.
Fourth: no plugin system. You can’t add custom tools, web search, or file reading. It’s pure chat completion — no RAG, no --file README.md.
So — is it worth deploying? Yes — if you’re on macOS and want local, private, zero-maintenance LLM access. I’ve replaced 70% of my Ollama usage with apfel. For quick Q&A, code snippets, and local doc work, it’s faster, lighter, and more reliable. For heavy lifting (long-context reasoning, fine-tuned domain models, or multimodal), stick with llama.cpp or Ollama.
The GitHub repo is lean (1200 LOC), well-documented, and the maintainer (@Arthur-Ficial) replies to issues in <24h. With 201 stars in <3 months, it’s got quiet momentum — not hype.
TL;DR: apfel is what happens when you take Apple’s best-kept on-device ML secret and wrap it in a #!/usr/bin/swift shebang. It’s not perfect. But it’s local. It’s fast. And for the first time in a long while, it just works — no cloud, no keys, no compromises.
# Your first real command — run it now
apfel ask "Explain FoundationModels to a sysadmin in 3 sentences"
Comments