Let’s be honest: you’ve stared at a blank SKILL.md file more times than you’d admit. You spent 45 minutes writing a prompt that almost got your LLM to scaffold a microservice the right way—only for it to generate a Dockerfile with apt-get update && apt-get install -y python3-pip inside the container, then crash on pip install -r requirements.txt because pip wasn’t in $PATH. You’re not failing at coding. You’re failing at orchestrating intent. That’s why Spec-Driven Develop (461 stars on GitHub as of May 2024, shell-based, zero dependencies beyond bash and curl) hit me like a cold splash of water: it’s not another framework—it’s a teaching artifact. A single, human-readable, version-controlled .md file that trains any AI coding agent (Cursor, Continue.dev, o1-preview, even local Ollama models) how to interpret your spec, decompose scope, validate constraints, and generate correct-by-construction scaffolds—before a single line of production code exists.
What Is Spec-Driven Develop — And Why It’s Not Another AI DevTool
Spec-Driven Develop (SDD) is a spec-driven development methodology encoded as a skill, not software. The entire “project” is one file: SKILL.md. That’s it. No CLI, no daemon, no Python virtual env. It’s a protocol—a strict 7-step loop your AI agent follows when handed a SPEC.md:
- Parse spec goals, constraints, and non-goals
- Identify all required services, data flows, and external integrations
- Validate feasibility (e.g., “must run on ARM64” + “uses CUDA” → ❌)
- Generate interface-first contracts (OpenAPI, protobuf, message schemas)
- Scaffold only what’s needed:
docker-compose.yml,Makefile,.gitignore,README.mdboilerplate - Output validation checklist (e.g., “✅ all env vars declared in
.env.example; ❌ no healthcheck inDockerfile”) - Gate next steps behind human sign-off (no auto-commit, no auto-push)
That’s the kicker: SDD refuses to generate code until interfaces are locked. Unlike GitHub Copilot or Tabnine—which happily generate a main.py with hardcoded API keys—SDD forces negotiation at the spec layer first. I ran it with a real client spec last week (“a real-time MQTT-to-PostgreSQL ingestion pipeline with auth via Keycloak, needs zero-downtime migrations, must run on Raspberry Pi 4”). My local llama3:70b (via Ollama) spent 3 minutes arguing with itself about schema evolution before outputting a schema/ folder with device_reading.proto, docker-compose.yml with keycloak:22.0.5, and a Makefile with migrate-up and migrate-down. No fluff. No guesswork.
How to Install and Use Spec-Driven Develop (No Runtime Required)
There’s nothing to “install” in the traditional sense—but there is a lightweight harness to make it actionable. The project ships a run.sh (42 lines of bash) that:
- Validates your
SPEC.mdstructure usingyqandgrep - Injects context (e.g.,
--os=linux/arm64,--ai=ollama:llama3:70b) - Pipes spec +
SKILL.mdinto your AI agent via its CLI or API
Here’s what I use daily:
# Clone (yes, it's just a git clone — no build step)
git clone https://github.com/zhu1090093659/spec_driven_develop.git
cd spec_driven_develop
# Make sure you have ollama + yq
brew install yq # macOS
# or
sudo snap install yq
# Example: run against your local SPEC.md
./run.sh \
--spec=../my-project/SPEC.md \
--ai=ollama \
--model=llama3:70b \
--output-dir=../my-project/scaffold/
That’s it. run.sh outputs a scaffold/ directory with:
docker-compose.yml(with correct network isolation and healthchecks)services/subfolders per microservice, each withDockerfile,entrypoint.sh,README.mdschema/with OpenAPI v3 YAML and equivalent JSON Schemavalidation-report.md— a human-readable checklist you must approve before moving on
No Python. No Node. Just bash, curl, yq, and your AI agent’s CLI. The GitHub repo is shell-only (100% of 1.2k LOC), and the SKILL.md file is MIT-licensed, so you can fork, edit, and PR improvements without needing a dev environment.
Docker Compose Setup for Local AI Agents
You don’t need Docker to use SDD—but if your AI agent runs in-container (e.g., continue-server, cursor-server, or a local llama.cpp API), this docker-compose.yml is battle-tested on my M2 MacBook and 32GB Raspberry Pi 5:
# docker-compose.sdd.yml
version: '3.8'
services:
ollama:
image: ollama/ollama:0.3.7
ports:
- "11434:11434"
volumes:
- ./ollama_models:/root/.ollama
command: ["ollama", "serve"]
restart: unless-stopped
continue:
image: continue-dev/continue-server:0.6.2
ports:
- "8000:8000"
volumes:
- ./continue_config:/app/config
- ./my-project:/app/workspace
environment:
- CONTINUE_SERVER_API_KEY=dev-key
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
Then point run.sh at Continue’s API:
./run.sh \
--spec=../my-project/SPEC.md \
--ai=continue \
--continue-url=http://localhost:8000 \
--continue-api-key=dev-key \
--output-dir=../my-project/scaffold/
Resource-wise: ollama:0.3.7 with llama3:70b needs ~24GB RAM and 20GB disk (quantized Q4_K_M). On the Pi 5 (8GB RAM), I drop to phi3:3.8b — it’s slower, but SDD’s strict validation catches 90% of the hallucinations. CPU load stays under 1.2 during inference. No GPU required — but if you have one, ollama run llama3:70b --gpu cuts scaffold time from 4m22s → 1m18s (measured with hyperfine).
Spec-Driven Develop vs. Alternatives: Why Not Just Use Copilot or LangChain?
If you’ve used GitHub Copilot for scaffolding, you know the pain: it generates syntax, not semantics. It’ll happily write a Dockerfile that pulls python:3.12-slim and RUN pip install fastapi uvicorn, then forget to EXPOSE 8000 or set CMD ["uvicorn", "main:app"]. LangChain-based dev agents (like devchain, aiconfig) add orchestration—but at the cost of configuration debt. I tried devchain on the same MQTT spec: it required 3 custom Chain classes, a config.yaml, and 2 hours of debugging its Dockerfile generator before it spat out something that almost worked.
SDD wins by refusing complexity. No YAML configs. No plugin registries. No “agent memory” to manage. Just:
- You write
SPEC.md(with strict sections:## Goals,## Constraints,## Non-Goals,## Interfaces) - You run
run.sh - You review
validation-report.md - You approve or edit
SPEC.mdand re-run
It’s like having a senior dev pair-programming with your AI—before the first git init. Compared to Cursor’s “Project Plan” feature: SDD is 100% reproducible, git diff-able, and versioned with your spec. Cursor’s plan lives in its UI cache. SDD’s plan lives in scaffold/validation-report.md — and that file gets committed.
Who Is This For? (Hint: Not Just AI DevOps Nerds)
SDD is built for three real-world personas:
Platform Engineers who own internal dev tooling: You drop
SKILL.mdinto your company’sai-scaffoldingrepo, add apre-commithook that runsrun.sh --validateon everySPEC.mdchange, and enforce interface-first development across 12 teams. No more “why does service X call service Y over HTTP when we agreed on gRPC?”Startup CTOs shipping MVPs fast: I used SDD to spec out a Stripe webhook processor + Notion sync service in 2 hours — then generated the entire stack (FastAPI + Celery + Redis + Notion API auth flow) with zero copy-paste. The
validation-report.mdcaught that I’d forgotten rate-limiting headers in the webhook endpoint — before writing any handler logic.Self-Hosters & Indie Hackers: You want your AI to generate correct, self-contained, deployable artifacts — not just “here’s a
main.py”. SDD gives you Docker Compose files thatdocker-compose up -djust works, with proper healthchecks, restart policies, and env var isolation. No moredocker logs -fdebugging at 2 AM.
Hardware? You don’t need a GPU server. A 16GB M1 Mac or 8GB Pi 5 runs phi3:3.8b + SDD just fine. RAM usage peaks at ~1.4GB during inference (measured with htop). Disk? 200MB for the whole repo + models. It’s lighter than most node_modules.
The Rough Edges — And My Honest Verdict
Let’s get real: SDD isn’t magic. It has sharp corners.
No built-in LLM routing: You must bring your own agent. There’s no
sdd serveor web UI. If you’re not comfortable withollama run,curl -X POST, or Continue’s config — this feels like “too much plumbing.”SPEC.md is strict: Miss a
## Constraintssection?run.shexits withERROR: SPEC.md missing required section 'Constraints'. It’s opinionated — and intentionally inflexible. I spent 20 minutes arguing with it about whether “must support offline mode” belongs underConstraintsorNon-Goals. (Answer:Constraints. I lost the argument.)No CI/CD integration yet: There’s no
sdd-actionfor GitHub Actions. I hacked one usingdocker run -v $(pwd):/workspace ubuntu:24.04+apt install yq curl— but it’s not in the repo.Shell-only means macOS/Linux only: No native Windows support. WSL2 works fine, but if you’re fully on PowerShell, you’ll need to port
run.sh(PRs welcome — the author merged mine in <24h).
That said — yes, it’s worth deploying. I’ve run it for 17 days across 4 real projects (2 internal, 2 client). Every time, the first scaffold/ output was production-ready enough to deploy to staging. Not perfect — but 80% of the boilerplate done, 100% of the interfaces validated, and zero “why does this Docker container crash on start?” surprises.
The TL;DR? Spec-Driven Develop isn’t another AI code generator. It’s a spec enforcer disguised as a skill. It shifts the AI’s job from “write code” to “negotiate intent.” And in a world where LLMs hallucinate rm -rf / as a “safe cleanup step”, that shift isn’t optional — it’s survival.
Star it. Fork it. Edit SKILL.md to match your team’s standards. Then write your next SPEC.md — and watch your AI agent finally listen.
Comments