OpenCLAWN Logo

OpenCLAWN

Playful by Design. Powerful by Nature.

Lightweight, safe, self-improving multi-role agent framework

Route Smarter · Learn Continuously · Stay Safe · Hand Off Cleanly


What is OpenCLAWN?

OpenCLAWN is an agent framework built around 4 core innovations that most agent frameworks skip:

Innovation Problem Solved
Routing audit + self-calibration No agent records why a routing decision was made or whether it was correct
Skill decay Skill trees accumulate forever — stale skills pollute context
Confidence-gated crystallization Self-evolving agents store skills from bad solutions
Role output contracts Multi-agent handoffs without typed contracts are fragile

Built on top of those, the agent compounds — the skill library tidies and improves itself as it's used, all gated and reversible:

Capability What it does
Multi-agent conversation Pipeline / debate / orchestrator where roles hand off with validated contracts; live stop & interject
Skill compounding Skills get promoted when proven, refined when corrected, and merged when duplicate (all versioned & revertible)
Autopilots Scheduled agent runs — read-only; actions needing approval become proposals, never silent execution
Skill packs Export/import skills between installs (Markdown + hash), imported as draft behind SSRF + injection guards
Activity timeline Chronological view of every agent action across routing, tools, handoffs, conversations
Multilingual routing Language-agnostic complexity signals + optional script-aware tier bump

Stack: Python 3.12 · FastAPI · HTMX · SQLite (aiosqlite) · Ollama + Gemini + Claude · httpx · Pydantic · structlog · tenacity


Quick Start

git clone https://github.com/MuhammadHasbiAshshiddieqy/OpenClawn.git
cd OpenClawn

uv sync --frozen --extra dev

# Or with pip
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# Create .env from example
cp .env.example .env
# Fill in keys: GEMINI_API_KEY and/or ANTHROPIC_API_KEY (heavy tiers)
# Local-only is fine too — Ollama handles light tiers without any key

# Run database migration
mkdir -p data
sqlite3 data/openclawn.db < migrations/001_initial.sql

# Pull Ollama models — one per local tier (or just gemma4:e4b to start)
ollama pull gemma4:e2b
ollama pull gemma4:e4b
ollama pull gemma4:12b

# Build sandbox image for code_run / shell_run
docker build -t openclawn-sandbox:latest -f Dockerfile.sandbox .

# Start the app
uvicorn web.main:app --reload --port 8000

Open http://localhost:8000 to chat · http://localhost:8000/metrics for the routing calibration dashboard.


Architecture

Component Overview

graph TB
    subgraph UI["Web UI — FastAPI + HTMX + SSE"]
        CHAT["/chat/stream · single agent"]
        CONVERSE["/converse/stream · multi-agent"]
        ACTIVITY["/activity · timeline + blockers"]
        AUTOPAGE["/autopilots · scheduled runs"]
        SKILLS["/skills · decay · curation · packs"]
        METRICS["/metrics · calibration · telemetry"]
        SETTINGS["/router · /settings · model map"]
    end

    subgraph CONVO["Multi-Agent Layer"]
        ORCH["ConversationOrchestrator"]
        PIPE["PipelineStrategy · PM &rarr; Dev &rarr; QA"]
        DEBATE["DebateStrategy · round-robin"]
        LEAD["OrchestratorStrategy · dynamic delegation"]
    end

    subgraph SCHED["Autopilots — scheduled, proposal-gated"]
        SCHEDULER["AutopilotScheduler · asyncio loop"]
        PROPOSAL["Approval-gated actions &rarr; proposals (never silent)"]
    end

    subgraph AGENT["Agent Loop — iterative, not recursive"]
        direction TB
        SHIELD["Shield · NFKD scan"]
        ROUTE["SmartRouter · soul-aware · multilingual"]
        LLMCALL["LLM Client · stream + fallback"]
        TOOLLOOP["Tool Loop · max 5 hops + loop guard"]
        POST["Post-Turn · memory + decay + compounding"]
    end

    subgraph MODULES["Core Modules — 4 innovations + compounding"]
        AUDITOR["RoutingAuditor + Calibration · innovation #1"]
        MEMORY["MemoryManager · L1–L4 + FTS5"]
        DECAY["SkillDecay · innovation #2"]
        CRYSTAL["Crystallizer (+ refine) · innovation #3"]
        CONTRACTS["RoleNegotiator · innovation #4"]
        CURATOR["SkillCurator · merge/dedup (I1)"]
        FEEDBACK["SkillFeedback · promote/refine (I2/I3)"]
        USERMODEL["UserModel · dialectic profile (I5, opt-in)"]
        ACTIVITYMOD["ActivityTimeline · SkillPack"]
        COMPACTOR["ContextCompactor · token budget"]
    end

    subgraph TOOLS["Tools — 26 total, all workspace-bounded"]
        FS["Filesystem · read/write/edit/append/patch/glob/grep/list_dir · read_many"]
        EXEC["Execution · code_run · shell_run (both sandboxed)"]
        NET["Network · web_fetch · web_search · http_request (SSRF-guarded)"]
        DATA["Data/docs · db_query · json_query · pdf_read · doc_write · pdf_write"]
        DEVT["Dev/agent · git_status/diff/log · todo_write · report_blocker"]
    end

    subgraph SECURITY["Security"]
        VAULT["Vault · API keys, never in context"]
        APPROVAL["ApprovalGate · HITL + proposal queue"]
        SHIELD2["Shield · injection scan · SSRF guard"]
    end

    subgraph INFRA["Infrastructure"]
        DB["SQLite · aiosqlite · WAL · POWER()"]
        SANDBOX["Docker Sandbox · network none · read-only · non-root"]
    end

    UI --> CONVO
    UI --> AGENT
    UI --> SCHED
    CONVO --> AGENT
    SCHED --> AGENT
    SCHED --> PROPOSAL
    AGENT --> MODULES
    AGENT --> TOOLS
    AGENT --> SECURITY
    AGENT --> INFRA
    EXEC --> SANDBOX
    MODULES --> DB
    SECURITY --> DB

Full Agent Flow — One Turn

flowchart TD
    U(["User sends message via Web UI"]) --> SHIELD

    subgraph PRE["0 · Input Processing"]
        SHIELD{"Shield<br/>NFKD normalize<br/>+ danger pattern scan"}
        SHIELD -->|blocked| REJECT["Rejected"]
        SHIELD -->|clean| CORRECT
        CORRECT{"Check correction<br/>from previous turn?"}
        CORRECT -->|"yes: had_correction=1"| RESOLVE
        CORRECT -->|no| RESOLVE
        RESOLVE["SkillFeedback.resolve_previous()<br/>success &rarr; revive + promote draft (I2)<br/>corrected &rarr; reset + refine skill (I3)"]
        RESOLVE --> LOAD_SKILL
    end

    subgraph MEM["1 · Memory Loading"]
        LOAD_SKILL["Load active skills<br/>SkillDecay: score &gt; 0.3,<br/>max 8 + 1 draft trial (I2)"]
        LOAD_SKILL --> LOAD_CTX["Load memory context<br/>L1: state · L2: facts · L3: skills<br/>L4: FTS5 archive · User profile (I5, opt-in)"]
    end

    subgraph BUILD["2 · Context Building"]
        LOAD_CTX --> COMPACT["ContextCompactor.build()<br/>system + memory + history<br/>+ message, within budget"]
    end

    subgraph ROUTE["3 · Routing Decision"]
        COMPACT --> DIMS["10 dimensions scored<br/>(+ has_code_signal · query_script)"]
        DIMS --> SOUL{"soul.toml<br/>upgrade_kw hit?"}
        SOUL -->|"yes: +3 score"| PREFER
        SOUL -->|no| PREFER
        PREFER{"prefer_local?"}
        PREFER -->|"yes: threshold +1<br/>stay local longer"| LANG
        PREFER -->|"no: normal threshold"| LANG
        LANG{"language bump?<br/>(opt-in)"}
        LANG -->|"script outside local<br/>threshold -1: bump tier"| LABEL
        LANG -->|"local script / off"| LABEL
        LABEL["Complexity label (+ calibration offset)<br/>TRIVIAL &rarr; SIMPLE &rarr; MODERATE<br/>&rarr; COMPLEX &rarr; CRITICAL"]
        LABEL --> OVERRIDE{"/settings<br/>override active?"}
        OVERRIDE -->|yes| USE_OV["Use chosen model<br/>(audit still logs router decision)"]
        OVERRIDE -->|no| USE_ROUTE["Use router model<br/>(/router tier&rarr;model map)"]
    end

    subgraph AUDIT1["4 · Pre-Call Audit — innovation #1"]
        USE_OV --> LOG["Auditor.log_decision()<br/>10 dims + score + label<br/>+ model + reason &rarr; DB"]
        USE_ROUTE --> LOG
    end

    subgraph LLM["5 · LLM Call with Fallback"]
        LOG --> STREAM["LLMClient.stream_with_fallback()"]
        STREAM --> HEALTH{"Ollama health check"}
        HEALTH -->|up| PRIMARY["Try primary model"]
        HEALTH -->|down| FALL["Fallback chain"]
        PRIMARY -->|error| FALL
        FALL --> F1["1 · gemma4:e4b (local)"]
        F1 -->|error| F2["2 · deepseek-r1 (local)"]
        F2 -->|error| F3["3 · qwen3.5:9b (local)"]
        F3 -->|error| F4["4 · gemini-2.5-flash (cloud)"]
        F4 -->|error| FAIL["ProviderUnavailable"]
    end

    subgraph TOOL_LOOP["6 · Iterative Tool Loop — max 5 hops"]
        PRIMARY --> PARSE{"Tool call in stream?"}
        FALL --> PARSE
        PARSE -->|"tool_call found"| LOOPGUARD
        PARSE -->|"text only"| YIELD["Yield text to user"]
        YIELD --> DONE_CHECK{"Another tool call?"}
        DONE_CHECK -->|no| FINALIZE
        LOOPGUARD{"Same call<br/>repeated 2&times;?"}
        LOOPGUARD -->|yes| HALT["Loop halted<br/>(hard break)"]
        LOOPGUARD -->|no| ALLOWED
        HALT --> FINALIZE
        ALLOWED{"Role allowed?"}
        ALLOWED -->|no| ERR2["Tool denied"]
        ALLOWED -->|yes| APPROVAL{"requires_approval?"}
        APPROVAL -->|no| RUN_TOOL["Run tool"]
        APPROVAL -->|yes| AUTOMODE{"autopilot mode?"}
        AUTOMODE -->|"yes: queue proposal<br/>(no silent execution)"| TOOL_RESULT
        AUTOMODE -->|no| HITL{"User approves?"}
        HITL -->|reject/timeout| ERR3["Approval denied"]
        HITL -->|approve| RUN_TOOL
        ERR2 --> TOOL_RESULT
        ERR3 --> TOOL_RESULT
        RUN_TOOL --> TOOL_RESULT["Result &rarr; append to messages"]
        TOOL_RESULT --> HOP{"hop &lt; 5?"}
        HOP -->|yes| PRIMARY
        HOP -->|no| FINALIZE
    end

    subgraph POST["7 · Post-Turn Processing — throttled, non-blocking"]
        FINALIZE["Auditor.finalize()<br/>tokens, cost, latency &rarr; DB"]
        FINALIZE --> WRITE_MEM["MemoryManager<br/>L1 checkpoint · L4 archive (if threshold)"]
        WRITE_MEM --> DECAY_PASS["SkillDecay.maybe_run_decay_pass()<br/>throttled: 1&times;/hour"]
        DECAY_PASS --> RECORD["SkillFeedback.record_usage()<br/>(skills used this turn &rarr; next-turn outcome)"]
        RECORD --> CURATE["SkillCurator.maybe_run_curation_pass() (I1)<br/>merge duplicates · judge &ge;4 · revertible"]
        CURATE --> AUTOTUNE["Calibration.maybe_auto_apply() (I4)<br/>opt-in · clamp &plusmn;1 · revertible"]
        AUTOTUNE --> USERMOD["UserModel.maybe_update() (I5)<br/>opt-in · versioned"]
        USERMOD --> CRYST_CHECK{"Crystallizer<br/>should_attempt?<br/>(&ge;3 tool calls)"}
        CRYST_CHECK -->|yes| SELF_EVAL["Self-evaluate<br/>evaluator &ge; generator<br/>confidence 1–5"]
        SELF_EVAL --> STORE{"conf &ge; 4 AND<br/>no critical gaps?"}
        STORE -->|yes| ACTIVE["Store as active skill"]
        STORE -->|no| DRAFT["Store as draft<br/>(not auto-injected)"]
        CRYST_CHECK -->|no| DONE
        ACTIVE --> DONE
        DRAFT --> DONE
        DONE(["Turn complete"])
    end

    style REJECT fill:#f66,stroke:#900,color:#fff
    style FAIL fill:#f66,stroke:#900,color:#fff
    style HALT fill:#f66,stroke:#900,color:#fff
    style ACTIVE fill:#6f6,stroke:#090
    style DRAFT fill:#ff6,stroke:#990
    style DONE fill:#6cf,stroke:#069

Tools & Security

All 26 tools are workspace-bounded — every file path is resolved with Path.resolve() and rejected if it escapes the workspace root (defeats ../ and symlink escape). Tools that mutate state or run code require explicit approval.

flowchart LR
    subgraph SAFE["No approval — read-only / inspect / internal"]
        direction TB
        R1["file_read · read_many · list_dir · glob · grep"]
        R2["web_fetch · web_search · pdf_read (SSRF-guarded net)"]
        R3["memory_search · json_query · git_status/diff/log"]
        R4["todo_write · report_blocker (internal tables)"]
    end

    subgraph GATED["Requires approval — mutate / execute / reach out"]
        direction TB
        G1["file_write · file_edit · file_append · apply_patch"]
        G2["code_run · shell_run"]
        G3["http_request (SSRF-guarded) · db_query (SELECT-only)"]
        G4["doc_write · pdf_write"]
    end

    subgraph APPROVAL_GATE["ApprovalGate"]
        AG["Interactive: wait for user<br/>timeout 120s · fail-safe deny"]
        AGP["Autopilot: queue as proposal<br/>(never silent execution)"]
    end

    subgraph SANDBOX["Docker Sandbox — code_run AND shell_run"]
        direction TB
        S1["network none"]
        S2["read-only filesystem"]
        S3["non-root user"]
        S4["memory 256m · cpus 0.5"]
        S5["timeout 30s · no-new-privileges"]
    end

    GATED --> AG
    GATED -.autopilot.-> AGP
    G2 --> SANDBOX

Security note: code_run and shell_run never execute on the host — both run inside the Docker sandbox. If Docker is unavailable, they fail safe (return an error) rather than falling back to host execution. db_query is SELECT-only. web_fetch/http_request pass an anti-SSRF guard (reject loopback, private, link-local incl. cloud metadata). In autopilot mode, approval-gated tools are queued as proposals for later review — never run unattended.

The 4 Innovations — Where They Fire

flowchart LR
    subgraph TURN["One Agent Turn"]
        T1["Audit: log<br/>routing decision"] --> T2["Route: soul-aware<br/>10-dim scoring"]
        T2 --> T3["LLM call<br/>+ tool loop"]
        T3 --> T4["Audit: finalize<br/>tokens / cost / latency"]
        T4 --> T5["Decay pass<br/>(throttled)"]
        T5 --> T6["Crystallize<br/>(confidence-gated)"]
    end

    I1["#1 · Routing Audit<br/>+ Self-Calibration<br/><i>pre-call log + post-correct</i>"] -.-> T1
    I1 -.-> T4
    I2["#2 · Skill Decay<br/><i>exponential + throttle</i>"] -.-> T5
    I3["#3 · Confidence-Gated<br/>Crystallization<br/><i>eval &ge; generator</i>"] -.-> T6
    I4["#4 · Role Output<br/>Contracts<br/><i>Pydantic validated</i>"] -.-> T3

    C["Compounding (builds on #1–#3)<br/><i>I1 merge · I2 promote · I3 refine · I4 auto-tune · I5 profile</i>"] -.-> T5
    C -.-> T6

Multi-Agent Conversation

Beyond single-agent turns, roles can talk to each other. One orchestrator loop drives three pluggable strategies; each turn is a full agent run (routing, tools, memory all intact). You can stop mid-conversation or interject with your own message, counted on the next turn.

flowchart TD
    START(["User message + mode"]) --> STRAT{"Strategy"}

    STRAT -->|Pipeline| P["PM &rarr; Dev &rarr; QA<br/>sequential, contract-validated handoff"]
    STRAT -->|Debate| D["Round-robin, N rounds<br/>full transcript shared each turn"]
    STRAT -->|Orchestrator| O["Lead delegates dynamically<br/>via JSON directive each turn"]

    O --> ODYN{"Directive<br/>parseable?"}
    ODYN -->|yes| OWORK["Route to chosen worker"]
    ODYN -->|no| OFALL["Fallback: lead &rarr; all workers &rarr; synthesis"]

    P --> NEXT{"next_speaker()"}
    D --> NEXT
    OWORK --> NEXT
    OFALL --> NEXT

    NEXT -->|role| RUN["Run AgentLoop for that role<br/>(cooperative stop check between tokens)"]
    RUN --> CONTRACT{"wants_contract?"}
    CONTRACT -->|yes, valid| REC["Record handoff · validation_ok=1"]
    CONTRACT -->|yes, invalid| DEG["Degrade: keep raw text<br/>validation_ok=0, continue"]
    CONTRACT -->|no| LOOP
    REC --> LOOP
    DEG --> LOOP
    LOOP{"stopped OR<br/>max_turns OR<br/>strategy done?"}
    LOOP -->|no| NEXT
    LOOP -->|yes| END(["conversation_end"])

    NEXT -->|none| END

    style END fill:#6cf,stroke:#069
    style DEG fill:#ff6,stroke:#990

The 4 Core Innovations

1. Routing Audit + Self-Calibration

Every routing decision is logged before the LLM call with 10 dimensions (token count, tech keywords, soul upgrade hits, a language-agnostic code signal, detected script, etc.) and updated after with latency, cost, and correction signals. The /metrics dashboard shows which complexity labels have the highest correction rate — letting you tune the router with real data.

2. Skill Decay

Skills age with exponential decay (score × 0.97^days_since_used). Unused skills drop below 0.3 and get archived. A revived skill recovers score immediately. Decay runs throttled (max once per hour) so it never blocks a turn.

3. Confidence-Gated Crystallization

After a successful multi-step task, the agent evaluates its own solution using a model at least as capable as the generator (EVALUATOR_FOR map: e4b→12b, Sonnet→Sonnet). Solutions with confidence < 4/5 or critical gaps are stored as draft, not active, and never injected into future context automatically.

4. Role Output Contracts

Handoffs between roles (PM → QA → Dev) use Pydantic models as typed contracts. Invalid output is stored with validation_ok=0 for debugging — no crash, no silent data loss.


LLM Routing

The router scores 10 dimensions, then maps a complexity label to a model. Light tiers stay local (Ollama, free, private); heavy tiers escalate to a cloud model. The exact mapping is configurable. Local tiers are ordered by model capacity (harder case → more capable model); heavy tiers go to the cloud. The shipped default:

Query complexity → model selection:

TRIVIAL  → gemma4:e4b          (Ollama · local, lightest)
SIMPLE   → deepseek-r1         (Ollama · local, reasoning)
MODERATE → qwen3.5:9b          (Ollama · local, most capable)
COMPLEX  → gemini-2.5-flash    (cloud)   # or claude-haiku-4-5
CRITICAL → gemini-2.5-pro      (cloud)   # or claude-sonnet-4-6

Cloud tiers are pluggable: point them at Gemini or Claude depending on the API key you provide. The shipped default routes heavy tiers to Gemini; swap to Claude in core/router.py if you prefer. Local tiers are easy to remap too — just edit the MODELS dict.

The router is soul-aware: each role's soul.toml can define upgrade_keywords that force higher complexity, and prefer_local=true to resist escalating to the cloud. Soul upgrade keywords override prefer_local — the soul has higher priority.

If Ollama is offline, the client falls back down the chain automatically (gemma4:e4b → deepseek-r1 → qwen3.5:9b → gemini-2.5-flash). Every fallback is logged to the audit DB.


Project Structure

openclawn/
├── core/           # agent_loop · llm_client · router (multilingual) · audit · calibration
│                   # crystallizer · compactor · conversation (multi-agent)
│                   # activity (timeline) · autopilot (scheduler) · skill_pack · tool_audit
├── infra/          # config · database (WAL, POWER()) · logging · env · workspace
├── memory/         # layers (L1–L4) · skill_decay · curator (merge) · skill_feedback
│                   # user_model · search (FTS5)
├── roles/          # pm/qa/dev/data/security soul.toml · contracts (Pydantic) · registry
├── tools/          # 26 tools: file_ops · read_many · search · shell · code · web · git
│                   # document (pdf_read · doc_write · pdf_write) · todo · report_blocker
├── security/       # vault · shield (NFKD) · approval (HITL + proposal queue) · question
├── web/            # FastAPI app · HTMX templates · SSE · /activity /autopilots /skills
├── migrations/     # 001_initial.sql
└── tests/          # 30 files, 420 tests — innovations, tools, web, compounding

The 4 core innovations are stable; everything above (multi-agent, autopilots, skill compounding, skill packs) builds on them. See CHANGELOG.md for the full feature history.

(structure continued — key runtime pages)
/                  chat · single & multi-agent modes
/activity          timeline of agent actions + open blockers
/autopilots        scheduled runs + pending proposals
/skills            decay curves · crystallization · curation · skill packs
/metrics           routing calibration · tool telemetry
/conversations     multi-agent transcripts
/router · /settings  tier→model map · model override

Running Tests

pytest tests/ -v

All tests use in-memory SQLite and mocked LLM calls — no real Ollama, Gemini, or Claude API needed.


Documentation

Detailed reference for every module, class, and function:

Folder Doc
infra/ docs/infra.md — config, database, logging
core/ docs/core.md — agent loop, LLM client, router (multilingual), audit, crystallizer, calibration, conversation, activity, autopilot, skill packs
memory/ docs/memory.md — L1–L4 layers, skill decay, curator (merge), skill feedback (promote/refine), user model, FTS5 search
roles/ docs/roles.md — contracts, role registry, soul.toml format
security/ docs/security.md — vault, shield, approval gate HITL
tools/ docs/tools.md — 26 tools, permission matrix, Docker sandbox
web/ docs/web.md — FastAPI endpoints, SSE streaming
Database docs/database.md — full schema + example queries
Tests docs/tests.md — test index + patterns

Sprint Status

Sprint Focus Status
0 Infra · LLM client · Agent loop · Web UI · Audit Done
1 Soul-aware router · Memory L1–L4 · Compactor + caching Done
2 Tools · Docker sandbox · Crystallizer · Skill decay Done
3 Role contracts · Vault · Shield · ApprovalGate (HITL) Done
4 Coverage · Calibration advisor · (router tuning needs live data) Ongoing
5 Multi-agent conversation · Gemini provider · UI redesign Done
5+ Tooling to 26 (git · todo · docs · pdf · blocker) · SSRF guard · CI + uv.lock Done
5++ Autopilots (scheduled, proposal-gated) · Activity timeline · Skill packs Done
6–8 Compounding intelligence: skill curator · draft promotion · refine · guarded auto-apply · user model Done
Multilingual routing (structural + script-aware signals) Done
MCP client (external tools, approval-gated) · /health · stale-draft cleanup Done

Design Principles

  • Security firstcode_run and shell_run only run inside Docker (network none, read-only, non-root, timeout); they never touch the host. Web tools have an anti-SSRF guard; autopilots never execute approval-gated actions (they queue proposals)
  • Workspace-bounded — every file tool resolves paths and rejects escapes outside the workspace root
  • No SDK — raw httpx for all LLM calls, intentional for audit transparency
  • Token-first — context budget tracked; prompt caching on stable system blocks
  • No hardcoded domain/locale — locale via field & config, not in code (routing keywords moved out of core)
  • Gated, versioned, reversible — self-improvement (merge/refine/promote/auto-tune) always behind confidence gates, with audit trails and revert
  • Every innovation = extractable moduleskill_decay, audit, crystallizer, contracts, curator, activity have clean interfaces

Scope & Production Posture

OpenCLAWN targets single-user, self-hosted use (research/experiment phase). Several things a multi-user SaaS would need are intentionally out of scope — they are design decisions, not technical debt:

Not included Why (deliberate)
Authentication Single-user by design — you run it for yourself, behind your own machine/network
Rate limiting Only relevant when exposed to untrusted/multiple clients
PostgreSQL / horizontal scaling SQLite (WAL) is sufficient for one user; no multi-instance state
Multi-tenancy One workspace, one user

Adopting these would change the project's identity, so they are not on the roadmap unless that direction is chosen explicitly.

What "production-ready" means here (for single-user self-hosting): reliable for one person. That posture is already largely met — Docker-sandboxed execution, SSRF guard, HITL approval, fail-safe error handling, CI on every push, a /health endpoint, and stale draft-skill cleanup. Remaining polish is tracked in CHANGELOG.md.

Common review misread: OpenCLAWN is not an under-built multi-user product. shell_run and code_run run only in the Docker sandbox (never on the host); the DB is never served statically; there are no except: pass swallows; CI exists. Evaluate it as a single-user framework, not a SaaS.