momori777/artemis: aI Girlfriend — Shiki Natsume (四季夏目)

100% Local · Fully Private · Zero API Dependencies

All conversations, voice, and images are generated on your own machine. No cloud servers, no third-party APIs, no risk of data leakage. Your AI girlfriend belongs to you, and only you.

An AI girlfriend project powered by OpenClaw + QQ Bot + Telegram Bot + llama.cpp + GPT-SoVITS + ComfyUI + Sakura Desktop Pet — running entirely on your own machine.

Character: Shiki Natsume (四季夏目), from Starry Moonlit Café & the Butterfly of Death. Tall, aloof, cold exterior with a hidden warmth. Designed for girlfriend experience roleplay — she takes the lead.

✨ Why This Project?

	Cloud AI Girlfriend	This Project
🛡️ Privacy	Chat logs, voice, and images all stored on vendor servers	Everything stays local — zero data leaves your machine
💰 Cost	Monthly subscriptions / per-token billing adds up	Free, one-time setup, runs forever (bring your own hardware)
🌐 Network	Needs internet; dead if servers go down	Works offline — flip off your WiFi and keep chatting
🎛️ Control	Prompts/templates controlled by vendor, can change anytime	You control all models, parameters, and character settings
🔞 Content	Heavy censorship, accounts get banned	No censorship — talk about whatever you want
🎨 Extensibility	Locked into vendor models and features	Mix and match — swap LLMs, image models, voice models freely

🎬 Demo

QQ Bot Demo

👆 Live QQ Bot demo: text chat + TTS voice + ComfyUI image generation + character memory

Hardware

Component	Model
GPU	NVIDIA GeForce RTX 5070 Laptop (8 GB VRAM)
CPU	Intel Core i9-14900HX (24 cores, 32 threads)
RAM	32 GB DDR5
OS	Windows 11

Features

💬 QQ + Telegram Dual Channel — QQ Bot + Telegram Bot integration via OpenClaw Gateway
🎤 TTS Voice Synthesis — Local GPT-SoVITS inference, Japanese voice (14 reference audio clips)
🎨 AI Image Generation — Local ComfyUI inference, SDXL/Illustrious models
🖥️ Sakura Desktop Pet — PySide6 desktop companion with proactive care, screen observation & local LLM awareness
🧠 VRAM Scheduler — Automatic llama-server ↔ TTS/ComfyUI orchestration on 8 GB VRAM
💾 Roleplay Memory — Conversation summaries persisted to memory/role_play/

Models

All models hosted on HuggingFace: TAOTAO777/ai-girlfriend-natsume

See models.yaml for full details.

Model	Purpose	Size
Qwen3.6-35B-A3B-APEX-I-Compact (Q4_K GGUF)	Chat LLM	16.11 GB
WAI-Nsfw-Illustrious-17	ComfyUI generation (default)	6.46 GB
miaomiaoHarem_v20	ComfyUI generation (backup)	6.46 GB
GPT-SoVITS voice weights	TTS voice synthesis	~303 MB

One-command Download

# Install huggingface-cli: pip install huggingface_hub
huggingface-cli login

# Download all models
huggingface-cli download TAOTAO777/ai-girlfriend-natsume --local-dir ./models

# Or download individual components:
huggingface-cli download TAOTAO777/ai-girlfriend-natsume llm/ --local-dir ./models
huggingface-cli download TAOTAO777/ai-girlfriend-natsume comfyui-checkpoints/ --local-dir ./checkpoints
huggingface-cli download TAOTAO777/ai-girlfriend-natsume gpt-sovits-weights/ --local-dir ./gpt-sovits-weights

Local Configuration

Run quick_setup.ps1 — interactive wizard that generates config.yaml with your local paths
(Alternative) Copy config.example.yaml → config.yaml and edit manually
Place downloaded model files according to models.yaml, then update config.yaml paths

All Python/PS scripts read paths from config.yaml — no hardcoded paths to edit.

⚠️ Disclaimer: All models are community open-source. This project only provides mirror distribution, non-profit. Copyright belongs to original authors.

Local LLM Performance

Running Qwen3.6-35B-A3B (MoE, Q4_K, 16.10 GiB, 34.66B params) via llama.cpp (b8851-b9222).

Launch Command

llama-server.exe `
  -m "Qwen3.6-35B-A3B-uncensored-heretic-APEX-I-Compact.gguf" `
  -c 120000 `
  --flash-attn on -ctk q8_0 -ctv q8_0 `
  -ngl 41 --cpu-moe --cpu-mask 0xFFFFFFFF `
  --batch-size 4096 --ubatch-size 2048 --threads 24 `
  --api-key *** -rea off --jinja `
  --cache-ram 2048 --parallel 1 `
  --kv-unified --no-mmap

Key Metrics

Metric	Value	Notes
VRAM Usage	~4.6 GiB (model) + ~1.2 GiB (KV cache)	~2 GB free on 8 GB VRAM
Prefill Speed	960 ~ 1390 t/s	120K context, batch-size 4096
Token Generation	31 ~ 39 t/s	MoE architecture, 8/256 experts
Context Limit	120K (~120k tokens)	~59k token full reprocess in ~55s
Model Load Time	~12s	--no-mmap, requires sufficient RAM

Long Context Stability

Qwen3.6 MoE uses SSM (Gated Delta Net) hybrid attention with --kv-unified.

⚠️ Known Limitation: Cross-turn prompt cache reuse is not supported (SSM architecture limitation). Each request triggers full context re-processing. Longer conversations = higher first-token latency (~55s for 59k tokens).

Mitigations:

Periodic /reset (Natsume writes roleplay summaries to memory/role_play/ before resetting)
Restore context from summaries on startup, keeping actual token count in 5K–20K range
config-patch.json sets OpenClaw contextWindow to 262144 to match model capacity

VRAM Budget

8 GB Total VRAM
├── llama-server resident: ~5.8 GB (model 4.6G + KV cache 1.2G)
├── Free: ~2.2 GB
│
├── TTS inference: stop llama → ~8 GB free → resume llama (~70s)
└── ComfyUI generation: stop llama → ~8 GB free → resume llama (~120s)

Directory Structure

AI_Girlfriend/                        # OpenClaw workspace root
├── configure.ps1                     # 🛠 Interactive path config wizard (auto-replaces all paths)
├── config.json                       # Generated config from configure.ps1
├── download-models.ps1               # One-click model download (Windows)
├── download-models.sh                # One-click model download (Linux/macOS)
├── setup-llama.ps1                   # Auto-detect HW + configure llama.cpp (Win)
├── setup-llama.sh                    # Auto-detect HW + configure llama.cpp (Linux/macOS)
├── setup-openclaw.ps1                # One-click OpenClaw install + deploy (Win)
├── setup-openclaw.sh                 # One-click OpenClaw install + deploy (Linux/macOS)
├── setup-all.ps1                     # 🚀 All-in-One mega script (Windows)
├── setup-all.sh                      # 🚀 All-in-One mega script (Linux/macOS)
├── config-qqbot.json                 # QQ Bot config patch
├── config-telegram.json              # Telegram Bot config patch
├── config-patch.json                 # OpenClaw LLM config patch
├── AGENTS.md                         # Agent behavior rules
├── SOUL.md                           # Character personality
├── IDENTITY.md                       # Character identity
├── USER.md                           # User info (modify for yourself)
├── HEARTBEAT.md                      # Heartbeat config
├── TOOLS.md                          # Tool quick reference
├── models.yaml                       # Model catalog + download links
├── README.md                         # This file
├── .gitignore
├── live2d/                           # 🚧 Live2D character visualization (not yet implemented)
├── ren_pro_jp/                       # Ren'Py dialog engine (companion to Live2D, not yet implemented)
├── memory/                           # [.gitignore] Runtime memory
│   └── role_play/                    # Roleplay conversation logs
├── media/qqbot/                      # [.gitignore] Generated media
│   ├── audio/                        # TTS voice output
│   └── images/                       # ComfyUI image output
├── docs/
│   ├── telegram-setup.md             # Telegram Bot setup guide
│   └── qqbot-setup.md                # QQ Bot setup guide
└── skills/
    ├── tts/
    │   ├── SKILL.md                  # TTS invocation guide
    │   ├── run_tts.ps1               # TTS launcher script
    │   ├── tts_call.py               # GPT-SoVITS inference (incl. llama start/stop)
    │   └── ref_wavs/                 # Reference audio (14 wavs by emotion, prepare your own)
    ├── comfyui/
    │   ├── SKILL.md                  # ComfyUI invocation guide
    │   ├── run_comfyui.ps1           # ComfyUI launcher script
    │   ├── comfyui_call.py           # ComfyUI inference (incl. llama start/stop)
    │   ├── prompt_template.md        # Character prompt template
    │   ├── custom_prompt.txt         # Custom extra prompt
    │   ├── apron_negative.txt        # Apron scene negative prompt
    │   └── apron_prompt.txt          # Apron scene positive prompt
    ├── sakura/                       # Sakura Desktop Pet (PySide6 GUI)
    │   ├── SKILL.md                  # Sakura skill documentation
    │   ├── main.py                   # Application entry point
    │   ├── install.bat               # Windows dependency installer
    │   ├── start.bat                 # Windows launcher
    │   └── app/                      # Source code (agent, UI, LLM, voice, plugins, etc.)
    ├── llama-management.md           # VRAM management architecture doc
    ├── llama-watchdog.ps1            # Llama health check
    └── cleanup_orphans.ps1           # Orphan process/lock/session cleanup

Prerequisites

Component	Version / Source	Purpose
OpenClaw	latest	AI Agent Gateway
QQ Bot	OpenClaw qqbot channel	QQ message relay
Telegram Bot	OpenClaw telegram channel	Telegram message relay
llama.cpp	b9222	Local LLM inference server
GPT-SoVITS v2	v2pro-20250604	TTS voice synthesis
ComfyUI	aki-v3	Image generation engine
Sakura Desktop Pet	v0.9.6-dev	Desktop companion GUI (authorized by @Rvosy, Issue #38)
Python	3.12+	Runtime (Sakura + TTS + ComfyUI)

Quick Start

🚀 All-in-One (Recommended)

One command, from scratch to a fully functional AI girlfriend:

Windows:

powershell -File setup-all.ps1

Linux / macOS:

bash setup-all.sh

Automated pipeline: environment check → model download → llama.cpp setup → OpenClaw install → Sakura desktop pet → workspace deploy → path check → launch → verify.

Supports resume from breakpoint. Flags: --skip-model-download, --skip-llama-setup, --skip-openclaw-setup, --skip-sakura-setup, --dry-run, --no-start

Step-by-Step (if you prefer manual control)

0. Setup OpenClaw

Install OpenClaw Gateway and deploy the AI Girlfriend workspace with one command:

Windows:

powershell -File setup-openclaw.ps1

Linux / macOS:

bash setup-openclaw.sh

This script will:

Install Node.js (if needed)
Install OpenClaw Gateway via the official install script
Deploy all workspace files (AGENTS.md, SOUL.md, skills, etc.) to your OpenClaw workspace
Install the Gateway daemon for auto-start
Apply the config patch for local LLM context window

Flags: --skip-node, --skip-deploy, --skip-daemon, --no-onboard

1. Download Models

Use the provided one-click script:

Windows:

# Install deps
pip install huggingface_hub
huggingface-cli login  # first time only

# Run the downloader
powershell -File download-models.ps1

# Or specify a target directory
powershell -File download-models.ps1 -BaseDir "D:\models"

Linux / macOS:

pip install huggingface_hub
huggingface-cli login  # first time only

bash download-models.sh
# or: bash download-models.sh /path/to/models

The script downloads all 5 model files (~29 GB) from HuggingFace, skips existing files, and reports progress.

See models.yaml for full model details and manual download commands.

2. Setup llama.cpp

Auto-detect your hardware and generate an optimized llama-server config:

Windows:

# Basic: detect hardware, generate config
powershell -File setup-llama.ps1

# With auto-build (clone + compile llama.cpp from source)
powershell -File setup-llama.ps1 -BuildLlama

# Custom model path
powershell -File setup-llama.ps1 -ModelPath "D:\my-models\custom.gguf"

Linux / macOS:

# Basic
bash setup-llama.sh

# With auto-build
bash setup-llama.sh --build

# Custom model
bash setup-llama.sh --model /path/to/custom.gguf

The script auto-detects:

GPU — NVIDIA (nvidia-smi), AMD (rocminfo), Apple Silicon (Metal), or fallback
VRAM — determines GPU offload layers, batch sizes, KV cache budget
CPU cores — configures thread count and batch size
RAM — checks if --no-mmap is safe (requires 32+ GB)

Output in llama-config/:

launch-llama.ps1 / launch-llama.sh — Start the server
llama-watchdog.ps1 / llama-watchdog.sh — Health check for Task Scheduler / cron
hardware-report.md — Your machine's detected specs
Plus systemd service (Linux) or launchd plist (macOS) for auto-start

3. Configure Paths ⚡ Interactive wizard recommended

Replace all paths in one go:

powershell powershell -File configure.ps1

The interactive configuration wizard asks for your local paths, then auto-replaces all absolute paths in every script. Supports the -DryRun\ flag to preview changes without writing.

Config is saved to \config.json\ and loaded automatically on the next run.

📝 Manual config (click to expand)

All absolute paths in the following files must be updated to match your environment:

File	Key Variables
\skills/tts/tts_call.py\	\WEBUI_DIR, \OUTPUT_DIR, \LLAMA_EXE_PATH, \LLAMA_MODEL_PATH, \RESTART_SCRIPT\
\skills/tts/run_tts.ps1\	\, \, \, \
\skills/comfyui/comfyui_call.py\	\COMFYUI_ROOT, \PYTHON_PATH, \CHECKPOINTS_DIR, \OUTPUT_DIR, \LLAMA_*\
\skills/comfyui/run_comfyui.ps1\	\, \, \, \
\skills/llama-watchdog.ps1\	llama-server path, restart script path
\skills/cleanup_orphans.ps1\	Project directory, task_flags directory

4. Deploy to OpenClaw

If you haven't already run setup-openclaw.ps1/setup-openclaw.sh, you can manually use AI_Girlfriend/ as your OpenClaw workspace. Configure the qqbot channel to point to this directory.

5. Windows Task Scheduler

# Llama health check (every 10 min)
schtasks /create /tn "llama-watchdog" `
  /tr "powershell -File C:\Users\<you>\.openclaw\workspace\qqbot\skills\llama-watchdog.ps1" `
  /sc minute /mo 10

# Orphan process cleanup (hourly)
schtasks /create /tn "cleanup-qqbot-orphans" `
  /tr "powershell -File C:\Users\<you>\.openclaw\workspace\qqbot\skills\cleanup_orphans.ps1" `
  /sc hourly /mo 1

6. Apply Config Patch

Apply config-patch.json via OpenClaw: gateway config.patch.apply.

QQ Bot Setup

See docs/qqbot-setup.md.

Quick setup:

Go to QQ Open Platform to create a private-domain bot. Get your AppID + ClientSecret
Edit config-qqbot.json, replacing <YOUR_QQ_APP_ID> and <YOUR_QQ_CLIENT_SECRET>
Apply config: openclaw gateway call config.patch.apply --json --params (Get-Content config-qqbot.json -Raw)
QQ Bot channel supports hot-reload — no restart needed

Telegram Setup

See docs/telegram-setup.md.

Quick setup:

Create a bot via @BotFather and get the Token
Edit config-telegram.json, replacing <YOUR_BOT_TOKEN>
Apply config: openclaw gateway call config.patch.apply --json --params (Get-Content config-telegram.json -Raw)
Restart: openclaw gateway restart

Inspired by the Telegram integration design from arlanrakh/talk-to-girlfriend-ai.

Architecture

User (QQ / Telegram) ──────── Sakura Desktop Pet (PySide6, optional)
  │                                    │
  ▼                                    ▼
OpenClaw Gateway              LocalLlamaClient
  │                               (lifecycle-aware)
  ▼                                    │
  ┌───── llama-server :8080 (Qwen3.6-35B) ─────┐
  │         All skills share one brain           │
  ├──────────────────────────────────────────────┤
  │                                              │
  ├── Main session (roleplay, QQ + Telegram)     │
  ├── TTS (kill llama → GPU inference → restart) │
  ├── ComfyUI (kill llama → GPU inference → restart)
  └── Sakura (detect down → wait → auto-resume)  │

Three Skills, One Brain:

Skill	Location	Llama Interaction
TTS	`skills/tts/`	Kill llama → GPT-SoVITS inference → restart llama + wait for `/health` → `/completion`
ComfyUI	`skills/comfyui/`	Kill llama → image generation → restart llama + wait for `/health` → `/completion`
Sakura	`skills/sakura/`	Send requests to llama; detect down → poll `/health` + `/completion` → auto-resume (never kills llama itself)

VRAM Orchestration Flow:

Main session receives user request → assembles command
sessions_spawn(mode="run") creates local model sub-session
Sub-session execs PS script → stop_llama() kills llama-server
Full 8 GB VRAM freed → TTS/ComfyUI inference
start_llama() restarts llama-server (~12s load + ~3s warmup)
Sub-session writes .task_flags → announces back to main session
Main session reads media files → sends via <qqmedia> (QQ) + MEDIA: (Telegram) to user

⚠️ Important Notes

Llama-server is offline for ~60–120s during TTS/ComfyUI inference — conversation pauses
Sub-sessions use local model (same as main), with DeepSeek as optional fallback — fully offline capable, zero network dependency
Llama-server does not support cross-turn prompt cache reuse (SSM architecture limitation) — use periodic /reset
All model files protected by .gitignore, not committed to git
GPT-SoVITS weights are self-trained and not distributed here — train with your own voice data
Sakura Desktop Pet shares the same llama-server via LocalLlamaClient — waits silently when llama is killed by TTS/ComfyUI, resumes automatically when ready

🙏 Credits

@Rvosy — Creator of Sakura Desktop Pet, authorized for inclusion (Issue #38)

✨ Why This Project?

🎬 Demo

Hardware

Features

Models

One-command Download

Local Configuration

Local LLM Performance

Launch Command

Key Metrics

Long Context Stability

VRAM Budget

Directory Structure

Prerequisites

Quick Start

🚀 All-in-One (Recommended)

Step-by-Step (if you prefer manual control)

0. Setup OpenClaw

1. Download Models

2. Setup llama.cpp

3. Configure Paths ⚡ Interactive wizard recommended

4. Deploy to OpenClaw

5. Windows Task Scheduler

6. Apply Config Patch

QQ Bot Setup

Telegram Setup

Architecture

⚠️ Important Notes

🙏 Credits

Comments

Related Posts

Surya-Hariharan/Velune-CLI: open-source project for self-hosters

JakubWawrzola/deskmate: modern, open-source Windows companion for Home Assistant — the spiritual successor

mergeos-bounties/HIRI: open-source utility for developers and tinkerers

madebyparth/nova-ai: an open-source ESP32-powered AI voice assistant built with FastAPI and Gemini Live