Completely free AI video generator β€” Built on Agnes AI's free models, no subscription, no high-end GPU, no usage limits. Type in a text idea and automatically generate multi-scene AI videos with narration and subtitles. Supports text-to-video, image-to-video, keyframes animation, digital anchor, and more. All AI compute runs in the cloud β€” a regular laptop is all you need.

"The solution is not to suppress AI, but to make it a more equitable capability, so that everyone knows how to create more with AI. This is a very important vision for our company β€” to make world-class AI belong to everyone. What we can do may be insignificant, but this vision is very long-term and enduring."

β€” Bruce Yang, Founder of Agnes AI

οΏ½οΏ½οΏ½οΏ½ Official Website | πŸ“ Blog (δΈ­ζ–‡) | πŸ“ Blog (English)

πŸŽ₯ Demo

1. Creative Video β€” No Narration

A dark-twist fairytale β€” The Frog Prince, 5 scenes, keyframes chaining, fully auto-generated.

2. Creative Video β€” With TTS Narration

Same Frog Prince story, now with AI-generated TTS narration and auto subtitles.

3. Manuscript Video β€” Text-to-Video

Paste a long article or script β†’ auto-split into segments β†’ AI video per segment β†’ unified TTS narration + subtitles β†’ final video.

Click to watch on Douyin

Why Agnes Video Generator?

Making AI videos today has an absurdly high barrier. Overseas services like Runway and Pika charge monthly subscriptions of tens of dollars. Domestic platforms like Jimeng and Keling charge by the second once their free quotas run out. Want to run open-source models locally? A GPU capable of video generation easily costs over ten thousand RMB. For most people who want to try AI video creation, the door is essentially closed.

We believe what Bruce Yang said β€” AI should be a more equitable capability. World-class AI should belong to everyone, not just those who can afford the bill.

To be honest, Agnes's video model isn't perfect yet. The generated frames are sometimes unstable, and complex actions occasionally deform. But it is completely free with no usage limits, and it iterates fast. We choose to grow with it rather than wait for a "perfect" commercial solution. If you share this mindset, then this project is for you β€” all you need is a free Agnes AI API key and an ordinary computer that can run Python to start creating AI videos at zero cost.

Comparison: Agnes vs. Commercial AI Video Tools

Feature Agnes Video Generator Runway Gen-3 Pika 2.0 OpenAI Sora Kling 1.6
Price Free $15–$95/month $10–$28/month $20+/month (limited) Free quota, then pay-per-second
Open Source βœ… Yes (MIT) ❌ No ❌ No ❌ No ❌ No
Self-Hosted βœ… Yes ❌ No ❌ No ❌ No ❌ No
Max Video Length 20s per clip, unlimited scenes 10s per clip 10s per clip 20s per clip 10s per clip
Multi-Scene Pipeline βœ… Built-in (Creative/Manuscript) ❌ Manual editing ❌ Manual editing ❌ Manual editing ❌ Manual editing
AI Narration (TTS) βœ… Free, built-in ❌ Third-party ❌ Third-party ❌ Not available ❌ Not available
Auto Subtitles βœ… Word-level SRT ❌ Not available ❌ Not available ❌ Not available ❌ Not available
Digital Anchor βœ… Built-in ❌ No ❌ No ❌ No ❌ No
Resolution Options 9:16 / 16:9 / 1:1 Multiple Multiple Multiple Multiple
Image-to-Video βœ… Yes βœ… Yes βœ… Yes βœ… Image inputs βœ… Yes
Keyframes Animation βœ… Yes βœ… Yes βœ… Yes ❌ Not available ❌ Not available
Local GPU Required ❌ No (cloud API) ❌ No (cloud) ❌ No (cloud) ❌ No (cloud) ❌ No (cloud)
Watermark No watermark Built-in watermark Built-in watermark C2PA metadata Built-in watermark
Usage Limit No limit (16 req/min rate limit) Billed by compute Billed by generation Billed by generation Billed by generation

✨ Core Features

🎬 Multiple Creation Modes

Mode Description Best For
Simple Video Single prompt β†’ single AI video. Full control over all parameters (generation mode, duration, resolution, seed, negative prompt). Also supports image-to-video and keyframes mode. Quick single-clip AI video
Creative Video Full AI pipeline: idea β†’ story β†’ script β†’ character reference β†’ multi-scene video β†’ narration β†’ subtitles β†’ final output. 10-step pipeline, fully automated. Storytelling, creative videos
Manuscript Video Paste a long article or script β†’ auto-split by reading duration β†’ per-segment AI video β†’ unified TTS narration + subtitle overlay β†’ final output. 5-step pipeline. Explainers, course content, vlogs
Digital Anchor AI-generated digital anchor (or upload custom image) β†’ dynamic anchor clip β†’ TTS narration β†’ subtitle positioning β†’ looped concatenation. Optional reference image for appearance consistency. Virtual anchors, product presentations, news broadcasts

πŸ†“ Completely Free AI Model Chain

All core AI capabilities are completely free β€” no trial period, no watermarks, no token limits:

Capability Model Cost
Text / Script Generation agnes-2.0-flash Free
Image Generation agnes-image-2.1-flash Free
Video Generation agnes-video-v2.0 Free
Text-to-Speech Narration Edge TTS (Microsoft) Free, no extra API key needed

All AI API calls share a global token bucket rate limiter (16 requests/min), with automatic retries and exponential backoff to ensure stable operation.

πŸŽ™οΈ AI Narration & Smart Subtitles

Both Creative Video and Manuscript Video support:

  • Free TTS narration: Based on Microsoft Edge TTS, offering 4 Chinese voice roles (gentle female, steady male, lively female, young male) with adjustable speech rate (-30% to +30%)
  • Word-level fine-grained subtitles: SRT subtitles generated from TTS word-level timestamps, one entry every 2-3 seconds, with precise audio-video sync
  • Multi-line auto-wrapping: Long subtitle text is intelligently split into two lines, preferring punctuation break points to prevent screen overflow
  • Fully configurable subtitle style: Font, color, size, position (top/bottom), stroke, and semi-transparent background
  • Audio-video sync strategy: All video clips are concatenated first, then audio and subtitles are overlaid as a whole, avoiding cumulative errors from per-segment overlay. TTS output is automatically amplified 2.5Γ— to compensate for Edge TTS's low default volume

🎨 Flexible Creative Controls

  • Custom reference images β€” Upload character or scene reference images to maintain visual consistency across scenes
  • Custom end frames β€” Specify end frame images per scene for precise visual transition control
  • Image-to-image end frames β€” Auto-generate scene end frames via img2img from your reference image
  • Three video chaining modes β€” keyframes (first+last frame interpolation, recommended) / ti2vid (inter-scene transition frames) / none (independent scenes)
  • Multiple resolutions β€” Portrait 9:16 (768Γ—1152), Landscape 16:9 (1152Γ—768), Square 1:1 (1024Γ—1024)
  • Flexible duration β€” Custom scene duration
  • Smart manuscript splitting β€” Splits by period/question mark/exclamation mark, greedily merges into 5-12 second segments based on reading speed (~4 chars/sec), preserves long sentences, auto-merges short sentences forward

πŸ”§ Production-Grade Reliability

  • Checkpoint resume β€” Automatically resumes from the last checkpoint after interruption; state is persisted after each step, no duplicate API calls
  • Task management β€” Create, view, resume, and stop tasks from the Web UI
  • Real-time progress β€” WebSocket pushes per-step generation progress (step name, status, percentage, current/total)
  • Built-in CJK fonts β€” Project ships with Chinese fonts, no garbled characters in subtitle rendering

πŸ€– AI Agent Friendly

Designed specifically for AI coding assistants (Claude, Cursor, QoderWork, etc.), with a complete AGENTS.md deployment guide. AI Agents can automatically:

  • Check environment (Python 3.10+, ffmpeg)
  • Install dependencies and start the server
  • Configure API key
  • Run 4-layer deployment verification (connectivity β†’ static analysis β†’ endpoint testing β†’ subtitle feature)
  • Execute 10-scenario regression test suite

🌐 Multilingual Web UI

One-click launch, operate entirely in the browser. Interface available in 7 languages: δΈ­ζ–‡, English, Русский, ζ—₯本θͺž, ν•œκ΅­μ–΄, Bahasa Melayu, Bahasa Indonesia.

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • ffmpeg (for video concatenation and audio processing)

That's it. No GPU, no large RAM, a regular laptop is all you need.

Option A: Manual Setup

Step 1 β€” Clone & Launch

git clone https://github.com/lcy362/agnes-video-generator.git
cd agnes-video-generator
./start.sh

The script automatically creates a virtual environment, installs dependencies, and opens http://localhost:8765 in your browser. You can also start manually:

python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/python server.py

Step 2 β€” Configure API Key

Get a free API key from Agnes AI, then choose one of two ways:

# Way 1: Environment variable
export AGNES_API_KEY="your-api-key"

# Way 2: Via API (same as entering it in the Web UI)
curl -X POST http://localhost:8765/api/config \
  -H "Content-Type: application/json" \
  -d '{"api_key": "your-api-key"}'

Step 3 β€” Create Your First Video

Open http://localhost:8765, choose a video mode (Simple / Creative / Manuscript / Anchor), enter your idea, and click "Start Generating".

Option B: AI Agent Assisted Setup

This project is designed for AI coding assistants. First, download the code and prepare your API key:

git clone https://github.com/lcy362/agnes-video-generator.git
cd agnes-video-generator

Then tell your agent:

"Read the AGENTS.md in this project, install dependencies, configure the API key <your-key>, and start the server."

The agent will read AGENTS.md (a comprehensive deployment guide) and handle: environment checks (Python 3.10+, ffmpeg), pip install, server launch, and API key configuration. After startup, you can also ask the agent to verify the deployment:

"Run the deployment verification checks."

The agent will execute the 4-layer checklist from AGENTS.md (connectivity β†’ static analysis β†’ endpoint testing β†’ subtitle feature) and report results.

πŸ“– Usage

1. Configure API Key

Enter your free Agnes AI API key at the top of the page and save it. Or set it via environment variable:

export AGNES_API_KEY="your-api-key"

2. Choose a Video Mode

Simple Video

Quick single-clip generation with full parameter control:

Field Description
Prompt Describe the AI video scene in natural language
Generation Mode Text-to-Video / Image-to-Video / Text+Image / Keyframes
Resolution Portrait 9:16 / Landscape 16:9 / Square 1:1
Duration 5s / 10s / 15s / 18s / 20s
Reference Image Optional upload for image-to-video modes
End Frame Image Optional end frame for keyframes mode

Creative Video

AI-driven multi-scene storytelling:

Field Description Required
Idea Describe your AI video concept Yes
User Requirements Scene count, duration, and other constraints -
Visual Style Cinematic realism, anime, cyberpunk, etc. -
Chaining Mode keyframes (recommended) / ti2vid / none -
Narration Enable/disable TTS narration, choose voice and speed -
Subtitle Style Font, color, size, position, stroke, background -
Reference Image Optional character reference for visual consistency -
End Frames Custom or auto-generated per-scene end frames -

Manuscript Video

Long-form text to narrated video:

Field Description Required
Manuscript Text Paste your full article, script, or narration Yes
Resolution Portrait / Landscape / Square -
Narration Voice role and speech rate -
Subtitle Style Full subtitle customization -

Note: Segment duration is auto-calculated based on text length (~4 chars/sec, 5–12s per segment) β€” no manual setting needed.

Digital Anchor

Field Description Required
Anchor Script Enter the text the anchor will say Yes
Anchor Image AI-generated or upload custom reference image -
Resolution Portrait / Landscape / Square -
Narration Voice role and speech rate -
Subtitle Style Full subtitle customization -

3. Click "Start Generating"

The progress panel shows real-time generation status for each step. For Creative Video: Init β†’ Image Analysis β†’ Story β†’ Character Reference β†’ Script β†’ Narration β†’ End Frame Prompts β†’ End Frame Generation β†’ Video Generation β†’ Audio & Subtitles β†’ Concatenation.

4. Checkpoint Resume & Task Management

If the server is interrupted, restart it and find the incomplete task in the "Task List" tab. Click "Resume" to continue from the last checkpoint. Running tasks can also be stopped and resumed later.

πŸ—οΈ Project Structure

agnes-video-generator/
β”œβ”€β”€ start.sh                          # One-click launch script
β”œβ”€β”€ requirements.txt                  # Python dependencies
β”œβ”€β”€ server.py                         # FastAPI server (REST + WebSocket)
β”œβ”€β”€ static/
β”‚   └── index.html                    # Frontend SPA β€” 5 task tabs, 7 languages (Tailwind CSS)
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ config.py                     # API key, font resolution, default configs
β”‚   β”œβ”€β”€ screenwriter.py               # Screenwriter Agent (LLM-powered story/script/narration)
β”‚   β”œβ”€β”€ task_manager.py               # Task state persistence & checkpoint resume
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ agnes_chat.py             # LLM Chat API (agnes-2.0-flash)
β”‚   β”‚   β”œβ”€β”€ agnes_image.py            # Image generation API (agnes-image-2.1-flash / 2.0-flash)
β”‚   β”‚   β”œβ”€β”€ agnes_video.py            # Video generation API (agnes-video-v2.0)
β”‚   β”‚   └── rate_limiter.py           # Global token bucket rate limiter (16 requests/min)
β”‚   β”œβ”€β”€ audio/
β”‚   β”‚   β”œβ”€β”€ tts.py                    # Edge TTS engine + silent fallback engine
β”‚   β”‚   └── subtitle.py               # SRT generation (fine-grained word-level) + overlay
β”‚   β”œβ”€β”€ compositor/
β”‚   β”‚   β”œβ”€β”€ concatenator.py           # Video concatenation + audio/subtitle overlay
β”‚   β”‚   └── processor.py              # Video resize, frame extraction, freeze, silence gen
β”‚   └── pipelines/
β”‚       β”œβ”€β”€ simple_video.py           # Pipeline: Simple Video
β”‚       β”œβ”€β”€ creative_video.py         # Pipeline: Creative Video (10-step)
β”‚       β”œβ”€β”€ manuscript_video.py       # Pipeline: Manuscript Video (5-step)
β”‚       └── anchor_video.py           # Pipeline: Digital Anchor
β”œβ”€β”€ models/
β”‚   └── task.py                       # Data models (5 task types, configs, requests)
β”œβ”€β”€ resource/
β”‚   └── fonts/                        # Built-in CJK fonts for subtitle rendering
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ image.py                      # Image download / base64 conversion
β”‚   └── video.py                      # Video download
β”œβ”€β”€ scripts/
β”‚   └── regression_runner.py          # 10-scenario regression test suite
└── docs/
    β”œβ”€β”€ regression_test_plan.md       # Regression test plan
    β”œβ”€β”€ plans-v1.0/                   # v1.0 design & planning docs
    β”œβ”€β”€ plans-v2.0/                   # v2.0 review & optimization docs
    └── plans-v3.0/                   # v3.0 feature planning docs

πŸ”§ Tech Stack

Layer Choice Notes
Backend Python FastAPI Async + WebSocket
Frontend HTML/CSS/JS + Tailwind CSS CDN Zero build steps, single-file SPA
LLM Agnes Chat (agnes-2.0-flash) Free β€” story, script, narration generation
Image AI agnes-image-2.1-flash (t2i) / agnes-image-2.0-flash (i2i) Free β€” reference images, end frames, standalone image generation
Video AI agnes-video-v2.0 Free β€” text-to-video, image-to-video, keyframes
TTS Edge TTS (Microsoft) Free β€” 4 Chinese voices, no extra API key needed
Subtitles moviepy + srt Fine-grained word-level SRT, multi-line wrapping
Video Processing moviepy + ffmpeg Concatenation, subtitle overlay, audio mixing

🎬 Three AI Video Chaining Modes

Mode How It Works Best For
keyframes Specify first + last frame per scene; server auto-interpolates transitions Smooth transitions (recommended)
ti2vid Last frame of previous scene β†’ img2img transition β†’ first frame of next scene Visual continuity between scenes
none All scenes share the same reference image, independent of each other Fast output, independent scenes

πŸ“‹ API Endpoints

Method Path Description
GET / Serve Web UI
GET /api/config Get API key (masked)
POST /api/config Save API key
DELETE /api/config Delete API key
GET /api/voices List available TTS voices
POST /api/image/generate Image generation
GET /api/image/{task_id} Query image task status
POST /api/tasks/simple Create simple video task
POST /api/tasks/creative Create creative video task
POST /api/tasks/manuscript Create manuscript video task
POST /api/tasks/anchor Create digital anchor task
POST /api/tasks Generic task creation (backward-compatible)
GET /api/tasks List all tasks (with type badges)
GET /api/tasks/{id} Get task details
POST /api/tasks/{id}/resume Resume an interrupted task
POST /api/tasks/{id}/stop Stop a running task
GET /api/video/{id} Download/stream final video
WS /ws/{id} WebSocket real-time progress

⚠️ Important Notes

This project is in early stage β€” corner cases may not be fully handled. Recommended workflow:

  1. Fill in your idea on the page and submit an AI video task
  2. Watch the console logs (the terminal running server.py) and be patient
  3. All key operations are logged for easy debugging

Log Reference

All important operations are logged to the server console:

Prefix Module
[Startup] Server startup, stale task reset
[WS] WebSocket connect/disconnect
[Resume] / [Stop] Task resume/stop
[Pipeline] / [Simple] / [Manuscript] Pipeline step execution
[TTS] / [Subtitle] Audio and subtitle generation
[Compositor] Video concatenation and processing
[AgnesImage] / [AgnesVideo] / [AgnesChat] AI API calls
[RateLimiter] Global rate limiter
[TaskManager] Task state persistence
[Screenwriter] Screenwriter Agent

Output Directory

All AI video task artifacts are stored under .working_dir/{timestamp}_{task_id}/:

.working_dir/{timestamp}_{task_id}/
β”œβ”€β”€ task_state.json              # Task state (required for checkpoint resume)
β”œβ”€β”€ final_video.mp4              # Final video with narration + subtitles
β”œβ”€β”€ story.txt                    # AI-generated story (creative mode)
β”œβ”€β”€ script.json                  # Scene script (JSON format)
β”œβ”€β”€ narration.mp3                # Combined TTS narration audio
β”œβ”€β”€ narration.srt                # Combined subtitle file
β”œβ”€β”€ scene_0/
β”‚   β”œβ”€β”€ video.mp4                # Scene 0 AI video
β”‚   β”œβ”€β”€ end_frame.png            # Scene 0 end frame
β”‚   └── task.json                # Video generation task ID
β”œβ”€β”€ scene_1/
β”‚   └── ...
└── scene_2/
    └── ...

πŸ™ Acknowledgments

This project is built upon the following open-source projects:

Special thanks to Agnes AI for providing completely free, high-quality AI model APIs (text, image, and video generation) β€” this project runs at absolute zero cost thanks to their generosity.

Feedback & Contributing

Bug reports and feature suggestions are welcome via GitHub Issues.

πŸ“„ License

MIT


❓ FAQ

Is Agnes Video Generator really free? Are there any hidden costs?

Yes, it is completely free. All AI model calls (Agnes Chat, Agnes Image, Agnes Video) are free of charge with no trial period, no watermarks, and no usage limits. The only TTS integration (Microsoft Edge TTS) is also free and requires no extra API key. You only need a free API key from Agnes AI to get started.

Do I need a GPU to run this AI video generator?

No. All AI compute runs in the cloud via Agnes AI's free API. You just need a regular laptop or desktop computer that can run Python 3.10+ and ffmpeg. No GPU, no high RAM, no special hardware required.

How is this different from Runway, Pika, or Sora?

Unlike commercial AI video tools that charge $10–$95/month, Agnes Video Generator is completely free and open-source (MIT). It offers built-in multi-scene pipelines, AI narration, auto subtitles, and digital anchor β€” features that require third-party tools or manual editing elsewhere. See the comparison table above for details.

What video generation modes are supported?

Four modes: Simple Video (single prompt, full parameter control), Creative Video (AI story β†’ multi-scene video with narration), Manuscript Video (long text β†’ auto-split β†’ narrated video), and Digital Anchor (AI anchor with TTS). Additional options include text-to-video, image-to-video, keyframes animation, and image-to-image end frame generation.

Can I use my own images as references?

Yes. You can upload reference images for character or scene consistency across scenes, use custom end frames for precise visual transitions, or choose img2img to auto-generate end frames from your reference. Reference images are supported in both Creative Video and Digital Anchor modes.

What languages does the UI support?

The Web UI supports 7 languages: δΈ­ζ–‡, English, Русский, ζ—₯本θͺž, ν•œκ΅­μ–΄, Bahasa Melayu, and Bahasa Indonesia. Subtitles are generated in the source text language with CJK font support built-in.

Can I host this on my own server?

Absolutely. The project is designed for self-hosting. Just clone the repo, run ./start.sh, and the server starts on http://localhost:8765. No external dependencies, no cloud lock-in. See the Quick Start section above.

How do I get help or report issues?

Check the GitHub Issues page for existing reports or open a new one. The project also includes a comprehensive AGENTS.md for AI-agent-assisted debugging. For feature requests, bug reports, or questions, the Issues page is the best place.

Keywords: free AI video generator, AI video generation tool, text to video AI, free AI video maker, AI video creator, open source video generator, Agnes AI, text-to-video, image-to-video, keyframes video, AI narration, auto subtitles, multi-scene video, zero cost AI video, no subscription AI video tool, digital anchor, self-hosted AI video generator, open source alternative to Runway