Completely free AI video generator β Built on Agnes AI's free models, no subscription, no high-end GPU, no usage limits. Type in a text idea and automatically generate multi-scene AI videos with narration and subtitles. Supports text-to-video, image-to-video, keyframes animation, digital anchor, and more. All AI compute runs in the cloud β a regular laptop is all you need.
"The solution is not to suppress AI, but to make it a more equitable capability, so that everyone knows how to create more with AI. This is a very important vision for our company β to make world-class AI belong to everyone. What we can do may be insignificant, but this vision is very long-term and enduring."
β Bruce Yang, Founder of Agnes AI
οΏ½οΏ½οΏ½οΏ½ Official Website | π Blog (δΈζ) | π Blog (English)
π₯ Demo
1. Creative Video β No Narration
A dark-twist fairytale β The Frog Prince, 5 scenes, keyframes chaining, fully auto-generated.
2. Creative Video β With TTS Narration
Same Frog Prince story, now with AI-generated TTS narration and auto subtitles.
3. Manuscript Video β Text-to-Video
Paste a long article or script β auto-split into segments β AI video per segment β unified TTS narration + subtitles β final video.
Click to watch on Douyin
Why Agnes Video Generator?
Making AI videos today has an absurdly high barrier. Overseas services like Runway and Pika charge monthly subscriptions of tens of dollars. Domestic platforms like Jimeng and Keling charge by the second once their free quotas run out. Want to run open-source models locally? A GPU capable of video generation easily costs over ten thousand RMB. For most people who want to try AI video creation, the door is essentially closed.
We believe what Bruce Yang said β AI should be a more equitable capability. World-class AI should belong to everyone, not just those who can afford the bill.
To be honest, Agnes's video model isn't perfect yet. The generated frames are sometimes unstable, and complex actions occasionally deform. But it is completely free with no usage limits, and it iterates fast. We choose to grow with it rather than wait for a "perfect" commercial solution. If you share this mindset, then this project is for you β all you need is a free Agnes AI API key and an ordinary computer that can run Python to start creating AI videos at zero cost.
Comparison: Agnes vs. Commercial AI Video Tools
| Feature | Agnes Video Generator | Runway Gen-3 | Pika 2.0 | OpenAI Sora | Kling 1.6 |
|---|---|---|---|---|---|
| Price | Free | $15β$95/month | $10β$28/month | $20+/month (limited) | Free quota, then pay-per-second |
| Open Source | β Yes (MIT) | β No | β No | β No | β No |
| Self-Hosted | β Yes | β No | β No | β No | β No |
| Max Video Length | 20s per clip, unlimited scenes | 10s per clip | 10s per clip | 20s per clip | 10s per clip |
| Multi-Scene Pipeline | β Built-in (Creative/Manuscript) | β Manual editing | β Manual editing | β Manual editing | β Manual editing |
| AI Narration (TTS) | β Free, built-in | β Third-party | β Third-party | β Not available | β Not available |
| Auto Subtitles | β Word-level SRT | β Not available | β Not available | β Not available | β Not available |
| Digital Anchor | β Built-in | β No | β No | β No | β No |
| Resolution Options | 9:16 / 16:9 / 1:1 | Multiple | Multiple | Multiple | Multiple |
| Image-to-Video | β Yes | β Yes | β Yes | β Image inputs | β Yes |
| Keyframes Animation | β Yes | β Yes | β Yes | β Not available | β Not available |
| Local GPU Required | β No (cloud API) | β No (cloud) | β No (cloud) | β No (cloud) | β No (cloud) |
| Watermark | No watermark | Built-in watermark | Built-in watermark | C2PA metadata | Built-in watermark |
| Usage Limit | No limit (16 req/min rate limit) | Billed by compute | Billed by generation | Billed by generation | Billed by generation |
β¨ Core Features
π¬ Multiple Creation Modes
| Mode | Description | Best For |
|---|---|---|
| Simple Video | Single prompt β single AI video. Full control over all parameters (generation mode, duration, resolution, seed, negative prompt). Also supports image-to-video and keyframes mode. | Quick single-clip AI video |
| Creative Video | Full AI pipeline: idea β story β script β character reference β multi-scene video β narration β subtitles β final output. 10-step pipeline, fully automated. | Storytelling, creative videos |
| Manuscript Video | Paste a long article or script β auto-split by reading duration β per-segment AI video β unified TTS narration + subtitle overlay β final output. 5-step pipeline. | Explainers, course content, vlogs |
| Digital Anchor | AI-generated digital anchor (or upload custom image) β dynamic anchor clip β TTS narration β subtitle positioning β looped concatenation. Optional reference image for appearance consistency. | Virtual anchors, product presentations, news broadcasts |
π Completely Free AI Model Chain
All core AI capabilities are completely free β no trial period, no watermarks, no token limits:
| Capability | Model | Cost |
|---|---|---|
| Text / Script Generation | agnes-2.0-flash |
Free |
| Image Generation | agnes-image-2.1-flash |
Free |
| Video Generation | agnes-video-v2.0 |
Free |
| Text-to-Speech Narration | Edge TTS (Microsoft) | Free, no extra API key needed |
All AI API calls share a global token bucket rate limiter (16 requests/min), with automatic retries and exponential backoff to ensure stable operation.
ποΈ AI Narration & Smart Subtitles
Both Creative Video and Manuscript Video support:
- Free TTS narration: Based on Microsoft Edge TTS, offering 4 Chinese voice roles (gentle female, steady male, lively female, young male) with adjustable speech rate (-30% to +30%)
- Word-level fine-grained subtitles: SRT subtitles generated from TTS word-level timestamps, one entry every 2-3 seconds, with precise audio-video sync
- Multi-line auto-wrapping: Long subtitle text is intelligently split into two lines, preferring punctuation break points to prevent screen overflow
- Fully configurable subtitle style: Font, color, size, position (top/bottom), stroke, and semi-transparent background
- Audio-video sync strategy: All video clips are concatenated first, then audio and subtitles are overlaid as a whole, avoiding cumulative errors from per-segment overlay. TTS output is automatically amplified 2.5Γ to compensate for Edge TTS's low default volume
π¨ Flexible Creative Controls
- Custom reference images β Upload character or scene reference images to maintain visual consistency across scenes
- Custom end frames β Specify end frame images per scene for precise visual transition control
- Image-to-image end frames β Auto-generate scene end frames via img2img from your reference image
- Three video chaining modes β
keyframes(first+last frame interpolation, recommended) /ti2vid(inter-scene transition frames) /none(independent scenes) - Multiple resolutions β Portrait 9:16 (768Γ1152), Landscape 16:9 (1152Γ768), Square 1:1 (1024Γ1024)
- Flexible duration β Custom scene duration
- Smart manuscript splitting β Splits by period/question mark/exclamation mark, greedily merges into 5-12 second segments based on reading speed (~4 chars/sec), preserves long sentences, auto-merges short sentences forward
π§ Production-Grade Reliability
- Checkpoint resume β Automatically resumes from the last checkpoint after interruption; state is persisted after each step, no duplicate API calls
- Task management β Create, view, resume, and stop tasks from the Web UI
- Real-time progress β WebSocket pushes per-step generation progress (step name, status, percentage, current/total)
- Built-in CJK fonts β Project ships with Chinese fonts, no garbled characters in subtitle rendering
π€ AI Agent Friendly
Designed specifically for AI coding assistants (Claude, Cursor, QoderWork, etc.), with a complete AGENTS.md deployment guide. AI Agents can automatically:
- Check environment (Python 3.10+, ffmpeg)
- Install dependencies and start the server
- Configure API key
- Run 4-layer deployment verification (connectivity β static analysis β endpoint testing β subtitle feature)
- Execute 10-scenario regression test suite
π Multilingual Web UI
One-click launch, operate entirely in the browser. Interface available in 7 languages: δΈζ, English, Π ΡΡΡΠΊΠΈΠΉ, ζ₯ζ¬θͺ, νκ΅μ΄, Bahasa Melayu, Bahasa Indonesia.
π Quick Start
Prerequisites
- Python 3.10+
- ffmpeg (for video concatenation and audio processing)
That's it. No GPU, no large RAM, a regular laptop is all you need.
Option A: Manual Setup
Step 1 β Clone & Launch
git clone https://github.com/lcy362/agnes-video-generator.git
cd agnes-video-generator
./start.sh
The script automatically creates a virtual environment, installs dependencies, and opens http://localhost:8765 in your browser. You can also start manually:
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/python server.py
Step 2 β Configure API Key
Get a free API key from Agnes AI, then choose one of two ways:
# Way 1: Environment variable
export AGNES_API_KEY="your-api-key"
# Way 2: Via API (same as entering it in the Web UI)
curl -X POST http://localhost:8765/api/config \
-H "Content-Type: application/json" \
-d '{"api_key": "your-api-key"}'
Step 3 β Create Your First Video
Open http://localhost:8765, choose a video mode (Simple / Creative / Manuscript / Anchor), enter your idea, and click "Start Generating".
Option B: AI Agent Assisted Setup
This project is designed for AI coding assistants. First, download the code and prepare your API key:
git clone https://github.com/lcy362/agnes-video-generator.git
cd agnes-video-generator
Then tell your agent:
"Read the AGENTS.md in this project, install dependencies, configure the API key
<your-key>, and start the server."
The agent will read AGENTS.md (a comprehensive deployment guide) and handle: environment checks (Python 3.10+, ffmpeg), pip install, server launch, and API key configuration. After startup, you can also ask the agent to verify the deployment:
"Run the deployment verification checks."
The agent will execute the 4-layer checklist from AGENTS.md (connectivity β static analysis β endpoint testing β subtitle feature) and report results.
π Usage
1. Configure API Key
Enter your free Agnes AI API key at the top of the page and save it. Or set it via environment variable:
export AGNES_API_KEY="your-api-key"
2. Choose a Video Mode
Simple Video
Quick single-clip generation with full parameter control:
| Field | Description |
|---|---|
| Prompt | Describe the AI video scene in natural language |
| Generation Mode | Text-to-Video / Image-to-Video / Text+Image / Keyframes |
| Resolution | Portrait 9:16 / Landscape 16:9 / Square 1:1 |
| Duration | 5s / 10s / 15s / 18s / 20s |
| Reference Image | Optional upload for image-to-video modes |
| End Frame Image | Optional end frame for keyframes mode |
Creative Video
AI-driven multi-scene storytelling:
| Field | Description | Required |
|---|---|---|
| Idea | Describe your AI video concept | Yes |
| User Requirements | Scene count, duration, and other constraints | - |
| Visual Style | Cinematic realism, anime, cyberpunk, etc. | - |
| Chaining Mode | keyframes (recommended) / ti2vid / none | - |
| Narration | Enable/disable TTS narration, choose voice and speed | - |
| Subtitle Style | Font, color, size, position, stroke, background | - |
| Reference Image | Optional character reference for visual consistency | - |
| End Frames | Custom or auto-generated per-scene end frames | - |
Manuscript Video
Long-form text to narrated video:
| Field | Description | Required |
|---|---|---|
| Manuscript Text | Paste your full article, script, or narration | Yes |
| Resolution | Portrait / Landscape / Square | - |
| Narration | Voice role and speech rate | - |
| Subtitle Style | Full subtitle customization | - |
Note: Segment duration is auto-calculated based on text length (~4 chars/sec, 5β12s per segment) β no manual setting needed.
Digital Anchor
| Field | Description | Required |
|---|---|---|
| Anchor Script | Enter the text the anchor will say | Yes |
| Anchor Image | AI-generated or upload custom reference image | - |
| Resolution | Portrait / Landscape / Square | - |
| Narration | Voice role and speech rate | - |
| Subtitle Style | Full subtitle customization | - |
3. Click "Start Generating"
The progress panel shows real-time generation status for each step. For Creative Video: Init β Image Analysis β Story β Character Reference β Script β Narration β End Frame Prompts β End Frame Generation β Video Generation β Audio & Subtitles β Concatenation.
4. Checkpoint Resume & Task Management
If the server is interrupted, restart it and find the incomplete task in the "Task List" tab. Click "Resume" to continue from the last checkpoint. Running tasks can also be stopped and resumed later.
ποΈ Project Structure
agnes-video-generator/
βββ start.sh # One-click launch script
βββ requirements.txt # Python dependencies
βββ server.py # FastAPI server (REST + WebSocket)
βββ static/
β βββ index.html # Frontend SPA β 5 task tabs, 7 languages (Tailwind CSS)
βββ core/
β βββ config.py # API key, font resolution, default configs
β βββ screenwriter.py # Screenwriter Agent (LLM-powered story/script/narration)
β βββ task_manager.py # Task state persistence & checkpoint resume
β βββ api/
β β βββ agnes_chat.py # LLM Chat API (agnes-2.0-flash)
β β βββ agnes_image.py # Image generation API (agnes-image-2.1-flash / 2.0-flash)
β β βββ agnes_video.py # Video generation API (agnes-video-v2.0)
β β βββ rate_limiter.py # Global token bucket rate limiter (16 requests/min)
β βββ audio/
β β βββ tts.py # Edge TTS engine + silent fallback engine
β β βββ subtitle.py # SRT generation (fine-grained word-level) + overlay
β βββ compositor/
β β βββ concatenator.py # Video concatenation + audio/subtitle overlay
β β βββ processor.py # Video resize, frame extraction, freeze, silence gen
β βββ pipelines/
β βββ simple_video.py # Pipeline: Simple Video
β βββ creative_video.py # Pipeline: Creative Video (10-step)
β βββ manuscript_video.py # Pipeline: Manuscript Video (5-step)
β βββ anchor_video.py # Pipeline: Digital Anchor
βββ models/
β βββ task.py # Data models (5 task types, configs, requests)
βββ resource/
β βββ fonts/ # Built-in CJK fonts for subtitle rendering
βββ utils/
β βββ image.py # Image download / base64 conversion
β βββ video.py # Video download
βββ scripts/
β βββ regression_runner.py # 10-scenario regression test suite
βββ docs/
βββ regression_test_plan.md # Regression test plan
βββ plans-v1.0/ # v1.0 design & planning docs
βββ plans-v2.0/ # v2.0 review & optimization docs
βββ plans-v3.0/ # v3.0 feature planning docs
π§ Tech Stack
| Layer | Choice | Notes |
|---|---|---|
| Backend | Python FastAPI | Async + WebSocket |
| Frontend | HTML/CSS/JS + Tailwind CSS CDN | Zero build steps, single-file SPA |
| LLM | Agnes Chat (agnes-2.0-flash) |
Free β story, script, narration generation |
| Image AI | agnes-image-2.1-flash (t2i) / agnes-image-2.0-flash (i2i) |
Free β reference images, end frames, standalone image generation |
| Video AI | agnes-video-v2.0 |
Free β text-to-video, image-to-video, keyframes |
| TTS | Edge TTS (Microsoft) | Free β 4 Chinese voices, no extra API key needed |
| Subtitles | moviepy + srt | Fine-grained word-level SRT, multi-line wrapping |
| Video Processing | moviepy + ffmpeg | Concatenation, subtitle overlay, audio mixing |
π¬ Three AI Video Chaining Modes
| Mode | How It Works | Best For |
|---|---|---|
| keyframes | Specify first + last frame per scene; server auto-interpolates transitions | Smooth transitions (recommended) |
| ti2vid | Last frame of previous scene β img2img transition β first frame of next scene | Visual continuity between scenes |
| none | All scenes share the same reference image, independent of each other | Fast output, independent scenes |
π API Endpoints
| Method | Path | Description |
|---|---|---|
| GET | / |
Serve Web UI |
| GET | /api/config |
Get API key (masked) |
| POST | /api/config |
Save API key |
| DELETE | /api/config |
Delete API key |
| GET | /api/voices |
List available TTS voices |
| POST | /api/image/generate |
Image generation |
| GET | /api/image/{task_id} |
Query image task status |
| POST | /api/tasks/simple |
Create simple video task |
| POST | /api/tasks/creative |
Create creative video task |
| POST | /api/tasks/manuscript |
Create manuscript video task |
| POST | /api/tasks/anchor |
Create digital anchor task |
| POST | /api/tasks |
Generic task creation (backward-compatible) |
| GET | /api/tasks |
List all tasks (with type badges) |
| GET | /api/tasks/{id} |
Get task details |
| POST | /api/tasks/{id}/resume |
Resume an interrupted task |
| POST | /api/tasks/{id}/stop |
Stop a running task |
| GET | /api/video/{id} |
Download/stream final video |
| WS | /ws/{id} |
WebSocket real-time progress |
β οΈ Important Notes
This project is in early stage β corner cases may not be fully handled. Recommended workflow:
- Fill in your idea on the page and submit an AI video task
- Watch the console logs (the terminal running
server.py) and be patient - All key operations are logged for easy debugging
Log Reference
All important operations are logged to the server console:
| Prefix | Module |
|---|---|
[Startup] |
Server startup, stale task reset |
[WS] |
WebSocket connect/disconnect |
[Resume] / [Stop] |
Task resume/stop |
[Pipeline] / [Simple] / [Manuscript] |
Pipeline step execution |
[TTS] / [Subtitle] |
Audio and subtitle generation |
[Compositor] |
Video concatenation and processing |
[AgnesImage] / [AgnesVideo] / [AgnesChat] |
AI API calls |
[RateLimiter] |
Global rate limiter |
[TaskManager] |
Task state persistence |
[Screenwriter] |
Screenwriter Agent |
Output Directory
All AI video task artifacts are stored under .working_dir/{timestamp}_{task_id}/:
.working_dir/{timestamp}_{task_id}/
βββ task_state.json # Task state (required for checkpoint resume)
βββ final_video.mp4 # Final video with narration + subtitles
βββ story.txt # AI-generated story (creative mode)
βββ script.json # Scene script (JSON format)
βββ narration.mp3 # Combined TTS narration audio
βββ narration.srt # Combined subtitle file
βββ scene_0/
β βββ video.mp4 # Scene 0 AI video
β βββ end_frame.png # Scene 0 end frame
β βββ task.json # Video generation task ID
βββ scene_1/
β βββ ...
βββ scene_2/
βββ ...
π Acknowledgments
This project is built upon the following open-source projects:
- ViMax β AI video generation framework by HKU Data Science Lab
- vimax-agnes β Agnes AI adaptation based on ViMax
Special thanks to Agnes AI for providing completely free, high-quality AI model APIs (text, image, and video generation) β this project runs at absolute zero cost thanks to their generosity.
Feedback & Contributing
Bug reports and feature suggestions are welcome via GitHub Issues.
π License
MIT
β FAQ
Is Agnes Video Generator really free? Are there any hidden costs?
Yes, it is completely free. All AI model calls (Agnes Chat, Agnes Image, Agnes Video) are free of charge with no trial period, no watermarks, and no usage limits. The only TTS integration (Microsoft Edge TTS) is also free and requires no extra API key. You only need a free API key from Agnes AI to get started.
Do I need a GPU to run this AI video generator?
No. All AI compute runs in the cloud via Agnes AI's free API. You just need a regular laptop or desktop computer that can run Python 3.10+ and ffmpeg. No GPU, no high RAM, no special hardware required.
How is this different from Runway, Pika, or Sora?
Unlike commercial AI video tools that charge $10β$95/month, Agnes Video Generator is completely free and open-source (MIT). It offers built-in multi-scene pipelines, AI narration, auto subtitles, and digital anchor β features that require third-party tools or manual editing elsewhere. See the comparison table above for details.
What video generation modes are supported?
Four modes: Simple Video (single prompt, full parameter control), Creative Video (AI story β multi-scene video with narration), Manuscript Video (long text β auto-split β narrated video), and Digital Anchor (AI anchor with TTS). Additional options include text-to-video, image-to-video, keyframes animation, and image-to-image end frame generation.
Can I use my own images as references?
Yes. You can upload reference images for character or scene consistency across scenes, use custom end frames for precise visual transitions, or choose img2img to auto-generate end frames from your reference. Reference images are supported in both Creative Video and Digital Anchor modes.
What languages does the UI support?
The Web UI supports 7 languages: δΈζ, English, Π ΡΡΡΠΊΠΈΠΉ, ζ₯ζ¬θͺ, νκ΅μ΄, Bahasa Melayu, and Bahasa Indonesia. Subtitles are generated in the source text language with CJK font support built-in.
Can I host this on my own server?
Absolutely. The project is designed for self-hosting. Just clone the repo, run ./start.sh, and the server starts on http://localhost:8765. No external dependencies, no cloud lock-in. See the Quick Start section above.
How do I get help or report issues?
Check the GitHub Issues page for existing reports or open a new one. The project also includes a comprehensive AGENTS.md for AI-agent-assisted debugging. For feature requests, bug reports, or questions, the Issues page is the best place.
Keywords: free AI video generator, AI video generation tool, text to video AI, free AI video maker, AI video creator, open source video generator, Agnes AI, text-to-video, image-to-video, keyframes video, AI narration, auto subtitles, multi-scene video, zero cost AI video, no subscription AI video tool, digital anchor, self-hosted AI video generator, open source alternative to Runway
Comments