openflipbook is an open-source implementation of the interactive flipbook concept popularized by flipbook.page — a site where each page is a standalone AI-generated image, and clicking any region triggers a new image that zooms into or expands on that detail. Unlike the original, which runs on a closed, hosted service, openflipbook is designed to be self-hosted and modular. It replaces vendor-locked inference with swappable, API-key-driven components: image generation, vision-language understanding, planning, and video rendering. The project’s stated goal is not to replicate flipbook.page’s UI exactly, but to prove the same “image-is-the-UI” paradigm can run entirely on infrastructure the user controls — from Modal for backend compute to R2 for asset storage and MongoDB for state.
What it does
- One image per page, rendered by
fal-ai/nano-banana(a lightweight Gemini 2.5 Flash Image model) — text inside images is rasterized, not HTML text. - Click-to-explore navigation: tapping any region sends coordinates and the image to
qwen/qwen-2.5-vl-72b-instruct(via OpenRouter) to extract a subject phrase, then toqwen/qwen-2.5-72b-instruct:onlinefor planning and grounding with web search. - Image seeding: users can start from a text prompt or upload/drag-and-drop any image as the first page.
- Optional animation: a static 5-second MP4 is generated per page using
fal-ai/ltx-video/image-to-video, or — for streaming — the LTXF binary can be deployed to Modal and served via Media Source Extensions. - Permalinks: each page gets a
/n/:idroute that loads from MongoDB and Cloudflare R2 without regeneration.
Getting it running
The project uses a monorepo structure with two main apps: apps/web (Next.js 15 frontend) and apps/modal-backend (FastAPI backend deployed to Modal). Node.js ≥20 is required. The README does not provide a Docker Compose setup or single-command deploy — instead, it expects users to configure and connect external services.
To run locally, start with the web frontend:
cd apps/web
pnpm install
pnpm dev
This launches the Next.js dev server on http://localhost:3000.
The backend runs on Modal and requires configuration:
- Set environment variables including
OPENROUTER_API_KEY,FAL_KEY,MONGODB_URI, andCLOUDFLARE_R2_BUCKET. - Deploy the Modal app with:
cd apps/modal-backend
modal deploy
Modal CLI must be authenticated and linked to a paid account — openflipbook does not include a local FastAPI dev server or SQLite fallback. There is no docker build or pip install -e . workflow documented. The project assumes familiarity with Modal’s deployment model and expects users to provision their own R2 bucket and MongoDB instance.
Who this is for
This project suits developers comfortable managing API keys, configuring cloud infrastructure, and swapping AI model providers. It’s not a plug-and-play app for casual users. The README explicitly notes it’s built for people who want to “swap pieces around”: replacing nano-banana with another image model, or Qwen-VL with Gemini, requires editing code in apps/modal-backend/providers/. The interfaces there are intentionally narrow, but the trade-off is that users must understand how each provider’s API works and handle rate limits, cost tracking, and authentication themselves. It’s a toolkit for experimentation — for instance, testing how different vision models interpret clicks on the same image, or comparing latency and fidelity across video generation services. If you want an AI flipbook without managing infrastructure, this is not the tool.
How it comparesflipbook.page remains the reference implementation: it’s polished, fast, and requires no setup — but it’s closed, offers no export or data control, and gives no insight into how clicks are resolved or pages generated. openflipbook is its open counterpart, licensed under MIT, with full source visibility and extensibility. Compared to other open generative UIs like V0 (by Vercel) or Cursor’s playground, openflipbook is narrowly scoped — it does only the image-to-image exploration loop, nothing else. It lacks built-in user accounts, sharing controls, or collaborative editing. It’s lighter than full-stack AI canvas tools like Suno or Runway, but heavier than simple image-generation frontends like Ollama WebUI, which don’t support click-driven navigation or vision models. Its architecture — Next.js frontend, FastAPI backend, Modal-hosted inference — mirrors patterns used in production AI apps, but the project size (37 GitHub stars, one primary contributor) reflects its role as a proof-of-concept rather than a production-ready platform.
openflipbook is MIT-licensed, written in TypeScript, and hosted at https://github.com/eren23/openflipbook.
Comments