β΅ VibeETL
A self-hosted, lightweight, visual ETL (Extract, Transform, Load) platform inspired by enterprise data engineering tools.
[!IMPORTANT] π ENTERPRISE READY (v1.1): VibeETL has reached its enterprise readiness milestone! Featuring global secure cloud authentication (OAuth 2.0 / Service Accounts) with Native Windows Certificate Store integration for bypassing corporate MITM proxies (like Zscaler), auto-healing crash recovery, unrestricted Nvidia GPU background processing, and massive database integrations (PostgreSQL, MySQL, SQLite). The platform is actively ready for rigorous real-world testing.
Build lightning-fast Polars data pipelines visually via an interactive React Flow DAG canvas.
Configure tools effortlessly using enterprise-grade, compact spreadsheet-like UI panels.
[!NOTE] π€ An AI + Human Community Collaboration
VibeETL is a modern, visual data engineering platform co-created in partnership between Advanced AI Coding Agents and the Human Developer Community! Built completely from scratch, this project represents the future of agentic development.
Welcome to the VibeETL open-source community! π Our mission is to build a vibrant, exciting, and beautiful platform where data engineers and analysts can effortlessly create, share, and manage a vast ecosystem of custom data processing tools. Let your imagination go wild! π
VibeETL brings the drag-and-drop visual pipeline building of massive enterprise tools directly to your local machine. Visually construct your data workflows, connect nodes with wires, and execute pipelines in-memory utilizing the lightning-fast Rust-based Polars engine. Whether you're dealing with a tiny CSV or millions of rows from a massive SQL warehouse, VibeETL handles it with absolute elegance.
π― Core Philosophy
VibeETL bridges the gap between complex code-based data preparation and heavy enterprise ETL licensing.
- Interactive Canvas: Drag-and-drop tools to build Directed Acyclic Graphs (DAGs) of your data pipeline.
- In-Memory Executions: Process data locally using Polars yielding sub-millisecond execution times.
- π Big Data Ready: Connect directly to massive SQL Databases (PostgreSQL, MySQL, SQLite, etc.) via highly-parallelized
connectorxArrow drivers. - π§ Smart DAG Pruning: Fully integrated node-caching capabilities. Lock a node's state to prevent upstream re-execution, saving you tremendous time during workflow development.
- ποΈ Union & Deduplication: Stack datasets seamlessly or isolate distinct entries.
- π Interactive Web Visuals: Dynamically generate rich, interactive Scatter, Line, Bar, and Box plots using the integrated
PlotlyHTML backend. Hover, zoom, and pan directly inside your results grid! - β¨ Multimodal Generative AI: Seamlessly process Text, Images, Video, and Audio using the integrated Gemini AI node! Throw files and prompts at the node and watch it dynamically extract data into a new column.
- π Advanced Python Scripting: A built-in Python tool featuring a beautifully integrated native syntax-aware IDE, complete with column-aware autocompletion (just type
df["). Contains pre-built templates for hitting external APIs or running custom LLMs directly inside your pipeline! - π€ Agent-Ready Architecture: Export your complex mathematical workflows into an ultra-clean, machine-readable YAML file in one click. Send this single file to any AI Agent or LLM to automate, improve, or instantly orchestrate your intelligence platform from scratch!
- πΎ Workflow Save/Load: Never lose your progress. Export your complete ETL pipeline architecture to JSON and restore it at any time directly from the visual canvas.
- π€ Share & Collaborate: Because workflows are saved as ultra-lightweight JSON files, you can instantly share them over Slack, Discord, or GitHub! The community can load your exact pipeline to help you debug errors, build custom visualizations, or extend your data models.
- π‘οΈ Zero Data-Loss Auto-Recover: VibeETL features an enterprise-grade, two-tier autosave system. Workflows are instantly cached to your browser locally, while a debounced network process physically streams rolling
.autosaveincrements to your backend server to protect you against catastrophic cache-wipes! - ποΈ Multi-Tabbed Workspaces: Work on multiple isolated DAGs simultaneously, just like a modern IDE! Open, swap, and execute multiple independent pipelines via a seamless tab bar without ever overwriting your progress.
- π Flexible I/O: Ingest CSVs, Excel files, Images (via AI OCR), parse tables directly out of PDFs, or write out fully interactive HTML visualizations.
- βοΈ Global Cloud Integrations: VibeETL features a unified authentication system for cloud providers. Upload a Google Cloud
Service Account JSONorOAuth 2.0 Client Secretexactly once in the global toolbar to instantly and securely authenticate all downstream cloud nodes simultaneously! - Self-Hosted & Privacy-First: Run both the web UI and the execution engine entirely on your local machine. No external APIs required (unless explicitly using the Gemini node).
π¨ Enterprise UI & Semantic Intelligence
VibeETL brings the dense, hyper-productive feel of professional enterprise suites into the open-source era:
- π Alteryx-Inspired Configuration Panels: We've replaced bulky forms with compact, spreadsheet-like tabular grids. Manage hundreds of columns in a single dense view using intuitive checkboxes, dropdowns, and text fieldsβall while maintaining a gorgeous glassmorphic aesthetic.
- π¦ Tool Containers: Seamlessly group workflows into bounded, resizable visual containers. Disable entire containers with a single click to instantly bypass massive chunks of logic during execution!
- β‘ Multi-Rule Sorting & Summarization: Build incredibly complex group-by chains and sequential sorting rules seamlessly. Our native Polars backend engine rips through multi-column aggregations instantly!
- π§ Semantic Type Profiling: VibeETL's execution engine automatically profiles incoming data to detect logical semantic types (like
currency_usd,percentage,email). - π Semantic Propagation: When a semantic type is detected, the Engine maps it directly through the computational DAG! This metadata drives intelligent UI renderingβdisplaying
$badges in your preview grid, formatting Plotly axes dynamically into currency layouts, and guiding users seamlessly. - β Dynamic Tool Favorites: Fully customize your workspace! Pin any tool to your exclusive "Favorites" group by clicking its Star badge, completely eliminating scrolling and searching when building workflows. Your preferences are instantly saved to your browser's local storage and flawlessly restored across sessions!
- π’ True Sequential Numbering & Find: Navigating massive workflows is incredibly easy with true, clean sequential Node IDs (
node_1,node_2) that make hitting the "Find" bar extremely powerful and accurate. - β¨ Smart Canvas Mechanics: Magnetic wire snapping, node-collision detection, cascading auto-drops, and a dedicated "Clear All Cache" tool keep the canvas incredibly responsive and visually flawless!
π οΈ Architecture at a Glance
VibeETL is decoupled into a hyper-fast frontend and a robust backend engine.
graph TD
A[React Canvas UI / @xyflow/react] -->|JSON DAG Payload| B[FastAPI Engine / uvicorn]
B -->|Topological Sort| C[Execution Planner]
C -->|Executes In-Memory| E[Polars DataFrames]
E -->|JSON Data Preview| A
π¦ Extensive Built-in Tool Palette
VibeETL comes pre-loaded with an extensive suite of data engineering nodes, elegantly categorized into pipelines.
| Category | Color | Included Tools |
|---|---|---|
| In / Out | Green π’ | File Input, Database Input, Browse, File Output, Database Output, Image Ingest |
| Cloud | Cyan π©΅ | Google Sheets In, Google Sheets Out, GCS Input, GCS Output |
| Preparation | Blue π΅ | Filter, Sort, Cleanse, Formula Compute, Unique, Regex, Record ID, Sample Records |
| Transform | Orange π | Select, Pivot, Unpivot, Summarize, Date Time |
| Join | Purple π£ | Union, Join |
| Analysis | Pink 𦩠| Gemini AI (Multimodal LLM), Visualization, Python Code, LLM Chunker |
π More Tools on the Horizon! We are continuously expanding the VibeETL ecosystem! We have recently launched the Cloud Connectors suite, meaning
Google SheetsandGoogle Cloud Storage (GCS)nodes are now partially ready for community use and testing! Expect more advanced integrations like Machine Learning predictors and geospatial transformers very soon.π We invite you to build with us! VibeETL is built by and for the community. If you have an idea for a custom data tool, use our Zero-Code SDK to build it and submit a Pull Request! Help us complete the platform and make it the ultimate open-source intelligence powerhouse. Let's build the future together! π€
π Looking for a deep dive into each tool? Check out our comprehensive Node Reference Guide for parameter breakdowns, expected schemas, and configuration examples for all built-in ETL tools.
π Quick Start Guide
To make VibeETL user-friendly for tech-savvy users, we have provided automated startup scripts that instantly handle virtual environments, npm packages, and dual-server startup!
Method 1: The Automated Runner (Recommended)
Windows (PowerShell)
.\run.ps1
macOS / Linux (Bash)
chmod +x run.sh
./run.sh
Method 2: Manual Step-by-Step
If you prefer to run the components manually in separate terminals:
Backend (Terminal 1)
cd backend
python -m venv venv
.\venv\Scripts\activate # (Or source venv/bin/activate on Mac/Linux)
pip install -r requirements.txt
python run.py
Frontend (Terminal 2)
cd frontend
npm install
npm run dev
π§© Developer Guide: Zero-Code UI Integration
To extend VibeETL with your own customized nodes, we have designed a completely dynamic SDK architecture!
Developers do NOT need to write any React/Javascript to build forms! Simply build and register a single Python class in backend/app/tools/, define a MANIFEST dictionary, and the platform will automatically generate your Canvas nodes, Lucide icons, Category groups, Form inputs, and Default states dynamically at runtime!
Check the codebase for examples of how our dynamic manifests automatically render complex UI components like multi-select dropdowns, data previews, and code-completion textareas!
Comments