SlothDB: Fast Embedded SQL DB for Analytics (Data)

SlothDB provides an embedded SQL database that operates across platforms, including laptops, servers, and web browsers. Written in C++ from scratch, it targets analytics workloads with claims of up to 5x faster performance in key areas compared to alternatives like DuckDB, as shown in its demos. The project, hosted at github.com/SouravRoy-ETL/slothdb with 507 stars, emphasizes local execution without external dependencies, keeping all data and processing on the user's machine.

Its design addresses the need for a lightweight, portable database that handles SQL queries on CSV, Parquet, and other formats without server setup. A browser-based playground loads a 1,000-row demo CSV and Parquet file for immediate testing, while desktop bindings support Python and WebAssembly via npm.

Core features

SlothDB supports standard SQL operations with extensions for local natural language querying:

Cross-platform embedding: Compiles to native binaries, Python wheels (CPython 3.8+ on Linux/macOS/Windows), and WebAssembly for browser use.
Natural language SQL generation: Prefix queries with .ask at the interactive prompt. A rules-based parser handles catalog questions, aggregates (COUNT/SUM/AVG/GROUP BY/TOP-N), and file sources in under 10 ms without models. Complex queries route to local Qwen2.5-Coder models (0.5B or 1.5B parameters, quantized to Q4_K_M, lazy-downloaded on first use).
Local AI privacy: Models total ~1.3 GB (310 MB for 0.5B, 986 MB for 1.5B) and support 29 languages including English, Chinese, Spanish, French, German, Japanese, Korean, Russian, Arabic, Portuguese, Italian, and Hindi. Generated SQL displays before execution; set SLOTHDB_ASK_CONFIRM=1 for a [Y/n] prompt. No data leaves the device.
Format support and performance: Processes CSV and Parquet with side-by-side benchmarks. A demo script generates a 100,000-row CSV and runs three queries, timing against DuckDB.
Interactive shell: Enter slothdb> prompt for direct SQL or .ask commands. Full SQL guide and Python API in docs/DOCUMENTATION.md.

The routing for .ask uses a deterministic function—no initial LLM call—covering simple aggregates via rules, open-ended SELECT/GROUP BY/filter with the 0.5B model (~~200 ms), and advanced features like window functions or joins with the 1.5B model (~~500 ms). It refuses cumulative aggregates due to engine limitations, detailed in docs/ASK.md.

Getting it running

No installation is needed for quick tests. Visit the playground at slothdb.org/playground, where a full WebAssembly build loads demo files. Users can upload their own CSVs or Parquet files, which remain local.

For Python, install via PyPI:

pip install slothdb

Run the demo:

python -c "import slothdb; slothdb.demo()"

This creates a 100,000-row CSV, executes three queries, and outputs timings versus DuckDB. Wheels are available for the latest release at github.com/SouravRoy-ETL/slothdb/releases/latest; source builds if no match exists. Enable .ask models with -DSLOTHDB_ASK_MODEL=ON during build or runtime.

In JavaScript/Node.js, use the npm package:

npm install @slothdb/wasm

The documentation covers Python APIs like connecting to databases, executing queries, and loading files. Continuous integration runs on GitHub Actions, and the project uses an MIT license.

Who this is for

Developers building analytics tools benefit from SlothDB's embedding capability—no separate database server required. Data analysts querying local files in SQL or natural language find the .ask feature useful for rapid exploration without cloud LLMs. Its browser support suits web apps needing client-side SQL on user-uploaded data, such as dashboards or report generators.

Teams prioritizing privacy use it for on-device processing, as seen in the playground where files never leave the browser. The Discord community at discord.gg/XJWyGmX5G discusses integrations. Blog posts, like one on compiling to WASM, highlight backend challenges it solves.

How it compares

SlothDB positions itself against DuckDB, showing faster query times in demos (e.g., the 60-second video and 100k-row script). Benchmarks are in the README's performance section, focusing on analytics speed. Download stats: PyPI via pepy.tech, npm via npmjs.com.

Unlike server-based databases like PostgreSQL, it avoids network overhead. Compared to SQLite, it adds columnar formats like Parquet and AI-assisted querying. DuckDB shares embedded analytics focus but lacks native browser support or the .ask tiered local AI without extra setup. SlothDB's WASM package (@slothdb/wasm) enables direct npm integration, while Python users get a single pip install.

It's lighter for simple queries via rules parser but downloads ~1.3 GB for full .ask on first use. Cumulative aggregates are not supported yet.

SlothDB suits embedded analytics needs with local AI, though users handling massive datasets or needing full SQL compliance might prefer DuckDB or established options. Source at github.com/SouravRoy-ETL/slothdb; try the playground.

SlothDB: Embedded SQL Database for Cross-Platform Analytics

Core features

Getting it running

Who this is for

How it compares

Comments

Core features

Getting it running

Who this is for

How it compares

Comments

Related Posts

criccomini/sleet runs background services for SlateDB databases

DimonSmart/DbSketch converts live databases into version-controlled schema diagrams

nexirdb/nexir-mvcc-core: A Deterministic MVCC Engine for Transactional Key-Value Storage

olyannaa/clickadvisor: Local-first ClickHouse performance advisor with 119 rules