Paro: Multi-Model Rust DB for AI Agents (AI & ML)

Paro is a multi-model database written in Rust, designed to handle relational, vector, full-text, graph, and sandboxed Python workloads within a single SQL engine. Hosted on GitHub at zunor/paro with 102 stars, it targets AI-native applications, particularly those involving agents. Instead of requiring separate services for different data types—like a vector database alongside a graph store—Paro processes them in one query. A single SQL statement can traverse a graph, perform vector similarity searches, rank by full-text relevance, and execute Python user-defined functions (UDFs), all from a consistent snapshot under serializable isolation.

This approach eliminates sidecar services and cross-system integration code. Developers building agentic systems, such as those for research or retrieval-augmented generation, avoid stitching together tools like PostgreSQL for SQL, pgvector for embeddings, or separate graph databases. Paro runs concurrent writes with MVCC snapshot reads, lock-free where possible, and supports predicate locks or SELECT ... FOR UPDATE as needed. Note that it remains in beta, with ongoing work on driver compatibility, performance, and on-disk format stability—suitable for evaluation and prototyping, not production yet.

Core features

Paro stores structured and semi-structured data in a columnar layout, optimized for scan-heavy analytical tasks. It uses SIMD instructions for vectorized compute on distance functions and expressions, leveraging hardware parallelism.

Key capabilities include:

Vector search with pgvector operators like <-> (L2 distance), <+> (inner product), <=> (cosine distance), and <#> (negative inner product), backed by HNSW indexing.
Full-text search via GIN indexes, to_tsvector, plainto_tsquery, ts_rank, and BM25 ranking.
Graph queries through SQL/PGQ syntax: CREATE PROPERTY GRAPH, GRAPH_TABLE, and multi-hop traversals like (me:Researcher WHERE me.name = 'Alice') -[:CollaboratesWith]->{1,2}(peer:Researcher).
Sandboxed Python UDFs in isolated worker processes, with Arrow/NumPy interop for batch processing—safe for agent-generated code.
SQL interface compatible with PostgreSQL wire protocol; tools like psql connect directly, though broader ORM support is under validation.

Transactions provide serializable isolation via SSI-validated writes. Savepoints and range locks handle contention.

Getting it running

Paro requires Rust 1.85 or later. Clone the repository from GitHub, enter the project directory, and run:

make run

This compiles and starts the parod server on 127.0.0.1:6432 by default, exposing the postgres database. From another terminal, connect using psql (version 14 or newer recommended):

psql -h 127.0.0.1 -p 6432 -d postgres

Set environment variables for custom binds, such as make run PARO_HOST=0.0.0.0 PARO_PORT=5432. Authentication is absent, so restrict access to trusted networks. Older psql versions might connect but lack full testing.

Once connected, users can create tables, indexes, and graphs immediately. For vector data, add HNSW indexes on embedding columns. Full-text setup involves GIN indexes on tsvector columns generated by to_tsvector. Graphs require CREATE PROPERTY GRAPH on vertex/edge tables.

Example queries

Paro's strength shows in combined workloads. Consider an agent querying "retrieval-augmented generation for autonomous agents" across a collaboration graph and paper abstracts. This CTE-based query traverses Alice's two-hop network, filters papers by embedding similarity to [0.91, 0.10, 0.80, 0.22], then scores by hybrid vector and full-text relevance:

WITH network AS (
    SELECT * FROM GRAPH_TABLE(collab_graph
        MATCH (me:Researcher WHERE me.name = 'Alice')
              -[:CollaboratesWith]->{1,2}(peer:Researcher)
        COLUMNS (peer.id AS author_id, peer.name AS author_name)
    )
),
candidates AS (
    SELECT
        id,
        title,
        author_id,
        abstract,
        1.0 / (1.0 + (embedding <-> '[0.91, 0.10, 0.80, 0.22]')) AS vec_score
    FROM papers
    ORDER BY embedding <-> '[0.91, 0.10, 0.80, 0.22]'
    LIMIT 20
)
SELECT
    c.title,
    n.author_name,
    c.vec_score
      + ts_rank(
            to_tsvector('simple', c.abstract),
            plainto_tsquery('simple', 'retrieval augmented generation agents')
        ) AS score
FROM network n
JOIN candidates c ON c.author_id = n.author_id
WHERE to_tsvector('simple', c.abstract)
   @@ plainto_tsquery('simple', 'retrieval augmented generation agents')
ORDER BY score

Such queries run atomically, blending models without ETL pipelines.

Who this is for

Paro fits developers prototyping AI agents that reason over mixed data—graphs for relationships, vectors for semantics, text for keywords, and Python for custom logic. Research tools, like academic collaboration explorers or RAG pipelines, benefit from its one-engine design. Teams tired of managing Postgres + pgvector + Apache AGE (for graphs) or similar stacks find value in the unified SQL dialect.

It's less ideal for pure transactional OLTP without vectors/graphs, where vanilla Postgres suffices. Agent builders evaluating multi-model options during early stages will appreciate psql compatibility for quick iteration. Production users should wait for stability in ORMs, steady-state perf, and format changes.

How it compares

Paro overlaps with Postgres extensions: pgvector for vectors (it matches operators and adds HNSW), pg_trgm or built-in full-text, and AGE/Cypher extensions for graphs. However, Paro bakes these into the core engine without extension conflicts or version mismatches. Columnar storage and SIMD give it an analytical edge over row-oriented Postgres.

Alternatives like SurrealDB offer multi-model SQL with graphs and vectors but lack Python UDFs or pgvector parity. SingleStore or DuckDB handle analytics and vectors in SQL, yet miss native graphs and agent-focused sandboxing. Weaviate or Milvus excel at vectors but require separate SQL layers. Paro's Rust foundation promises better concurrency than some Go-based peers, though its beta status limits direct production comparisons.

For self-hosters, Paro's single binary (post-make run) is lighter than Postgres + extensions sprawl. It weighs more than minimal vector stores due to full SQL/graph support.

Current status and caveats

Expect rough edges in driver support—psql works reliably, but test ORMs like SQLAlchemy. No authentication means local-only deploys for now. On-disk formats may evolve, so back up prototypes. For those needing mature multi-model today, Postgres with extensions remains safer. Source code and updates live at https://github.com/zunor/paro.

Paro: Rust Database Unifying Vectors, Graphs, SQL, and Python for AI Agents

Core features

Getting it running

Example queries

Who this is for

How it compares

Current status and caveats

Comments

Core features

Getting it running

Example queries

Who this is for

How it compares

Current status and caveats

Comments

Related Posts

Bike4Mind/bike4mind: Open-Core AI Knowledge Workspace with Agents and RAG

clawkwork/clawk: Run coding agents inside a disposable Linux VM to secure your host

Pluviobyte/rnskill: AI Agent Skills for Codex, Claude Code, and Agent Workflows

ai4s-research/open-science: A Local-First AI Research Workbench for Scientists