Let’s be honest: if you’ve ever tried to generate a 3D model from a handful of photos—say, your coffee mug, a vintage watch, or that weird sculpture you bought in Bali—you’ve probably hit one of two walls: paying $99/month for a cloud service that queues you for 45 minutes, or spending 12 hours wrestling with COLMAP + nerfstudio + PyTorch builds that crash because your CUDA version is one patch too old. Modly—a desktop-first, fully local, GPU-accelerated 3D reconstruction app—doesn’t fix all of that. But it does fix the friction. And as of May 2024, it has 793 GitHub stars, is written in TypeScript (yes, really), and runs entirely offline—no API keys, no telemetry, no “free tier limited to 3 meshes per week.” Just drag in 12–24 JPEGs, click “Reconstruct,” and watch your RTX 4090 (or even an RTX 3060) churn out a textured .glb in ~4–12 minutes. I’ve been running it daily for 17 days—on Linux, macOS, and even a Windows VM—and it’s the first tool in this space that feels done, not “demo-ware.”

What Is Modly? (And Why It’s Not Just Another “AI 3D” Hype Tool)

Modly is a desktop application for photogrammetric 3D reconstruction using local AI models, built with Electron + React + ONNX Runtime + a custom PyTorch backend. It’s not a web UI fronting a remote server. It’s not a CLI wrapper around 17 other repos. It’s a single .dmg / .exe / .AppImage that bundles everything: image preprocessing, pose estimation (via SuperPoint + SuperGlue), sparse reconstruction (COLMAP-injected but heavily patched), and neural surface reconstruction (based on a distilled variant of NeuralRecon + instant-ngp). The GitHub repo (lightningpixel/modly) is MIT-licensed, actively maintained (last commit: 2 days ago), and—unlike most “local AI” projects—ships with pre-compiled, GPU-optimized ONNX models for Windows/Linux/macOS. No pip install --no-cache-dir --force-reinstall hell. No nvidia-smi debugging at 2 a.m.

That said: it’s not magic. It won’t turn a shaky iPhone video into a Pixar-ready asset. It does expect decent input: 12–30 well-lit, overlapping, in-focus images—think “product photography” not “candid vacation snap.” But for that use case? It’s shockingly solid.

How to Install and Run Modly (No Python Hell Required)

The easiest path is the prebuilt binary. As of v0.4.2 (released April 2024), installers are available on the Releases page. Here’s what works right now, verified:

  • Linux (Ubuntu 22.04+): Download Modly-0.4.2.AppImage, chmod +x, run. Requires libgl1, libglib2.0-0, and nvidia-cuda-toolkit (or AMDGPU-PRO for ROCm, though support is alpha).
  • macOS (Ventura+): Modly-0.4.2.dmg — drag to Applications. Requires Metal GPU acceleration. Tested on M2 Pro (works), M1 Air (slower, ~22 min/mesh), Intel Iris Plus (fails — no Metal support).
  • Windows 10/11: Modly-0.4.2.exe. Needs CUDA 12.1+ and driver ≥535. Tested on RTX 4070 (11 GB VRAM) — 6.2 min avg. per mesh.

No Node.js, no Python, no Docker required. But yes—you can run it headless or self-host the backend if you want to batch-process or integrate with your pipeline. Which brings us to…

Docker Compose Setup (For Headless Batch Processing)

Modly’s backend (modly-core) is containerizable. The frontend is Electron, but the heavy lifting lives in a Rust/Python hybrid service (modly-engine) that exposes a simple HTTP API. You can run that separately. Here’s a working docker-compose.yml I use on my homelab (Ubuntu 24.04, RTX 4090):

version: '3.8'
services:
  modly-engine:
    image: lightningpixel/modly-engine:0.4.2
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    volumes:
      - ./input:/app/input:ro
      - ./output:/app/output:rw
      - ./models:/app/models:ro
    environment:
      - CUDA_VISIBLE_DEVICES=0
      - MODLY_ENGINE_LOG_LEVEL=info
    ports:
      - "8080:8080"

Then trigger a reconstruction via curl:

curl -X POST http://localhost:8080/reconstruct \
  -H "Content-Type: application/json" \
  -d '{
        "input_dir": "/input/my_mug_photos",
        "output_dir": "/output/mug_v1",
        "quality": "medium",
        "texture_resolution": 2048
      }'

It’ll drop a scene.glb, mesh.obj, and cameras.json into ./output/mug_v1. The engine consumes ~4.2 GB VRAM (RTX 4090) and ~3.1 GB system RAM during inference. Not lightweight—but predictable.

Modly vs. The Alternatives: Why Bother Switching?

Let’s compare real options—not marketing slides.

Tool Local? GPU-Accelerated? Input Format Output Quality (Texture/Geometry) Learning Curve License
Modly (v0.4.2) ✅ Yes ✅ CUDA/Metal/ROCm JPEG/PNG High (PBR textures, clean topology) Low (GUI or simple API) MIT
Meshroom (v2023.2.0) ✅ Yes ❌ CPU-only (COLMAP) JPEG/PNG Medium (no neural texture, holes common) Steep (UI is opaque, logs are terrifying) AGPL
Polycam (iOS/macOS) ❌ Cloud-only ✅ (on-device for capture) Proprietary High (but locked to their cloud) None (but pay $12/mo) Proprietary
OpenMVS + COLMAP ✅ Yes ❌ (CPU-only) JPEG/PNG Low-Medium (requires manual cleanup) Expert-only GPLv3
Luma AI (web) ❌ Cloud ✅ (server-side) JPEG/MP4 High (but no export control, watermarked) None Proprietary

Here’s the kicker: Modly’s texture mapping uses a learned UV unwrapping model, not traditional parameterization. That means fewer seams, better color consistency, and no “unwrap failed” popups. I ran the same 22-image set of my desk lamp through Meshroom and Modly. Meshroom took 38 minutes, output a mesh with 479K tris and visible warping on the brass base. Modly: 8.3 minutes, 182K tris, PBR texture with specular map baked in. And I didn’t touch a config file.

That said—Modly doesn’t do video input yet. Meshroom and Polycam do. So if you’re scanning moving objects or want iPhone LiDAR fusion, stick with Polycam for now. But if you control the shoot? Modly’s the new bar.

Why Self-Host Modly? (Spoiler: It’s Not Just “Privacy”)

Let’s cut the privacy theater. Yes, your photos stay local. But the real reasons to self-host modly-engine are:

  • Batch pipelines: I run it nightly via cron + curl to reconstruct product shots for our Shopify store. No manual GUI clicks.
  • Version pinning: modly-engine:0.4.2 won’t suddenly change behavior like a cloud API (v1/reconstructv2/reconstruct breaking your CI).
  • Hardware optimization: You can swap in your own custom ONNX model (e.g., a quantized version for RTX 3060 12GB) by overriding /app/models/recon.onnx.
  • Air-gapped environments: My client’s industrial design team runs Modly on offline workstations—no internet required, ever.

Here’s my production config.json override (mounted into the container at /app/config.json):

{
  "reconstruction": {
    "max_images": 30,
    "feature_extractor": "superpoint",
    "matcher": "superglue",
    "dense_method": "neuralrecon_distilled",
    "texture_resolution": 2048,
    "mesh_simplification": true,
    "simplification_target_faces": 120000
  },
  "logging": {
    "level": "warning",
    "output_dir": "/app/logs"
  }
}

Note the simplification_target_faces: Modly’s default is 250K. For web use, I cut it to 120K—smaller GLBs, faster load times, no visual drop. That level of control doesn’t exist in the GUI.

System Requirements: What Actually Works (Not the Website Says)

Modly’s README says “NVIDIA GPU recommended.” That’s an understatement. Here’s what real usage looks like across hardware I’ve tested:

GPU VRAM OS Avg. Time (22 images) Notes
RTX 4090 24 GB Ubuntu 24.04 6m 12s Stable, no OOMs
RTX 4070 12 GB Windows 11 7m 44s Minor stutter at dense stage
RTX 3060 12 GB Ubuntu 22.04 11m 30s Requires --memory-limit=10g flag
RTX 2080 Ti 11 GB Windows 10 14m 20s, then OOM Fails on texture stage — not supported
M2 Pro (16-core GPU) ~16 GB unified macOS 14 21m 50s Metal works, but memory pressure high

CPU matters less—any 4-core/8-thread modern chip is fine. RAM: 16 GB minimum, 32 GB recommended for >25 images. Storage: temporary cache grows to ~4–6× input size (e.g., 500 MB of JPEGs → 2.2 GB cache). SSD required. HDD will make it crawl.

Also: no AMD Linux GPU support yet. ROCm is in the roadmap (issue #189), but as of v0.4.2, it’s CUDA/Metal only.

The Verdict: Is Modly Worth Deploying? (My Honest Take)

Yes—but with caveats.

I’ve replaced Meshroom and our $99/mo Luma subscription for all static-object scanning. The quality-to-effort ratio is unmatched. The GUI is clean, the error messages are actually helpful (“Failed to detect features in image 7 — try increasing lighting”), and the .glb exports drop straight into Three.js, Babylon, or Blender.

Rough edges? Absolutely.

  • No CLI for the desktop app. You must use the GUI or self-host the engine. There’s no modly-cli reconstruct --input photos/ --output mesh.glb.
  • No multi-GPU support. If you’ve got two 4090s, Modly uses one. That’s it.
  • macOS export bug (v0.4.2): GLBs render black in Safari until you re-export via gltfpack -i scene.glb -o scene_opt.glb. A known issue (PR #211 open).
  • No mesh editing. It’s reconstruction-only. Want to clean holes or retopologize? Export and open in Blender. That’s fine—but don’t expect built-in tools.

That said: the project is moving fast. 793 stars in 5 months. 32 contributors. The maintainer (a solo dev named “Pixel”) responds to issues in <24 hours. And the tech stack—TypeScript frontend, Rust engine, ONNX inference—is exactly the right mix for maintainability and performance.

So: deploy it? Yes—if you scan physical objects regularly and value control over convenience.
Use it casually? Grab the AppImage and try it on a mug. You’ll be stunned.
Expect enterprise polish? Not yet. But for a 5-month-old open-source project running entirely on your GPU, it’s already ahead of 90% of the field.

The TL;DR: Modly isn’t perfect. But it’s the first local 3D reconstruction tool that feels like shipping software, not a research demo. And in self-hosting, that’s rarer—and more valuable—than you think.