A practical reference for 30 of the most powerful large language models you can use for free — either by self-hosting open-weight models on your own machine, or by using free hosted API platforms (no hardware needed).

For every model you get:

  • What it is — a plain-English description, license, and where it shines.
  • PC / RAM recommendations — a table of variants with the hardware each needs.
  • How to install — copy-paste commands (Ollama, LM Studio, Hugging Face) or API setup.

⚠️ This space moves fast. Model names, sizes, licenses, and free-tier limits change constantly. Always confirm the current model card and rate limits before building anything serious. For live free-tier quotas, the canonical living list is cheahjs/free-llm-api-resources.


Two ways to run an LLM

Approach Privacy Cost Hardware Best for
Self-host (open-weight, models 1–20) Fully private Free (your electricity) You need RAM/VRAM Privacy, offline, tinkering, fine-tuning
Free hosted (platforms 21–30) Usually train on your data Free within rate limits None Quick start, frontier models, no GPU

How much RAM do I actually need? (rule of thumb)

Most people run quantized models (compressed to ~4-bit, the Q4 builds Ollama pulls by default). Rough guide:

Model size Approx. RAM/VRAM (Q4) Typical hardware
1B–3B 2–4 GB Any laptop, phone, Raspberry Pi 5
7B–9B 5–8 GB 8GB+ laptop, base M-series Mac
13B–14B 9–12 GB 16GB laptop, RTX 3060 12GB
27B–34B 18–24 GB RTX 3090/4090, M-series 32GB+
70B ~40–48 GB 2× 24GB GPU, Mac 64GB+
120B+ MoE 60–80 GB+ Workstation / multi-GPU
400B–1T+ MoE 250 GB–1 TB+ Datacenter / multi-node only

🧮 Quick formula: RAM ≈ (params in billions) × bytes-per-param. Q4 ≈ 0.5–0.6 GB per billion params; FP16 (full) ≈ 2 GB per billion. Add ~1–2 GB overhead and room for context.


The easiest install path (read this first)

Three tools cover ~95% of self-hosting. Install one and you're set:

  1. Ollama — simplest CLI. ollama run qwen3 and you're chatting. Mac/Linux/Windows.
  2. LM Studio — GUI app with a model browser, chat UI, and local API server. Great for beginners.
  3. Hugging Face + transformers — raw access to every model and every quant via Python. Most flexible, most setup.
# Ollama one-liner (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
ollama run qwen3:8b   # downloads + chats

Each model page below gives the exact command(s).


Catalog

🟢 Open-weight families — free to self-host (20)

Chinese-origin

# Model Maker License Practical pick
1 Qwen Alibaba Apache 2.0 Qwen3 8B (entry) / 32B (power)
2 DeepSeek DeepSeek MIT R1 distill 14B / 32B
3 GLM / ChatGLM Z.ai / Zhipu MIT GLM-Edge (consumer)
4 Kimi K2 Moonshot Modified MIT Datacenter / hosted route
5 MiniMax M3 MiniMax Open-weight Datacenter / quantized community builds
6 Yi 01.AI Apache 2.0 Yi 9B / 34B
7 Baichuan Baichuan Mixed 7B–13B
8 InternLM Shanghai AI Lab Open 7B–20B
9 Ernie Baidu Mixed Verify per-model
10 Hunyuan Tencent Mixed Verify per-model

Western / other

# Model Maker License Practical pick
11 Llama Meta Community (open-weight) Llama 3.x 8B / 70B
12 Gemma Google Gemma license Gemma 12B / 26B
13 gpt-oss OpenAI Apache 2.0 gpt-oss 20B
14 Mistral / Devstral Mistral AI Apache 2.0 Mistral Small / Devstral
15 Phi Microsoft MIT Phi-4 / Phi-4-mini
16 Nemotron NVIDIA Open-weight Varies by size
17 OLMo Allen AI Apache 2.0 OLMo 2 7B / 13B
18 Falcon TII (UAE) Falcon license Falcon-H1 7B–34B
19 Granite IBM Apache 2.0 Granite 8B
20 Command R Cohere Non-commercial Personal / research

🔵 Free hosted platforms — no hardware needed (10)

# Platform What you get Free tier highlight
21 Google AI Studio (Gemini) Frontier closed model ~1,500 req/day Gemini Flash, no card
22 Groq Fastest open-weight inference 300+ tok/s, ~30 req/min
23 Cerebras Wafer-scale fast inference Generous free tier, no-training
24 OpenRouter One key, many models 25+ :free models
25 GitHub Models Dev playground Free within rate limits
26 Cloudflare Workers AI Edge inference 10,000 neurons/day
27 Mistral La Plateforme Mistral API Free experiment tier (opt-in training)
28 Hugging Face Inference Thousands of models Serverless, models <~10GB
29 NVIDIA NIM Hosted open models Trial-style credits
30 Together AI Open models + credits ~$1–25 signup credit

Choosing a model — quick advice

  • Just want it to work on a normal laptop? → Qwen3 8B, Llama 3.x 8B, Gemma 12B, or Phi-4-mini via Ollama.
  • Coding agent? → Devstral / Qwen3-Coder / GLM, or hosted Groq/Cerebras for speed.
  • Reasoning / math? → DeepSeek-R1 distills, Phi-4.
  • No GPU at all? → Use a hosted platform (21–30). Start with Google AI Studio or Groq.
  • Privacy is non-negotiable? → Self-host only. Hosted free tiers usually train on your prompts.
  • Long documents? → Gemini (1M context), Llama 4 Scout (10M), Falcon-H1 (256K).