A practical reference for 30 of the most powerful large language models you can use for free — either by self-hosting open-weight models on your own machine, or by using free hosted API platforms (no hardware needed).
For every model you get:
- What it is — a plain-English description, license, and where it shines.
- PC / RAM recommendations — a table of variants with the hardware each needs.
- How to install — copy-paste commands (Ollama, LM Studio, Hugging Face) or API setup.
⚠️ This space moves fast. Model names, sizes, licenses, and free-tier limits change constantly. Always confirm the current model card and rate limits before building anything serious. For live free-tier quotas, the canonical living list is
cheahjs/free-llm-api-resources.
Two ways to run an LLM
| Approach | Privacy | Cost | Hardware | Best for |
|---|---|---|---|---|
| Self-host (open-weight, models 1–20) | Fully private | Free (your electricity) | You need RAM/VRAM | Privacy, offline, tinkering, fine-tuning |
| Free hosted (platforms 21–30) | Usually train on your data | Free within rate limits | None | Quick start, frontier models, no GPU |
How much RAM do I actually need? (rule of thumb)
Most people run quantized models (compressed to ~4-bit, the Q4 builds Ollama pulls by default). Rough guide:
| Model size | Approx. RAM/VRAM (Q4) | Typical hardware |
|---|---|---|
| 1B–3B | 2–4 GB | Any laptop, phone, Raspberry Pi 5 |
| 7B–9B | 5–8 GB | 8GB+ laptop, base M-series Mac |
| 13B–14B | 9–12 GB | 16GB laptop, RTX 3060 12GB |
| 27B–34B | 18–24 GB | RTX 3090/4090, M-series 32GB+ |
| 70B | ~40–48 GB | 2× 24GB GPU, Mac 64GB+ |
| 120B+ MoE | 60–80 GB+ | Workstation / multi-GPU |
| 400B–1T+ MoE | 250 GB–1 TB+ | Datacenter / multi-node only |
🧮 Quick formula: RAM ≈ (params in billions) × bytes-per-param. Q4 ≈ 0.5–0.6 GB per billion params; FP16 (full) ≈ 2 GB per billion. Add ~1–2 GB overhead and room for context.
The easiest install path (read this first)
Three tools cover ~95% of self-hosting. Install one and you're set:
- Ollama — simplest CLI.
ollama run qwen3and you're chatting. Mac/Linux/Windows. - LM Studio — GUI app with a model browser, chat UI, and local API server. Great for beginners.
- Hugging Face + transformers — raw access to every model and every quant via Python. Most flexible, most setup.
# Ollama one-liner (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
ollama run qwen3:8b # downloads + chats
Each model page below gives the exact command(s).
Catalog
🟢 Open-weight families — free to self-host (20)
Chinese-origin
| # | Model | Maker | License | Practical pick |
|---|---|---|---|---|
| 1 | Qwen | Alibaba | Apache 2.0 | Qwen3 8B (entry) / 32B (power) |
| 2 | DeepSeek | DeepSeek | MIT | R1 distill 14B / 32B |
| 3 | GLM / ChatGLM | Z.ai / Zhipu | MIT | GLM-Edge (consumer) |
| 4 | Kimi K2 | Moonshot | Modified MIT | Datacenter / hosted route |
| 5 | MiniMax M3 | MiniMax | Open-weight | Datacenter / quantized community builds |
| 6 | Yi | 01.AI | Apache 2.0 | Yi 9B / 34B |
| 7 | Baichuan | Baichuan | Mixed | 7B–13B |
| 8 | InternLM | Shanghai AI Lab | Open | 7B–20B |
| 9 | Ernie | Baidu | Mixed | Verify per-model |
| 10 | Hunyuan | Tencent | Mixed | Verify per-model |
Western / other
| # | Model | Maker | License | Practical pick |
|---|---|---|---|---|
| 11 | Llama | Meta | Community (open-weight) | Llama 3.x 8B / 70B |
| 12 | Gemma | Gemma license | Gemma 12B / 26B | |
| 13 | gpt-oss | OpenAI | Apache 2.0 | gpt-oss 20B |
| 14 | Mistral / Devstral | Mistral AI | Apache 2.0 | Mistral Small / Devstral |
| 15 | Phi | Microsoft | MIT | Phi-4 / Phi-4-mini |
| 16 | Nemotron | NVIDIA | Open-weight | Varies by size |
| 17 | OLMo | Allen AI | Apache 2.0 | OLMo 2 7B / 13B |
| 18 | Falcon | TII (UAE) | Falcon license | Falcon-H1 7B–34B |
| 19 | Granite | IBM | Apache 2.0 | Granite 8B |
| 20 | Command R | Cohere | Non-commercial | Personal / research |
🔵 Free hosted platforms — no hardware needed (10)
| # | Platform | What you get | Free tier highlight |
|---|---|---|---|
| 21 | Google AI Studio (Gemini) | Frontier closed model | ~1,500 req/day Gemini Flash, no card |
| 22 | Groq | Fastest open-weight inference | 300+ tok/s, ~30 req/min |
| 23 | Cerebras | Wafer-scale fast inference | Generous free tier, no-training |
| 24 | OpenRouter | One key, many models | 25+ :free models |
| 25 | GitHub Models | Dev playground | Free within rate limits |
| 26 | Cloudflare Workers AI | Edge inference | 10,000 neurons/day |
| 27 | Mistral La Plateforme | Mistral API | Free experiment tier (opt-in training) |
| 28 | Hugging Face Inference | Thousands of models | Serverless, models <~10GB |
| 29 | NVIDIA NIM | Hosted open models | Trial-style credits |
| 30 | Together AI | Open models + credits | ~$1–25 signup credit |
Choosing a model — quick advice
- Just want it to work on a normal laptop? → Qwen3 8B, Llama 3.x 8B, Gemma 12B, or Phi-4-mini via Ollama.
- Coding agent? → Devstral / Qwen3-Coder / GLM, or hosted Groq/Cerebras for speed.
- Reasoning / math? → DeepSeek-R1 distills, Phi-4.
- No GPU at all? → Use a hosted platform (21–30). Start with Google AI Studio or Groq.
- Privacy is non-negotiable? → Self-host only. Hosted free tiers usually train on your prompts.
- Long documents? → Gemini (1M context), Llama 4 Scout (10M), Falcon-H1 (256K).
Comments