Moh4696/30-powerful-llms: 30 Powerful LLMs — Self-Host & Free-Hosted G

A practical reference for 30 of the most powerful large language models you can use for free — either by self-hosting open-weight models on your own machine, or by using free hosted API platforms (no hardware needed).

For every model you get:

What it is — a plain-English description, license, and where it shines.
PC / RAM recommendations — a table of variants with the hardware each needs.
How to install — copy-paste commands (Ollama, LM Studio, Hugging Face) or API setup.

⚠️ This space moves fast. Model names, sizes, licenses, and free-tier limits change constantly. Always confirm the current model card and rate limits before building anything serious. For live free-tier quotas, the canonical living list is cheahjs/free-llm-api-resources.

Two ways to run an LLM

Approach	Privacy	Cost	Hardware	Best for
Self-host (open-weight, models 1–20)	Fully private	Free (your electricity)	You need RAM/VRAM	Privacy, offline, tinkering, fine-tuning
Free hosted (platforms 21–30)	Usually train on your data	Free within rate limits	None	Quick start, frontier models, no GPU

How much RAM do I actually need? (rule of thumb)

Most people run quantized models (compressed to ~4-bit, the Q4 builds Ollama pulls by default). Rough guide:

Model size	Approx. RAM/VRAM (Q4)	Typical hardware
1B–3B	2–4 GB	Any laptop, phone, Raspberry Pi 5
7B–9B	5–8 GB	8GB+ laptop, base M-series Mac
13B–14B	9–12 GB	16GB laptop, RTX 3060 12GB
27B–34B	18–24 GB	RTX 3090/4090, M-series 32GB+
70B	~40–48 GB	2× 24GB GPU, Mac 64GB+
120B+ MoE	60–80 GB+	Workstation / multi-GPU
400B–1T+ MoE	250 GB–1 TB+	Datacenter / multi-node only

🧮 Quick formula: RAM ≈ (params in billions) × bytes-per-param. Q4 ≈ 0.5–0.6 GB per billion params; FP16 (full) ≈ 2 GB per billion. Add ~1–2 GB overhead and room for context.

The easiest install path (read this first)

Three tools cover ~95% of self-hosting. Install one and you're set:

Ollama — simplest CLI. ollama run qwen3 and you're chatting. Mac/Linux/Windows.
LM Studio — GUI app with a model browser, chat UI, and local API server. Great for beginners.
Hugging Face + transformers — raw access to every model and every quant via Python. Most flexible, most setup.

# Ollama one-liner (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
ollama run qwen3:8b   # downloads + chats

Each model page below gives the exact command(s).

Catalog

🟢 Open-weight families — free to self-host (20)

Chinese-origin

#	Model	Maker	License	Practical pick
1	Qwen	Alibaba	Apache 2.0	Qwen3 8B (entry) / 32B (power)
2	DeepSeek	DeepSeek	MIT	R1 distill 14B / 32B
3	GLM / ChatGLM	Z.ai / Zhipu	MIT	GLM-Edge (consumer)
4	Kimi K2	Moonshot	Modified MIT	Datacenter / hosted route
5	MiniMax M3	MiniMax	Open-weight	Datacenter / quantized community builds
6	Yi	01.AI	Apache 2.0	Yi 9B / 34B
7	Baichuan	Baichuan	Mixed	7B–13B
8	InternLM	Shanghai AI Lab	Open	7B–20B
9	Ernie	Baidu	Mixed	Verify per-model
10	Hunyuan	Tencent	Mixed	Verify per-model

Western / other

#	Model	Maker	License	Practical pick
11	Llama	Meta	Community (open-weight)	Llama 3.x 8B / 70B
12	Gemma	Google	Gemma license	Gemma 12B / 26B
13	gpt-oss	OpenAI	Apache 2.0	gpt-oss 20B
14	Mistral / Devstral	Mistral AI	Apache 2.0	Mistral Small / Devstral
15	Phi	Microsoft	MIT	Phi-4 / Phi-4-mini
16	Nemotron	NVIDIA	Open-weight	Varies by size
17	OLMo	Allen AI	Apache 2.0	OLMo 2 7B / 13B
18	Falcon	TII (UAE)	Falcon license	Falcon-H1 7B–34B
19	Granite	IBM	Apache 2.0	Granite 8B
20	Command R	Cohere	Non-commercial	Personal / research

🔵 Free hosted platforms — no hardware needed (10)

#	Platform	What you get	Free tier highlight
21	Google AI Studio (Gemini)	Frontier closed model	~1,500 req/day Gemini Flash, no card
22	Groq	Fastest open-weight inference	300+ tok/s, ~30 req/min
23	Cerebras	Wafer-scale fast inference	Generous free tier, no-training
24	OpenRouter	One key, many models	25+ `:free` models
25	GitHub Models	Dev playground	Free within rate limits
26	Cloudflare Workers AI	Edge inference	10,000 neurons/day
27	Mistral La Plateforme	Mistral API	Free experiment tier (opt-in training)
28	Hugging Face Inference	Thousands of models	Serverless, models <~10GB
29	NVIDIA NIM	Hosted open models	Trial-style credits
30	Together AI	Open models + credits	~$1–25 signup credit

Choosing a model — quick advice

Just want it to work on a normal laptop? → Qwen3 8B, Llama 3.x 8B, Gemma 12B, or Phi-4-mini via Ollama.
Coding agent? → Devstral / Qwen3-Coder / GLM, or hosted Groq/Cerebras for speed.
Reasoning / math? → DeepSeek-R1 distills, Phi-4.
No GPU at all? → Use a hosted platform (21–30). Start with Google AI Studio or Groq.
Privacy is non-negotiable? → Self-host only. Hosted free tiers usually train on your prompts.
Long documents? → Gemini (1M context), Llama 4 Scout (10M), Falcon-H1 (256K).

Moh4696/30-powerful-llms: 30 Powerful LLMs — Self-Host & Free-Hosted Guide

Two ways to run an LLM

How much RAM do I actually need? (rule of thumb)

The easiest install path (read this first)

Catalog

🟢 Open-weight families — free to self-host (20)

🔵 Free hosted platforms — no hardware needed (10)

Choosing a model — quick advice

Comments

Two ways to run an LLM

How much RAM do I actually need? (rule of thumb)

The easiest install path (read this first)

Catalog

🟢 Open-weight families — free to self-host (20)

🔵 Free hosted platforms — no hardware needed (10)

Choosing a model — quick advice

Comments

Related Posts

CyberSunil/LLMVault: ultimate Hands-On OWASP LLM Top 10 Training Platform

2501035-wq/mobile-sim-streamer: an open-source tool on GitHub for self-hosters

ricardovilla0/nucleus-stack: a modular, user-sovereign engine for cultivating, curating

tahzeeb031/harmonize-ffmpeg-pipeline: all-in-One Media Orchestrator & Format Alchemist A self-hosted