Open-Source Local AI Models Registry

A verified reference of every open-weight AI model worth running locally, organized by your GPU's VRAM. Every URL points to a live HuggingFace page. Every VRAM placement is validated against actual parameter counts and quantization requirements. Every license restriction is disclosed.

For the narrative walkthrough with hardware recommendations and first-project suggestions, see our Complete Hardware Guide for 2026.

LLMs (Chat / Reasoning)

Tier 1: ~8GB VRAM

Model	Params	VRAM (Q4)	License	Context	HuggingFace	Notes
Qwen3.5-4B	~5B hybrid	~5.5GB	Apache 2.0	262K (1M YaRN)	Qwen/Qwen3.5-4B	RECOMMENDED. Multimodal (text/image/video). Supersedes Qwen3-4B. Feb 2026.
Qwen3-8B	8.2B dense	~4.6GB	Apache 2.0	32K (131K YaRN)	Qwen/Qwen3-8B	Best pure text reasoning at this tier. ~40 tok/s.
Nemotron-3-Nano-4B	3.97B	~2.5GB	NVIDIA Open	262K	nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16	Mamba2 hybrid. Edge-optimized (Jetson, mobile). Mar 2026.
Phi-4-mini	3.8B dense	~2.5GB	MIT	128K	microsoft/Phi-4-mini-instruct	Most permissive license. Strong math/logic. Older (Feb 2025).

Tier 2: ~16GB VRAM

Model	Params (total/active)	VRAM (Q4)	License	Context	HuggingFace	Notes
Qwen3.5-35B-A3B	35B / 3B active	~15GB (Q3)	Apache 2.0	262K (1M YaRN)	Qwen/Qwen3.5-35B-A3B	RECOMMENDED. MoE. Needs Q3/IQ3 to fit 16GB. Multimodal. Mar 2026.
Qwen3-30B-A3B	30.5B / 3.3B active	~15GB	Apache 2.0	32K (131K YaRN)	Qwen/Qwen3-30B-A3B	MoE. Fits 16GB at Q4 cleanly. Coder variant: Qwen/Qwen3-Coder-30B-A3B-Instruct.
Gemma 3 12B	12B dense	~7GB	Gemma ToU	128K	google/gemma-3-12b-it	Multimodal. 140+ languages. Google-backed. Mar 2025.
Qwen2.5-VL-7B	7B dense	~5GB	Apache 2.0	32K	Qwen/Qwen2.5-VL-7B-Instruct	Strong vision-language. Superseded by Qwen3.5-4B for most tasks.

Excluded from this tier:
- Mistral Small 4 (119B MoE, 22B active) — needs ~60-70GB at Q4. Does not fit 16GB.
- GLM-4.5-Air (106B MoE, 12B active) — needs ~24GB at INT4. Belongs in Tier 3.

Tier 3: ~24GB VRAM

Model	Params (total/active)	VRAM (Q4)	License	Context	HuggingFace	Notes
Qwen3.5-27B	27.8B dense	~17GB	Apache 2.0	262K (1M YaRN)	Qwen/Qwen3.5-27B	RECOMMENDED. Multimodal. All params active. MMLU-Pro 86.1. Feb 2026.
GLM-4.7-Flash	31B / 3B active	~18-20GB	MIT	200K	zai-org/GLM-4.7-Flash	Best coding (SWE-bench 59.2%). MoE. Jan 2026.
GLM-4.5-Air	106B / 12B active	~24GB (INT4)	MIT	128K	zai-org/GLM-4.5-Air	Tight fit at INT4 on 24GB. Massive knowledge capacity.
Mistral Small 3.2	24B dense	~14-15GB	Apache 2.0	128K	mistralai/Mistral-Small-3.2-24B-Instruct-2506	Best tool calling. Vision. Jun 2025.

Excluded from this tier:
- GLM-4.7 full (358B MoE) — multi-GPU only. Use GLM-4.7-Flash instead.
- Mistral Large 3 (675B MoE, 41B active) — needs ~340GB at Q4. Enterprise only.

Quick Reference

VRAM	Best General Chat	Best for Coding	Best Multimodal
8GB	Qwen3.5-4B (Q4)	Qwen3-8B (Q4)	Qwen3.5-4B (Q4)
16GB	Qwen3.5-35B-A3B (Q3)	Qwen3-Coder-30B-A3B (Q4)	Qwen3.5-35B-A3B (Q3)
24GB	Qwen3.5-27B (Q4)	GLM-4.7-Flash (Q4)	Qwen3.5-27B (Q4)

TTS (Text-to-Speech)

Tier 1: CPU / ~8GB VRAM

Model	Params	VRAM	License	Languages	HuggingFace	Notes
Kokoro-82M	82M	CPU or ~2-3GB	Apache 2.0	8 (54 voices)	hexgrad/Kokoro-82M	96x real-time. 9.4M+ downloads/month. No voice cloning.
Piper TTS	5-30M/voice	CPU only	MIT	30+ (100+ voices)	rhasspy/piper-voices	Runs on Raspberry Pi. Archived Oct 2025; fork: OHF-Voice/piper1-gpl (GPL-3.0).
MeloTTS	~small	CPU	MIT	6 (multi-accent EN)	myshell-ai/MeloTTS-English-v3	GitHub: myshell-ai/MeloTTS. Language-specific repos on HF.

Tier 2: ~16GB VRAM

Model	Params	VRAM	License	Languages	HuggingFace	Notes
Qwen3-TTS	0.6B / 1.7B	~2GB / ~16GB	Apache 2.0	10	Qwen3-TTS collection	RECOMMENDED. Voice cloning (3s audio). VoiceDesign variant. 1.67M downloads. Jan 2026.
CosyVoice3 0.5B	0.5B	~4-8GB	Apache 2.0	9 + 18 CN dialects	FunAudioLLM/Fun-CosyVoice3-0.5B-2512	Zero-shot voice cloning. 150ms streaming. Successor to CosyVoice2.
Chatterbox-Turbo	350M	~4.5GB	MIT	EN (Turbo) / 23+ (Multilingual)	ResembleAI/chatterbox-turbo	Voice cloning. Emotion control. Multilingual: ResembleAI/chatterbox.
Voxtral-4B	4B	>=16GB	CC BY-NC 4.0	9	mistralai/Voxtral-4B-TTS-2603	NON-COMMERCIAL. 70ms latency. 20 preset voices. Mar 2026.
XTTS-v2	467M	<10GB	CPML (NC)	17	coqui/XTTS-v2	NON-COMMERCIAL. Coqui AI defunct (2024). Voice cloning from 6s clip. Unmaintained.
Fish Speech 1.5	~1B	~12GB	CC-BY-NC-SA	13	fishaudio/fish-speech-1.5	NON-COMMERCIAL. Top TTS Arena ELO (1339). Best quality regardless of license.

Tier 3: ~24GB VRAM

Model	Params	VRAM	License	Languages	HuggingFace	Notes
Higgs Audio V2	6B total	~24GB	Llama-deriv (<100K users)	50+ claimed	bosonai/higgs-audio-v2-generation-3B-base	Most expressive. Multi-speaker dialogue. Music + speech.
Dia2	1-2B	~12-24GB	Apache 2.0	EN only	nari-labs/Dia2-2B	Dialogue/podcast specialist. Multi-speaker.

TTS License Summary

Model	Commercial OK?
Kokoro-82M	YES (Apache 2.0)
Piper TTS	YES (MIT) / Conditional (GPL fork)
MeloTTS	YES (MIT)
Qwen3-TTS	YES (Apache 2.0)
CosyVoice3	YES (Apache 2.0)
Chatterbox	YES (MIT)
Voxtral-4B	NO (CC BY-NC 4.0)
XTTS-v2	NO (CPML, company defunct)
Fish Speech 1.5	NO (CC-BY-NC-SA)
Higgs Audio V2	CONDITIONAL (<100K users)
Dia2	YES (Apache 2.0)

STT (Speech-to-Text / ASR)

Tier 1: ~8GB VRAM

Model	Params	VRAM	WER	License	Languages	HuggingFace	Notes
IBM Granite 4.0 1B	2B	~4GB	5.52%	Apache 2.0	6 (EN,FR,DE,ES,PT,JA)	ibm-granite/granite-4.0-1b-speech	#1 Open ASR Leaderboard. Edge-deployable. Keyword biasing. Mar 2026.
Cohere Transcribe	2B	~4-6GB	5.42%	Apache 2.0	14	CohereLabs/cohere-transcribe-03-2026	Conformer-based. 500K hrs training. Mar 2026.
Whisper Large V3 Turbo	809M	~3-6GB	~7.75%	MIT	99	openai/whisper-large-v3-turbo	Best language breadth. Massive ecosystem. 6.3x faster than full V3.
Qwen3-ASR-1.7B	2B	~6-8GB	SOTA (many benchmarks)	52 (30 + 22 CN dialects)	Qwen/Qwen3-ASR-1.7B	Best CJK support. Singing voice transcription. Offline + streaming. Feb 2026.
Distil-Whisper Large V3	756M	~5GB	MIT	EN only	distil-whisper/distil-large-v3	6.3x faster than Whisper V3. Within 1% WER. Also: distil-large-v3.5.

Tier 2: ~16GB VRAM

Model	Params	VRAM	WER	License	Languages	HuggingFace	Notes
Canary-Qwen 2.5B	2.5B	~8-10GB	5.63%	CC-BY-4.0	EN (reliable)	nvidia/canary-qwen-2.5b	SALM arch — can summarize/QA about transcripts. 234K hrs training.
Parakeet TDT 1.1B	1.1B	~4-7GB	~8%	CC-BY-4.0	EN	nvidia/parakeet-tdt-1.1b	Fastest (RTFx >2000). Best for batch/throughput.
Granite Speech 3.3 8B	~9B	~18GB	5.85%	Apache 2.0	5 + translation	ibm-granite/granite-speech-3.3-8b	Best noise robustness. Multilingual ASR + translation.

Tier 3: ~24GB VRAM

Model	Params	VRAM	WER	License	Languages	HuggingFace	Notes
Whisper Large V3	1.55B	~6-10GB	~7.4%	Apache 2.0	99	openai/whisper-large-v3	Ecosystem standard. Widest tooling. Actually fits in mid-range.

STT Quick Picks

Need	Best Pick
Best accuracy (EN)	IBM Granite 4.0 1B or Cohere Transcribe (~5.5% WER, both Apache 2.0)
Most languages (99)	Whisper Large V3 Turbo
Best CJK	Qwen3-ASR-1.7B (52 languages)
Fastest throughput	Parakeet TDT 1.1B (RTFx >2000)
Edge/low-power	IBM Granite 4.0 1B (~4GB, runs on Apple Silicon)

Vision (VLMs)

Tier 1: ~8GB VRAM

Model	Params	VRAM	License	Capabilities	HuggingFace	Notes
Qwen3-VL-2B	2B	~4-5GB	Apache 2.0	Image, video, OCR (32 langs), GUI agent, 3D grounding	Qwen/Qwen3-VL-2B-Instruct	RECOMMENDED. Most capable at this size. 2.5M+ downloads/month. No 3B version exists.
Ministral-3-3B	3.8B total	~8GB (FP8)	Apache 2.0	Image, text, function calling	mistralai/Ministral-3-3B-Instruct-2512	Confirmed vision via 0.4B vision encoder. 256K context. No video.
SmolVLM2-2.2B	2.2B	~5.2GB	Apache 2.0	Image, video, OCR, document analysis	HuggingFaceTB/SmolVLM2-2.2B-Instruct	Best video at this size. HuggingFace-built.
Molmo-7B (Q4)	~8B	~5GB (Q4)	Apache 2.0	Image, pointing/grounding	allenai/Molmo-7B-D-0924	Unique pointing architecture. Open weights AND data.

Tier 2: ~16GB VRAM

Model	Params	VRAM	License	Capabilities	HuggingFace	Notes
Qwen2.5-VL-7B	7B	~14-16GB	Apache 2.0	Image, video (1hr+), OCR (864 score), GUI agent, docs	Qwen/Qwen2.5-VL-7B-Instruct	Best document understanding (DocVQA 95.7). Also consider: Qwen3-VL-8B.
InternVL3-8B	~7.3B	~16GB (8GB Q8)	MIT	Image, video, OCR, GUI, 3D, industrial, 100+ langs	OpenGVLab/InternVL3-8B	Broadest capability set. Also: InternVL3.5-8B.
Pixtral-12B	12.4B	~10GB (Q4)	Apache 2.0	Image (multi), docs, charts, code gen	mistralai/Pixtral-12B-2409	Natively multimodal. 128K context. Strong code + vision.

Tier 3: ~24GB VRAM

Model	Params	VRAM (Q4)	License	Capabilities	HuggingFace	Notes
Qwen3-VL-32B	33B	~18-20GB	Apache 2.0	Image, video, OCR (32 langs), GUI, 3D, coding	Qwen/Qwen3-VL-32B-Instruct	RECOMMENDED. Most capable open VLM. Needs Q4 to fit 24GB.
GLM-4.6V-Flash	9B	~7GB (INT4)	MIT	Image, video, OCR, function calling	zai-org/GLM-4.6V-Flash	First VLM with native vision-driven function calling. Fits 16GB too.

Does not exist: Qwen3-VL-72B. The Qwen3-VL family jumps from 32B dense to 235B-A22B MoE. The 72B exists only in the older Qwen2.5-VL generation.

Image Generation

Low-End (~8-13GB VRAM)

Model	Params	VRAM	License	HuggingFace	Notes
SD 3.5 Medium	2.5B	~6-10GB	Community (<$1M free)	stabilityai/stable-diffusion-3.5-medium	Runs on almost anything. Good typography. Multi-resolution.
FLUX.2 klein 4B	4B	~13GB	Apache 2.0	black-forest-labs/FLUX.2-klein-4B	Sub-second inference. Generation + editing. Best Apache 2.0 option.
FLUX.1 schnell	12B	8-24GB (8GB Q4)	Apache 2.0	black-forest-labs/FLUX.1-schnell	Previous gen, still 733K+ downloads/month. 1-4 steps.

High-End (~24GB VRAM)

Model	Params	VRAM	License	HuggingFace	Notes
SD 3.5 Large Turbo	8.1B	~11-24GB	Community (<$1M free)	stabilityai/stable-diffusion-3.5-large-turbo	4-step fast inference.
Qwen-Image	20B	~24GB	Apache 2.0	Qwen/Qwen-Image	Best Apache 2.0 quality. Editing + text rendering (EN/CN).
FLUX.2-dev	32B	~24GB (Q4)	Non-Commercial	black-forest-labs/FLUX.2-dev	NON-COMMERCIAL. Best raw quality. Editing + combining.

Note: There is no "FLUX.2-schnell." FLUX.2 uses: klein (fast), dev, flex, pro. FLUX.1 used: schnell, dev, pro.

All URLs verified against live HuggingFace pages on March 28, 2026. VRAM requirements validated against actual parameter counts, not marketing claims. Models and benchmarks change fast. We will update this registry as new releases change the picture.

Open-Source Local AI Models Registry

LLMs (Chat / Reasoning)

Tier 1: ~8GB VRAM

Tier 2: ~16GB VRAM

Tier 3: ~24GB VRAM

Quick Reference

TTS (Text-to-Speech)

Tier 1: CPU / ~8GB VRAM

Tier 2: ~16GB VRAM

Tier 3: ~24GB VRAM

TTS License Summary

STT (Speech-to-Text / ASR)

Tier 1: ~8GB VRAM

Tier 2: ~16GB VRAM

Tier 3: ~24GB VRAM

STT Quick Picks

Vision (VLMs)

Tier 1: ~8GB VRAM

Tier 2: ~16GB VRAM

Tier 3: ~24GB VRAM

Image Generation

Low-End (~8-13GB VRAM)

High-End (~24GB VRAM)

Related Posts

How to automate employee onboarding without code

How to migrate from Zapier to Make or n8n

How to automate invoice processing without code