Can I Generate? logoCan I Generate?
ComfyUI + Hugging Face · Image · Video · Speech · Music · 100% private

Can I generateimages, video, voices, musicon this machine?

We read your GPU through your browser, then check it against every local generative model worth running — FLUX, SDXL, Wan 2.2, HunyuanVideo, LTX-Video, Qwen-Image, plus Kokoro, XTTS, Bark, Orpheus for speech and ACE-Step, MusicGen, YuE, Stable Audio for music. No sign-ups. No uploads. Just an answer.

Reading hardware…
GPU not auto-detected
VRAM
Bandwidth
System RAM
16 GB
estimated
Free storage
256 GB
default
CPU cores
Pick a GPU aboveto score every model against your specific hardware. Without that, you’re just looking at the catalog.
ImageComfyUI

Stable Cascade

Stability AI · 5.9B · 2024
5
F
Too heavyFP8 · ~6.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

3-stage cascade (Würstchen architecture). Strong prompt adherence, lighter than its 5.9B suggests.

1024×102413 GB disk16 GB RAM ✓Stability AI Non-Commercial
ImageComfyUI

Z-Image Turbo

Tongyi Lab · 6B · 2025
5
F
Too heavyFP8 · ~8.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Distilled few-step model. FP16 fits comfortably in 16GB. Apache-licensed.

1024×1024fast14 GB disk16 GB RAM ✓Apache 2.0
ImageComfyUI

Z-Image Base

Tongyi Lab · 6B · 2025
5
F
Too heavyFP8 · ~8.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Full (non-distilled) Z-Image. More steps than Z-Image Turbo but higher fidelity. Apache-licensed.

1024×102416 GB disk16 GB RAM ✓Apache 2.0
ImageComfyUIMost popular community model

Stable Diffusion XL

Stability AI · 3.5B · 2023
4
F
Too heavyQ4 GGUF · ~3.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

The workhorse. Sharp 1024px output, huge fine-tune ecosystem (Pony, Illustrious, Juggernaut).

1024×10247 GB disk16 GB RAM ✓CreativeML Open RAIL++-M
ImageComfyUI

SDXL Turbo

Stability AI · 3.5B · 2023
4
F
Too heavyFP8 · ~5.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

1-step distilled SDXL. Generates in under a second on midrange GPUs. Lower fidelity than full SDXL.

512×512real-time7 GB disk16 GB RAM ✓Stability AI Non-Commercial
EditComfyUI

OmniGen 2

BAAI · 3.8B · 2025
4
F
Too heavyFP8 · ~6.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Unified text-to-image and image-editing model. One model handles generate, edit, compose.

1024×102411 GB disk16 GB RAM ✓MIT
VideoComfyUI

CogVideoX-5B

THUDM (Tsinghua) · 5B · 2024
4
F
Too heavyQ4 GGUF · ~5.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

5B T2V/I2V from Tsinghua. Mid-range hardware target. 6-second clips at 720p.

720p · 6s16 GB disk16 GB RAM ✓CogVideoX (open)
ImageComfyUI

FLUX.2 [klein] 4B

Black Forest Labs · 4B · 2025
4
F
Too heavyQ4 GGUF · ~3.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

The smaller Apache-licensed FLUX.2. 4B params at 1024px — fits comfortably on midrange GPUs and high-end phones.

1024×1024fast12 GB disk16 GB RAM ✓Apache 2.0
ImageHugging FaceDistilled FLUX.2 for consumers

FLUX.2 Klein

Black Forest Labs · 4B · 2026
4
F
Too heavyFP8 · ~7.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

4 B distilled from FLUX.2 32 B. Real-time generation on consumer GPUs at FLUX-grade fidelity.

1024×1024fast11 GB disk16 GB RAM ✓FLUX.2 Klein (Apache-style)
MusicHugging FaceTop open music model

ACE-Step 1.5

ACE-Step · 3.5B · 2025
4
F
Too heavyINT8 · ~5.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Best open full-song generator with vocals. Generates 4-minute tracks in ~20 s on a 4090.

Up to 4 min · stereofast10 GB disk16 GB RAM ✓Apache 2.0
MusicHugging Face

MusicGen Large

Meta · 3.3B · 2023
4
F
Too heavyINT8 · ~7.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Meta's flagship MusicGen. Strong instrumental tracks. Non-commercial only.

32 kHz · 30s8.5 GB disk16 GB RAM ✓CC BY-NC 4.0 (non-commercial)
MusicHugging Face

MusicGen Stereo Large

Meta · 3.3B · 2024
4
F
Too heavyINT8 · ~7.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Stereo variant of MusicGen Large. True L/R channels for richer mix.

32 kHz · 30s8.5 GB disk16 GB RAM ✓CC BY-NC 4.0 (non-commercial)
ImageComfyUI

Stable Diffusion 3 Medium

Stability AI · 2B · 2024
3
F
Too heavyFP8 · ~4.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

MMDiT architecture with triple text encoders. Better text rendering than SDXL.

1024×102411 GB disk16 GB RAM ✓Stability AI Community
ImageComfyUI

Stable Diffusion 3.5 Medium

Stability AI · 2.5B · 2024
3
F
Too heavyFP8 · ~5.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Improved 3.5 generation with better composition and aesthetics than SD3 Medium.

1024×102412.5 GB disk16 GB RAM ✓Stability AI Community
ImageComfyUI

Lumina Image 2.0

Alpha-VLLM · 2.6B · 2025
3
F
Too heavyFP8 · ~4.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Compact next-gen DiT with strong photographic realism. Apache-licensed for commercial use.

1024×102412 GB disk16 GB RAM ✓Apache 2.0
VideoComfyUI

LTX-Video

Lightricks · 2B · 2024
3
F
Too heavyFP8 · ~7.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Realtime-capable video model. Generates faster than playback on a 4090. The speed champion.

768×512real-time11 GB disk16 GB RAM ✓Lightricks Open License
ImageHugging Face

Kolors

Kuaishou · 2.6B · 2024
3
F
Too heavyFP8 · ~6.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Strong photographic realism, multilingual prompts (Chinese + English). Mid-weight footprint.

1024×102412 GB disk16 GB RAM ✓Kolors (open)
TTSHugging Face

Parler-TTS Large

HuggingFace + Parler · 2.3B · 2024
3
F
Too heavyINT8 · ~6.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Larger Parler with broader voice and prosody range. Slower, higher fidelity.

44.1 kHz · 30s7 GB disk16 GB RAM ✓Apache 2.0
TTSHugging Face

Orpheus 3B

CanopyAI · 3B · 2025
3
F
Too heavyQ4 GGUF · ~4.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

LLM-style TTS with rich emotional control. Streaming output via vLLM.

24 kHz · streaming8 GB disk16 GB RAM ✓Apache 2.0
ImageComfyUI

Hunyuan-DiT

Tencent · 1.5B · 2024
2
F
Too heavyFP8 · ~4.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Bilingual (Chinese + English) DiT. Strong at Chinese text rendering, light to run.

1024×102411 GB disk16 GB RAM ✓Tencent Hunyuan Community
VideoComfyUI

Stable Video Diffusion

Stability AI · 1.5B · 2023
2
F
Too heavyFP8 · ~6.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Image-to-video, ~14–25 frames at 576×1024. The original consumer-friendly local video model.

576×1024 · 25 frames10 GB disk16 GB RAM ✓Stability AI Non-Commercial
VideoComfyUI

Wan 2.1 T2V 1.3B

Alibaba · 1.3B · 2025
2
F
Too heavyFP8 · ~5.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Tiny T2V. The most compatible video model — fits on 8GB GPUs. Apache-licensed.

480p9 GB disk16 GB RAM ✓Apache 2.0
VideoComfyUI

SkyReels V2 1.3B

Skywork AI · 1.3B · 2025
2
F
Too heavyFP8 · ~8.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Lightweight successor with infinite-length generation. Fits on 8 GB GPUs.

540p10 GB disk16 GB RAM ✓Apache 2.0
ImageHugging Face

SANA 1.5

NVIDIA Labs · 1.6B · 2025
2
F
Too heavyINT8 · ~5.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Linear DiT that generates up to 4K images on as little as 8 GB. Tiny weights, massive output.

Up to 4Kfast9 GB disk16 GB RAM ✓NVIDIA Source Code (Non-Commercial)
TTSHugging Face

Bark

Suno · 0.9B · 2023
2
F
Too heavyLow-VRAM · ~2.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Generates speech, laughter, sighs, music — all from a text prompt. Quirky but expressive.

24 kHz · 14s5 GB disk16 GB RAM ✓MIT
TTSHugging Face

Parler-TTS Mini

HuggingFace + Parler · 0.88B · 2024
2
F
Too heavyINT8 · ~3.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Prompt-controlled TTS — describe the voice in natural language ('young woman, warm, slow').

44.1 kHz · 30s3 GB disk8 GB RAM ✓Apache 2.0
TTSHugging Face

Sesame CSM 1B

Sesame · 1B · 2025
2
F
Too heavyINT8 · ~6.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Conversational speech model — context-aware prosody for dialogue agents.

24 kHz · streaming5 GB disk16 GB RAM ✓Apache 2.0
TTSHugging Face

Fish Speech 1.5

Fish Audio · 1.4B · 2024
2
F
Too heavyINT8 · ~7.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Multilingual zero-shot voice cloning. Strong on Chinese, Japanese, Korean.

44.1 kHz · 30s6.5 GB disk16 GB RAM ✓CC BY-NC-SA 4.0
MusicHugging Face

MusicGen Medium

Meta · 1.5B · 2023
2
F
Too heavyINT8 · ~4.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Mid-tier MusicGen. Faster than Large, slightly less rich.

32 kHz · 30s4.5 GB disk8 GB RAM ✓CC BY-NC 4.0 (non-commercial)
MusicHugging Face

Stable Audio Open 1.0

Stability AI · 1.21B · 2024
2
F
Too heavyINT8 · ~5.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Sound effects, ambient textures, short instrumental loops up to 47 s.

44.1 kHz · 47s4 GB disk16 GB RAM ✓Stability AI Community
MusicHugging Face

Stable Audio Open 1.5

Stability AI · 1.5B · 2025
2
F
Too heavyINT8 · ~6.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Refined sound design model. Cleaner ambient textures, tighter SFX.

44.1 kHz · 47s5 GB disk16 GB RAM ✓Stability AI Community
MusicHugging Face

DiffRhythm

ASLP-lab · 1.5B · 2025
2
F
Too heavyINT8 · ~5.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Latent-diffusion full-song generator. End-to-end. 8 GB minimum with chunked inference.

44.1 kHz · 4 min6 GB disk16 GB RAM ✓Apache 2.0
ImageComfyUI

Stable Diffusion 1.5

Stability AI / RunwayML · 0.86B · 2022
1
F
Too heavyFP8 · ~1.2 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

The classic. Tiny, fast, runs on almost anything. Massive ecosystem of LoRAs and fine-tunes.

512×512very fast4.5 GB disk8 GB RAM ✓CreativeML Open RAIL-M
ImageComfyUI

Stable Diffusion 2.1

Stability AI · 0.86B · 2022
1
F
Too heavyFP8 · ~1.4 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Successor to SD 1.5 with native 768px training. Smaller community than 1.5 but still light on hardware.

768×768very fast5.5 GB disk8 GB RAM ✓CreativeML Open RAIL++-M
ImageComfyUI

PixArt-Σ

PixArt-α team · 0.6B · 2024
1
F
Too heavyFP8 · ~4.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Featherweight 0.6B DiT with T5-XXL encoder. Beautiful 4K output for the param count.

Up to 4K11.5 GB disk16 GB RAM ✓OpenRAIL++
TTSHugging Face

Chatterbox Turbo

Resemble AI · 0.35B · 2025
1
F
Too heavyINT8 · ~1.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

350 M model with a 1-step distilled decoder. Sub-200 ms latency for production agents.

44.1 kHz · streamingreal-time2 GB disk8 GB RAM ✓MIT
TTSHugging Face

F5-TTS

SWivid · 0.33B · 2024
1
F
Too heavyINT8 · ~3.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Flow-matching TTS with strong voice cloning from a 6-second reference clip.

24 kHz · 30s2.5 GB disk16 GB RAM ✓CC BY-NC 4.0
TTSHugging Face

XTTS v2

Coqui · 0.47B · 2023
1
F
Too heavyFP16 · ~4.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Multi-language voice cloning from 6 seconds of reference audio. 17 languages.

24 kHz · streaming2.5 GB disk8 GB RAM ✓Coqui Public Model License
TTSHugging Face

Spark-TTS 0.5B

SparkAudio · 0.5B · 2025
1
F
Too heavyINT8 · ~2.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Tiny LLM-driven zero-shot voice cloning. 4 GB VRAM, runs comfortably on most GPUs.

24 kHz · streaming3 GB disk8 GB RAM ✓Apache 2.0
TTSHugging Face

Kani-TTS 2

Kani · 0.4B · 2025
1
F
Too heavyFP16 · ~3.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

400 M streaming TTS on a Liquid LFM2 backbone with NVIDIA NanoCodec. 3 GB VRAM end-to-end.

24 kHz · streamingreal-time2 GB disk8 GB RAM ✓Apache 2.0
MusicHugging Face

MusicGen Small

Meta · 0.3B · 2023
1
F
Too heavyFP16 · ~3.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Tiny MusicGen for fast prototyping. Runs on a 4 GB GPU.

32 kHz · 30sfast1 GB disk8 GB RAM ✓CC BY-NC 4.0 (non-commercial)
MusicHugging Face

Magenta RealTime

Google Magenta · 0.8B · 2025
1
F
Too heavyINT8 · ~3.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Live, prompt-steerable instrumental generation. Optimized for streaming.

44.1 kHz · streamingreal-time4 GB disk8 GB RAM ✓Apache 2.0
ImageComfyUI

Stable Diffusion 3.5 Large

Stability AI · 8B · 2024
0
F
Too heavyQ4 GGUF · ~8.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Stability's flagship 8B MMDiT. Sharp text, strong composition. Triple text encoder (CLIP-L, CLIP-G, T5-XXL).

1024×102420 GB diskRAM tight · −15Stability AI Community
ImageComfyUI

AuraFlow v0.3

Fal.ai · 6.8B · 2024
0
F
Too heavyFP8 · ~9.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Truly open-source flow-matching model. The largest fully Apache-licensed image model.

1024×102416 GB diskRAM tight · −15Apache 2.0
ImageComfyUI

FLUX.1 schnell

Black Forest Labs · 12B · 2024
0
F
Too heavyQ4 GGUF · ~7.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

4-step distilled FLUX. The fastest way to get FLUX-tier quality. Apache-licensed.

1024×1024fast33 GB diskRAM tight · −15Apache 2.0
ImageComfyUIFlagship FLUX

FLUX.1 dev

Black Forest Labs · 12B · 2024
0
F
Too heavyQ4 GGUF · ~7.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

The model to beat. State-of-the-art prompt adherence and photorealism. Hungry but worth it.

1024×102433 GB diskRAM tight · −15FLUX.1 Non-Commercial
EditComfyUI

FLUX.1 Kontext dev

Black Forest Labs · 12B · 2025
0
F
Too heavyQ4 GGUF · ~7.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Image editing variant of FLUX. Reference image + prompt → edited result.

1024×102433 GB diskRAM tight · −15FLUX.1 Non-Commercial
ImageComfyUI

HiDream-I1

HiDream-ai · 17B (8.5B active) · 2025
0
F
Too heavyQ4 GGUF · ~11.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Hybrid DiT + MoE. Beats Flux on several benchmarks. MIT-licensed for any use.

1024×102438 GB diskRAM tight · −15MIT
ImageComfyUI

Qwen-Image

Alibaba · 20B · 2025
0
F
Too heavyQ4 GGUF · ~14.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

20B MMDiT from Alibaba. Best-in-class text rendering — handles paragraphs of text in images.

1328×132845 GB diskRAM low · −25Apache 2.0
EditComfyUI

Qwen-Image-Edit

Alibaba · 20B · 2025
0
F
Too heavyQ4 GGUF · ~14.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Edit variant of Qwen-Image. Replace, add, restyle objects via natural prompts.

1328×132845 GB diskRAM low · −25Apache 2.0
ImageComfyUI

Hunyuan Image 2.1

Tencent · 17B · 2025
0
F
Too heavyQ4 GGUF · ~11.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Tencent's 17B image flagship. Strong at composition and Chinese text.

1024×102436 GB diskRAM tight · −15Tencent Hunyuan Community
ImageComfyUI84B MoE giant

Hunyuan Image 3

Tencent · 84B (13B active) · 2025
0
F
Too heavyQ4 GGUF · ~32.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Massive 84B MoE image model with 13B active. Quality rivals closed-source flagships. Needs serious hardware.

1024×1024180 GB diskRAM low · −25Tencent Hunyuan Community
ImageComfyUINext-gen FLUX

FLUX.2 dev

Black Forest Labs · 32B · 2025
0
F
Too heavyQ4 GGUF · ~22.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Successor to FLUX.1 dev. Higher fidelity, longer context. Heavy hardware required.

1024×102472 GB diskRAM low · −25FLUX.2 Non-Commercial
ImageComfyUI

ERNIE-Image

Baidu · 10B · 2025
0
F
Too heavyFP8 · ~12.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Baidu's 10B image model. Strong at Chinese prompts and stylized output.

1024×102422 GB diskRAM tight · −15Baidu Community
EditComfyUI

HiDream-E1.1

HiDream-ai · 17B (8.5B active) · 2025
0
F
Too heavyQ4 GGUF · ~11.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Editing variant of HiDream-I1. Same MoE backbone tuned for image editing.

1024×102436 GB diskRAM tight · −15MIT
VideoComfyUI

Wan 2.1 T2V 14B

Alibaba · 14B · 2025
0
F
Too heavyQ4 GGUF · ~12.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Full-size Wan T2V. Strong motion, 720p output. Quantization makes it viable down to 12GB.

720p30 GB diskRAM tight · −15Apache 2.0
VideoComfyUI

Wan 2.1 I2V 14B

Alibaba · 14B · 2025
0
F
Too heavyQ4 GGUF · ~12.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Image-to-video flagship. Best open I2V motion in early 2025.

720p30 GB diskRAM tight · −15Apache 2.0
VideoComfyUIBest open video < 8B

Wan 2.2 TI2V 5B

Alibaba · 5B · 2025
0
F
Too heavyQ4 GGUF · ~7.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Unified text + image to video. Best-in-class T2V/I2V at 5B. Fits on a 24GB GPU at FP16.

720p13 GB diskRAM tight · −15Apache 2.0
VideoComfyUI

Wan 2.2 T2V A14B (MoE)

Alibaba · 27B (14B active) · 2025
0
F
Too heavyQ4 GGUF · ~18.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

MoE flagship — 27B total / 14B active. Top open-video quality.

720p56 GB diskRAM low · −25Apache 2.0
VideoComfyUI

LTX-2

Lightricks · 8B · 2026
0
F
Too heavyDistilled FP8 · ~12.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Successor to LTX-Video. 4K-capable on high-end hardware. Distilled variants run on 12GB.

1080p–4K28 GB diskRAM tight · −15Lightricks Open License
VideoComfyUI

HunyuanVideo

Tencent · 13B · 2024
0
F
Too heavyQ4 GGUF · ~12.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

13B T2V — closest open rival to Sora at release. Slow but cinematic.

720p50 GB diskRAM tight · −15Tencent Hunyuan Community
VideoComfyUIConsumer-friendly Hunyuan

HunyuanVideo 1.5

Tencent · 8.3B · 2025
0
F
Too heavyQ4 GGUF · ~9.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Lighter, sharper successor. 8.3B params, 14GB minimum, runs on consumer GPUs.

720p22 GB diskRAM tight · −15Tencent Hunyuan Community
VideoComfyUI

Mochi 1

Genmo · 10B · 2024
0
F
Too heavyQ4 GGUF · ~9.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

10B asymmetric DiT. Ambitious motion. Apache-licensed for any use.

480p–720p22 GB diskRAM tight · −15Apache 2.0
VideoComfyUI

Pyramid Flow

Pyramid Flow team · 2B · 2024
0
F
Too heavyFP8 · ~8.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Pyramidal flow-matching for efficient long videos. Strong T2V quality at 2B.

768p · 10s11 GB diskRAM tight · −15MIT
ImageComfyUI

Chroma 1 HD

lodestones (community) · 8.9B · 2025
0
F
Too heavyQ4 GGUF · ~5.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Community 8.9B distillation of FLUX.1-schnell, fully Apache 2.0. Strong photographic quality with no commercial restrictions.

1024×102424 GB diskRAM tight · −15Apache 2.0
ImageComfyUI

FLUX.1 [klein]

Black Forest Labs · 9B · 2025
0
F
Too heavyQ4 GGUF · ~5.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Apache-licensed 9B distillation of FLUX.1. Smaller and lighter than schnell while keeping FLUX-tier prompt adherence.

1024×1024fast24 GB diskRAM tight · −15Apache 2.0
ImageComfyUI

FLUX.2 [klein] 9B

Black Forest Labs · 9B · 2025
0
F
Too heavyQ4 GGUF · ~5.5 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

The larger Apache-licensed FLUX.2. 9B params, sharper detail than the 4B variant.

1024×102424 GB diskRAM tight · −15Apache 2.0
VideoComfyUI

SkyReels V1

Skywork AI · 13.8B · 2025
0
F
Too heavyQ4 GGUF · ~12.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Human-centric video foundation model based on HunyuanVideo. Up to 12 s clips at 24 fps, 544×960. Strong at faces and motion.

544×960 · 12 s50 GB diskRAM tight · −15Apache 2.0
VideoComfyUI

SkyReels V2 14B

Skywork AI · 14B · 2025
0
F
Too heavyQ4 GGUF · ~12.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Flagship infinite-length video model derived from Wan 2.1 14B. Top-tier open-source motion at 720p.

720p30 GB diskRAM tight · −15Apache 2.0
ImageHugging Face

Cosmos-Predict2 2B Text2Image

NVIDIA · 2B · 2025
0
F
Too heavyFP8 · ~14.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Physics-aware text-to-image — coherent geometry and lighting, tuned for sim/robotics.

1024×102422 GB diskRAM tight · −15NVIDIA Open Model License
VideoHugging Face

Cosmos-Predict2 2B Video2World

NVIDIA · 2B · 2025
0
F
Too heavyFP8 · ~18.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

World-model T2V/I2V. Generates physics-consistent motion. Heavy VRAM appetite.

720p · 16fps28 GB diskRAM tight · −15NVIDIA Open Model License
TTSHugging FaceBest ultra-light TTS

Kokoro 82M

hexgrad · 0.082B · 2025
0
F
Too heavyFP32 CPU · ~0.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Featherweight TTS that punches way above its weight. Sub-300 ms inference, runs on CPU.

24 kHz · streamingreal-time1 GB disk8 GB RAM ✓Apache 2.0
MusicHugging Face

YuE 7B

Multimodal Art Projection · 7B · 2025
0
F
Too heavyQ4 GGUF (GP) · ~6.0 GB
VRAM headroomInfinity% used · ~0.0 steps/s proxy

Suno-style full-song generation with synchronized lyrics + vocals. Up to 5-minute tracks.

44.1 kHz · up to 5 min18 GB diskRAM tight · −15Apache 2.0
Methodology

No magic. Just math you can audit.

We compute every model’s VRAM at FP16, FP8, and Q4 GGUF; add a 10% KV-cache safety margin and a 0.5 GB runtime overhead; then check fit against your GPU. Speed comes from a bandwidth-bound model (steps/s ≈ bandwidth ÷ model size × efficiency). All client-side.

See the formulas
Models tracked
81
GPUs in DB
75
Categories
Image · Video · TTS · Music
Quant levels
FP16 · FP8 · Q4