deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B Text Generation • 2B • Updated Feb 24 • 952k • • 1.28k
Pico Decoder Model Suite Collection Pico Decoder models (10M-500M) • 4 items • Updated Apr 22 • 1
Running 2.85k 2.85k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters