nvidia/Llama-3.1-Nemotron-8B-UltraLong-4M-Instruct Text Generation β’ 8B β’ Updated Apr 17 β’ 51.5k β’ 115
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B Text Generation β’ 8B β’ Updated Feb 24 β’ 906k β’ β’ 682
Running 2.85k 2.85k The Ultra-Scale Playbook π The ultimate guide to training LLM on large GPU Clusters
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B Text Generation β’ 2B β’ Updated Feb 24 β’ 952k β’ β’ 1.28k