📢 NVIDIA Releases Nemotron-CC-Math Pre-Training Dataset: A High-Quality, Web-Scale Math Corpus for Pretraining Large Language Models Aug 18 • 5
NVIDIA Releases Improved Pretraining Dataset: Preserves High Value Math & Code, and Augments with Multi-Lingual Aug 18 • 3
NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks Aug 11 • 73
Llama-NeMoRetriever-ColEmbed: Developer-Focused Guide to NVIDIA's State-of-the-Art Text-Image Retrieval Jul 9 • 4
Nemotron-Personas: Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions Jun 10 • 16
nvidia/segformer-b0-finetuned-ade-512-512 Image Segmentation • 0.0B • Updated Jan 14, 2024 • 319k • • 165
nvidia/stt_en_fastconformer_transducer_xxlarge Automatic Speech Recognition • Updated Dec 24, 2023 • 187 • 12
nvidia/segformer-b4-finetuned-cityscapes-1024-1024 Image Segmentation • Updated Apr 24, 2023 • 272k • • 5