view article Article We’re open-sourcing our text-to-image model and the process behind it 21 days ago • 73
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21 • 396
DINOv2: Learning Robust Visual Features without Supervision Paper • 2304.07193 • Published Apr 14, 2023 • 8
Intuitive physics understanding emerges from self-supervised pretraining on natural videos Paper • 2502.11831 • Published Feb 17 • 20
Cluster and Predict Latents Patches for Improved Masked Image Modeling Paper • 2502.08769 • Published Feb 12 • 5
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 66
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 32 items • Updated Jul 10 • 151
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 428
view article Article Introducing smolagents: simple agents that write actions in code. +1 Dec 31, 2024 • 1.15k
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8, 2024 • 40
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think Paper • 2410.06940 • Published Oct 9, 2024 • 10