Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 25 days ago • 457
On the Theoretical Limitations of Embedding-Based Retrieval Paper • 2508.21038 • Published Aug 28 • 19
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published Aug 8 • 188
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion Paper • 2506.08009 • Published Jun 9 • 29
Continuous Visual Autoregressive Generation via Score Maximization Paper • 2505.07812 • Published May 12 • 12
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction Paper • 2505.02471 • Published May 5 • 15
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published May 5 • 85
Improving Editability in Image Generation with Layer-wise Memory Paper • 2505.01079 • Published May 2 • 29
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 298
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published Apr 10 • 50
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization Paper • 2503.06698 • Published Mar 9 • 4
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published Feb 28 • 26