How to Move Your Dragon: Text-to-Motion Synthesis for Large-Vocabulary Objects Paper • 2503.04257 • Published Mar 6 • 2
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published Mar 6 • 25
Do generative video models learn physical principles from watching videos? Paper • 2501.09038 • Published Jan 14 • 35
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens Paper • 2412.10208 • Published Dec 13, 2024 • 19