Submitted by Juanxi 93 MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization · 11 authors 38 7
Submitted by Howe666 70 AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction · 5 authors 325 2
Submitted by akhaliq 68 DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance · 6 authors 7
Submitted by lkevinzc 55 Understanding R1-Zero-Like Training: A Critical Perspective · 8 authors 1.05k 3
Submitted by wenhu 44 ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations · 10 authors 220 2
Submitted by 8ruceLi 41 Towards Physically Plausible Video Generation via VLM Planning · 11 authors 3
Submitted by hanyang-21 40 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step · 4 authors 299 2
Submitted by akhaliq 24 Articulated Kinematics Distillation from Video Diffusion Models · 7 authors 3
Submitted by huangrh9 23 ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement · 11 authors 115 4
Submitted by AdinaY 22 Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback · 3 authors 3
Submitted by Jarvis1111 15 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks · 7 authors 8 2
Submitted by nielsr 14 MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis · 14 authors 2
Submitted by YanNeu 13 DASH: Detection and Assessment of Systematic Hallucinations of VLMs · 3 authors 9 2
Submitted by hychiang 10 Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models · 6 authors 55 2
Submitted by Jiuzhouh 9 VerifiAgent: a Unified Verification Agent in Language Model Reasoning · 3 authors 4 2
Submitted by mawjdgus 4 Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations · 2 authors 26 1