DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper • 2507.04447 • Published 24 days ago • 41
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30 • 96
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation Paper • 2502.13143 • Published Feb 18 • 31
Learning Getting-Up Policies for Real-World Humanoid Robots Paper • 2502.12152 • Published Feb 17 • 43
Taming Teacher Forcing for Masked Autoregressive Video Generation Paper • 2501.12389 • Published Jan 21 • 10
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24, 2024 • 58
PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection Paper • 2205.11098 • Published May 23, 2022
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction Paper • 2402.17766 • Published Feb 27, 2024
Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining Paper • 2302.02318 • Published Feb 5, 2023
CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP Paper • 2303.04748 • Published Mar 8, 2023
Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks Paper • 2112.15139 • Published Dec 30, 2021
VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation Paper • 2307.16605 • Published Jul 28, 2023
Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception Paper • 2303.05970 • Published Mar 10, 2023
Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning? Paper • 2212.08320 • Published Dec 16, 2022
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning Paper • 2307.09474 • Published Jul 18, 2023 • 1
DreamLLM: Synergistic Multimodal Comprehension and Creation Paper • 2309.11499 • Published Sep 20, 2023 • 59