Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning Paper • 2503.11646 • Published Mar 14 • 36
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Paper • 2501.01895 • Published Jan 3 • 56
Effective Tuning Strategies for Generalist Robot Manipulation Policies Paper • 2410.01220 • Published Oct 2, 2024
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models Paper • 2409.20551 • Published Sep 30, 2024 • 15
SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation Paper • 2409.18082 • Published Sep 26, 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published Sep 23, 2024 • 26
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents Paper • 2407.17490 • Published Jul 3, 2024 • 32
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification Paper • 2305.09160 • Published May 16, 2023
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models Paper • 2402.05935 • Published Feb 8, 2024 • 17
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models Paper • 2403.11289 • Published Mar 17, 2024
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners Paper • 2303.02151 • Published Mar 3, 2023
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model Paper • 2305.11176 • Published May 18, 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models Paper • 2306.09265 • Published Jun 15, 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard Paper • 2308.03729 • Published Aug 7, 2023 • 10
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models Paper • 2402.14800 • Published Feb 22, 2024 • 3
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want Paper • 2403.20271 • Published Mar 29, 2024 • 3
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Paper • 2406.08451 • Published Jun 12, 2024 • 26
A3VLM: Actionable Articulation-Aware Vision Language Model Paper • 2406.07549 • Published Jun 11, 2024 • 1