Siyuan's picture

8 15 3

Siyuan PRO

SiyuanH

·

siyuanhuang95

AI & ML interests

None yet

Organizations

None yet

authored a paper 4 months ago

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

Paper • 2503.11646 • Published Mar 14 • 36

authored 2 papers 7 months ago

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Paper • 2501.01895 • Published Jan 3 • 56

A3: Android Agent Arena for Mobile GUI Agents

Paper • 2501.01149 • Published Jan 2 • 22

authored 4 papers 10 months ago

Effective Tuning Strategies for Generalist Robot Manipulation Policies

Paper • 2410.01220 • Published Oct 2, 2024

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Paper • 2409.20551 • Published Sep 30, 2024 • 15

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation

Paper • 2409.18082 • Published Sep 26, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Paper • 2409.15278 • Published Sep 23, 2024 • 26

authored 12 papers 11 months ago

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents

Paper • 2407.17490 • Published Jul 3, 2024 • 32

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification

Paper • 2305.09160 • Published May 16, 2023

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Paper • 2402.05935 • Published Feb 8, 2024 • 17

ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

Paper • 2403.11289 • Published Mar 17, 2024

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Paper • 2303.02151 • Published Mar 3, 2023

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

Paper • 2305.11176 • Published May 18, 2023

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

Paper • 2306.09265 • Published Jun 15, 2023

Tiny LVLM-eHub: Early Multimodal Experiments with Bard

Paper • 2308.03729 • Published Aug 7, 2023 • 10

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

Paper • 2402.14800 • Published Feb 22, 2024 • 3

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

Paper • 2403.20271 • Published Mar 29, 2024 • 3

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

Paper • 2406.08451 • Published Jun 12, 2024 • 26

A3VLM: Actionable Articulation-Aware Vision Language Model

Paper • 2406.07549 • Published Jun 11, 2024 • 1