👉 Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Yanjiang Guo*, Lucy Xiaoyang Shi*, Jianyu Chen, Chelsea Finn
*Equal contribution; Stanford University, Tsinghua University
TL; DR:
Ctrl-World is an action-conditioned world model compatible with modern VLA policies and enables policy-in-the-loop rollouts entirely in imagination, which can be used to evaluate and improve the instruction following ability of VLA.
Model Details:
This repo include the Ctrl-World model checkpoint trained on opensourced DROID dataset (~95k trajectories, 564 scenes). The DROID platform consists of a Franka Panda robotic arm equipped with a Robotiq gripper and three cameras: two randomly placed third-person cameras and one wrist-mounted camera.
Usage
See the official Ctrl-World github repo for detailed usage.
Acknowledgement
Ctrl-World is developed from the opensourced video foundation model Stable-Video-Diffusion. The VLA model used in this repo is from openpi. We thank the authors for their efforts!
Bibtex
If you find our work helpful, please leave us a star and cite our paper. Thank you!
@article{guo2025ctrl,
title={Ctrl-World: A Controllable Generative World Model for Robot Manipulation},
author={Guo, Yanjiang and Shi, Lucy Xiaoyang and Chen, Jianyu and Finn, Chelsea},
journal={arXiv preprint arXiv:2510.10125},
year={2025}
}
- Downloads last month
- 32
Model tree for yjguo/Ctrl-World
Base model
stabilityai/stable-video-diffusion-img2vid