FlowerVLA - Vision-Language-Action Flow Model for CALVIN ABC
This is a pretrained FlowerVLA model for robotic manipulation trained on the CALVIN ABC dataset. Flower is an efficient Vision-Language-Action Flow policy for robot learning that only contains 1B parameters.
Model Description
FlowerVLA is a novel architecture that:
- Uses half of Florence-2 for multi-modal vision-language encoding
- Employs an novel transformer-based flow matching architecture
- Provides an efficient, versatile VLA policy with only ~1B parameters
Model Performance
This checkpoint contains weights for the CALVIN ABC challenge and currently ranks 1 with the following results:
| Train→Test | Method | 1 | 2 | 3 | 4 | 5 | Avg. Len. |
|---|---|---|---|---|---|---|---|
| CALVIN ABC | FlowerVLA | 99.3% | 95.9% | 90.5% | 84.8% | 77.5% | 4.54 |
Input/Output Specifications
Inputs
- RGB Static Camera:
(B, T, 3, H, W)tensor - RGB Gripper Camera:
(B, T, 3, H, W)tensor - Language Instructions: Text strings
Outputs
- Action Space:
(B, T, 7)tensor representing delta EEF actions
Usage
Check out our full model implementation on Github todo and follow the instructions in the readme to test the model on one of the environments.
obs = {
"rgb_obs": {
"rgb_static": static_image,
"rgb_gripper": gripper_image
}
}
goal = {"lang_text": "pick up the blue cube"}
action = model.step(obs, goal)
Training Details
Configuration
- Optimizer: AdamW
- Learning Rate: 2e-5
- Weight Decay: 0.05
@inproceedings{ reuss2025flower, title={{FLOWER}: Democratizing Generalist Robot Policies with Efficient Vision-Language-Flow Models}, author={Moritz Reuss and Hongyi Zhou and Marcel R{"u}hle and {"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Fabian Otto and Rudolf Lioutikov}, booktitle={9th Annual Conference on Robot Learning}, year={2025}, url={https://openreview.net/forum?id=JeppaebLRD} }
License
This model is released under the MIT license.
- Downloads last month
- 6
Model tree for mbreuss/flower_calvin_abc
Unable to build the model tree, the base model loops to the model itself. Learn more.