mbreuss
/

flower_calvin_abc

@@ -1,46 +1,75 @@
 # FlowerVLA - Vision-Language-Action Flow Model for {dataset_name}
-    This is a pretrained FlowerVLA model for robotic manipulation trained on the {dataset_name} dataset. FlowerVLA is an efficient Vision-Language-Action Flow policy for robot learning.
-    ## Model Description
-    FlowerVLA is a novel architecture that:
-    - Uses Florence-2 for multi-modal vision-language encoding
-    - Employs a transformer-based flow matching architecture
-    - Provides an efficient policy with ~1B parameters
-    - Operates on action chunks for better long-horizon planning
-    ## Usage
-    ```python
-    from huggingface_hub import snapshot_download
-    import torch
-    import hydra
-    from omegaconf import OmegaConf
-    import json
-    import os
-    model_path = snapshot_download(repo_id="{repo_id}")
-    with open(os.path.join(model_path, "config.json")) as f:
-        config = json.load(f)
-    model_cfg = OmegaConf.create(config["model_config"])
-    model_cfg["_target_"] = "flower.models.flower.FLOWERVLA"
-    model = hydra.utils.instantiate(model_cfg)
-    state_dict = torch.load(os.path.join(model_path, "model.pt"))
-    model.load_state_dict(state_dict)
-    model.eval()
-    # obs = {...}  # Your observation dict
-    # goal = {"lang_text": "push the blue block to the right"}
-    # action = model.step(obs, goal)
-    @inproceedings{
-        reuss2024multimodal,
-        # Add citation when available
     }

+---
+license: mit
+language:
+- en
+base_model:
+- microsoft/Florence-2-large
+pipeline_tag: robotics
+tags:
+- robotics
+- VLA
+---
 # FlowerVLA - Vision-Language-Action Flow Model for {dataset_name}
+This is a pretrained FlowerVLA model for robotic manipulation trained on the {dataset_name} dataset.
+Flower is an efficient Vision-Language-Action Flow policy for robot learning that only contains 1B parameters.
+## Model Description
+FlowerVLA is a novel architecture that:
+- Uses half of Florence-2 for multi-modal vision-language encoding
+- Employs an novel transformer-based flow matching architecture
+- Provides an efficient, versatile VLA policy with only ~1B parameters
+## Model Performance
+This checkpoint contains weights for the CALVIN ABC challenge and currently ranks 1 with the following results:
+| Train→Test | Method | 1 | 2 | 3 | 4 | 5 | **Avg. Len.** |
+|------------|--------|---|---|---|---|---|---------------|
+| {dataset_name} | FlowerVLA | 99.3% | 95.9% | 90.5% | 84.8% |77.5% | 4.54 |
+### Input/Output Specifications
+#### Inputs
+- RGB Static Camera: `(B, T, 3, H, W)` tensor
+- RGB Gripper Camera: `(B, T, 3, H, W)` tensor
+- Language Instructions: Text strings
+#### Outputs
+- Action Space: `(B, T, 7)` tensor representing delta EEF actions
+## Usage
+Check out our full model implementation on Github [todo]() and follow the instructions in the readme to test the model on one of the environments.
+```python
+obs = {
+    "rgb_obs": {
+        "rgb_static": static_image,
+        "rgb_gripper": gripper_image
     }
+}
+goal = {"lang_text": "pick up the blue cube"}
+action = model.step(obs, goal)
+```
+## Training Details
+### Configuration
+- **Optimizer**: AdamW
+- **Learning Rate**: 2e-5
+- **Weight Decay**: 0.05
+@inproceedings{
+    reuss2025flower,
+    # Add citation when available
+}
+## License
+This model is released under the MIT license.