File size: 4,628 Bytes
fa56edc 8adad2e fa56edc 8adad2e fa56edc 8adad2e fa56edc 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e e665266 8adad2e 4b4fc0a 8adad2e e665266 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e 4b4fc0a 8adad2e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
library_name: transformers
pipeline_tag: robotics
tags:
- robotics
- foundation-model
- gr00t
- dual-camera
- robot-learning
- manipulation
- embodied-ai
model_type: gr00t
datasets:
- so101_wave_300k_dualcam
language:
- en
base_model_relation: finetune
widget:
- example_title: "Robot Manipulation"
text: "Dual camera robotics control for manipulation tasks"
---
# GR00T Wave: Dual Camera Robotics Foundation Model
## Model Overview
GR00T Wave is a specialized robotics foundation model trained on dual-camera manipulation data from the SO101 Wave dataset. This model represents a significant advancement in robot learning, enabling sophisticated manipulation tasks through dual-camera visual input.
## Key Features
- **Dual Camera Input**: Processes synchronized dual-camera feeds for enhanced spatial understanding
- **Foundation Model Architecture**: Built on the GR00T framework for robust robotics applications
- **300K Training Steps**: Extensive training on high-quality manipulation demonstrations
- **Manipulation Focused**: Optimized for robotic manipulation and control tasks
## Model Details
- **Model Type**: GR00T Robotics Foundation Model
- **Training Data**: SO101 Wave 300K Dual Camera Dataset
- **Architecture**: Transformer-based with dual camera encoders
- **Training Steps**: 300,000 steps with checkpoints at 150K and 300K
- **Input Modalities**: Dual RGB cameras, robot state
- **Output**: Robot actions and control commands
## Usage
```python
from transformers import AutoModel, AutoTokenizer
# Load the model
model = AutoModel.from_pretrained("cagataydev/gr00t-wave", trust_remote_code=True)
# Model is ready for robotics inference
# Note: This model requires specialized robotics inference pipeline
```
## Training Configuration
- **Base Model**: GR00T N1.5-3B
- **Dataset**: SO101 Wave 300K Dual Camera
- **Training Framework**: Custom robotics training pipeline
- **Batch Size**: Optimized for dual camera inputs
- **Optimization**: AdamW with custom learning rate scheduling
## Model Files
The repository contains:
- **SafeTensors Model Files**:
- `model-00001-of-00002.safetensors` (4.7GB)
- `model-00002-of-00002.safetensors` (2.4GB)
- **Configuration Files**:
- `config.json`
- `model.safetensors.index.json`
- **Training Checkpoints**:
- `checkpoint-150000/` (16GB)
- `checkpoint-300000/` (16GB)
- **Training Metadata**:
- `trainer_state.json`
- `training_args.bin`
## Evaluation
The model has been evaluated on standard robotics manipulation benchmarks with the following approach:
- **Evaluation Steps**: 150 per checkpoint
- **Trajectory Count**: 5 trajectories per evaluation
- **Data Configuration**: SO100 dual camera setup
- **Metrics**: Success rate, manipulation accuracy, and task completion
## Applications
This model is suitable for:
- **Robotic Manipulation**: Pick and place operations
- **Dual Camera Systems**: Tasks requiring stereo vision
- **Manufacturing Automation**: Assembly and quality control
- **Research**: Foundation for robotics research and development
## Technical Specifications
- **Model Size**: ~7.1GB (SafeTensors format)
- **Total Repository Size**: ~40GB (including checkpoints)
- **Inference Requirements**: GPU with sufficient VRAM for transformer inference
- **Framework Compatibility**: Transformers, PyTorch
## Installation
```bash
# Install required dependencies
pip install transformers torch torchvision
pip install huggingface_hub
# Login to HuggingFace (required for private model)
huggingface-cli login
```
## Limitations
- Requires specialized robotics inference pipeline
- Optimized for specific dual camera configurations
- Performance may vary with different robot platforms
- Requires adequate computational resources for real-time inference
## Model Card
This model card provides comprehensive information about the GR00T Wave model, including its capabilities, limitations, and intended use cases. The model represents current state-of-the-art in robotics foundation models with dual camera input.
## Ethical Considerations
This model is designed for robotics research and industrial applications. Users should ensure:
- Safe deployment in robotics systems
- Appropriate safety measures for physical robot control
- Compliance with relevant safety standards
- Responsible use in manufacturing and research environments
## Version History
- **v1.0**: Initial release with 300K step training
- **Checkpoints**: Available at 150K and 300K training steps
## Support
For technical questions and implementation support, please refer to the model documentation and community resources. |