metadata

title: Robot Task Planning - Llama 3.1 8B
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: gradio
app_file: app.py
pinned: false
license: llama3.1

🤖 Robot Task Planning - Llama 3.1 8B (ZeroGPU)

This Space demonstrates a fine-tuned version of Meta's Llama 3.1 8B model specialized for robot task planning using QLoRA (4-bit quantization + LoRA) technique.

🚀 Hardware: ZeroGPU

This Space uses ZeroGPU - dynamic GPU allocation with Nvidia H200:

Free for HuggingFace users
Dynamic allocation - GPU resources allocated on-demand
High performance - H200 offers superior performance
60-second duration per request

🎯 Purpose

Convert natural language commands into structured task sequences for construction robots including:

Excavators - Digging, loading, positioning
Dump Trucks - Material transport, loading, unloading
Multi-robot Coordination - Complex task dependencies

🔗 Model

Fine-tuned Model: YongdongWang/llama-3.1-8b-dart-qlora

Base Model: meta-llama/Llama-3.1-8B

✨ Features

🎮 Interactive Chat Interface - Real-time robot command processing
⚙️ Configurable Generation - Adjust temperature, top-p, max tokens
📝 Example Commands - Pre-built scenarios to get started
🚀 Optimized Performance - 4-bit quantization for efficient inference
📊 Structured Output - JSON-formatted task sequences
⚡ ZeroGPU Powered - Dynamic GPU allocation for free users

🚀 Usage

Input: Natural language robot commands

"Deploy Excavator 1 to Soil Area 1 for excavation"

Output: Structured task sequences

{
  "tasks": [
    {
      "robot": "Excavator_1",
      "action": "move_to",
      "target": "Soil_Area_1",
      "duration": 30
    },
    {
      "robot": "Excavator_1", 
      "action": "excavate",
      "target": "Soil_Area_1",
      "duration": 120
    }
  ]
}

🛠️ Technical Details

Architecture: Llama 3.1 8B + QLoRA adapters
Quantization: 4-bit (NF4) with double quantization
Framework: Transformers + PEFT + BitsAndBytesConfig
Hardware: ZeroGPU (Dynamic Nvidia H200)

⚡ Performance Notes

First Generation: 5-10 seconds (GPU allocation + model loading)
Subsequent Generations: 2-5 seconds per response
Memory Usage: ~8GB VRAM with 4-bit quantization
Context Length: Up to 2048 tokens
GPU Duration: 60 seconds per request

📚 Example Commands

Try these robot commands:

"Deploy Excavator 1 to Soil Area 1 for excavation"
"Send Dump Truck 1 to collect material, then unload at storage"
"Coordinate multiple excavators across different areas"
"Create evacuation sequence for all robots from dangerous zone"

🔬 Research Applications

This model demonstrates:

Natural Language → Robot Planning translation
Multi-agent Task Coordination
Efficient LLM Fine-tuning with QLoRA
Real-time Robot Command Processing
ZeroGPU Integration for scalable deployment

📄 License

This project uses Meta's Llama 3.1 license. Please review the license terms before use.

🤝 Contributing

For issues, improvements, or questions about the model, please visit the model repository.