Spaces:

YongdongWang
/

DART-LLM-Multi-Model

Sleeping

App Files Files Community

Yongdong Wang commited on Jul 23

Commit

4ce8f9e

1 Parent(s): 8e887ef

Modify the display model link to gguf.

Browse files

Files changed (2) hide show

README.md +77 -0
app.py +17 -12

README.md CHANGED Viewed

@@ -63,6 +63,83 @@ This Hugging Face Space hosts DART-LLM, a QLoRA-fine-tuned meta-llama/Llama-3.1-
 2. Click **Generate Tasks**.
 3. Review the structured JSON output describing the robot task sequence.
 ## Citation
 If you use this work, please cite:

 2. Click **Generate Tasks**.
 3. Review the structured JSON output describing the robot task sequence.
+## Local/Edge Deployment (Recommended for Jetson)
+For local deployment on edge devices like NVIDIA Jetson, we recommend using the GGUF quantized models for optimal performance and memory efficiency:
+### Available GGUF Models
+| Model | Size | Memory Usage | Recommended Hardware |
+|-------|------|--------------|---------------------|
+| [llama-3.2-1b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf) | 870MB | ~2GB RAM | Jetson Nano, Jetson Orin Nano |
+| [llama-3.2-3b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf) | 1.9GB | ~4GB RAM | Jetson Orin NX, Jetson AGX Orin |
+| [llama-3.1-8b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.1-8b-lora-qlora-dart-llm-gguf) | 4.6GB | ~8GB RAM | High-end Jetson AGX Orin |
+### Deployment Options
+#### Option 1: Using Ollama (Recommended)
+```bash
+# Install Ollama
+curl -fsSL https://ollama.ai/install.sh | sh
+# Create a Modelfile
+cat > Modelfile << EOF
+FROM ./llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf
+TEMPLATE """### Instruction:
+{{ .Prompt }}
+### Response:
+"""
+PARAMETER stop "### Instruction:"
+PARAMETER stop "### Response:"
+EOF
+# Create the model
+ollama create dart-llm-1b -f Modelfile
+# Run inference
+ollama run dart-llm-1b "Deploy Excavator 1 to Soil Area 1 for excavation"
+```
+#### Option 2: Using llama.cpp
+```bash
+# Clone and build llama.cpp
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
+make
+# Download model
+wget https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf/resolve/main/llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf
+# Run inference
+./main -m llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf \
+  -p "### Instruction:\nDeploy Excavator 1 to Soil Area 1 for excavation\n\n### Response:\n" \
+  -n 512
+```
+#### Option 3: Using Python (llama-cpp-python)
+```bash
+# Install llama-cpp-python
+pip install llama-cpp-python
+# Python script
+python3 << EOF
+from llama_cpp import Llama
+# Load model
+llm = Llama(model_path="llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf", n_ctx=2048)
+# Generate response
+prompt = "### Instruction:\nDeploy Excavator 1 to Soil Area 1 for excavation\n\n### Response:\n"
+output = llm(prompt, max_tokens=512, stop=["</s>"], echo=False)
+print(output['choices'][0]['text'])
+EOF
+```
 ## Citation
 If you use this work, please cite:

app.py CHANGED Viewed

@@ -277,16 +277,19 @@ with gr.Blocks(
     Choose from **three fine-tuned models** specialized for **robot task planning** using QLoRA technique:
-    - **🚀 Dart-llm-model-1B**: Ready for Jetson Nano deployment
-    - **⚖️ Dart-llm-model-3B**: Ready for Jetson Xavier NX deployment
-    - **🎯 Dart-llm-model-8B**: Ready for Jetson AGX Xavier/Orin deployment
     **Capabilities**: Convert natural language robot commands into structured task sequences for excavators, dump trucks, and other construction robots. **Edge-ready for Jetson devices with DAG Visualization!**
-    **Models**:
-    - [YongdongWang/llama-3.2-1b-lora-qlora-dart-llm](https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm) (Default)
-    - [YongdongWang/llama-3.2-3b-lora-qlora-dart-llm](https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm)
-    - [YongdongWang/llama-3.1-8b-lora-qlora-dart-llm](https://huggingface.co/YongdongWang/llama-3.1-8b-lora-qlora-dart-llm)
     """)
     with gr.Tabs():
@@ -336,11 +339,13 @@ with gr.Blocks(
                     )
                     gr.Markdown("""
-                    ### 🔧 Jetson Deployment Ready
-                    Choose the model that fits your Jetson device:
-                    - **1B**: Deployable on Jetson Nano (4GB RAM)
-                    - **3B**: Deployable on Jetson Xavier NX (8GB RAM)
-                    - **8B**: Deployable on Jetson AGX Xavier/Orin (32GB RAM)
                     """)
         with gr.Tab("📊 DAG Visualization"):

     Choose from **three fine-tuned models** specialized for **robot task planning** using QLoRA technique:
+    - **🚀 Dart-llm-model-1B**: Ready for Jetson Nano deployment (870MB GGUF)
+    - **⚖️ Dart-llm-model-3B**: Ready for Jetson Xavier NX deployment (1.9GB GGUF)
+    - **🎯 Dart-llm-model-8B**: Ready for Jetson AGX Xavier/Orin deployment (4.6GB GGUF)
     **Capabilities**: Convert natural language robot commands into structured task sequences for excavators, dump trucks, and other construction robots. **Edge-ready for Jetson devices with DAG Visualization!**
+    ## 🔧 Recommended for Jetson Deployment (GGUF Models)
+    For optimal edge deployment performance, use these GGUF quantized models:
+    - **[YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf)** (870MB) - Jetson Nano/Orin Nano
+    - **[YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf)** (1.9GB) - Jetson Orin NX/AGX Orin
+    - **[YongdongWang/llama-3.1-8b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.1-8b-lora-qlora-dart-llm-gguf)** (4.6GB) - High-end Jetson AGX Orin
+    💡 **Deploy with**: Ollama, llama.cpp, or llama-cpp-python for efficient edge inference
     """)
     with gr.Tabs():
                     )
                     gr.Markdown("""
+                    ### 🔧 GGUF Models for Jetson Deployment
+                    **Recommended for edge deployment:**
+                    - **1B (870MB)**: Jetson Nano/Orin Nano (2GB RAM)
+                    - **3B (1.9GB)**: Jetson Orin NX/AGX Orin (4GB RAM)
+                    - **8B (4.6GB)**: High-end Jetson AGX Orin (8GB RAM)
+                    💡 Use **Ollama** or **llama.cpp** for efficient inference
                     """)
         with gr.Tab("📊 DAG Visualization"):