Spaces:
Sleeping
Sleeping
Yongdong Wang
commited on
Commit
Β·
4ce8f9e
1
Parent(s):
8e887ef
Modify the display model link to gguf.
Browse files
README.md
CHANGED
@@ -63,6 +63,83 @@ This Hugging Face Space hosts DART-LLM, a QLoRA-fine-tuned meta-llama/Llama-3.1-
|
|
63 |
2. Click **Generate Tasks**.
|
64 |
3. Review the structured JSON output describing the robot task sequence.
|
65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
## Citation
|
67 |
|
68 |
If you use this work, please cite:
|
|
|
63 |
2. Click **Generate Tasks**.
|
64 |
3. Review the structured JSON output describing the robot task sequence.
|
65 |
|
66 |
+
## Local/Edge Deployment (Recommended for Jetson)
|
67 |
+
|
68 |
+
For local deployment on edge devices like NVIDIA Jetson, we recommend using the GGUF quantized models for optimal performance and memory efficiency:
|
69 |
+
|
70 |
+
### Available GGUF Models
|
71 |
+
|
72 |
+
| Model | Size | Memory Usage | Recommended Hardware |
|
73 |
+
|-------|------|--------------|---------------------|
|
74 |
+
| [llama-3.2-1b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf) | 870MB | ~2GB RAM | Jetson Nano, Jetson Orin Nano |
|
75 |
+
| [llama-3.2-3b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf) | 1.9GB | ~4GB RAM | Jetson Orin NX, Jetson AGX Orin |
|
76 |
+
| [llama-3.1-8b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.1-8b-lora-qlora-dart-llm-gguf) | 4.6GB | ~8GB RAM | High-end Jetson AGX Orin |
|
77 |
+
|
78 |
+
### Deployment Options
|
79 |
+
|
80 |
+
#### Option 1: Using Ollama (Recommended)
|
81 |
+
|
82 |
+
```bash
|
83 |
+
# Install Ollama
|
84 |
+
curl -fsSL https://ollama.ai/install.sh | sh
|
85 |
+
|
86 |
+
# Create a Modelfile
|
87 |
+
cat > Modelfile << EOF
|
88 |
+
FROM ./llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf
|
89 |
+
TEMPLATE """### Instruction:
|
90 |
+
{{ .Prompt }}
|
91 |
+
|
92 |
+
### Response:
|
93 |
+
"""
|
94 |
+
PARAMETER stop "### Instruction:"
|
95 |
+
PARAMETER stop "### Response:"
|
96 |
+
EOF
|
97 |
+
|
98 |
+
# Create the model
|
99 |
+
ollama create dart-llm-1b -f Modelfile
|
100 |
+
|
101 |
+
# Run inference
|
102 |
+
ollama run dart-llm-1b "Deploy Excavator 1 to Soil Area 1 for excavation"
|
103 |
+
```
|
104 |
+
|
105 |
+
#### Option 2: Using llama.cpp
|
106 |
+
|
107 |
+
```bash
|
108 |
+
# Clone and build llama.cpp
|
109 |
+
git clone https://github.com/ggerganov/llama.cpp
|
110 |
+
cd llama.cpp
|
111 |
+
make
|
112 |
+
|
113 |
+
# Download model
|
114 |
+
wget https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf/resolve/main/llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf
|
115 |
+
|
116 |
+
# Run inference
|
117 |
+
./main -m llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf \
|
118 |
+
-p "### Instruction:\nDeploy Excavator 1 to Soil Area 1 for excavation\n\n### Response:\n" \
|
119 |
+
-n 512
|
120 |
+
```
|
121 |
+
|
122 |
+
#### Option 3: Using Python (llama-cpp-python)
|
123 |
+
|
124 |
+
```bash
|
125 |
+
# Install llama-cpp-python
|
126 |
+
pip install llama-cpp-python
|
127 |
+
|
128 |
+
# Python script
|
129 |
+
python3 << EOF
|
130 |
+
from llama_cpp import Llama
|
131 |
+
|
132 |
+
# Load model
|
133 |
+
llm = Llama(model_path="llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf", n_ctx=2048)
|
134 |
+
|
135 |
+
# Generate response
|
136 |
+
prompt = "### Instruction:\nDeploy Excavator 1 to Soil Area 1 for excavation\n\n### Response:\n"
|
137 |
+
output = llm(prompt, max_tokens=512, stop=["</s>"], echo=False)
|
138 |
+
|
139 |
+
print(output['choices'][0]['text'])
|
140 |
+
EOF
|
141 |
+
```
|
142 |
+
|
143 |
## Citation
|
144 |
|
145 |
If you use this work, please cite:
|
app.py
CHANGED
@@ -277,16 +277,19 @@ with gr.Blocks(
|
|
277 |
|
278 |
Choose from **three fine-tuned models** specialized for **robot task planning** using QLoRA technique:
|
279 |
|
280 |
-
- **π Dart-llm-model-1B**: Ready for Jetson Nano deployment
|
281 |
-
- **βοΈ Dart-llm-model-3B**: Ready for Jetson Xavier NX deployment
|
282 |
-
- **π― Dart-llm-model-8B**: Ready for Jetson AGX Xavier/Orin deployment
|
283 |
|
284 |
**Capabilities**: Convert natural language robot commands into structured task sequences for excavators, dump trucks, and other construction robots. **Edge-ready for Jetson devices with DAG Visualization!**
|
285 |
|
286 |
-
|
287 |
-
|
288 |
-
- [YongdongWang/llama-3.2-
|
289 |
-
- [YongdongWang/llama-3.
|
|
|
|
|
|
|
290 |
""")
|
291 |
|
292 |
with gr.Tabs():
|
@@ -336,11 +339,13 @@ with gr.Blocks(
|
|
336 |
)
|
337 |
|
338 |
gr.Markdown("""
|
339 |
-
### π§ Jetson Deployment
|
340 |
-
|
341 |
-
- **1B**:
|
342 |
-
- **3B**:
|
343 |
-
- **8B**:
|
|
|
|
|
344 |
""")
|
345 |
|
346 |
with gr.Tab("π DAG Visualization"):
|
|
|
277 |
|
278 |
Choose from **three fine-tuned models** specialized for **robot task planning** using QLoRA technique:
|
279 |
|
280 |
+
- **π Dart-llm-model-1B**: Ready for Jetson Nano deployment (870MB GGUF)
|
281 |
+
- **βοΈ Dart-llm-model-3B**: Ready for Jetson Xavier NX deployment (1.9GB GGUF)
|
282 |
+
- **π― Dart-llm-model-8B**: Ready for Jetson AGX Xavier/Orin deployment (4.6GB GGUF)
|
283 |
|
284 |
**Capabilities**: Convert natural language robot commands into structured task sequences for excavators, dump trucks, and other construction robots. **Edge-ready for Jetson devices with DAG Visualization!**
|
285 |
|
286 |
+
## π§ Recommended for Jetson Deployment (GGUF Models)
|
287 |
+
For optimal edge deployment performance, use these GGUF quantized models:
|
288 |
+
- **[YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf)** (870MB) - Jetson Nano/Orin Nano
|
289 |
+
- **[YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf)** (1.9GB) - Jetson Orin NX/AGX Orin
|
290 |
+
- **[YongdongWang/llama-3.1-8b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.1-8b-lora-qlora-dart-llm-gguf)** (4.6GB) - High-end Jetson AGX Orin
|
291 |
+
|
292 |
+
π‘ **Deploy with**: Ollama, llama.cpp, or llama-cpp-python for efficient edge inference
|
293 |
""")
|
294 |
|
295 |
with gr.Tabs():
|
|
|
339 |
)
|
340 |
|
341 |
gr.Markdown("""
|
342 |
+
### π§ GGUF Models for Jetson Deployment
|
343 |
+
**Recommended for edge deployment:**
|
344 |
+
- **1B (870MB)**: Jetson Nano/Orin Nano (2GB RAM)
|
345 |
+
- **3B (1.9GB)**: Jetson Orin NX/AGX Orin (4GB RAM)
|
346 |
+
- **8B (4.6GB)**: High-end Jetson AGX Orin (8GB RAM)
|
347 |
+
|
348 |
+
π‘ Use **Ollama** or **llama.cpp** for efficient inference
|
349 |
""")
|
350 |
|
351 |
with gr.Tab("π DAG Visualization"):
|