Yongdong Wang commited on
Commit
4ce8f9e
Β·
1 Parent(s): 8e887ef

Modify the display model link to gguf.

Browse files
Files changed (2) hide show
  1. README.md +77 -0
  2. app.py +17 -12
README.md CHANGED
@@ -63,6 +63,83 @@ This Hugging Face Space hosts DART-LLM, a QLoRA-fine-tuned meta-llama/Llama-3.1-
63
  2. Click **Generate Tasks**.
64
  3. Review the structured JSON output describing the robot task sequence.
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ## Citation
67
 
68
  If you use this work, please cite:
 
63
  2. Click **Generate Tasks**.
64
  3. Review the structured JSON output describing the robot task sequence.
65
 
66
+ ## Local/Edge Deployment (Recommended for Jetson)
67
+
68
+ For local deployment on edge devices like NVIDIA Jetson, we recommend using the GGUF quantized models for optimal performance and memory efficiency:
69
+
70
+ ### Available GGUF Models
71
+
72
+ | Model | Size | Memory Usage | Recommended Hardware |
73
+ |-------|------|--------------|---------------------|
74
+ | [llama-3.2-1b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf) | 870MB | ~2GB RAM | Jetson Nano, Jetson Orin Nano |
75
+ | [llama-3.2-3b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf) | 1.9GB | ~4GB RAM | Jetson Orin NX, Jetson AGX Orin |
76
+ | [llama-3.1-8b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.1-8b-lora-qlora-dart-llm-gguf) | 4.6GB | ~8GB RAM | High-end Jetson AGX Orin |
77
+
78
+ ### Deployment Options
79
+
80
+ #### Option 1: Using Ollama (Recommended)
81
+
82
+ ```bash
83
+ # Install Ollama
84
+ curl -fsSL https://ollama.ai/install.sh | sh
85
+
86
+ # Create a Modelfile
87
+ cat > Modelfile << EOF
88
+ FROM ./llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf
89
+ TEMPLATE """### Instruction:
90
+ {{ .Prompt }}
91
+
92
+ ### Response:
93
+ """
94
+ PARAMETER stop "### Instruction:"
95
+ PARAMETER stop "### Response:"
96
+ EOF
97
+
98
+ # Create the model
99
+ ollama create dart-llm-1b -f Modelfile
100
+
101
+ # Run inference
102
+ ollama run dart-llm-1b "Deploy Excavator 1 to Soil Area 1 for excavation"
103
+ ```
104
+
105
+ #### Option 2: Using llama.cpp
106
+
107
+ ```bash
108
+ # Clone and build llama.cpp
109
+ git clone https://github.com/ggerganov/llama.cpp
110
+ cd llama.cpp
111
+ make
112
+
113
+ # Download model
114
+ wget https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf/resolve/main/llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf
115
+
116
+ # Run inference
117
+ ./main -m llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf \
118
+ -p "### Instruction:\nDeploy Excavator 1 to Soil Area 1 for excavation\n\n### Response:\n" \
119
+ -n 512
120
+ ```
121
+
122
+ #### Option 3: Using Python (llama-cpp-python)
123
+
124
+ ```bash
125
+ # Install llama-cpp-python
126
+ pip install llama-cpp-python
127
+
128
+ # Python script
129
+ python3 << EOF
130
+ from llama_cpp import Llama
131
+
132
+ # Load model
133
+ llm = Llama(model_path="llama_3.2_1b-lora-qlora-dart-llm_q5_k_m.gguf", n_ctx=2048)
134
+
135
+ # Generate response
136
+ prompt = "### Instruction:\nDeploy Excavator 1 to Soil Area 1 for excavation\n\n### Response:\n"
137
+ output = llm(prompt, max_tokens=512, stop=["</s>"], echo=False)
138
+
139
+ print(output['choices'][0]['text'])
140
+ EOF
141
+ ```
142
+
143
  ## Citation
144
 
145
  If you use this work, please cite:
app.py CHANGED
@@ -277,16 +277,19 @@ with gr.Blocks(
277
 
278
  Choose from **three fine-tuned models** specialized for **robot task planning** using QLoRA technique:
279
 
280
- - **πŸš€ Dart-llm-model-1B**: Ready for Jetson Nano deployment
281
- - **βš–οΈ Dart-llm-model-3B**: Ready for Jetson Xavier NX deployment
282
- - **🎯 Dart-llm-model-8B**: Ready for Jetson AGX Xavier/Orin deployment
283
 
284
  **Capabilities**: Convert natural language robot commands into structured task sequences for excavators, dump trucks, and other construction robots. **Edge-ready for Jetson devices with DAG Visualization!**
285
 
286
- **Models**:
287
- - [YongdongWang/llama-3.2-1b-lora-qlora-dart-llm](https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm) (Default)
288
- - [YongdongWang/llama-3.2-3b-lora-qlora-dart-llm](https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm)
289
- - [YongdongWang/llama-3.1-8b-lora-qlora-dart-llm](https://huggingface.co/YongdongWang/llama-3.1-8b-lora-qlora-dart-llm)
 
 
 
290
  """)
291
 
292
  with gr.Tabs():
@@ -336,11 +339,13 @@ with gr.Blocks(
336
  )
337
 
338
  gr.Markdown("""
339
- ### πŸ”§ Jetson Deployment Ready
340
- Choose the model that fits your Jetson device:
341
- - **1B**: Deployable on Jetson Nano (4GB RAM)
342
- - **3B**: Deployable on Jetson Xavier NX (8GB RAM)
343
- - **8B**: Deployable on Jetson AGX Xavier/Orin (32GB RAM)
 
 
344
  """)
345
 
346
  with gr.Tab("πŸ“Š DAG Visualization"):
 
277
 
278
  Choose from **three fine-tuned models** specialized for **robot task planning** using QLoRA technique:
279
 
280
+ - **πŸš€ Dart-llm-model-1B**: Ready for Jetson Nano deployment (870MB GGUF)
281
+ - **βš–οΈ Dart-llm-model-3B**: Ready for Jetson Xavier NX deployment (1.9GB GGUF)
282
+ - **🎯 Dart-llm-model-8B**: Ready for Jetson AGX Xavier/Orin deployment (4.6GB GGUF)
283
 
284
  **Capabilities**: Convert natural language robot commands into structured task sequences for excavators, dump trucks, and other construction robots. **Edge-ready for Jetson devices with DAG Visualization!**
285
 
286
+ ## πŸ”§ Recommended for Jetson Deployment (GGUF Models)
287
+ For optimal edge deployment performance, use these GGUF quantized models:
288
+ - **[YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-1b-lora-qlora-dart-llm-gguf)** (870MB) - Jetson Nano/Orin Nano
289
+ - **[YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf)** (1.9GB) - Jetson Orin NX/AGX Orin
290
+ - **[YongdongWang/llama-3.1-8b-lora-qlora-dart-llm-gguf](https://huggingface.co/YongdongWang/llama-3.1-8b-lora-qlora-dart-llm-gguf)** (4.6GB) - High-end Jetson AGX Orin
291
+
292
+ πŸ’‘ **Deploy with**: Ollama, llama.cpp, or llama-cpp-python for efficient edge inference
293
  """)
294
 
295
  with gr.Tabs():
 
339
  )
340
 
341
  gr.Markdown("""
342
+ ### πŸ”§ GGUF Models for Jetson Deployment
343
+ **Recommended for edge deployment:**
344
+ - **1B (870MB)**: Jetson Nano/Orin Nano (2GB RAM)
345
+ - **3B (1.9GB)**: Jetson Orin NX/AGX Orin (4GB RAM)
346
+ - **8B (4.6GB)**: High-end Jetson AGX Orin (8GB RAM)
347
+
348
+ πŸ’‘ Use **Ollama** or **llama.cpp** for efficient inference
349
  """)
350
 
351
  with gr.Tab("πŸ“Š DAG Visualization"):