YongdongWang commited on
Commit
497308e
·
verified ·
1 Parent(s): 60d42aa

Upload llama_3.2_3b-lora-qlora-dart-llm GGUF quantized models

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ llama_3.2_3b-lora-qlora-dart-llm_q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ base_model: meta-llama/Llama-3.2-3B
4
+ tags:
5
+ - llama
6
+ - gguf
7
+ - quantized
8
+ - robotics
9
+ - task-planning
10
+ - construction
11
+ - dart-llm
12
+ language:
13
+ - en
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # Llama 3.2 3B DART LLM - GGUF Quantized Models
18
+
19
+ This repository contains GGUF quantized versions of the **Llama 3.2 3B DART LLM** model, fine-tuned for robot task planning in construction environments.
20
+
21
+ ## Model Details
22
+
23
+ - **Base Model**: meta-llama/Llama-3.2-3B
24
+ - **Fine-tuned Version**: Based on QLoRA fine-tuned model for robotics task planning
25
+ - **Format**: GGUF (GPT-Generated Unified Format)
26
+ - **Use Case**: Optimized for inference with llama.cpp and compatible frameworks
27
+
28
+ ## Available Files
29
+
30
+ - **M**: `llama_3.2_3b-lora-qlora-dart-llm_q4_k_m.gguf` - m quantization
31
+
32
+ ## Usage with llama.cpp
33
+
34
+ ```bash
35
+ # Clone llama.cpp repository
36
+ git clone https://github.com/ggerganov/llama.cpp
37
+ cd llama.cpp
38
+
39
+ # Build llama.cpp
40
+ make
41
+
42
+ # Download a quantized model (example with q4_k_m)
43
+ wget https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf/resolve/main/{model_filename}_q4_k_m.gguf
44
+
45
+ # Run inference
46
+ ./main -m {model_filename}_q4_k_m.gguf -p "### Instruction:\nDeploy Excavator 1 to Soil Area 1 for excavation\n\n### Response:\n" -n 512
47
+ ```
48
+
49
+ ## Usage with Python (llama-cpp-python)
50
+
51
+ ```python
52
+ from llama_cpp import Llama
53
+
54
+ # Load model
55
+ llm = Llama(model_path="{model_filename}_q4_k_m.gguf", n_ctx=2048)
56
+
57
+ # Generate response
58
+ prompt = "### Instruction:\nDeploy Excavator 1 to Soil Area 1 for excavation\n\n### Response:\n"
59
+ output = llm(prompt, max_tokens=512, stop=["</s>"], echo=False)
60
+
61
+ print(output['choices'][0]['text'])
62
+ ```
63
+
64
+ ## Quantization Details
65
+
66
+ Different quantization levels offer trade-offs between model size, inference speed, and quality:
67
+
68
+ - **f16**: Full 16-bit precision (largest, highest quality)
69
+ - **q8_0**: 8-bit quantization (good balance of size and quality)
70
+ - **q5_k_m**: 5-bit quantization with mixed precision (recommended)
71
+ - **q4_k_m**: 4-bit quantization (good for most use cases)
72
+ - **q3_k_m**: 3-bit quantization (smaller, some quality loss)
73
+ - **q2_k**: 2-bit quantization (smallest, significant quality loss)
74
+
75
+ ## Performance
76
+
77
+ The model generates structured JSON task sequences for construction robotics:
78
+
79
+ ```json
80
+ {
81
+ "tasks": [
82
+ {
83
+ "instruction_function": {
84
+ "dependencies": [],
85
+ "name": "target_area_for_specific_robots",
86
+ "object_keywords": ["soil_area_1"],
87
+ "robot_ids": ["robot_excavator_01"],
88
+ "robot_type": null
89
+ },
90
+ "task": "target_area_for_specific_robots_1"
91
+ }
92
+ ]
93
+ }
94
+ ```
95
+
96
+ ## Original Model
97
+
98
+ This GGUF model is converted from: [YongdongWang/llama-3.2-3b-lora-qlora-dart-llm](https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm)
99
+
100
+ ## License
101
+
102
+ This model inherits the license from the base model (meta-llama/Llama-3.2-3B).
103
+
104
+ ## Citation
105
+
106
+ ```bibtex
107
+ @misc{llama_3.2_3b_lora_qlora_dart_llm_gguf,
108
+ title={Llama 3.2 3B DART LLM - GGUF Quantized Models},
109
+ author={YongdongWang},
110
+ year={2024},
111
+ publisher={Hugging Face},
112
+ url={https://huggingface.co/YongdongWang/llama-3.2-3b-lora-qlora-dart-llm-gguf}
113
+ }
114
+ ```
llama_3.2_3b-lora-qlora-dart-llm_q4_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c00e0b7d9285316e92e0f7276320ff2078bbf4d8fc98a0dc5531f4027948407
3
+ size 2019373248