Add files using upload-large-folder tool
Browse files- .gitattributes +1 -0
- LICENSE +21 -0
- README.md +74 -42
- config.json +3 -7
- figures/benchmark.jpg +3 -0
- generation_config.json +2 -4
- model.safetensors +2 -2
- tokenizer.json +2 -2
- tokenizer_config.json +27 -188
.gitattributes
CHANGED
@@ -33,4 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
figures/benchmark.jpg filter=lfs diff=lfs merge=lfs -text
|
37 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
MIT License
|
2 |
+
|
3 |
+
Copyright (c) 2023 DeepSeek
|
4 |
+
|
5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6 |
+
of this software and associated documentation files (the "Software"), to deal
|
7 |
+
in the Software without restriction, including without limitation the rights
|
8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9 |
+
copies of the Software, and to permit persons to whom the Software is
|
10 |
+
furnished to do so, subject to the following conditions:
|
11 |
+
|
12 |
+
The above copyright notice and this permission notice shall be included in all
|
13 |
+
copies or substantial portions of the Software.
|
14 |
+
|
15 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21 |
+
SOFTWARE.
|
README.md
CHANGED
@@ -1,51 +1,54 @@
|
|
1 |
---
|
2 |
-
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
|
3 |
-
language:
|
4 |
-
- en
|
5 |
-
license: apache-2.0
|
6 |
-
library_name: transformers
|
7 |
tags:
|
8 |
-
- deepseek
|
9 |
-
- qwen
|
10 |
-
- qwen2
|
11 |
- unsloth
|
12 |
-
|
|
|
|
|
|
|
13 |
---
|
|
|
|
|
|
|
|
|
14 |
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
|
31 |
-
| **Llama-3.2 (3B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2.4x faster | 58% less |
|
32 |
-
| **Llama-3.2 (11B vision)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 60% less |
|
33 |
-
| **Qwen2 VL (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb) | 1.8x faster | 60% less |
|
34 |
-
| **Qwen2.5 (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 60% less |
|
35 |
-
| **Llama-3.1 (8B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb) | 2.4x faster | 58% less |
|
36 |
-
| **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_3.5_Mini-Conversational.ipynb) | 2x faster | 50% less |
|
37 |
-
| **Gemma 2 (9B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_(9B)-Alpaca.ipynb) | 2.4x faster | 58% less |
|
38 |
-
| **Mistral (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb) | 2.2x faster | 62% less |
|
39 |
|
40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
-
|
43 |
-
|
44 |
-
|
|
|
|
|
45 |
|
46 |
-
## Special Thanks
|
47 |
-
A huge thank you to the DeepSeek team for creating and releasing these models.
|
48 |
|
|
|
|
|
|
|
49 |
|
50 |
|
51 |
## 1. Introduction
|
@@ -58,6 +61,8 @@ we introduce DeepSeek-R1, which incorporates cold-start data before RL.
|
|
58 |
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
|
59 |
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
60 |
|
|
|
|
|
61 |
<p align="center">
|
62 |
<img width="80%" src="figures/benchmark.jpg">
|
63 |
</p>
|
@@ -94,7 +99,7 @@ To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSe
|
|
94 |
</div>
|
95 |
|
96 |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base.
|
97 |
-
For more details
|
98 |
|
99 |
### DeepSeek-R1-Distill Models
|
100 |
|
@@ -183,6 +188,8 @@ We also provide OpenAI-Compatible API at DeepSeek Platform: [platform.deepseek.c
|
|
183 |
|
184 |
Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running DeepSeek-R1 locally.
|
185 |
|
|
|
|
|
186 |
### DeepSeek-R1-Distill Models
|
187 |
|
188 |
DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.
|
@@ -193,7 +200,23 @@ For instance, you can easily start a service using [vLLM](https://github.com/vll
|
|
193 |
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
|
194 |
```
|
195 |
|
196 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
197 |
|
198 |
## 7. License
|
199 |
This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE).
|
@@ -204,8 +227,17 @@ DeepSeek-R1 series support commercial use, allow for any modifications and deriv
|
|
204 |
|
205 |
## 8. Citation
|
206 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
207 |
|
208 |
```
|
209 |
|
210 |
## 9. Contact
|
211 |
-
If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
2 |
tags:
|
|
|
|
|
|
|
3 |
- unsloth
|
4 |
+
base_model:
|
5 |
+
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
|
6 |
+
license: mit
|
7 |
+
library_name: transformers
|
8 |
---
|
9 |
+
# DeepSeek-R1
|
10 |
+
<!-- markdownlint-disable first-line-h1 -->
|
11 |
+
<!-- markdownlint-disable html -->
|
12 |
+
<!-- markdownlint-disable no-duplicate-header -->
|
13 |
|
14 |
+
<div align="center">
|
15 |
+
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
|
16 |
+
</div>
|
17 |
+
<hr>
|
18 |
+
<div align="center" style="line-height: 1;">
|
19 |
+
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
|
20 |
+
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
|
21 |
+
</a>
|
22 |
+
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
|
23 |
+
<img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20R1-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
24 |
+
</a>
|
25 |
+
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
|
26 |
+
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
27 |
+
</a>
|
28 |
+
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
+
<div align="center" style="line-height: 1;">
|
31 |
+
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
|
32 |
+
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
|
33 |
+
</a>
|
34 |
+
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
|
35 |
+
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
36 |
+
</a>
|
37 |
+
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
|
38 |
+
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
39 |
+
</a>
|
40 |
+
</div>
|
41 |
|
42 |
+
<div align="center" style="line-height: 1;">
|
43 |
+
<a href="https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE" style="margin: 2px;">
|
44 |
+
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
|
45 |
+
</a>
|
46 |
+
</div>
|
47 |
|
|
|
|
|
48 |
|
49 |
+
<p align="center">
|
50 |
+
<a href="https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf"><b>Paper Link</b>👁️</a>
|
51 |
+
</p>
|
52 |
|
53 |
|
54 |
## 1. Introduction
|
|
|
61 |
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
|
62 |
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
63 |
|
64 |
+
**NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the [Usage Recommendation](#usage-recommendations) section.**
|
65 |
+
|
66 |
<p align="center">
|
67 |
<img width="80%" src="figures/benchmark.jpg">
|
68 |
</p>
|
|
|
99 |
</div>
|
100 |
|
101 |
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base.
|
102 |
+
For more details regarding the model architecture, please refer to [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repository.
|
103 |
|
104 |
### DeepSeek-R1-Distill Models
|
105 |
|
|
|
188 |
|
189 |
Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running DeepSeek-R1 locally.
|
190 |
|
191 |
+
**NOTE: Hugging Face's Transformers has not been directly supported yet.**
|
192 |
+
|
193 |
### DeepSeek-R1-Distill Models
|
194 |
|
195 |
DeepSeek-R1-Distill models can be utilized in the same manner as Qwen or Llama models.
|
|
|
200 |
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
|
201 |
```
|
202 |
|
203 |
+
You can also easily start a service using [SGLang](https://github.com/sgl-project/sglang)
|
204 |
+
|
205 |
+
```bash
|
206 |
+
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2
|
207 |
+
```
|
208 |
+
|
209 |
+
### Usage Recommendations
|
210 |
+
|
211 |
+
**We recommend adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:**
|
212 |
+
|
213 |
+
1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
|
214 |
+
2. **Avoid adding a system prompt; all instructions should be contained within the user prompt.**
|
215 |
+
3. For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
|
216 |
+
4. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
|
217 |
+
|
218 |
+
Additionally, we have observed that the DeepSeek-R1 series models tend to bypass thinking pattern (i.e., outputting "\<think\>\n\n\</think\>") when responding to certain queries, which can adversely affect the model's performance.
|
219 |
+
**To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with "\<think\>\n" at the beginning of every output.**
|
220 |
|
221 |
## 7. License
|
222 |
This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE).
|
|
|
227 |
|
228 |
## 8. Citation
|
229 |
```
|
230 |
+
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
|
231 |
+
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
|
232 |
+
author={DeepSeek-AI},
|
233 |
+
year={2025},
|
234 |
+
eprint={2501.12948},
|
235 |
+
archivePrefix={arXiv},
|
236 |
+
primaryClass={cs.CL},
|
237 |
+
url={https://arxiv.org/abs/2501.12948},
|
238 |
+
}
|
239 |
|
240 |
```
|
241 |
|
242 |
## 9. Contact
|
243 |
+
If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
|
config.json
CHANGED
@@ -1,10 +1,9 @@
|
|
1 |
{
|
2 |
-
"_name_or_path": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
|
3 |
"architectures": [
|
4 |
"Qwen2ForCausalLM"
|
5 |
],
|
6 |
"attention_dropout": 0.0,
|
7 |
-
"bos_token_id":
|
8 |
"eos_token_id": 151643,
|
9 |
"hidden_act": "silu",
|
10 |
"hidden_size": 1536,
|
@@ -16,15 +15,12 @@
|
|
16 |
"num_attention_heads": 12,
|
17 |
"num_hidden_layers": 28,
|
18 |
"num_key_value_heads": 2,
|
19 |
-
"pad_token_id": 151654,
|
20 |
"rms_norm_eps": 1e-06,
|
21 |
-
"rope_scaling": null,
|
22 |
"rope_theta": 10000,
|
23 |
-
"sliding_window":
|
24 |
"tie_word_embeddings": false,
|
25 |
"torch_dtype": "bfloat16",
|
26 |
-
"transformers_version": "4.
|
27 |
-
"unsloth_fixed": true,
|
28 |
"use_cache": true,
|
29 |
"use_mrope": false,
|
30 |
"use_sliding_window": false,
|
|
|
1 |
{
|
|
|
2 |
"architectures": [
|
3 |
"Qwen2ForCausalLM"
|
4 |
],
|
5 |
"attention_dropout": 0.0,
|
6 |
+
"bos_token_id": 151643,
|
7 |
"eos_token_id": 151643,
|
8 |
"hidden_act": "silu",
|
9 |
"hidden_size": 1536,
|
|
|
15 |
"num_attention_heads": 12,
|
16 |
"num_hidden_layers": 28,
|
17 |
"num_key_value_heads": 2,
|
|
|
18 |
"rms_norm_eps": 1e-06,
|
|
|
19 |
"rope_theta": 10000,
|
20 |
+
"sliding_window": 4096,
|
21 |
"tie_word_embeddings": false,
|
22 |
"torch_dtype": "bfloat16",
|
23 |
+
"transformers_version": "4.44.0",
|
|
|
24 |
"use_cache": true,
|
25 |
"use_mrope": false,
|
26 |
"use_sliding_window": false,
|
figures/benchmark.jpg
ADDED
![]() |
Git LFS Details
|
generation_config.json
CHANGED
@@ -1,11 +1,9 @@
|
|
1 |
{
|
2 |
"_from_model_config": true,
|
3 |
"bos_token_id": 151646,
|
4 |
-
"do_sample": true,
|
5 |
"eos_token_id": 151643,
|
6 |
-
"
|
7 |
-
"pad_token_id": 151654,
|
8 |
"temperature": 0.6,
|
9 |
"top_p": 0.95,
|
10 |
-
"transformers_version": "4.
|
11 |
}
|
|
|
1 |
{
|
2 |
"_from_model_config": true,
|
3 |
"bos_token_id": 151646,
|
|
|
4 |
"eos_token_id": 151643,
|
5 |
+
"do_sample": true,
|
|
|
6 |
"temperature": 0.6,
|
7 |
"top_p": 0.95,
|
8 |
+
"transformers_version": "4.39.3"
|
9 |
}
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:58858233513d76b8703e72eed6ce16807b523328188e13329257fb9594462945
|
3 |
+
size 3554214621
|
tokenizer.json
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:88145e3c3249adc2546ede277e9819d6e405e19072456e4b521cbc724bd60773
|
3 |
+
size 7031660
|
tokenizer_config.json
CHANGED
@@ -1,196 +1,35 @@
|
|
1 |
{
|
2 |
"add_bos_token": true,
|
3 |
"add_eos_token": false,
|
4 |
-
"
|
5 |
-
|
6 |
-
"
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
"single_word": false,
|
12 |
-
"special": true
|
13 |
-
},
|
14 |
-
"151644": {
|
15 |
-
"content": "<|User|>",
|
16 |
-
"lstrip": false,
|
17 |
-
"normalized": false,
|
18 |
-
"rstrip": false,
|
19 |
-
"single_word": false,
|
20 |
-
"special": false
|
21 |
-
},
|
22 |
-
"151645": {
|
23 |
-
"content": "<|Assistant|>",
|
24 |
-
"lstrip": false,
|
25 |
-
"normalized": false,
|
26 |
-
"rstrip": false,
|
27 |
-
"single_word": false,
|
28 |
-
"special": false
|
29 |
-
},
|
30 |
-
"151646": {
|
31 |
-
"content": "<|begin▁of▁sentence|>",
|
32 |
-
"lstrip": false,
|
33 |
-
"normalized": false,
|
34 |
-
"rstrip": false,
|
35 |
-
"single_word": false,
|
36 |
-
"special": true
|
37 |
-
},
|
38 |
-
"151647": {
|
39 |
-
"content": "<|EOT|>",
|
40 |
-
"lstrip": false,
|
41 |
-
"normalized": false,
|
42 |
-
"rstrip": false,
|
43 |
-
"single_word": false,
|
44 |
-
"special": false
|
45 |
-
},
|
46 |
-
"151648": {
|
47 |
-
"content": "<think>",
|
48 |
-
"lstrip": false,
|
49 |
-
"normalized": false,
|
50 |
-
"rstrip": false,
|
51 |
-
"single_word": false,
|
52 |
-
"special": false
|
53 |
-
},
|
54 |
-
"151649": {
|
55 |
-
"content": "</think>",
|
56 |
-
"lstrip": false,
|
57 |
-
"normalized": false,
|
58 |
-
"rstrip": false,
|
59 |
-
"single_word": false,
|
60 |
-
"special": false
|
61 |
-
},
|
62 |
-
"151650": {
|
63 |
-
"content": "<|quad_start|>",
|
64 |
-
"lstrip": false,
|
65 |
-
"normalized": false,
|
66 |
-
"rstrip": false,
|
67 |
-
"single_word": false,
|
68 |
-
"special": true
|
69 |
-
},
|
70 |
-
"151651": {
|
71 |
-
"content": "<|quad_end|>",
|
72 |
-
"lstrip": false,
|
73 |
-
"normalized": false,
|
74 |
-
"rstrip": false,
|
75 |
-
"single_word": false,
|
76 |
-
"special": true
|
77 |
-
},
|
78 |
-
"151652": {
|
79 |
-
"content": "<|vision_start|>",
|
80 |
-
"lstrip": false,
|
81 |
-
"normalized": false,
|
82 |
-
"rstrip": false,
|
83 |
-
"single_word": false,
|
84 |
-
"special": true
|
85 |
-
},
|
86 |
-
"151653": {
|
87 |
-
"content": "<|vision_end|>",
|
88 |
-
"lstrip": false,
|
89 |
-
"normalized": false,
|
90 |
-
"rstrip": false,
|
91 |
-
"single_word": false,
|
92 |
-
"special": true
|
93 |
-
},
|
94 |
-
"151654": {
|
95 |
-
"content": "<|vision_pad|>",
|
96 |
-
"lstrip": false,
|
97 |
-
"normalized": false,
|
98 |
-
"rstrip": false,
|
99 |
-
"single_word": false,
|
100 |
-
"special": true
|
101 |
-
},
|
102 |
-
"151655": {
|
103 |
-
"content": "<|image_pad|>",
|
104 |
-
"lstrip": false,
|
105 |
-
"normalized": false,
|
106 |
-
"rstrip": false,
|
107 |
-
"single_word": false,
|
108 |
-
"special": true
|
109 |
-
},
|
110 |
-
"151656": {
|
111 |
-
"content": "<|video_pad|>",
|
112 |
-
"lstrip": false,
|
113 |
-
"normalized": false,
|
114 |
-
"rstrip": false,
|
115 |
-
"single_word": false,
|
116 |
-
"special": true
|
117 |
-
},
|
118 |
-
"151657": {
|
119 |
-
"content": "<tool_call>",
|
120 |
-
"lstrip": false,
|
121 |
-
"normalized": false,
|
122 |
-
"rstrip": false,
|
123 |
-
"single_word": false,
|
124 |
-
"special": false
|
125 |
-
},
|
126 |
-
"151658": {
|
127 |
-
"content": "</tool_call>",
|
128 |
-
"lstrip": false,
|
129 |
-
"normalized": false,
|
130 |
-
"rstrip": false,
|
131 |
-
"single_word": false,
|
132 |
-
"special": false
|
133 |
-
},
|
134 |
-
"151659": {
|
135 |
-
"content": "<|fim_prefix|>",
|
136 |
-
"lstrip": false,
|
137 |
-
"normalized": false,
|
138 |
-
"rstrip": false,
|
139 |
-
"single_word": false,
|
140 |
-
"special": false
|
141 |
-
},
|
142 |
-
"151660": {
|
143 |
-
"content": "<|fim_middle|>",
|
144 |
-
"lstrip": false,
|
145 |
-
"normalized": false,
|
146 |
-
"rstrip": false,
|
147 |
-
"single_word": false,
|
148 |
-
"special": false
|
149 |
-
},
|
150 |
-
"151661": {
|
151 |
-
"content": "<|fim_suffix|>",
|
152 |
-
"lstrip": false,
|
153 |
-
"normalized": false,
|
154 |
-
"rstrip": false,
|
155 |
-
"single_word": false,
|
156 |
-
"special": false
|
157 |
-
},
|
158 |
-
"151662": {
|
159 |
-
"content": "<|fim_pad|>",
|
160 |
-
"lstrip": false,
|
161 |
-
"normalized": false,
|
162 |
-
"rstrip": false,
|
163 |
-
"single_word": false,
|
164 |
-
"special": false
|
165 |
-
},
|
166 |
-
"151663": {
|
167 |
-
"content": "<|repo_name|>",
|
168 |
-
"lstrip": false,
|
169 |
-
"normalized": false,
|
170 |
-
"rstrip": false,
|
171 |
-
"single_word": false,
|
172 |
-
"special": false
|
173 |
-
},
|
174 |
-
"151664": {
|
175 |
-
"content": "<|file_sep|>",
|
176 |
-
"lstrip": false,
|
177 |
-
"normalized": false,
|
178 |
-
"rstrip": false,
|
179 |
-
"single_word": false,
|
180 |
-
"special": false
|
181 |
-
}
|
182 |
},
|
183 |
-
"bos_token": "<|begin▁of▁sentence|>",
|
184 |
-
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|><think>\\n'}}{% endif %}",
|
185 |
"clean_up_tokenization_spaces": false,
|
186 |
-
"eos_token":
|
187 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
188 |
"legacy": true,
|
189 |
-
"model_max_length":
|
190 |
-
"pad_token":
|
191 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
192 |
"sp_model_kwargs": {},
|
193 |
-
"tokenizer_class": "LlamaTokenizerFast",
|
194 |
"unk_token": null,
|
195 |
-
"
|
196 |
-
}
|
|
|
|
1 |
{
|
2 |
"add_bos_token": true,
|
3 |
"add_eos_token": false,
|
4 |
+
"bos_token": {
|
5 |
+
"__type": "AddedToken",
|
6 |
+
"content": "<|begin▁of▁sentence|>",
|
7 |
+
"lstrip": false,
|
8 |
+
"normalized": true,
|
9 |
+
"rstrip": false,
|
10 |
+
"single_word": false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
},
|
|
|
|
|
12 |
"clean_up_tokenization_spaces": false,
|
13 |
+
"eos_token": {
|
14 |
+
"__type": "AddedToken",
|
15 |
+
"content": "<|end▁of▁sentence|>",
|
16 |
+
"lstrip": false,
|
17 |
+
"normalized": true,
|
18 |
+
"rstrip": false,
|
19 |
+
"single_word": false
|
20 |
+
},
|
21 |
"legacy": true,
|
22 |
+
"model_max_length": 16384,
|
23 |
+
"pad_token": {
|
24 |
+
"__type": "AddedToken",
|
25 |
+
"content": "<|end▁of▁sentence|>",
|
26 |
+
"lstrip": false,
|
27 |
+
"normalized": true,
|
28 |
+
"rstrip": false,
|
29 |
+
"single_word": false
|
30 |
+
},
|
31 |
"sp_model_kwargs": {},
|
|
|
32 |
"unk_token": null,
|
33 |
+
"tokenizer_class": "LlamaTokenizerFast",
|
34 |
+
"chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{%- set ns.is_first = true -%}{%- else %}{{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|><think>\\n'}}{% endif %}"
|
35 |
+
}
|