Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://h
|
|
| 19 |
|
| 20 |
Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2](https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2)
|
| 21 |
|
| 22 |
-
# Warning: This will not work unless you compile llama.cpp from the repo provided!
|
| 23 |
|
| 24 |
# How to use:
|
| 25 |
|
|
@@ -29,19 +29,55 @@ Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-
|
|
| 29 |
- Merged GGUF should appear
|
| 30 |
|
| 31 |
# Quants:
|
|
|
|
| 32 |
- bf16 [size: 439gb]
|
| 33 |
- q8_0 (after q2_k) [estimated size: 233.27gb]
|
| 34 |
- q4_k_m [size: 132gb]
|
| 35 |
- q2_k (uploading) [size: 80gb]
|
| 36 |
-
- q3_k_s (generating) [estimated size: 96.05gb]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them
|
| 39 |
|
| 40 |
-
Please use commit 039896407afd40e54321d47c5063c46a52da3e01
|
| 41 |
```
|
| 42 |
deepseek2.attention.q_lora_rank=int:1536
|
| 43 |
deepseek2.attention.kv_lora_rank=int:512
|
| 44 |
deepseek2.expert_shared_count=int:2
|
| 45 |
deepseek2.expert_feed_forward_length=int:1536
|
| 46 |
deepseek2.leading_dense_block_count=int:1
|
| 47 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2](https://github.com/fairydreaming/llama.cpp/tree/deepseek-v2)
|
| 21 |
|
| 22 |
+
# Warning: This will not work unless you compile llama.cpp from the repo provided (and set metadata KV overrides)!
|
| 23 |
|
| 24 |
# How to use:
|
| 25 |
|
|
|
|
| 29 |
- Merged GGUF should appear
|
| 30 |
|
| 31 |
# Quants:
|
| 32 |
+
```
|
| 33 |
- bf16 [size: 439gb]
|
| 34 |
- q8_0 (after q2_k) [estimated size: 233.27gb]
|
| 35 |
- q4_k_m [size: 132gb]
|
| 36 |
- q2_k (uploading) [size: 80gb]
|
| 37 |
+
- q3_k_s (generating, using importance matrix) [estimated size: 96.05gb]
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
# Planned Quants (using importance matrix):
|
| 41 |
+
```
|
| 42 |
+
- q5_k_m
|
| 43 |
+
- q5_k_s
|
| 44 |
+
- q3_k_m
|
| 45 |
+
- q6_k
|
| 46 |
+
- iq4_nl
|
| 47 |
+
- iq4_xs
|
| 48 |
+
- iq2_xxs
|
| 49 |
+
- iq2_xs
|
| 50 |
+
- iq2_s
|
| 51 |
+
- iq2_m
|
| 52 |
+
- iq1_s
|
| 53 |
+
- iq1_m
|
| 54 |
+
```
|
| 55 |
|
| 56 |
Note: the model files do not have some DeepSeek v2 specific parameters, will look into adding them
|
| 57 |
|
| 58 |
+
Please use commit `039896407afd40e54321d47c5063c46a52da3e01`, otherwise use these metadata KV overrides:
|
| 59 |
```
|
| 60 |
deepseek2.attention.q_lora_rank=int:1536
|
| 61 |
deepseek2.attention.kv_lora_rank=int:512
|
| 62 |
deepseek2.expert_shared_count=int:2
|
| 63 |
deepseek2.expert_feed_forward_length=int:1536
|
| 64 |
deepseek2.leading_dense_block_count=int:1
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
|
| 68 |
+
|
| 69 |
+
# License:
|
| 70 |
+
- DeepSeek license for model weights
|
| 71 |
+
- MIT license for any repo code
|
| 72 |
+
|
| 73 |
+
# Performance:
|
| 74 |
+
~1.5t/s with Ryzen 3 3700x (96gb 3200mhz) [Q2_K]
|
| 75 |
+
|
| 76 |
+
# iMatrix:
|
| 77 |
+
Find imatrix.dat in the root of this repo, made with a Q2_K quant (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
|
| 78 |
+
|
| 79 |
+
Using groups_merged.txt, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
|
| 80 |
+
|
| 81 |
+
# Censorship:
|
| 82 |
+
|
| 83 |
+
This model is quite censored, finetuning on toxic DPO might help.
|