readme: daily update
Browse files
README.md
CHANGED
|
@@ -84,9 +84,10 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w
|
|
| 84 |
| Quant | Status | Size | Description | KV Metadata | Weighted | Notes |
|
| 85 |
|----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
|
| 86 |
| BF16 | Available | 439 GB | Lossless :) | Old | No | Q8_0 is sufficient for most cases |
|
| 87 |
-
| Q8_0 |
|
|
|
|
| 88 |
| Q4_K_M | Available | 132 GB | Medium quality *recommended* | Old | No | |
|
| 89 |
-
| Q3_K_M |
|
| 90 |
| IQ3_XS | Available | 89.6 GB | Better than Q3_K_M | Old | Yes | |
|
| 91 |
| Q2_K | Available | 80.0 GB | Low quality **not recommended** | Old | No | |
|
| 92 |
| IQ2_XXS | Available | 61.5 GB | Lower quality **not recommended** | Old | Yes | |
|
|
@@ -97,8 +98,9 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w
|
|
| 97 |
|
| 98 |
| Planned Quant | Notes |
|
| 99 |
|-------------------|---------|
|
| 100 |
-
|
|
| 101 |
-
|
|
|
|
|
| 102 |
| Q6_K | |
|
| 103 |
| IQ4_XS | |
|
| 104 |
| IQ2_XS | |
|
|
@@ -116,9 +118,9 @@ deepseek2.leading_dense_block_count=int:1
|
|
| 116 |
deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
|
| 117 |
```
|
| 118 |
|
| 119 |
-
|
| 120 |
|
| 121 |
-
A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
|
| 122 |
|
| 123 |
# License:
|
| 124 |
- DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
|
|
@@ -128,7 +130,7 @@ A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c50
|
|
| 128 |
*~1.5t/s* with Ryzen 3 3700x (96gb 3200mhz) `[Q2_K]`
|
| 129 |
|
| 130 |
# iMatrix:
|
| 131 |
-
Find `imatrix.dat` in the root of this repo, made with a `Q2_K` quant (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
|
| 132 |
|
| 133 |
Using `groups_merged.txt`, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
|
| 134 |
|
|
|
|
| 84 |
| Quant | Status | Size | Description | KV Metadata | Weighted | Notes |
|
| 85 |
|----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
|
| 86 |
| BF16 | Available | 439 GB | Lossless :) | Old | No | Q8_0 is sufficient for most cases |
|
| 87 |
+
| Q8_0 | Available | 233.27 GB | High quality *recommended* | Updated | Yes | |
|
| 88 |
+
| Q5_K_M | Uploading | 155 GB | Medium-low quality | Updated | Yes | |
|
| 89 |
| Q4_K_M | Available | 132 GB | Medium quality *recommended* | Old | No | |
|
| 90 |
+
| Q3_K_M | Available | 104 GB | Medium-low quality | Updated | Yes | |
|
| 91 |
| IQ3_XS | Available | 89.6 GB | Better than Q3_K_M | Old | Yes | |
|
| 92 |
| Q2_K | Available | 80.0 GB | Low quality **not recommended** | Old | No | |
|
| 93 |
| IQ2_XXS | Available | 61.5 GB | Lower quality **not recommended** | Old | Yes | |
|
|
|
|
| 98 |
|
| 99 |
| Planned Quant | Notes |
|
| 100 |
|-------------------|---------|
|
| 101 |
+
| Q5_K_S | |
|
| 102 |
+
| Q4_K_S | |
|
| 103 |
+
| Q3_K_S | |
|
| 104 |
| Q6_K | |
|
| 105 |
| IQ4_XS | |
|
| 106 |
| IQ2_XS | |
|
|
|
|
| 118 |
deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
|
| 119 |
```
|
| 120 |
|
| 121 |
+
Quants with "Updated" metadata contain these parameters, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
|
| 122 |
|
| 123 |
+
A precompiled Windows AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
|
| 124 |
|
| 125 |
# License:
|
| 126 |
- DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
|
|
|
|
| 130 |
*~1.5t/s* with Ryzen 3 3700x (96gb 3200mhz) `[Q2_K]`
|
| 131 |
|
| 132 |
# iMatrix:
|
| 133 |
+
Find `imatrix.dat` in the root of this repo, made with a `Q2_K` quant containing 62 chunks (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
|
| 134 |
|
| 135 |
Using `groups_merged.txt`, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
|
| 136 |
|