leafspark
/

DeepSeek-V2-Chat-GGUF

@@ -84,9 +84,10 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w
 | Quant    | Status      | Size      | Description                                | KV Metadata | Weighted | Notes |
 |----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
 | BF16     | Available   | 439 GB    | Lossless :)                                | Old         | No       | Q8_0 is sufficient for most cases |
-| Q8_0     | Uploading   | 233.27 GB | High quality *recommended*                 | Updated     | Yes      |       |
 | Q4_K_M   | Available   | 132 GB    | Medium quality *recommended*               | Old         | No       |       |
-| Q3_K_M   | Uploading   | 92.6 GB   | Medium-low quality                         | Updated     | Yes      |       |
 | IQ3_XS   | Available   | 89.6 GB   | Better than Q3_K_M                         | Old         | Yes      |       |
 | Q2_K     | Available   | 80.0 GB   | Low quality **not recommended**            | Old         | No       |       |
 | IQ2_XXS  | Available   | 61.5 GB   | Lower quality **not recommended**          | Old         | Yes      |       |
@@ -97,8 +98,9 @@ Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed w
 | Planned Quant     | Notes   |
 |-------------------|---------|
-| Q5_K_M            |         |
-| Q5_K_M            |         |
 | Q6_K              |         |
 | IQ4_XS            |         |
 | IQ2_XS            |         |
@@ -116,9 +118,9 @@ deepseek2.leading_dense_block_count=int:1
 deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
 ```
-The `Q8_0` quant contains these parameters, along with future ones, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
-A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
 # License:
 - DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
@@ -128,7 +130,7 @@ A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c50
 *~1.5t/s* with Ryzen 3 3700x (96gb 3200mhz) `[Q2_K]`
 # iMatrix:
-Find `imatrix.dat` in the root of this repo, made with a `Q2_K` quant (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
 Using `groups_merged.txt`, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)

 | Quant    | Status      | Size      | Description                                | KV Metadata | Weighted | Notes |
 |----------|-------------|-----------|--------------------------------------------|-------------|----------|-------|
 | BF16     | Available   | 439 GB    | Lossless :)                                | Old         | No       | Q8_0 is sufficient for most cases |
+| Q8_0     | Available   | 233.27 GB | High quality *recommended*                 | Updated     | Yes      |       |
+| Q5_K_M   | Uploading   | 155 GB    | Medium-low quality                         | Updated     | Yes      |       |
 | Q4_K_M   | Available   | 132 GB    | Medium quality *recommended*               | Old         | No       |       |
+| Q3_K_M   | Available   | 104 GB    | Medium-low quality                         | Updated     | Yes      |       |
 | IQ3_XS   | Available   | 89.6 GB   | Better than Q3_K_M                         | Old         | Yes      |       |
 | Q2_K     | Available   | 80.0 GB   | Low quality **not recommended**            | Old         | No       |       |
 | IQ2_XXS  | Available   | 61.5 GB   | Lower quality **not recommended**          | Old         | Yes      |       |
 | Planned Quant     | Notes   |
 |-------------------|---------|
+| Q5_K_S            |         |
+| Q4_K_S            |         |
+| Q3_K_S            |         |
 | Q6_K              |         |
 | IQ4_XS            |         |
 | IQ2_XS            |         |
 deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
 ```
+Quants with "Updated" metadata contain these parameters, so as long as you're running a supported build of llama.cpp no `--override-kv` parameters are required.
+A precompiled Windows AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
 # License:
 - DeepSeek license for model weights, which can be found in the `LICENSE` file in the root of this repo
 *~1.5t/s* with Ryzen 3 3700x (96gb 3200mhz) `[Q2_K]`
 # iMatrix:
+Find `imatrix.dat` in the root of this repo, made with a `Q2_K` quant containing 62 chunks (see here for info: [https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693](https://github.com/ggerganov/llama.cpp/issues/5153#issuecomment-1913185693))
 Using `groups_merged.txt`, find it here: [https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)