readme: add detailed instructions
Browse files
README.md
CHANGED
|
@@ -23,11 +23,57 @@ Using llama.cpp fork: [https://github.com/fairydreaming/llama.cpp/tree/deepseek-
|
|
| 23 |
|
| 24 |
# How to use:
|
| 25 |
|
|
|
|
|
|
|
| 26 |
- Find the relevant directory
|
| 27 |
- Download all files
|
| 28 |
- Run merge.py
|
| 29 |
- Merged GGUF should appear
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
# Quants:
|
| 32 |
```
|
| 33 |
- bf16 [size: 439gb]
|
|
|
|
| 23 |
|
| 24 |
# How to use:
|
| 25 |
|
| 26 |
+
**Downloading the bf16:**
|
| 27 |
+
|
| 28 |
- Find the relevant directory
|
| 29 |
- Download all files
|
| 30 |
- Run merge.py
|
| 31 |
- Merged GGUF should appear
|
| 32 |
|
| 33 |
+
**Downloading the quantizations:**
|
| 34 |
+
- Find the relevant directory
|
| 35 |
+
- Download all files
|
| 36 |
+
- Point to the first split (most programs should load all the splits automatically now)
|
| 37 |
+
|
| 38 |
+
**Running in llama.cpp:**
|
| 39 |
+
|
| 40 |
+
To start in command line interactive mode (text completion):
|
| 41 |
+
```
|
| 42 |
+
main -m DeepSeek-V2-Chat.{quant}.gguf -c {context length} --color -i
|
| 43 |
+
```
|
| 44 |
+
To use llama.cpp OpenAI compatible server:
|
| 45 |
+
```
|
| 46 |
+
server \
|
| 47 |
+
-m DeepSeek-V2-Chat.{quant}.gguf \
|
| 48 |
+
-c {context_length} \
|
| 49 |
+
(--color [recommended: colored output in supported terminals]) \
|
| 50 |
+
(-i [note: interactive mode]) \
|
| 51 |
+
(--mlock [note: avoid using swap]) \
|
| 52 |
+
(--verbose) \
|
| 53 |
+
(--log-disable [note: disable logging to file, may be useful for prod]) \
|
| 54 |
+
(--metrics [note: prometheus compatible monitoring endpoint]) \
|
| 55 |
+
(--api-key [string]) \
|
| 56 |
+
(--port [int]) \
|
| 57 |
+
(--flash-attn [note: must be fully offloaded to supported GPU])
|
| 58 |
+
```
|
| 59 |
+
Making an importance matrix:
|
| 60 |
+
```
|
| 61 |
+
imatrix \
|
| 62 |
+
-m DeepSeek-V2-Chat.{quant}.gguf \
|
| 63 |
+
-f groups_merged.txt \
|
| 64 |
+
--verbosity [0, 1, 2] \
|
| 65 |
+
-ngl {GPU offloading; must build with CUDA} \
|
| 66 |
+
--ofreq {recommended: 1}
|
| 67 |
+
```
|
| 68 |
+
Making a quant:
|
| 69 |
+
```
|
| 70 |
+
quantize \
|
| 71 |
+
DeepSeek-V2-Chat.bf16.gguf \
|
| 72 |
+
DeepSeek-V2-Chat.{quant}.gguf \
|
| 73 |
+
{quant} \
|
| 74 |
+
(--imatrix [file])
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
# Quants:
|
| 78 |
```
|
| 79 |
- bf16 [size: 439gb]
|