Update README.md
Browse files
README.md
CHANGED
@@ -13,17 +13,17 @@ base_model:
|
|
13 |
- Qwen/Qwen3-Coder-480B-A35B-Instruct
|
14 |
base_model_relation: quantized
|
15 |
---
|
16 |
-
#
|
17 |
-
|
18 |
|
19 |
|
20 |
-
### 【
|
21 |
-
<i>注: 8
|
22 |
```
|
23 |
CONTEXT_LENGTH=32768 # 262144
|
24 |
|
25 |
vllm serve \
|
26 |
-
|
27 |
--served-model-name Qwen3-Coder-480B-A35B-Instruct-GPTQ-Int4-Int8Mix \
|
28 |
--enable-expert-parallel \
|
29 |
--swap-space 16 \
|
@@ -38,35 +38,35 @@ vllm serve \
|
|
38 |
--port 8000
|
39 |
```
|
40 |
|
41 |
-
###
|
42 |
|
43 |
```
|
44 |
vllm>=0.9.2
|
45 |
```
|
46 |
|
47 |
-
###
|
48 |
```
|
49 |
2025-07-24
|
50 |
-
1.
|
51 |
```
|
52 |
|
53 |
-
###
|
54 |
|
55 |
-
|
|
56 |
|---------|--------------|
|
57 |
| `261GB` | `2025-07-24` |
|
58 |
|
59 |
|
60 |
|
61 |
-
###
|
62 |
|
63 |
```python
|
64 |
-
from
|
65 |
-
snapshot_download('
|
66 |
```
|
67 |
|
68 |
|
69 |
-
###
|
70 |
|
71 |
# Qwen3-Coder-480B-A35B-Instruct
|
72 |
<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
|
|
|
13 |
- Qwen/Qwen3-Coder-480B-A35B-Instruct
|
14 |
base_model_relation: quantized
|
15 |
---
|
16 |
+
# Qwen3-Coder-480B-A35B-Instruct-GPTQ-Int4-Int8Mix
|
17 |
+
Base model [Qwen/Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct)
|
18 |
|
19 |
|
20 |
+
### 【VLLM Launch Command for 8 GPUs (Single Node)】
|
21 |
+
<i>注: Note: When launching with 8 GPUs, --enable-expert-parallel must be specified; otherwise, the expert tensors cannot be evenly split across tensor parallel ranks. This option is not required for 4-GPU setups. </i>
|
22 |
```
|
23 |
CONTEXT_LENGTH=32768 # 262144
|
24 |
|
25 |
vllm serve \
|
26 |
+
QuantTrio/Qwen3-Coder-480B-A35B-Instruct-GPTQ-Int4-Int8Mix \
|
27 |
--served-model-name Qwen3-Coder-480B-A35B-Instruct-GPTQ-Int4-Int8Mix \
|
28 |
--enable-expert-parallel \
|
29 |
--swap-space 16 \
|
|
|
38 |
--port 8000
|
39 |
```
|
40 |
|
41 |
+
### 【Dependencies】
|
42 |
|
43 |
```
|
44 |
vllm>=0.9.2
|
45 |
```
|
46 |
|
47 |
+
### 【Model Update History】
|
48 |
```
|
49 |
2025-07-24
|
50 |
+
1. fast commit
|
51 |
```
|
52 |
|
53 |
+
### 【Model Files】
|
54 |
|
55 |
+
| File Size | Last Updated |
|
56 |
|---------|--------------|
|
57 |
| `261GB` | `2025-07-24` |
|
58 |
|
59 |
|
60 |
|
61 |
+
### 【Model Download】
|
62 |
|
63 |
```python
|
64 |
+
from huggingface_hub import snapshot_download
|
65 |
+
snapshot_download('QuantTrio/Qwen3-Coder-480B-A35B-Instruct-GPTQ-Int4-Int8Mix', cache_dir="your_local_path")
|
66 |
```
|
67 |
|
68 |
|
69 |
+
### 【Description】
|
70 |
|
71 |
# Qwen3-Coder-480B-A35B-Instruct
|
72 |
<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
|