update doc
Browse files
README.md
CHANGED
|
@@ -147,13 +147,16 @@ We provide a pre-built Docker image containing vLLM 0.8.5 with full support for
|
|
| 147 |
https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
|
| 148 |
|
| 149 |
```
|
|
|
|
| 150 |
docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
| 151 |
|
|
|
|
|
|
|
| 152 |
```
|
| 153 |
|
| 154 |
- Download Model file:
|
| 155 |
- Huggingface: will download automicly by vllm.
|
| 156 |
-
- ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct`
|
| 157 |
|
| 158 |
|
| 159 |
- Start the API server:
|
|
@@ -165,7 +168,7 @@ docker run --privileged --user root --net=host --ipc=host \
|
|
| 165 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
| 166 |
\
|
| 167 |
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
|
| 168 |
-
--tensor-parallel-size
|
| 169 |
|
| 170 |
```
|
| 171 |
|
|
@@ -174,36 +177,14 @@ model downloaded by modelscope:
|
|
| 174 |
docker run --privileged --user root --net=host --ipc=host \
|
| 175 |
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
| 176 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
|
| 177 |
-
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size
|
| 178 |
-
--model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct/ --trust_remote_code
|
| 179 |
```
|
| 180 |
|
| 181 |
|
| 182 |
### SGLang
|
| 183 |
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
We also provide a pre-built Docker image based on the latest version of SGLang.
|
| 187 |
-
|
| 188 |
-
To get started:
|
| 189 |
-
|
| 190 |
-
- Pull the Docker image
|
| 191 |
-
|
| 192 |
-
```
|
| 193 |
-
docker pull tiacc-test.tencentcloudcr.com/tiacc/sglang:0.4.7
|
| 194 |
-
```
|
| 195 |
-
|
| 196 |
-
- Start the API server:
|
| 197 |
-
|
| 198 |
-
```
|
| 199 |
-
docker run --gpus all \
|
| 200 |
-
--shm-size 32g \
|
| 201 |
-
-p 30000:30000 \
|
| 202 |
-
--ipc=host \
|
| 203 |
-
tiacc-test.tencentcloudcr.com/tiacc/sglang:0.4.7 \
|
| 204 |
-
-m sglang.launch_server --model-path hunyuan/huanyuan_A13B --tp 4 --trust-remote-code --host 0.0.0.0 --port 30000
|
| 205 |
-
```
|
| 206 |
-
|
| 207 |
|
| 208 |
## Contact Us
|
| 209 |
|
|
|
|
| 147 |
https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
|
| 148 |
|
| 149 |
```
|
| 150 |
+
# docker hub:
|
| 151 |
docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
| 152 |
|
| 153 |
+
# china mirror
|
| 154 |
+
docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
| 155 |
```
|
| 156 |
|
| 157 |
- Download Model file:
|
| 158 |
- Huggingface: will download automicly by vllm.
|
| 159 |
+
- ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4`
|
| 160 |
|
| 161 |
|
| 162 |
- Start the API server:
|
|
|
|
| 168 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
|
| 169 |
\
|
| 170 |
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
|
| 171 |
+
--tensor-parallel-size 2 --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
|
| 172 |
|
| 173 |
```
|
| 174 |
|
|
|
|
| 177 |
docker run --privileged --user root --net=host --ipc=host \
|
| 178 |
-v ~/.cache/modelscope:/root/.cache/modelscope \
|
| 179 |
--gpus=all -it --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
|
| 180 |
+
-m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 2 --port 8000 \
|
| 181 |
+
--model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
|
| 182 |
```
|
| 183 |
|
| 184 |
|
| 185 |
### SGLang
|
| 186 |
|
| 187 |
+
Support for INT4 quantization on sglang is in progress and will be available in a future update.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 188 |
|
| 189 |
## Contact Us
|
| 190 |
|