tencent
/

Hunyuan-A13B-Instruct-GPTQ-Int4

Text Generation

4-bit precision

Model card Files Files and versions

asherszhang commited on Jun 27

Commit

b091365

·

verified ·

1 Parent(s): 714d4e3

update doc

Files changed (1) hide show

README.md +8 -27

README.md CHANGED Viewed

@@ -147,13 +147,16 @@ We provide a pre-built Docker image containing vLLM 0.8.5 with full support for
 https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
 ```
 docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
 ```
 - Download Model file:
   - Huggingface:  will download automicly by vllm.
-  - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct`
 - Start the API server:
@@ -165,7 +168,7 @@ docker run  --privileged --user root  --net=host --ipc=host \
         --gpus=all -it --entrypoint python  hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
  \
          -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
-         --tensor-parallel-size 4 --model tencent/Hunyuan-A13B-Instruct --trust-remote-code
 ```
@@ -174,36 +177,14 @@ model downloaded by modelscope:
 docker run  --privileged --user root  --net=host --ipc=host \
         -v ~/.cache/modelscope:/root/.cache/modelscope \
         --gpus=all -it --entrypoint python   hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
-         -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 4 --port 8000 \
-         --model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct/ --trust_remote_code
 ```
 ### SGLang
-#### Docker Image
-We also provide a pre-built Docker image based on the latest version of SGLang.
-To get started:
-- Pull the Docker image
-```
-docker pull tiacc-test.tencentcloudcr.com/tiacc/sglang:0.4.7
-```
-- Start the API server:
-```
-docker run --gpus all \
-    --shm-size 32g \
-    -p 30000:30000 \
-    --ipc=host \
-    tiacc-test.tencentcloudcr.com/tiacc/sglang:0.4.7 \
-    -m sglang.launch_server --model-path hunyuan/huanyuan_A13B --tp 4 --trust-remote-code --host 0.0.0.0 --port 30000
-```
 ## Contact Us

 https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
 ```
+# docker hub:
 docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
+# china mirror
+docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-vllm
 ```
 - Download Model file:
   - Huggingface:  will download automicly by vllm.
+  - ModelScope: `modelscope download --model Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4`
 - Start the API server:
         --gpus=all -it --entrypoint python  hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
  \
          -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 \
+         --tensor-parallel-size 2 --model tencent/Hunyuan-A13B-Instruct-GPTQ-Int4 --trust-remote-code
 ```
 docker run  --privileged --user root  --net=host --ipc=host \
         -v ~/.cache/modelscope:/root/.cache/modelscope \
         --gpus=all -it --entrypoint python   hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
+         -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --tensor-parallel-size 2 --port 8000 \
+         --model /root/.cache/modelscope/hub/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct-GPTQ-Int4/ --trust_remote_code
 ```
 ### SGLang
+Support for INT4 quantization on sglang is in progress and will be available in a future update.
 ## Contact Us