Zhihu-ai
/

Zhi-Create-Qwen3-32B-FP8

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

Parkerlambert123 commited on Jul 21

Commit

1e2964e

·

verified ·

1 Parent(s): f2be90d

Update README.md

Files changed (1) hide show

README.md +20 -0

README.md CHANGED Viewed

@@ -115,6 +115,26 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
 ```
 ### vllm
 For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm)

 print(response)
 ```
+### ZhiLight
+You can easily start a service using [ZhiLight](https://github.com/zhihu/ZhiLight)
+```bash
+docker run -it --net=host --gpus='"device=0"' -v /path/to/model:/mnt/models --entrypoints="" ghcr.io/zhihu/zhilight/zhilight:0.4.21-cu124 python -m zhilight.server.openai.entrypoints.api_server --model-path /mnt/models --port 8000 --enable-reasoning --reasoning-parser deepseek-r1 --served-model-name Zhi-Create-Qwen3-32B
+curl http://localhost:8000/v1/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "Zhi-Create-Qwen3-32B",
+        "prompt": "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章",
+        "max_tokens": 4096,
+        "temperature": 0.6,
+        "top_p": 0.95
+    }'
+```
 ### vllm
 For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm)