Text Generation
Transformers
Safetensors
Chinese
English
qwen3
conversational
text-generation-inference
compressed-tensors
Parkerlambert123 commited on
Commit
1e2964e
·
verified ·
1 Parent(s): f2be90d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -115,6 +115,26 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
115
  print(response)
116
  ```
117
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  ### vllm
119
 
120
  For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm)
 
115
  print(response)
116
  ```
117
 
118
+
119
+ ### ZhiLight
120
+
121
+ You can easily start a service using [ZhiLight](https://github.com/zhihu/ZhiLight)
122
+
123
+ ```bash
124
+ docker run -it --net=host --gpus='"device=0"' -v /path/to/model:/mnt/models --entrypoints="" ghcr.io/zhihu/zhilight/zhilight:0.4.21-cu124 python -m zhilight.server.openai.entrypoints.api_server --model-path /mnt/models --port 8000 --enable-reasoning --reasoning-parser deepseek-r1 --served-model-name Zhi-Create-Qwen3-32B
125
+
126
+ curl http://localhost:8000/v1/completions \
127
+ -H "Content-Type: application/json" \
128
+ -d '{
129
+ "model": "Zhi-Create-Qwen3-32B",
130
+ "prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
131
+ "max_tokens": 4096,
132
+ "temperature": 0.6,
133
+ "top_p": 0.95
134
+ }'
135
+ ```
136
+
137
+
138
  ### vllm
139
 
140
  For instance, you can easily start a service using [vLLM](https://github.com/vllm-project/vllm)