Qwen
/

Qwen3-32B-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

feihu.hf commited on May 1

Commit

0db3ae2

·

1 Parent(s): 34ba78d

update README

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -48,7 +48,7 @@ The following contains a code snippet illustrating how to use the model generate
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "Qwen/Qwen3-32B"
 # load the tokenizer and the model
 tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -95,11 +95,11 @@ print("content:", content)
 For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
 - SGLang:
     ```shell
-    python -m sglang.launch_server --model-path Qwen/Qwen3-32B --reasoning-parser qwen3
     ```
 - vLLM:
     ```shell
-    vllm serve Qwen/Qwen3-32B --enable-reasoning --reasoning-parser deepseek_r1
     ```
 Also check out our [AWQ documentation](https://qwen.readthedocs.io/en/latest/quantization/awq.html) for more usage guide.
@@ -157,7 +157,7 @@ Here is an example of a multi-turn conversation:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 class QwenChatbot:
-    def __init__(self, model_name="Qwen/Qwen3-32B"):
         self.tokenizer = AutoTokenizer.from_pretrained(model_name)
         self.model = AutoModelForCausalLM.from_pretrained(model_name)
         self.history = []
@@ -220,7 +220,7 @@ from qwen_agent.agents import Assistant
 # Define LLM
 llm_cfg = {
-    'model': 'Qwen3-32B',
     # Use the endpoint provided by Alibaba Model Studio:
     # 'model_type': 'qwen_dashscope',

 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "Qwen/Qwen3-32B-AWQ"
 # load the tokenizer and the model
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
 - SGLang:
     ```shell
+    python -m sglang.launch_server --model-path Qwen/Qwen3-32B-AWQ --reasoning-parser qwen3
     ```
 - vLLM:
     ```shell
+    vllm serve Qwen/Qwen3-32B-AWQ --enable-reasoning --reasoning-parser deepseek_r1
     ```
 Also check out our [AWQ documentation](https://qwen.readthedocs.io/en/latest/quantization/awq.html) for more usage guide.
 from transformers import AutoModelForCausalLM, AutoTokenizer
 class QwenChatbot:
+    def __init__(self, model_name="Qwen/Qwen3-32B-AWQ"):
         self.tokenizer = AutoTokenizer.from_pretrained(model_name)
         self.model = AutoModelForCausalLM.from_pretrained(model_name)
         self.history = []
 # Define LLM
 llm_cfg = {
+    'model': 'Qwen3-32B-AWQ',
     # Use the endpoint provided by Alibaba Model Studio:
     # 'model_type': 'qwen_dashscope',