nm-research commited on
Commit
c572e99
·
verified ·
1 Parent(s): 86fbc64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -108
README.md CHANGED
@@ -27,7 +27,7 @@ base_model: Qwen/Qwen3-32B
27
  - **Activation quantization:** FP16
28
  - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
29
  - **Release Date:** 6/25/2025
30
- - **Version:** 1.0
31
  - **Model Developers:** RedHatAI
32
 
33
  This model is a quantized version of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B).
@@ -35,7 +35,7 @@ It was evaluated on a several tasks to assess the its quality in comparison to t
35
 
36
  ### Model Optimizations
37
 
38
- This model was obtained by quantizing the weights of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) to FP4 data type, ready for inference with vLLM>=0.9.1
39
  This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
40
 
41
  Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
@@ -53,7 +53,7 @@ from transformers import AutoTokenizer
53
  model_id = "RedHatAI/Qwen3-32B-NVFP4A16"
54
  number_gpus = 2
55
 
56
- sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)
57
 
58
  tokenizer = AutoTokenizer.from_pretrained(model_id)
59
 
@@ -161,8 +161,7 @@ tokenizer.save_pretrained(SAVE_DIR)
161
 
162
  This model was evaluated on the well-known OpenLLM v1, OpenLLM v2, HumanEval, and HumanEval_64 benchmarks. All evaluations were conducted using [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness).
163
 
164
- ### Accuracy
165
-
166
  <table>
167
  <thead>
168
  <tr>
@@ -176,115 +175,135 @@ This model was evaluated on the well-known OpenLLM v1, OpenLLM v2, HumanEval, an
176
  <tbody>
177
  <tr>
178
  <td rowspan="7"><b>OpenLLM V1</b></td>
179
- <td>mmlu</td>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
180
  <td></td>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
  <td></td>
 
 
 
 
 
 
182
  <td></td>
 
 
 
 
183
  </tr>
184
- <tr>
185
- <td>MMLU</td>
186
- <td></td>
187
- <td></td>
188
- <td></td>
189
- </tr>
190
- <tr>
191
- <td>ARC Challenge (0-shot)</td>
192
- <td></td>
193
- <td></td>
194
- <td></td>
195
- </tr>
196
- <tr>
197
- <td>GSM8K (8-shot, strict-match)</td>
198
- <td></td>
199
- <td></td>
200
- <td></td>
201
- </tr>
202
- <tr>
203
- <td>Hellaswag (10-shot)</td>
204
- <td></td>
205
- <td></td>
206
- <td></td>
207
- </tr>
208
- <tr>
209
- <td>Winogrande (5-shot)</td>
210
- <td></td>
211
- <td></td>
212
- <td></td>
213
- </tr>
214
- <tr>
215
- <td>TruthfulQA (0-shot, mc2)</td>
216
- <td></td>
217
- <td></td>
218
- <td></td>
219
- </tr>
220
- <tr>
221
- <td><b>Average</b></td>
222
- <td><b></b></td>
223
- <td><b></b></td>
224
- <td><b>%</b></td>
225
- </tr>
226
- <tr>
227
- <td rowspan="7"><b>OpenLLM V2</b></td>
228
- <td>MMLU-Pro (5-shot)</td>
229
- <td></td>
230
- <td></td>
231
- <td></td>
232
- </tr>
233
- <tr>
234
- <td>IFEval (0-shot)</td>
235
- <td></td>
236
- <td></td>
237
- <td></td>
238
- </tr>
239
- <tr>
240
- <td>BBH (3-shot)</td>
241
- <td></td>
242
- <td></td>
243
- <td></td>
244
- </tr>
245
- <tr>
246
- <td>Math-|v|-5 (4-shot)</td>
247
- <td></td>
248
- <td></td>
249
- <td></td>
250
- </tr>
251
- <tr>
252
- <td>GPQA (0-shot)</td>
253
- <td></td>
254
- <td></td>
255
- <td></td>
256
- </tr>
257
- <tr>
258
- <td>MuSR (0-shot)</td>
259
- <td></td>
260
- <td></td>
261
- <td></td>
262
- </tr>
263
- <tr>
264
- <td><b>Average</b></td>
265
- <td><b></b></td>
266
- <td><b></b></td>
267
- <td><b>%</b></td>
268
- </tr>
269
-
270
- <tr>
271
- <td><b>Coding</b></td>
272
- <td>HumanEval pass@1</td>
273
- <td></td>
274
- <td></td>
275
- <td></td>
276
- </tr>
277
- <tr>
278
- <td></td>
279
- <td>HumanEval_64 pass@2</td>
280
- <td></td>
281
- <td></td>
282
- <td></td>
283
- </tr>
284
- </tbody>
285
  </table>
286
 
287
-
288
  ### Reproduction
289
 
290
  The results were obtained using the following commands:
 
27
  - **Activation quantization:** FP16
28
  - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
29
  - **Release Date:** 6/25/2025
30
+ - **Version:** 10
31
  - **Model Developers:** RedHatAI
32
 
33
  This model is a quantized version of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B).
 
35
 
36
  ### Model Optimizations
37
 
38
+ This model was obtained by quantizing the weights of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) to FP4 data type, ready for inference with vLLM>=9.1
39
  This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
40
 
41
  Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
 
53
  model_id = "RedHatAI/Qwen3-32B-NVFP4A16"
54
  number_gpus = 2
55
 
56
+ sampling_params = SamplingParams(temperature=6, top_p=9, max_tokens=256)
57
 
58
  tokenizer = AutoTokenizer.from_pretrained(model_id)
59
 
 
161
 
162
  This model was evaluated on the well-known OpenLLM v1, OpenLLM v2, HumanEval, and HumanEval_64 benchmarks. All evaluations were conducted using [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness).
163
 
164
+ <h3>Accuracy</h3>
 
165
  <table>
166
  <thead>
167
  <tr>
 
175
  <tbody>
176
  <tr>
177
  <td rowspan="7"><b>OpenLLM V1</b></td>
178
+ <td>MMLU</td>
179
+ <td>80.94</td>
180
+ <td>80.57</td>
181
+ <td>99.55%</td>
182
+ </tr>
183
+ <tr>
184
+ <td>ARC Challenge (0-shot)</td>
185
+ <td>68.34</td>
186
+ <td>68.43</td>
187
+ <td>100.12%</td>
188
+ </tr>
189
+ <tr>
190
+ <td>GSM8K (8-shot, strict-match)</td>
191
+ <td>87.34</td>
192
+ <td>87.72</td>
193
+ <td>100.43%</td>
194
+ </tr>
195
+ <tr>
196
+ <td>Hellaswag (10-shot)</td>
197
+ <td>71.16</td>
198
+ <td>70.48</td>
199
+ <td>99.05%</td>
200
+ </tr>
201
+ <tr>
202
+ <td>Winogrande (5-shot)</td>
203
+ <td>69.93</td>
204
+ <td>70.09</td>
205
+ <td>100.23%</td>
206
+ </tr>
207
+ <tr>
208
+ <td>TruthfulQA (0-shot, mc2)</td>
209
+ <td>58.63</td>
210
+ <td>58.96</td>
211
+ <td>100.56%</td>
212
+ </tr>
213
+ <tr>
214
+ <td><b>Average</b></td>
215
+ <td><b>72.72</b></td>
216
+ <td><b>72.71</b></td>
217
+ <td><b>99.98%</b></td>
218
+ </tr>
219
+ <tr>
220
+ <td rowspan="7"><b>OpenLLM V2</b></td>
221
+ <td>MMLU-Pro (5-shot)</td>
222
+ <td>54.48</td>
223
+ <td>51.61</td>
224
+ <td>94.73%</td>
225
+ </tr>
226
+ <tr>
227
+ <td>IFEval (0-shot)</td>
228
+ <td>88.85</td>
229
+ <td>88.49</td>
230
+ <td>99.59%</td>
231
+ </tr>
232
+ <tr>
233
+ <td>BBH (3-shot)</td>
234
+ <td>62.61</td>
235
+ <td>62.14</td>
236
+ <td>99.25%</td>
237
+ </tr>
238
+ <tr>
239
+ <td>Math-|v|-5 (4-shot)</td>
240
+ <td>56.87</td>
241
+ <td>56.27</td>
242
+ <td>98.94%</td>
243
+ </tr>
244
+ <tr>
245
+ <td>GPQA (0-shot)</td>
246
+ <td>30.45</td>
247
+ <td>30.29</td>
248
+ <td>99.47%</td>
249
+ </tr>
250
+ <tr>
251
+ <td>MuSR (0-shot)</td>
252
+ <td>39.15</td>
253
+ <td>40.48</td>
254
+ <td>103.40%</td>
255
+ </tr>
256
+ <tr>
257
+ <td><b>Average</b></td>
258
+ <td><b>55.40</b></td>
259
+ <td><b>54.88</b></td>
260
+ <td><b>99.06%</b></td>
261
+ </tr>
262
+ <tr>
263
+ <td><b>Coding</b></td>
264
+ <td>HumanEval Instruct pass@1</td>
265
+ <td>88.41</td>
266
+ <td>87.20</td>
267
+ <td>98.63%</td>
268
+ </tr>
269
+ <tr>
270
  <td></td>
271
+ <td>HumanEval 64 Instruct pass@2</td>
272
+ <td>90.27</td>
273
+ <td>89.66</td>
274
+ <td>99.32%</td>
275
+ </tr>
276
+ <tr>
277
+ <td></td>
278
+ <td>HumanEval 64 Instruct pass@8</td>
279
+ <td>92.20</td>
280
+ <td>92.13</td>
281
+ <td>99.92%</td>
282
+ </tr>
283
+ <tr>
284
+ <td></td>
285
+ <td>HumanEval 64 Instruct pass@16</td>
286
+ <td>92.96</td>
287
+ <td>93.27</td>
288
+ <td>100.33%</td>
289
+ </tr>
290
+ <tr>
291
  <td></td>
292
+ <td>HumanEval 64 Instruct pass@32</td>
293
+ <td>93.58</td>
294
+ <td>94.47</td>
295
+ <td>100.95%</td>
296
+ </tr>
297
+ <tr>
298
  <td></td>
299
+ <td>HumanEval 64 Instruct pass@64</td>
300
+ <td>93.90</td>
301
+ <td>95.73</td>
302
+ <td>101.95%</td>
303
  </tr>
304
+ </tbody>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
305
  </table>
306
 
 
307
  ### Reproduction
308
 
309
  The results were obtained using the following commands: