update model with several fixings (#5)

Browse files

- remove outdated files and update accuracy numbers (ddd719bed59f5138a919973e2c2d1d5f7dc5f555)

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
 - ## Quantization Stragegy
-  - ***Quantized Layers***: All linear layers excluding "lm_head", "*.gate"
   - ***Weight***: Auto Mixed Precision quantized by Quark, each weight has either quantization scheme in candidates of
     - FP8 symmetric per-tensor
     - OCP Microscaling (MX) FP4
@@ -39,9 +39,9 @@ The quantization evaluation results are conducted in pseudo-quantization mode, w
 | Quant scheme | arc challenge (↑)<br>(acc) |       | gsm8k (↑)<br>(strict-match) |       | mmlu (↑)<br>(acc) |       | wikitext (↓)<br>(word_perplexity) |       | winogrande (↑)<br>(acc) |       |
 |--------------|-----------------------------|-------|-----------------------------|-------|-------------------|-------|-----------------------------------|-------|--------------------------|-------|
 |              | absolute value              | recovery rate | absolute value            | recovery rate | absolute value    | recovery rate | absolute value                    | recovery rate | absolute value            | recovery rate |
-| **FP16**     | 0.6271 | 100.0% | 0.6384 | 100.0% | 0.6893 | 100.0% | 5.7259 | 100.0% | 0.7664 | 100.0% |
 | **FP8**      | 0.6220 | 99.2%  | 0.6384 | 100.0% | 0.6816 | 98.9%  | 5.8092 | 98.6%  | 0.7830 | 102.2% |
-| <span style="background-color:#5afc65; font-weight:bold;">AMP</span> | 0.6237 | 99.5%  | 0.6133 | 96.1%  | 0.6765 | 98.1%  | 6.2020 | 92.3%  | 0.7569 | 98.8%  |
 | **MXFP4**    | 0.5922 | 94.4%  | 0.4845 | 75.9%  | 0.6364 | 92.3%  | 7.0935 | 80.7%  | 0.7474 | 97.5%  |
 #### License

 - ## Quantization Stragegy
+  - ***Quantized Layers***: All linear layers excluding `lm_head`, `*.gate`
   - ***Weight***: Auto Mixed Precision quantized by Quark, each weight has either quantization scheme in candidates of
     - FP8 symmetric per-tensor
     - OCP Microscaling (MX) FP4
 | Quant scheme | arc challenge (↑)<br>(acc) |       | gsm8k (↑)<br>(strict-match) |       | mmlu (↑)<br>(acc) |       | wikitext (↓)<br>(word_perplexity) |       | winogrande (↑)<br>(acc) |       |
 |--------------|-----------------------------|-------|-----------------------------|-------|-------------------|-------|-----------------------------------|-------|--------------------------|-------|
 |              | absolute value              | recovery rate | absolute value            | recovery rate | absolute value    | recovery rate | absolute value                    | recovery rate | absolute value            | recovery rate |
+| **BF16**     | 0.6271 | 100.0% | 0.6384 | 100.0% | 0.6893 | 100.0% | 5.7259 | 100.0% | 0.7664 | 100.0% |
 | **FP8**      | 0.6220 | 99.2%  | 0.6384 | 100.0% | 0.6816 | 98.9%  | 5.8092 | 98.6%  | 0.7830 | 102.2% |
+| <span style="background-color:#5afc65; font-weight:bold;">AMP</span> | 0.6195 | 98.8%  | 0.6277 | 98.3%  | 0.6806 | 98.7%  | 6.2026 | 92.3%  | 0.7719 | 100.7%  |
 | **MXFP4**    | 0.5922 | 94.4%  | 0.4845 | 75.9%  | 0.6364 | 92.3%  | 7.0935 | 80.7%  | 0.7474 | 97.5%  |
 #### License