Update README.md
Browse files
README.md
CHANGED
|
@@ -45,4 +45,28 @@ This repo only contains the AttnGates' weights for Qwen2.5-7B-Instruct Model.
|
|
| 45 |
| repobench-p | 65.34 / 65.58 | 61.06 / 62.66 | 57.17 / 57.07 |
|
| 46 |
| multifieldqa_en | 57.50 / 56.02 | 46.61 / 46.33 | 50.16 / 49.34 |
|
| 47 |
| averaged score | 53.72 / 53.94 | 50.52 / 50.78 | 48.21 / 48.73 |
|
| 48 |
-
| averaged density | 0.842 | 0.624 | 0.379 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
| repobench-p | 65.34 / 65.58 | 61.06 / 62.66 | 57.17 / 57.07 |
|
| 46 |
| multifieldqa_en | 57.50 / 56.02 | 46.61 / 46.33 | 50.16 / 49.34 |
|
| 47 |
| averaged score | 53.72 / 53.94 | 50.52 / 50.78 | 48.21 / 48.73 |
|
| 48 |
+
| averaged density | 0.842 | 0.624 | 0.379 |
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
## LongBenchV2 CoT Benchmark
|
| 54 |
+
|
| 55 |
+
All the SeerAttention models run with threshold=5e-4.
|
| 56 |
+
|
| 57 |
+
For R1-Distilled models, we remove the two passes generation setup (think + summary), we directly ask the models to output anwser after thinking. The generation max length is set to 10240.
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
| Model | Overall | Easy | Hard | Short | Medium | Long |
|
| 62 |
+
|:---|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 63 |
+
| [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | 30.4 | 31.2 | 29.9 | 37.8 | 24.7 | 29.6 |
|
| 64 |
+
| [SeerAttention-Llama-3.1-8B](https://huggingface.co/SeerAttention/SeerAttention-Llama-3.1-8B-AttnGates) | 31.6 | 33.3 | 30.5 | 33.9 | 31.6 | 27.8 |
|
| 65 |
+
| [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) | 34.8 | 37.5 | 33.1 | 44.4 | 32.1 | 24.1 |
|
| 66 |
+
| [SeerAttention-Qwen2.5-14B](https://huggingface.co/SeerAttention/SeerAttention-Qwen2.5-14B-AttnGates) | 32.8 | 38.0 | 29.6 | 45.0 | 30.2 | 17.6 |
|
| 67 |
+
| [Qwen2.5-32B-Instruct]((https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)) | 36.4 | 42.2 | 32.8 | 47.8 | 29.8 | 30.6 |
|
| 68 |
+
| [SeerAttention-Qwen2.5-32B](https://huggingface.co/SeerAttention/SeerAttention-Qwen2.5-32B-AttnGates) | 36.4 | 41.1 | 33.4 | 49.4 | 29.8 | 27.8 |
|
| 69 |
+
| [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) | 34.2 | 43.2 | 28.6 | 45.0 | 27.9 | 28.7 |
|
| 70 |
+
| [SeerAttention-DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/SeerAttention/SeerAttention-DeepSeek-R1-Distill-Qwen-14B-AttnGates) | 31.6 | 35.9 | 28.9 | 41.7 | 26.0 | 25.9 |
|
| 71 |
+
| [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 37.2 | 42.7 | 33.8 | 47.2 | 35.8 | 23.1 |
|
| 72 |
+
| [SeerAttention-DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/SeerAttention/SeerAttention-DeepSeek-R1-Distill-Qwen-32B-AttnGates) | 37.0 | 42.2 | 33.8 | 49.4 | 31.6 | 26.9 |
|