Update README.md
Browse files
README.md
CHANGED
|
@@ -64,31 +64,7 @@ print(f"Prediction: {'Correct' if score > 0.5 else 'Incorrect'}")
|
|
| 64 |
|
| 65 |
## Training Details
|
| 66 |
|
| 67 |
-
This model was trained using the [Weaver distillation pipeline](https://github.com/
|
| 68 |
-
|
| 69 |
-
## Evaluation
|
| 70 |
-
|
| 71 |
-
Evaluate this model on different datasets:
|
| 72 |
-
|
| 73 |
-
```bash
|
| 74 |
-
# MATH500
|
| 75 |
-
python evaluate_crossencoder.py \
|
| 76 |
-
--model_name "Alibaba-NLP/gte-Qwen2-1.5B-instruct" \
|
| 77 |
-
--checkpoint_path "hazyresearch/Weaver_Distilled_All_Datasets_gte-Qwen2-1.5B-instruct" \
|
| 78 |
-
--dataset_path "hazyresearch/MATH500_with_Llama_3.1_70B_Instruct_v1" \
|
| 79 |
-
--dataset_split "data" \
|
| 80 |
-
--max_length 4096 \
|
| 81 |
-
--batch_size 64
|
| 82 |
-
|
| 83 |
-
# GPQA
|
| 84 |
-
python evaluate_crossencoder.py \
|
| 85 |
-
--model_name "Alibaba-NLP/gte-Qwen2-1.5B-instruct" \
|
| 86 |
-
--checkpoint_path "hazyresearch/Weaver_Distilled_All_Datasets_gte-Qwen2-1.5B-instruct" \
|
| 87 |
-
--dataset_path "hazyresearch/GPQA_with_Llama_3.1_70B_Instruct_v1" \
|
| 88 |
-
--dataset_split "data" \
|
| 89 |
-
--max_length 4096 \
|
| 90 |
-
--batch_size 64
|
| 91 |
-
```
|
| 92 |
|
| 93 |
## Citation
|
| 94 |
|
|
|
|
| 64 |
|
| 65 |
## Training Details
|
| 66 |
|
| 67 |
+
This model was trained using the [Weaver distillation pipeline](https://github.com/HazyResearch/scaling-verification) on a combined dataset spanning multiple reasoning domains. For training your own distilled models, see the [distillation README](https://github.com/HazyResearch/scaling-verification/blob/main/distillation/README.md).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
## Citation
|
| 70 |
|