Update README.md
Browse files
README.md
CHANGED
|
@@ -260,4 +260,76 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
| 260 |
|
| 261 |
## Model Card Contact
|
| 262 |
|
| 263 |
-
[More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 260 |
|
| 261 |
## Model Card Contact
|
| 262 |
|
| 263 |
+
[More Information Needed]
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
|
| 267 |
+
<details><summary>See axolotl config</summary>
|
| 268 |
+
|
| 269 |
+
axolotl version: [`a98526ef7843a3e8aa006f260e6b4fb8912b5f1a`](https://github.com/axolotl-ai-cloud/axolotl/tree/a98526ef7843a3e8aa006f260e6b4fb8912b5f1a)
|
| 270 |
+
|
| 271 |
+
```yaml
|
| 272 |
+
base_model: mistralai/Mistral-Small-24B-Instruct-2501
|
| 273 |
+
|
| 274 |
+
plugins:
|
| 275 |
+
- axolotl.integrations.liger.LigerPlugin
|
| 276 |
+
liger_rope: true
|
| 277 |
+
liger_rms_norm: true
|
| 278 |
+
liger_swiglu: true
|
| 279 |
+
liger_fused_linear_cross_entropy: true
|
| 280 |
+
|
| 281 |
+
datasets:
|
| 282 |
+
- path: yentinglin/s1K-1.1-trl-format
|
| 283 |
+
type: chat_template
|
| 284 |
+
chat_template: tokenizer_default
|
| 285 |
+
field_messages: messages
|
| 286 |
+
message_field_role: role
|
| 287 |
+
message_field_content: content
|
| 288 |
+
- path: open-r1/OpenR1-Math-220k
|
| 289 |
+
type: chat_template
|
| 290 |
+
chat_template: tokenizer_default
|
| 291 |
+
field_messages: messages
|
| 292 |
+
message_field_role: from
|
| 293 |
+
message_field_content: value
|
| 294 |
+
dataset_prepared_path:
|
| 295 |
+
val_set_size: 0.0
|
| 296 |
+
output_dir: ./placeholder/
|
| 297 |
+
|
| 298 |
+
sequence_len: 32768
|
| 299 |
+
sample_packing: true
|
| 300 |
+
eval_sample_packing: False
|
| 301 |
+
pad_to_sequence_len: true
|
| 302 |
+
|
| 303 |
+
wandb_project: Reasoning
|
| 304 |
+
wandb_entity:
|
| 305 |
+
wandb_watch:
|
| 306 |
+
wandb_name: Mistral-24B-SFT-220k
|
| 307 |
+
wandb_log_model:
|
| 308 |
+
|
| 309 |
+
gradient_accumulation_steps: 4
|
| 310 |
+
micro_batch_size: 1
|
| 311 |
+
num_epochs: 5
|
| 312 |
+
optimizer: adamw_torch_fused
|
| 313 |
+
lr_scheduler: cosine
|
| 314 |
+
learning_rate: 2e-5
|
| 315 |
+
|
| 316 |
+
train_on_inputs: false
|
| 317 |
+
group_by_length: false
|
| 318 |
+
bf16: auto
|
| 319 |
+
tf32: false
|
| 320 |
+
|
| 321 |
+
gradient_checkpointing: true
|
| 322 |
+
gradient_checkpointing_kwargs:
|
| 323 |
+
use_reentrant: false
|
| 324 |
+
logging_steps: 1
|
| 325 |
+
flash_attention: true
|
| 326 |
+
|
| 327 |
+
warmup_ratio: 0.1
|
| 328 |
+
saves_per_epoch: 2
|
| 329 |
+
weight_decay: 0.0
|
| 330 |
+
deepspeed: deepspeed_configs/zero3_bf16.json
|
| 331 |
+
special_tokens:
|
| 332 |
+
pad_token: "<pad>"
|
| 333 |
+
```
|
| 334 |
+
|
| 335 |
+
</details><br>
|