File size: 5,766 Bytes

1c7dc77

---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: v1_1000_STEPS_1e7_rate_03_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# v1_1000_STEPS_1e7_rate_03_beta_DPO

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6480
- Rewards/chosen: -0.1617
- Rewards/rejected: -0.2816
- Rewards/accuracies: 0.5912
- Rewards/margins: 0.1199
- Logps/rejected: -17.8183
- Logps/chosen: -15.7920
- Logits/rejected: -3.3428
- Logits/chosen: -3.3429

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6922        | 0.05  | 50   | 0.6921          | -0.0001        | -0.0024          | 0.5033             | 0.0023          | -16.8875       | -15.2533     | -3.3539         | -3.3540       |
| 0.6805        | 0.1   | 100  | 0.6859          | -0.0076        | -0.0233          | 0.5626             | 0.0156          | -16.9571       | -15.2784     | -3.3527         | -3.3527       |
| 0.684         | 0.15  | 150  | 0.6780          | -0.0207        | -0.0549          | 0.5758             | 0.0342          | -17.0624       | -15.3221     | -3.3514         | -3.3515       |
| 0.668         | 0.2   | 200  | 0.6712          | -0.0524        | -0.1041          | 0.5736             | 0.0517          | -17.2267       | -15.4277     | -3.3479         | -3.3480       |
| 0.6602        | 0.24  | 250  | 0.6656          | -0.0787        | -0.1439          | 0.5802             | 0.0651          | -17.3591       | -15.5155     | -3.3454         | -3.3455       |
| 0.6512        | 0.29  | 300  | 0.6625          | -0.1164        | -0.1922          | 0.5780             | 0.0758          | -17.5202       | -15.6409     | -3.3452         | -3.3453       |
| 0.6949        | 0.34  | 350  | 0.6586          | -0.1002        | -0.1858          | 0.5956             | 0.0855          | -17.4988       | -15.5872     | -3.3448         | -3.3449       |
| 0.6836        | 0.39  | 400  | 0.6558          | -0.0983        | -0.1934          | 0.5890             | 0.0952          | -17.5242       | -15.5806     | -3.3452         | -3.3453       |
| 0.5895        | 0.44  | 450  | 0.6530          | -0.1263        | -0.2307          | 0.5846             | 0.1044          | -17.6486       | -15.6741     | -3.3440         | -3.3441       |
| 0.6855        | 0.49  | 500  | 0.6504          | -0.1226        | -0.2329          | 0.5890             | 0.1103          | -17.6558       | -15.6618     | -3.3435         | -3.3436       |
| 0.5863        | 0.54  | 550  | 0.6497          | -0.1490        | -0.2631          | 0.5868             | 0.1142          | -17.7566       | -15.7496     | -3.3433         | -3.3434       |
| 0.6496        | 0.59  | 600  | 0.6496          | -0.1503        | -0.2653          | 0.5868             | 0.1150          | -17.7639       | -15.7542     | -3.3431         | -3.3432       |
| 0.6113        | 0.64  | 650  | 0.6478          | -0.1488        | -0.2683          | 0.5934             | 0.1195          | -17.7738       | -15.7490     | -3.3432         | -3.3433       |
| 0.6582        | 0.68  | 700  | 0.6482          | -0.1563        | -0.2757          | 0.5890             | 0.1194          | -17.7985       | -15.7741     | -3.3428         | -3.3429       |
| 0.6477        | 0.73  | 750  | 0.6476          | -0.1590        | -0.2798          | 0.5868             | 0.1208          | -17.8123       | -15.7831     | -3.3428         | -3.3430       |
| 0.6137        | 0.78  | 800  | 0.6477          | -0.1601        | -0.2804          | 0.5912             | 0.1203          | -17.8141       | -15.7867     | -3.3427         | -3.3429       |
| 0.6539        | 0.83  | 850  | 0.6475          | -0.1611        | -0.2818          | 0.5890             | 0.1207          | -17.8188       | -15.7899     | -3.3428         | -3.3429       |
| 0.6508        | 0.88  | 900  | 0.6477          | -0.1607        | -0.2816          | 0.5912             | 0.1209          | -17.8182       | -15.7887     | -3.3428         | -3.3430       |
| 0.6543        | 0.93  | 950  | 0.6482          | -0.1619        | -0.2813          | 0.5934             | 0.1194          | -17.8172       | -15.7927     | -3.3428         | -3.3429       |
| 0.6219        | 0.98  | 1000 | 0.6480          | -0.1617        | -0.2816          | 0.5912             | 0.1199          | -17.8183       | -15.7920     | -3.3428         | -3.3429       |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2