minpeter commited on
Commit
ea0a2e0
·
verified ·
1 Parent(s): 43d9695

End of training

Browse files
Files changed (1) hide show
  1. README.md +32 -55
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
  datasets:
9
  - minpeter/apigen-mt-5k-friendli
10
  model-index:
11
- - name: LoRA-Qwen3-4b-v1-iteration-01-sf-apigen-01
12
  results: []
13
  ---
14
 
@@ -18,13 +18,13 @@ should probably proofread and complete it, then remove this comment. -->
18
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
  <details><summary>See axolotl config</summary>
20
 
21
- axolotl version: `0.9.2`
22
  ```yaml
23
  base_model: Qwen/Qwen3-4B
24
- hub_model_id: minpeter/LoRA-Qwen3-4b-v1-iteration-01-sf-apigen-01
25
 
26
- load_in_8bit: false
27
- load_in_4bit: false
28
  strict: false
29
 
30
  datasets:
@@ -38,36 +38,32 @@ datasets:
38
  message_property_mappings:
39
  role: role
40
  content: content
41
- shards: 3
42
  chat_template: chatml
43
 
44
  dataset_prepared_path: last_run_prepared
45
 
46
  output_dir: ./output
 
47
 
48
- adapter: lora
49
- lora_model_dir:
50
-
51
- sequence_len: 8192
52
- pad_to_sequence_len: true
53
  sample_packing: true
54
-
55
- val_set_size: 0.05
56
  eval_sample_packing: true
57
- evals_per_epoch: 3
58
 
59
- lora_r: 8
60
- lora_alpha: 16
61
- lora_dropout: 0.05
62
- lora_fan_in_fan_out:
63
  lora_target_modules:
64
- - gate_proj
65
- - down_proj
66
- - up_proj
67
  - q_proj
68
- - v_proj
69
  - k_proj
 
70
  - o_proj
 
 
 
 
 
71
 
72
  wandb_project: "axolotl"
73
  wandb_entity: "kasfiekfs-e"
@@ -76,45 +72,35 @@ wandb_name:
76
  wandb_log_model:
77
 
78
  gradient_accumulation_steps: 2
79
- micro_batch_size: 2
80
- num_epochs: 2
81
- optimizer: adamw_8bit
82
  lr_scheduler: cosine
83
  learning_rate: 0.0002
84
 
85
- train_on_inputs: false
86
- group_by_length: false
87
  bf16: auto
88
  tf32: true
89
 
90
- gradient_checkpointing: true
91
- early_stopping_patience:
 
92
  resume_from_checkpoint:
93
- local_rank:
94
  logging_steps: 1
95
- xformers_attention:
96
  flash_attention: true
97
 
98
- loss_watchdog_threshold: 5.0
99
- loss_watchdog_patience: 3
100
-
101
  warmup_steps: 10
 
102
  saves_per_epoch: 1
103
- debug:
104
- deepspeed:
105
  weight_decay: 0.0
106
- fsdp:
107
- fsdp_config:
108
 
109
  ```
110
 
111
  </details><br>
112
 
113
- # LoRA-Qwen3-4b-v1-iteration-01-sf-apigen-01
114
 
115
  This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) on the minpeter/apigen-mt-5k-friendli dataset.
116
- It achieves the following results on the evaluation set:
117
- - Loss: 0.2285
118
 
119
  ## Model description
120
 
@@ -134,27 +120,18 @@ More information needed
134
 
135
  The following hyperparameters were used during training:
136
  - learning_rate: 0.0002
137
- - train_batch_size: 2
138
- - eval_batch_size: 2
139
  - seed: 42
140
  - gradient_accumulation_steps: 2
141
- - total_train_batch_size: 4
142
- - optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
143
  - lr_scheduler_type: cosine
144
  - lr_scheduler_warmup_steps: 10
145
- - num_epochs: 2.0
146
 
147
  ### Training results
148
 
149
- | Training Loss | Epoch | Step | Validation Loss |
150
- |:-------------:|:------:|:----:|:---------------:|
151
- | 1.4432 | 0.0069 | 1 | 1.0528 |
152
- | 0.3253 | 0.3322 | 48 | 0.2922 |
153
- | 0.4198 | 0.6644 | 96 | 0.2638 |
154
- | 0.4426 | 0.9965 | 144 | 0.2449 |
155
- | 0.2287 | 1.3253 | 192 | 0.2340 |
156
- | 0.1526 | 1.6574 | 240 | 0.2299 |
157
- | 0.268 | 1.9896 | 288 | 0.2285 |
158
 
159
 
160
  ### Framework versions
 
8
  datasets:
9
  - minpeter/apigen-mt-5k-friendli
10
  model-index:
11
+ - name: LoRA-Qwen3-4b-v1-iteration-02-sf-apigen-02
12
  results: []
13
  ---
14
 
 
18
  [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
  <details><summary>See axolotl config</summary>
20
 
21
+ axolotl version: `0.10.0.dev0`
22
  ```yaml
23
  base_model: Qwen/Qwen3-4B
24
+ hub_model_id: minpeter/LoRA-Qwen3-4b-v1-iteration-02-sf-apigen-02
25
 
26
+ plugins:
27
+ - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
28
  strict: false
29
 
30
  datasets:
 
38
  message_property_mappings:
39
  role: role
40
  content: content
 
41
  chat_template: chatml
42
 
43
  dataset_prepared_path: last_run_prepared
44
 
45
  output_dir: ./output
46
+ val_set_size: 0.0
47
 
48
+ sequence_len: 20000
 
 
 
 
49
  sample_packing: true
 
 
50
  eval_sample_packing: true
51
+ pad_to_sequence_len: true
52
 
53
+ load_in_4bit: true
54
+ adapter: qlora
55
+ lora_r: 16
56
+ lora_alpha: 32
57
  lora_target_modules:
 
 
 
58
  - q_proj
 
59
  - k_proj
60
+ - v_proj
61
  - o_proj
62
+ - down_proj
63
+ - up_proj
64
+ lora_mlp_kernel: true
65
+ lora_qkv_kernel: true
66
+ lora_o_kernel: true
67
 
68
  wandb_project: "axolotl"
69
  wandb_entity: "kasfiekfs-e"
 
72
  wandb_log_model:
73
 
74
  gradient_accumulation_steps: 2
75
+ micro_batch_size: 1
76
+ num_epochs: 1
77
+ optimizer: adamw_torch_4bit
78
  lr_scheduler: cosine
79
  learning_rate: 0.0002
80
 
 
 
81
  bf16: auto
82
  tf32: true
83
 
84
+ gradient_checkpointing: offload
85
+ gradient_checkpointing_kwargs:
86
+ use_reentrant: false
87
  resume_from_checkpoint:
 
88
  logging_steps: 1
 
89
  flash_attention: true
90
 
 
 
 
91
  warmup_steps: 10
92
+ evals_per_epoch: 4
93
  saves_per_epoch: 1
 
 
94
  weight_decay: 0.0
95
+ special_tokens:
 
96
 
97
  ```
98
 
99
  </details><br>
100
 
101
+ # LoRA-Qwen3-4b-v1-iteration-02-sf-apigen-02
102
 
103
  This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) on the minpeter/apigen-mt-5k-friendli dataset.
 
 
104
 
105
  ## Model description
106
 
 
120
 
121
  The following hyperparameters were used during training:
122
  - learning_rate: 0.0002
123
+ - train_batch_size: 1
124
+ - eval_batch_size: 1
125
  - seed: 42
126
  - gradient_accumulation_steps: 2
127
+ - total_train_batch_size: 2
128
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_4BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
129
  - lr_scheduler_type: cosine
130
  - lr_scheduler_warmup_steps: 10
131
+ - num_epochs: 1.0
132
 
133
  ### Training results
134
 
 
 
 
 
 
 
 
 
 
135
 
136
 
137
  ### Framework versions