ahmedaali commited on
Commit
8008053
·
verified ·
1 Parent(s): 720d8cc

End of training

Browse files
README.md CHANGED
@@ -4,6 +4,7 @@ license: apache-2.0
4
  base_model: Qwen/Qwen3-4B-Instruct-2507
5
  tags:
6
  - llama-factory
 
7
  - generated_from_trainer
8
  model-index:
9
  - name: obscura-blitz-v0.0.4-qwen-3
@@ -15,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # obscura-blitz-v0.0.4-qwen-3
17
 
18
- This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
  - Loss: 0.0464
21
 
 
4
  base_model: Qwen/Qwen3-4B-Instruct-2507
5
  tags:
6
  - llama-factory
7
+ - lora
8
  - generated_from_trainer
9
  model-index:
10
  - name: obscura-blitz-v0.0.4-qwen-3
 
16
 
17
  # obscura-blitz-v0.0.4-qwen-3
18
 
19
+ This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) on the obscura_finetune_train dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.0464
22
 
all_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_loss": 0.04641611501574516,
4
+ "eval_runtime": 48.5515,
5
+ "eval_samples_per_second": 4.119,
6
+ "eval_steps_per_second": 4.119,
7
+ "total_flos": 8.093314295223091e+16,
8
+ "train_loss": 0.09125252608899717,
9
+ "train_runtime": 5067.962,
10
+ "train_samples_per_second": 1.065,
11
+ "train_steps_per_second": 0.266
12
+ }
eval_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_loss": 0.04641611501574516,
4
+ "eval_runtime": 48.5515,
5
+ "eval_samples_per_second": 4.119,
6
+ "eval_steps_per_second": 4.119
7
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "total_flos": 8.093314295223091e+16,
4
+ "train_loss": 0.09125252608899717,
5
+ "train_runtime": 5067.962,
6
+ "train_samples_per_second": 1.065,
7
+ "train_steps_per_second": 0.266
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1092 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 3.0,
6
+ "eval_steps": 100,
7
+ "global_step": 1350,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.022234574763757644,
14
+ "grad_norm": 13.383837699890137,
15
+ "learning_rate": 6.666666666666667e-06,
16
+ "loss": 1.1717,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.04446914952751529,
21
+ "grad_norm": 8.375042915344238,
22
+ "learning_rate": 1.4074074074074075e-05,
23
+ "loss": 0.687,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.06670372429127293,
28
+ "grad_norm": 1.584594964981079,
29
+ "learning_rate": 2.148148148148148e-05,
30
+ "loss": 0.3803,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.08893829905503058,
35
+ "grad_norm": 0.9581405520439148,
36
+ "learning_rate": 2.8888888888888888e-05,
37
+ "loss": 0.1693,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.11117287381878821,
42
+ "grad_norm": 0.6701180338859558,
43
+ "learning_rate": 3.62962962962963e-05,
44
+ "loss": 0.3744,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.13340744858254586,
49
+ "grad_norm": 0.6323037147521973,
50
+ "learning_rate": 4.3703703703703705e-05,
51
+ "loss": 0.1978,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.1556420233463035,
56
+ "grad_norm": 0.796017050743103,
57
+ "learning_rate": 5.111111111111111e-05,
58
+ "loss": 0.2044,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.17787659811006115,
63
+ "grad_norm": 0.6810458898544312,
64
+ "learning_rate": 5.851851851851852e-05,
65
+ "loss": 0.1671,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.2001111728738188,
70
+ "grad_norm": 0.7922764420509338,
71
+ "learning_rate": 6.592592592592593e-05,
72
+ "loss": 0.2068,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.22234574763757642,
77
+ "grad_norm": 0.8138841986656189,
78
+ "learning_rate": 7.333333333333333e-05,
79
+ "loss": 0.1194,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.22234574763757642,
84
+ "eval_loss": 0.10435345023870468,
85
+ "eval_runtime": 48.2401,
86
+ "eval_samples_per_second": 4.146,
87
+ "eval_steps_per_second": 4.146,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.24458032240133407,
92
+ "grad_norm": 0.6520943641662598,
93
+ "learning_rate": 8.074074074074075e-05,
94
+ "loss": 0.1525,
95
+ "step": 110
96
+ },
97
+ {
98
+ "epoch": 0.2668148971650917,
99
+ "grad_norm": 1.0254029035568237,
100
+ "learning_rate": 8.814814814814815e-05,
101
+ "loss": 0.1429,
102
+ "step": 120
103
+ },
104
+ {
105
+ "epoch": 0.28904947192884933,
106
+ "grad_norm": 0.40128573775291443,
107
+ "learning_rate": 9.555555555555557e-05,
108
+ "loss": 0.1224,
109
+ "step": 130
110
+ },
111
+ {
112
+ "epoch": 0.311284046692607,
113
+ "grad_norm": 0.544367253780365,
114
+ "learning_rate": 9.999732574196451e-05,
115
+ "loss": 0.122,
116
+ "step": 140
117
+ },
118
+ {
119
+ "epoch": 0.33351862145636463,
120
+ "grad_norm": 0.5095152258872986,
121
+ "learning_rate": 9.996724362426075e-05,
122
+ "loss": 0.1241,
123
+ "step": 150
124
+ },
125
+ {
126
+ "epoch": 0.3557531962201223,
127
+ "grad_norm": 0.6605976819992065,
128
+ "learning_rate": 9.990375674425109e-05,
129
+ "loss": 0.0931,
130
+ "step": 160
131
+ },
132
+ {
133
+ "epoch": 0.3779877709838799,
134
+ "grad_norm": 0.6440847516059875,
135
+ "learning_rate": 9.980690754502393e-05,
136
+ "loss": 0.1106,
137
+ "step": 170
138
+ },
139
+ {
140
+ "epoch": 0.4002223457476376,
141
+ "grad_norm": 0.41629621386528015,
142
+ "learning_rate": 9.96767607734863e-05,
143
+ "loss": 0.0995,
144
+ "step": 180
145
+ },
146
+ {
147
+ "epoch": 0.4224569205113952,
148
+ "grad_norm": 0.4533500075340271,
149
+ "learning_rate": 9.951340343707852e-05,
150
+ "loss": 0.1155,
151
+ "step": 190
152
+ },
153
+ {
154
+ "epoch": 0.44469149527515284,
155
+ "grad_norm": 0.33990752696990967,
156
+ "learning_rate": 9.931694474560686e-05,
157
+ "loss": 0.1023,
158
+ "step": 200
159
+ },
160
+ {
161
+ "epoch": 0.44469149527515284,
162
+ "eval_loss": 0.08741892129182816,
163
+ "eval_runtime": 48.1931,
164
+ "eval_samples_per_second": 4.15,
165
+ "eval_steps_per_second": 4.15,
166
+ "step": 200
167
+ },
168
+ {
169
+ "epoch": 0.4669260700389105,
170
+ "grad_norm": 0.5338679552078247,
171
+ "learning_rate": 9.908751603823301e-05,
172
+ "loss": 0.1177,
173
+ "step": 210
174
+ },
175
+ {
176
+ "epoch": 0.48916064480266813,
177
+ "grad_norm": 0.817176342010498,
178
+ "learning_rate": 9.882527069566965e-05,
179
+ "loss": 0.0899,
180
+ "step": 220
181
+ },
182
+ {
183
+ "epoch": 0.5113952195664258,
184
+ "grad_norm": 0.28271085023880005,
185
+ "learning_rate": 9.853038403764021e-05,
186
+ "loss": 0.1285,
187
+ "step": 230
188
+ },
189
+ {
190
+ "epoch": 0.5336297943301834,
191
+ "grad_norm": 0.4720575511455536,
192
+ "learning_rate": 9.820305320567192e-05,
193
+ "loss": 0.116,
194
+ "step": 240
195
+ },
196
+ {
197
+ "epoch": 0.5558643690939411,
198
+ "grad_norm": 0.2770315706729889,
199
+ "learning_rate": 9.784349703130007e-05,
200
+ "loss": 0.1355,
201
+ "step": 250
202
+ },
203
+ {
204
+ "epoch": 0.5780989438576987,
205
+ "grad_norm": 0.45048788189888,
206
+ "learning_rate": 9.745195588977192e-05,
207
+ "loss": 0.1187,
208
+ "step": 260
209
+ },
210
+ {
211
+ "epoch": 0.6003335186214563,
212
+ "grad_norm": 0.3435899317264557,
213
+ "learning_rate": 9.702869153934782e-05,
214
+ "loss": 0.1505,
215
+ "step": 270
216
+ },
217
+ {
218
+ "epoch": 0.622568093385214,
219
+ "grad_norm": 0.47825339436531067,
220
+ "learning_rate": 9.657398694630712e-05,
221
+ "loss": 0.113,
222
+ "step": 280
223
+ },
224
+ {
225
+ "epoch": 0.6448026681489717,
226
+ "grad_norm": 0.19181689620018005,
227
+ "learning_rate": 9.608814609577585e-05,
228
+ "loss": 0.0761,
229
+ "step": 290
230
+ },
231
+ {
232
+ "epoch": 0.6670372429127293,
233
+ "grad_norm": 0.2734505832195282,
234
+ "learning_rate": 9.557149378850254e-05,
235
+ "loss": 0.0873,
236
+ "step": 300
237
+ },
238
+ {
239
+ "epoch": 0.6670372429127293,
240
+ "eval_loss": 0.07399436831474304,
241
+ "eval_runtime": 48.0939,
242
+ "eval_samples_per_second": 4.159,
243
+ "eval_steps_per_second": 4.159,
244
+ "step": 300
245
+ },
246
+ {
247
+ "epoch": 0.6892718176764869,
248
+ "grad_norm": 0.438323050737381,
249
+ "learning_rate": 9.502437542371812e-05,
250
+ "loss": 0.105,
251
+ "step": 310
252
+ },
253
+ {
254
+ "epoch": 0.7115063924402446,
255
+ "grad_norm": 0.694514274597168,
256
+ "learning_rate": 9.444715676822501e-05,
257
+ "loss": 0.1134,
258
+ "step": 320
259
+ },
260
+ {
261
+ "epoch": 0.7337409672040022,
262
+ "grad_norm": 0.5426012277603149,
263
+ "learning_rate": 9.384022371187003e-05,
264
+ "loss": 0.1102,
265
+ "step": 330
266
+ },
267
+ {
268
+ "epoch": 0.7559755419677598,
269
+ "grad_norm": 0.38747844099998474,
270
+ "learning_rate": 9.320398200956403e-05,
271
+ "loss": 0.0883,
272
+ "step": 340
273
+ },
274
+ {
275
+ "epoch": 0.7782101167315175,
276
+ "grad_norm": 0.33049455285072327,
277
+ "learning_rate": 9.253885701002134e-05,
278
+ "loss": 0.1114,
279
+ "step": 350
280
+ },
281
+ {
282
+ "epoch": 0.8004446914952752,
283
+ "grad_norm": 0.2674323320388794,
284
+ "learning_rate": 9.184529337140002e-05,
285
+ "loss": 0.0803,
286
+ "step": 360
287
+ },
288
+ {
289
+ "epoch": 0.8226792662590328,
290
+ "grad_norm": 0.31980791687965393,
291
+ "learning_rate": 9.112375476403312e-05,
292
+ "loss": 0.1024,
293
+ "step": 370
294
+ },
295
+ {
296
+ "epoch": 0.8449138410227904,
297
+ "grad_norm": 0.15382544696331024,
298
+ "learning_rate": 9.037472356044962e-05,
299
+ "loss": 0.0588,
300
+ "step": 380
301
+ },
302
+ {
303
+ "epoch": 0.8671484157865481,
304
+ "grad_norm": 0.23380494117736816,
305
+ "learning_rate": 8.959870051289241e-05,
306
+ "loss": 0.0549,
307
+ "step": 390
308
+ },
309
+ {
310
+ "epoch": 0.8893829905503057,
311
+ "grad_norm": 0.2885076105594635,
312
+ "learning_rate": 8.879620441854872e-05,
313
+ "loss": 0.1051,
314
+ "step": 400
315
+ },
316
+ {
317
+ "epoch": 0.8893829905503057,
318
+ "eval_loss": 0.06723224371671677,
319
+ "eval_runtime": 48.2096,
320
+ "eval_samples_per_second": 4.149,
321
+ "eval_steps_per_second": 4.149,
322
+ "step": 400
323
+ },
324
+ {
325
+ "epoch": 0.9116175653140633,
326
+ "grad_norm": 0.3105609714984894,
327
+ "learning_rate": 8.796777177271708e-05,
328
+ "loss": 0.0823,
329
+ "step": 410
330
+ },
331
+ {
332
+ "epoch": 0.933852140077821,
333
+ "grad_norm": 0.4871651828289032,
334
+ "learning_rate": 8.711395641014228e-05,
335
+ "loss": 0.095,
336
+ "step": 420
337
+ },
338
+ {
339
+ "epoch": 0.9560867148415787,
340
+ "grad_norm": 0.34139370918273926,
341
+ "learning_rate": 8.623532913475847e-05,
342
+ "loss": 0.0742,
343
+ "step": 430
344
+ },
345
+ {
346
+ "epoch": 0.9783212896053363,
347
+ "grad_norm": 0.1284688413143158,
348
+ "learning_rate": 8.533247733808776e-05,
349
+ "loss": 0.074,
350
+ "step": 440
351
+ },
352
+ {
353
+ "epoch": 1.0,
354
+ "grad_norm": 0.15243172645568848,
355
+ "learning_rate": 8.440600460654958e-05,
356
+ "loss": 0.1033,
357
+ "step": 450
358
+ },
359
+ {
360
+ "epoch": 1.0222345747637576,
361
+ "grad_norm": 0.29170939326286316,
362
+ "learning_rate": 8.345653031794292e-05,
363
+ "loss": 0.0786,
364
+ "step": 460
365
+ },
366
+ {
367
+ "epoch": 1.0444691495275154,
368
+ "grad_norm": 0.1913299411535263,
369
+ "learning_rate": 8.248468922737188e-05,
370
+ "loss": 0.0568,
371
+ "step": 470
372
+ },
373
+ {
374
+ "epoch": 1.066703724291273,
375
+ "grad_norm": 0.38078683614730835,
376
+ "learning_rate": 8.149113104289063e-05,
377
+ "loss": 0.0809,
378
+ "step": 480
379
+ },
380
+ {
381
+ "epoch": 1.0889382990550305,
382
+ "grad_norm": 0.38182222843170166,
383
+ "learning_rate": 8.047651999115217e-05,
384
+ "loss": 0.0758,
385
+ "step": 490
386
+ },
387
+ {
388
+ "epoch": 1.1111728738187883,
389
+ "grad_norm": 0.13781729340553284,
390
+ "learning_rate": 7.944153437335057e-05,
391
+ "loss": 0.0636,
392
+ "step": 500
393
+ },
394
+ {
395
+ "epoch": 1.1111728738187883,
396
+ "eval_loss": 0.06643614917993546,
397
+ "eval_runtime": 48.2161,
398
+ "eval_samples_per_second": 4.148,
399
+ "eval_steps_per_second": 4.148,
400
+ "step": 500
401
+ },
402
+ {
403
+ "epoch": 1.1334074485825458,
404
+ "grad_norm": 0.06484173983335495,
405
+ "learning_rate": 7.838686611175421e-05,
406
+ "loss": 0.068,
407
+ "step": 510
408
+ },
409
+ {
410
+ "epoch": 1.1556420233463034,
411
+ "grad_norm": 0.34467655420303345,
412
+ "learning_rate": 7.73132202871327e-05,
413
+ "loss": 0.0778,
414
+ "step": 520
415
+ },
416
+ {
417
+ "epoch": 1.1778765981100612,
418
+ "grad_norm": 0.35296931862831116,
419
+ "learning_rate": 7.6221314667387e-05,
420
+ "loss": 0.0796,
421
+ "step": 530
422
+ },
423
+ {
424
+ "epoch": 1.2001111728738187,
425
+ "grad_norm": 0.09108947217464447,
426
+ "learning_rate": 7.511187922769768e-05,
427
+ "loss": 0.0643,
428
+ "step": 540
429
+ },
430
+ {
431
+ "epoch": 1.2223457476375765,
432
+ "grad_norm": 0.3470743000507355,
433
+ "learning_rate": 7.398565566251232e-05,
434
+ "loss": 0.0716,
435
+ "step": 550
436
+ },
437
+ {
438
+ "epoch": 1.244580322401334,
439
+ "grad_norm": 0.23976042866706848,
440
+ "learning_rate": 7.284339688969809e-05,
441
+ "loss": 0.051,
442
+ "step": 560
443
+ },
444
+ {
445
+ "epoch": 1.2668148971650917,
446
+ "grad_norm": 0.36250776052474976,
447
+ "learning_rate": 7.168586654719117e-05,
448
+ "loss": 0.0608,
449
+ "step": 570
450
+ },
451
+ {
452
+ "epoch": 1.2890494719288492,
453
+ "grad_norm": 0.31230035424232483,
454
+ "learning_rate": 7.051383848247942e-05,
455
+ "loss": 0.0565,
456
+ "step": 580
457
+ },
458
+ {
459
+ "epoch": 1.311284046692607,
460
+ "grad_norm": 0.22365595400333405,
461
+ "learning_rate": 6.932809623525957e-05,
462
+ "loss": 0.0735,
463
+ "step": 590
464
+ },
465
+ {
466
+ "epoch": 1.3335186214563646,
467
+ "grad_norm": 0.26981058716773987,
468
+ "learning_rate": 6.812943251361505e-05,
469
+ "loss": 0.072,
470
+ "step": 600
471
+ },
472
+ {
473
+ "epoch": 1.3335186214563646,
474
+ "eval_loss": 0.06549877673387527,
475
+ "eval_runtime": 48.2581,
476
+ "eval_samples_per_second": 4.144,
477
+ "eval_steps_per_second": 4.144,
478
+ "step": 600
479
+ },
480
+ {
481
+ "epoch": 1.3557531962201224,
482
+ "grad_norm": 0.3754810690879822,
483
+ "learning_rate": 6.691864866406407e-05,
484
+ "loss": 0.0678,
485
+ "step": 610
486
+ },
487
+ {
488
+ "epoch": 1.37798777098388,
489
+ "grad_norm": 0.31102293729782104,
490
+ "learning_rate": 6.569655413583306e-05,
491
+ "loss": 0.0946,
492
+ "step": 620
493
+ },
494
+ {
495
+ "epoch": 1.4002223457476375,
496
+ "grad_norm": 0.2776915729045868,
497
+ "learning_rate": 6.446396593971294e-05,
498
+ "loss": 0.0649,
499
+ "step": 630
500
+ },
501
+ {
502
+ "epoch": 1.4224569205113953,
503
+ "grad_norm": 0.5137710571289062,
504
+ "learning_rate": 6.322170810186012e-05,
505
+ "loss": 0.0718,
506
+ "step": 640
507
+ },
508
+ {
509
+ "epoch": 1.4446914952751528,
510
+ "grad_norm": 0.255832314491272,
511
+ "learning_rate": 6.197061111290779e-05,
512
+ "loss": 0.0705,
513
+ "step": 650
514
+ },
515
+ {
516
+ "epoch": 1.4669260700389106,
517
+ "grad_norm": 0.19154119491577148,
518
+ "learning_rate": 6.07115113727553e-05,
519
+ "loss": 0.0682,
520
+ "step": 660
521
+ },
522
+ {
523
+ "epoch": 1.4891606448026682,
524
+ "grad_norm": 0.2686958909034729,
525
+ "learning_rate": 5.9445250631407024e-05,
526
+ "loss": 0.081,
527
+ "step": 670
528
+ },
529
+ {
530
+ "epoch": 1.5113952195664258,
531
+ "grad_norm": 0.31166499853134155,
532
+ "learning_rate": 5.817267542623451e-05,
533
+ "loss": 0.0574,
534
+ "step": 680
535
+ },
536
+ {
537
+ "epoch": 1.5336297943301833,
538
+ "grad_norm": 0.22264094650745392,
539
+ "learning_rate": 5.689463651603818e-05,
540
+ "loss": 0.0513,
541
+ "step": 690
542
+ },
543
+ {
544
+ "epoch": 1.555864369093941,
545
+ "grad_norm": 0.23241780698299408,
546
+ "learning_rate": 5.561198831228675e-05,
547
+ "loss": 0.0807,
548
+ "step": 700
549
+ },
550
+ {
551
+ "epoch": 1.555864369093941,
552
+ "eval_loss": 0.05730433017015457,
553
+ "eval_runtime": 48.1316,
554
+ "eval_samples_per_second": 4.155,
555
+ "eval_steps_per_second": 4.155,
556
+ "step": 700
557
+ },
558
+ {
559
+ "epoch": 1.5780989438576987,
560
+ "grad_norm": 0.36841678619384766,
561
+ "learning_rate": 5.432558830791479e-05,
562
+ "loss": 0.0601,
563
+ "step": 710
564
+ },
565
+ {
566
+ "epoch": 1.6003335186214565,
567
+ "grad_norm": 0.20728759467601776,
568
+ "learning_rate": 5.3036296504060235e-05,
569
+ "loss": 0.0841,
570
+ "step": 720
571
+ },
572
+ {
573
+ "epoch": 1.622568093385214,
574
+ "grad_norm": 0.10159023851156235,
575
+ "learning_rate": 5.174497483512506e-05,
576
+ "loss": 0.042,
577
+ "step": 730
578
+ },
579
+ {
580
+ "epoch": 1.6448026681489716,
581
+ "grad_norm": 0.31575503945350647,
582
+ "learning_rate": 5.045248659254344e-05,
583
+ "loss": 0.0829,
584
+ "step": 740
585
+ },
586
+ {
587
+ "epoch": 1.6670372429127291,
588
+ "grad_norm": 0.1763896644115448,
589
+ "learning_rate": 4.915969584764282e-05,
590
+ "loss": 0.0893,
591
+ "step": 750
592
+ },
593
+ {
594
+ "epoch": 1.689271817676487,
595
+ "grad_norm": 0.3741007447242737,
596
+ "learning_rate": 4.7867466873983464e-05,
597
+ "loss": 0.0694,
598
+ "step": 760
599
+ },
600
+ {
601
+ "epoch": 1.7115063924402447,
602
+ "grad_norm": 0.28057777881622314,
603
+ "learning_rate": 4.657666356956296e-05,
604
+ "loss": 0.0499,
605
+ "step": 770
606
+ },
607
+ {
608
+ "epoch": 1.7337409672040023,
609
+ "grad_norm": 0.23745323717594147,
610
+ "learning_rate": 4.528814887927157e-05,
611
+ "loss": 0.063,
612
+ "step": 780
613
+ },
614
+ {
615
+ "epoch": 1.7559755419677598,
616
+ "grad_norm": 0.22828607261180878,
617
+ "learning_rate": 4.400278421798501e-05,
618
+ "loss": 0.0623,
619
+ "step": 790
620
+ },
621
+ {
622
+ "epoch": 1.7782101167315174,
623
+ "grad_norm": 0.35160404443740845,
624
+ "learning_rate": 4.272142889468002e-05,
625
+ "loss": 0.0536,
626
+ "step": 800
627
+ },
628
+ {
629
+ "epoch": 1.7782101167315174,
630
+ "eval_loss": 0.05802774429321289,
631
+ "eval_runtime": 48.2104,
632
+ "eval_samples_per_second": 4.148,
633
+ "eval_steps_per_second": 4.148,
634
+ "step": 800
635
+ },
636
+ {
637
+ "epoch": 1.8004446914952752,
638
+ "grad_norm": 0.30460554361343384,
639
+ "learning_rate": 4.144493953795759e-05,
640
+ "loss": 0.074,
641
+ "step": 810
642
+ },
643
+ {
644
+ "epoch": 1.8226792662590328,
645
+ "grad_norm": 0.1435527503490448,
646
+ "learning_rate": 4.017416952335849e-05,
647
+ "loss": 0.0576,
648
+ "step": 820
649
+ },
650
+ {
651
+ "epoch": 1.8449138410227905,
652
+ "grad_norm": 0.13923799991607666,
653
+ "learning_rate": 3.890996840285328e-05,
654
+ "loss": 0.0441,
655
+ "step": 830
656
+ },
657
+ {
658
+ "epoch": 1.867148415786548,
659
+ "grad_norm": 0.2655491232872009,
660
+ "learning_rate": 3.765318133688853e-05,
661
+ "loss": 0.0779,
662
+ "step": 840
663
+ },
664
+ {
665
+ "epoch": 1.8893829905503057,
666
+ "grad_norm": 0.2776850759983063,
667
+ "learning_rate": 3.640464852936909e-05,
668
+ "loss": 0.0552,
669
+ "step": 850
670
+ },
671
+ {
672
+ "epoch": 1.9116175653140632,
673
+ "grad_norm": 0.10389228910207748,
674
+ "learning_rate": 3.5165204665953875e-05,
675
+ "loss": 0.0545,
676
+ "step": 860
677
+ },
678
+ {
679
+ "epoch": 1.933852140077821,
680
+ "grad_norm": 0.17789633572101593,
681
+ "learning_rate": 3.393567835604063e-05,
682
+ "loss": 0.0647,
683
+ "step": 870
684
+ },
685
+ {
686
+ "epoch": 1.9560867148415788,
687
+ "grad_norm": 0.19800323247909546,
688
+ "learning_rate": 3.271689157881317e-05,
689
+ "loss": 0.0728,
690
+ "step": 880
691
+ },
692
+ {
693
+ "epoch": 1.9783212896053364,
694
+ "grad_norm": 0.33102431893348694,
695
+ "learning_rate": 3.150965913372095e-05,
696
+ "loss": 0.0566,
697
+ "step": 890
698
+ },
699
+ {
700
+ "epoch": 2.0,
701
+ "grad_norm": 0.31623104214668274,
702
+ "learning_rate": 3.031478809575852e-05,
703
+ "loss": 0.0453,
704
+ "step": 900
705
+ },
706
+ {
707
+ "epoch": 2.0,
708
+ "eval_loss": 0.0510103702545166,
709
+ "eval_runtime": 48.2625,
710
+ "eval_samples_per_second": 4.144,
711
+ "eval_steps_per_second": 4.144,
712
+ "step": 900
713
+ },
714
+ {
715
+ "epoch": 2.0222345747637576,
716
+ "grad_norm": 0.190277099609375,
717
+ "learning_rate": 2.9133077275909108e-05,
718
+ "loss": 0.0461,
719
+ "step": 910
720
+ },
721
+ {
722
+ "epoch": 2.044469149527515,
723
+ "grad_norm": 0.30117812752723694,
724
+ "learning_rate": 2.7965316687112976e-05,
725
+ "loss": 0.0457,
726
+ "step": 920
727
+ },
728
+ {
729
+ "epoch": 2.066703724291273,
730
+ "grad_norm": 0.22665348649024963,
731
+ "learning_rate": 2.6812287016117477e-05,
732
+ "loss": 0.0416,
733
+ "step": 930
734
+ },
735
+ {
736
+ "epoch": 2.0889382990550307,
737
+ "grad_norm": 0.26945072412490845,
738
+ "learning_rate": 2.5674759101562006e-05,
739
+ "loss": 0.0492,
740
+ "step": 940
741
+ },
742
+ {
743
+ "epoch": 2.1111728738187883,
744
+ "grad_norm": 0.20119303464889526,
745
+ "learning_rate": 2.455349341864685e-05,
746
+ "loss": 0.0506,
747
+ "step": 950
748
+ },
749
+ {
750
+ "epoch": 2.133407448582546,
751
+ "grad_norm": 0.16967110335826874,
752
+ "learning_rate": 2.344923957073021e-05,
753
+ "loss": 0.0438,
754
+ "step": 960
755
+ },
756
+ {
757
+ "epoch": 2.1556420233463034,
758
+ "grad_norm": 0.17140169441699982,
759
+ "learning_rate": 2.2362735788193367e-05,
760
+ "loss": 0.0337,
761
+ "step": 970
762
+ },
763
+ {
764
+ "epoch": 2.177876598110061,
765
+ "grad_norm": 0.22932595014572144,
766
+ "learning_rate": 2.129470843490932e-05,
767
+ "loss": 0.0539,
768
+ "step": 980
769
+ },
770
+ {
771
+ "epoch": 2.200111172873819,
772
+ "grad_norm": 0.24180778861045837,
773
+ "learning_rate": 2.024587152264428e-05,
774
+ "loss": 0.0317,
775
+ "step": 990
776
+ },
777
+ {
778
+ "epoch": 2.2223457476375765,
779
+ "grad_norm": 0.4643738865852356,
780
+ "learning_rate": 1.9216926233717085e-05,
781
+ "loss": 0.0469,
782
+ "step": 1000
783
+ },
784
+ {
785
+ "epoch": 2.2223457476375765,
786
+ "eval_loss": 0.048854030668735504,
787
+ "eval_runtime": 48.1955,
788
+ "eval_samples_per_second": 4.15,
789
+ "eval_steps_per_second": 4.15,
790
+ "step": 1000
791
+ },
792
+ {
793
+ "epoch": 2.244580322401334,
794
+ "grad_norm": 0.266984224319458,
795
+ "learning_rate": 1.8208560452235625e-05,
796
+ "loss": 0.0614,
797
+ "step": 1010
798
+ },
799
+ {
800
+ "epoch": 2.2668148971650917,
801
+ "grad_norm": 0.24567271769046783,
802
+ "learning_rate": 1.7221448304223327e-05,
803
+ "loss": 0.0385,
804
+ "step": 1020
805
+ },
806
+ {
807
+ "epoch": 2.289049471928849,
808
+ "grad_norm": 0.15998658537864685,
809
+ "learning_rate": 1.6256249706943628e-05,
810
+ "loss": 0.0482,
811
+ "step": 1030
812
+ },
813
+ {
814
+ "epoch": 2.311284046692607,
815
+ "grad_norm": 0.2101755291223526,
816
+ "learning_rate": 1.5313609927723332e-05,
817
+ "loss": 0.0419,
818
+ "step": 1040
819
+ },
820
+ {
821
+ "epoch": 2.333518621456365,
822
+ "grad_norm": 0.10369472205638885,
823
+ "learning_rate": 1.4394159152569903e-05,
824
+ "loss": 0.0248,
825
+ "step": 1050
826
+ },
827
+ {
828
+ "epoch": 2.3557531962201224,
829
+ "grad_norm": 0.3291586637496948,
830
+ "learning_rate": 1.3498512064871271e-05,
831
+ "loss": 0.0611,
832
+ "step": 1060
833
+ },
834
+ {
835
+ "epoch": 2.37798777098388,
836
+ "grad_norm": 0.5122426748275757,
837
+ "learning_rate": 1.262726743445954e-05,
838
+ "loss": 0.0628,
839
+ "step": 1070
840
+ },
841
+ {
842
+ "epoch": 2.4002223457476375,
843
+ "grad_norm": 0.1757289469242096,
844
+ "learning_rate": 1.178100771731339e-05,
845
+ "loss": 0.0414,
846
+ "step": 1080
847
+ },
848
+ {
849
+ "epoch": 2.422456920511395,
850
+ "grad_norm": 0.3590919077396393,
851
+ "learning_rate": 1.096029866616704e-05,
852
+ "loss": 0.0349,
853
+ "step": 1090
854
+ },
855
+ {
856
+ "epoch": 2.444691495275153,
857
+ "grad_norm": 0.21179239451885223,
858
+ "learning_rate": 1.0165688952285651e-05,
859
+ "loss": 0.0318,
860
+ "step": 1100
861
+ },
862
+ {
863
+ "epoch": 2.444691495275153,
864
+ "eval_loss": 0.048208702355623245,
865
+ "eval_runtime": 48.1747,
866
+ "eval_samples_per_second": 4.152,
867
+ "eval_steps_per_second": 4.152,
868
+ "step": 1100
869
+ },
870
+ {
871
+ "epoch": 2.4669260700389106,
872
+ "grad_norm": 0.2429758608341217,
873
+ "learning_rate": 9.397709798660359e-06,
874
+ "loss": 0.0389,
875
+ "step": 1110
876
+ },
877
+ {
878
+ "epoch": 2.489160644802668,
879
+ "grad_norm": 0.3247833251953125,
880
+ "learning_rate": 8.656874624868134e-06,
881
+ "loss": 0.0474,
882
+ "step": 1120
883
+ },
884
+ {
885
+ "epoch": 2.5113952195664258,
886
+ "grad_norm": 0.20058025419712067,
887
+ "learning_rate": 7.943678703833657e-06,
888
+ "loss": 0.0446,
889
+ "step": 1130
890
+ },
891
+ {
892
+ "epoch": 2.5336297943301833,
893
+ "grad_norm": 0.22172123193740845,
894
+ "learning_rate": 7.258598830722946e-06,
895
+ "loss": 0.0429,
896
+ "step": 1140
897
+ },
898
+ {
899
+ "epoch": 2.555864369093941,
900
+ "grad_norm": 0.3664150834083557,
901
+ "learning_rate": 6.6020930041899635e-06,
902
+ "loss": 0.0487,
903
+ "step": 1150
904
+ },
905
+ {
906
+ "epoch": 2.5780989438576984,
907
+ "grad_norm": 0.13659419119358063,
908
+ "learning_rate": 5.974600120189289e-06,
909
+ "loss": 0.0438,
910
+ "step": 1160
911
+ },
912
+ {
913
+ "epoch": 2.6003335186214565,
914
+ "grad_norm": 0.18766269087791443,
915
+ "learning_rate": 5.376539678559567e-06,
916
+ "loss": 0.0385,
917
+ "step": 1170
918
+ },
919
+ {
920
+ "epoch": 2.622568093385214,
921
+ "grad_norm": 0.24047650396823883,
922
+ "learning_rate": 4.8083115025739756e-06,
923
+ "loss": 0.0413,
924
+ "step": 1180
925
+ },
926
+ {
927
+ "epoch": 2.6448026681489716,
928
+ "grad_norm": 0.10476374626159668,
929
+ "learning_rate": 4.270295471645064e-06,
930
+ "loss": 0.0426,
931
+ "step": 1190
932
+ },
933
+ {
934
+ "epoch": 2.667037242912729,
935
+ "grad_norm": 0.3628266155719757,
936
+ "learning_rate": 3.7628512673627215e-06,
937
+ "loss": 0.0527,
938
+ "step": 1200
939
+ },
940
+ {
941
+ "epoch": 2.667037242912729,
942
+ "eval_loss": 0.04677248001098633,
943
+ "eval_runtime": 48.3089,
944
+ "eval_samples_per_second": 4.14,
945
+ "eval_steps_per_second": 4.14,
946
+ "step": 1200
947
+ },
948
+ {
949
+ "epoch": 2.689271817676487,
950
+ "grad_norm": 0.2052982747554779,
951
+ "learning_rate": 3.286318133035132e-06,
952
+ "loss": 0.0394,
953
+ "step": 1210
954
+ },
955
+ {
956
+ "epoch": 2.7115063924402447,
957
+ "grad_norm": 0.14959484338760376,
958
+ "learning_rate": 2.8410146468933364e-06,
959
+ "loss": 0.0351,
960
+ "step": 1220
961
+ },
962
+ {
963
+ "epoch": 2.7337409672040023,
964
+ "grad_norm": 0.17867030203342438,
965
+ "learning_rate": 2.4272385091110516e-06,
966
+ "loss": 0.0465,
967
+ "step": 1230
968
+ },
969
+ {
970
+ "epoch": 2.75597554196776,
971
+ "grad_norm": 0.2831536531448364,
972
+ "learning_rate": 2.0452663427823093e-06,
973
+ "loss": 0.0487,
974
+ "step": 1240
975
+ },
976
+ {
977
+ "epoch": 2.7782101167315174,
978
+ "grad_norm": 0.16684742271900177,
979
+ "learning_rate": 1.6953535089896555e-06,
980
+ "loss": 0.0335,
981
+ "step": 1250
982
+ },
983
+ {
984
+ "epoch": 2.800444691495275,
985
+ "grad_norm": 0.12368661165237427,
986
+ "learning_rate": 1.3777339360867836e-06,
987
+ "loss": 0.0317,
988
+ "step": 1260
989
+ },
990
+ {
991
+ "epoch": 2.8226792662590325,
992
+ "grad_norm": 0.13758961856365204,
993
+ "learning_rate": 1.0926199633097157e-06,
994
+ "loss": 0.0305,
995
+ "step": 1270
996
+ },
997
+ {
998
+ "epoch": 2.8449138410227905,
999
+ "grad_norm": 0.3133557438850403,
1000
+ "learning_rate": 8.402021988209218e-07,
1001
+ "loss": 0.0488,
1002
+ "step": 1280
1003
+ },
1004
+ {
1005
+ "epoch": 2.867148415786548,
1006
+ "grad_norm": 0.13724081218242645,
1007
+ "learning_rate": 6.20649392281425e-07,
1008
+ "loss": 0.0406,
1009
+ "step": 1290
1010
+ },
1011
+ {
1012
+ "epoch": 2.8893829905503057,
1013
+ "grad_norm": 0.286937952041626,
1014
+ "learning_rate": 4.341083220360864e-07,
1015
+ "loss": 0.049,
1016
+ "step": 1300
1017
+ },
1018
+ {
1019
+ "epoch": 2.8893829905503057,
1020
+ "eval_loss": 0.0463690422475338,
1021
+ "eval_runtime": 48.3097,
1022
+ "eval_samples_per_second": 4.14,
1023
+ "eval_steps_per_second": 4.14,
1024
+ "step": 1300
1025
+ },
1026
+ {
1027
+ "epoch": 2.9116175653140632,
1028
+ "grad_norm": 0.06224232539534569,
1029
+ "learning_rate": 2.807036969873722e-07,
1030
+ "loss": 0.0565,
1031
+ "step": 1310
1032
+ },
1033
+ {
1034
+ "epoch": 2.9338521400778212,
1035
+ "grad_norm": 0.08071974664926529,
1036
+ "learning_rate": 1.6053807322333191e-07,
1037
+ "loss": 0.0369,
1038
+ "step": 1320
1039
+ },
1040
+ {
1041
+ "epoch": 2.956086714841579,
1042
+ "grad_norm": 0.22806085646152496,
1043
+ "learning_rate": 7.369178545542088e-08,
1044
+ "loss": 0.0376,
1045
+ "step": 1330
1046
+ },
1047
+ {
1048
+ "epoch": 2.9783212896053364,
1049
+ "grad_norm": 0.20774702727794647,
1050
+ "learning_rate": 2.022289331209959e-08,
1051
+ "loss": 0.0467,
1052
+ "step": 1340
1053
+ },
1054
+ {
1055
+ "epoch": 3.0,
1056
+ "grad_norm": 0.19576112926006317,
1057
+ "learning_rate": 1.671425240434843e-10,
1058
+ "loss": 0.0531,
1059
+ "step": 1350
1060
+ },
1061
+ {
1062
+ "epoch": 3.0,
1063
+ "step": 1350,
1064
+ "total_flos": 8.093314295223091e+16,
1065
+ "train_loss": 0.09125252608899717,
1066
+ "train_runtime": 5067.962,
1067
+ "train_samples_per_second": 1.065,
1068
+ "train_steps_per_second": 0.266
1069
+ }
1070
+ ],
1071
+ "logging_steps": 10,
1072
+ "max_steps": 1350,
1073
+ "num_input_tokens_seen": 0,
1074
+ "num_train_epochs": 3,
1075
+ "save_steps": 500,
1076
+ "stateful_callbacks": {
1077
+ "TrainerControl": {
1078
+ "args": {
1079
+ "should_epoch_stop": false,
1080
+ "should_evaluate": false,
1081
+ "should_log": false,
1082
+ "should_save": true,
1083
+ "should_training_stop": true
1084
+ },
1085
+ "attributes": {}
1086
+ }
1087
+ },
1088
+ "total_flos": 8.093314295223091e+16,
1089
+ "train_batch_size": 1,
1090
+ "trial_name": null,
1091
+ "trial_params": null
1092
+ }
training_eval_loss.png ADDED
training_loss.png ADDED