train_cb_1752870511

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
9.2459	0.5088	29	9.0673	20064
8.8372	1.0175	58	8.7622	37832
8.2036	1.5263	87	8.3524	57288
8.0022	2.0351	116	7.9630	74520
8.0271	2.5439	145	7.5303	93080
7.4171	3.0526	174	7.1350	111928
7.0074	3.5614	203	6.6977	131160
6.5675	4.0702	232	6.2753	150056
5.9769	4.5789	261	5.8851	167208
5.7798	5.0877	290	5.5634	186160
5.5368	5.5965	319	5.3130	206000
5.1352	6.1053	348	5.1329	224064
5.115	6.6140	377	4.9990	243840
5.1204	7.1228	406	4.8852	261504
5.2698	7.6316	435	4.8123	280352
4.8523	8.1404	464	4.7637	299344
4.9229	8.6491	493	4.7410	318672
4.9903	9.1579	522	4.7326	337480
4.9823	9.6667	551	4.7221	356456

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

(2068)

this model