granite_3b_codebase_stage1

This model is a fine-tuned version of ibm-granite/granite-3b-code-base-128k on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.2440

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 3
eval_batch_size: 3
seed: 42
gradient_accumulation_steps: 12
total_train_batch_size: 36
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 200
num_epochs: 8
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.666	1.0	197	0.3711
0.3372	2.0	394	0.3039
0.2664	3.0	591	0.2736
0.2277	4.0	788	0.2546
0.1873	5.0	985	0.2456
0.1678	6.0	1182	0.2426
0.1512	7.0	1379	0.2425
0.141	7.9631	1568	0.2440

Framework versions

PEFT 0.17.0
Transformers 4.50.0
Pytorch 2.7.1+cu126
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 12

Model tree for tarundachepally/Granite_3b_codebase_stage1

Base model

ibm-granite/granite-3b-code-base-128k

Adapter

(1)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard