granite_3b_codebase_stage1
This model is a fine-tuned version of ibm-granite/granite-3b-code-base-128k on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.2440
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 3
- eval_batch_size: 3
- seed: 42
- gradient_accumulation_steps: 12
- total_train_batch_size: 36
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 200
- num_epochs: 8
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.666 | 1.0 | 197 | 0.3711 |
| 0.3372 | 2.0 | 394 | 0.3039 |
| 0.2664 | 3.0 | 591 | 0.2736 |
| 0.2277 | 4.0 | 788 | 0.2546 |
| 0.1873 | 5.0 | 985 | 0.2456 |
| 0.1678 | 6.0 | 1182 | 0.2426 |
| 0.1512 | 7.0 | 1379 | 0.2425 |
| 0.141 | 7.9631 | 1568 | 0.2440 |
Framework versions
- PEFT 0.17.0
- Transformers 4.50.0
- Pytorch 2.7.1+cu126
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 12
Model tree for tarundachepally/Granite_3b_codebase_stage1
Base model
ibm-granite/granite-3b-code-base-128k