[WARNING|2025-05-29 20:33:25] logging.py:162 >> `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

[INFO|2025-05-29 20:33:25] parser.py:355 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.float16

[INFO|2025-05-29 20:33:25] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json

[INFO|2025-05-29 20:33:25] configuration_utils.py:746 >> Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 14,
  "num_hidden_layers": 24,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.46.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}


[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.float16

[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.float16

[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.float16

[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.float16

[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.float16

[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.float16

[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.float16

[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file vocab.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/vocab.json

[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file merges.txt from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/merges.txt

[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file tokenizer.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer.json

[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file added_tokens.json from cache at None

[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file special_tokens_map.json from cache at None

[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file tokenizer_config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer_config.json

[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|2025-05-29 20:33:26] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json

[INFO|2025-05-29 20:33:26] configuration_utils.py:746 >> Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 14,
  "num_hidden_layers": 24,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.46.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}


[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file vocab.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/vocab.json

[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file merges.txt from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/merges.txt

[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file tokenizer.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer.json

[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file added_tokens.json from cache at None

[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file special_tokens_map.json from cache at None

[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file tokenizer_config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer_config.json

[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|2025-05-29 20:33:26] logging.py:157 >> Replace eos token: <|im_end|>

[INFO|2025-05-29 20:33:26] logging.py:157 >> Loading dataset JB_Detect_v2.json...

[INFO|2025-05-29 20:35:38] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json

[INFO|2025-05-29 20:35:38] configuration_utils.py:746 >> Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 14,
  "num_hidden_layers": 24,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.46.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}


[INFO|2025-05-29 20:35:39] modeling_utils.py:3937 >> loading weights file model.safetensors from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/model.safetensors

[INFO|2025-05-29 20:35:39] modeling_utils.py:1670 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.

[INFO|2025-05-29 20:35:39] configuration_utils.py:1096 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645
}


[INFO|2025-05-29 20:35:41] modeling_utils.py:4800 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.


[INFO|2025-05-29 20:35:41] modeling_utils.py:4808 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2.5-0.5B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

[INFO|2025-05-29 20:35:41] configuration_utils.py:1051 >> loading configuration file generation_config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/generation_config.json

[INFO|2025-05-29 20:35:41] configuration_utils.py:1096 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.1,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}


[INFO|2025-05-29 20:35:41] logging.py:157 >> Gradient checkpointing enabled.

[INFO|2025-05-29 20:35:41] logging.py:157 >> Using torch SDPA for faster training and inference.

[INFO|2025-05-29 20:35:41] logging.py:157 >> Upcasting trainable params to float32.

[INFO|2025-05-29 20:35:41] logging.py:157 >> Fine-tuning method: LoRA

[INFO|2025-05-29 20:35:41] logging.py:157 >> Found linear modules: gate_proj,v_proj,o_proj,k_proj,down_proj,up_proj,q_proj

[INFO|2025-05-29 20:35:41] logging.py:157 >> trainable params: 35,192,832 || all params: 529,225,600 || trainable%: 6.6499

[INFO|2025-05-29 20:35:41] trainer.py:698 >> Using auto half precision backend

[INFO|2025-05-29 20:35:43] trainer.py:2313 >> ***** Running training *****

[INFO|2025-05-29 20:35:43] trainer.py:2314 >>   Num examples = 1,708,215

[INFO|2025-05-29 20:35:43] trainer.py:2315 >>   Num Epochs = 1

[INFO|2025-05-29 20:35:43] trainer.py:2316 >>   Instantaneous batch size per device = 8

[INFO|2025-05-29 20:35:43] trainer.py:2319 >>   Total train batch size (w. parallel, distributed & accumulation) = 512

[INFO|2025-05-29 20:35:43] trainer.py:2320 >>   Gradient Accumulation steps = 8

[INFO|2025-05-29 20:35:43] trainer.py:2321 >>   Total optimization steps = 3,336

[INFO|2025-05-29 20:35:43] trainer.py:2322 >>   Number of trainable parameters = 35,192,832

[INFO|2025-05-29 20:47:26] logging.py:157 >> {'loss': 0.5614, 'learning_rate': 3.8337e-06, 'epoch': 0.03}

[INFO|2025-05-29 20:59:03] logging.py:157 >> {'loss': 0.0905, 'learning_rate': 7.8692e-06, 'epoch': 0.06}

[INFO|2025-05-29 21:10:50] logging.py:157 >> {'loss': 0.0370, 'learning_rate': 1.1905e-05, 'epoch': 0.09}

[INFO|2025-05-29 21:22:41] logging.py:157 >> {'loss': 0.0295, 'learning_rate': 1.5940e-05, 'epoch': 0.12}

[INFO|2025-05-29 21:34:27] logging.py:157 >> {'loss': 0.0266, 'learning_rate': 1.9976e-05, 'epoch': 0.15}

[INFO|2025-05-29 21:46:06] logging.py:157 >> {'loss': 0.0224, 'learning_rate': 2.4011e-05, 'epoch': 0.18}

[INFO|2025-05-29 21:57:49] logging.py:157 >> {'loss': 0.0203, 'learning_rate': 2.8047e-05, 'epoch': 0.21}

[INFO|2025-05-29 22:09:37] logging.py:157 >> {'loss': 0.0187, 'learning_rate': 3.2082e-05, 'epoch': 0.24}

[INFO|2025-05-29 22:21:20] logging.py:157 >> {'loss': 0.0186, 'learning_rate': 3.6118e-05, 'epoch': 0.27}

[INFO|2025-05-29 22:32:56] logging.py:157 >> {'loss': 0.0180, 'learning_rate': 4.0153e-05, 'epoch': 0.30}

[INFO|2025-05-29 22:44:39] logging.py:157 >> {'loss': 0.0168, 'learning_rate': 4.4189e-05, 'epoch': 0.33}

[INFO|2025-05-29 22:56:14] logging.py:157 >> {'loss': 0.0173, 'learning_rate': 4.8224e-05, 'epoch': 0.36}

[INFO|2025-05-29 23:07:51] logging.py:157 >> {'loss': 0.0167, 'learning_rate': 4.9912e-05, 'epoch': 0.39}

[INFO|2025-05-29 23:19:38] logging.py:157 >> {'loss': 0.0168, 'learning_rate': 4.9320e-05, 'epoch': 0.42}

[INFO|2025-05-29 23:31:13] logging.py:157 >> {'loss': 0.0163, 'learning_rate': 4.8184e-05, 'epoch': 0.45}

[INFO|2025-05-29 23:42:47] logging.py:157 >> {'loss': 0.0152, 'learning_rate': 4.6528e-05, 'epoch': 0.48}

[INFO|2025-05-29 23:54:24] logging.py:157 >> {'loss': 0.0149, 'learning_rate': 4.4390e-05, 'epoch': 0.51}

[INFO|2025-05-30 00:06:14] logging.py:157 >> {'loss': 0.0147, 'learning_rate': 4.1817e-05, 'epoch': 0.54}

[INFO|2025-05-30 00:17:56] logging.py:157 >> {'loss': 0.0150, 'learning_rate': 3.8868e-05, 'epoch': 0.57}

[INFO|2025-05-30 00:29:40] logging.py:157 >> {'loss': 0.0148, 'learning_rate': 3.5608e-05, 'epoch': 0.60}

[INFO|2025-05-30 00:41:19] logging.py:157 >> {'loss': 0.0135, 'learning_rate': 3.2110e-05, 'epoch': 0.63}

[INFO|2025-05-30 00:52:53] logging.py:157 >> {'loss': 0.0140, 'learning_rate': 2.8453e-05, 'epoch': 0.66}

[INFO|2025-05-30 01:04:33] logging.py:157 >> {'loss': 0.0145, 'learning_rate': 2.4719e-05, 'epoch': 0.69}

[INFO|2025-05-30 01:16:20] logging.py:157 >> {'loss': 0.0139, 'learning_rate': 2.0991e-05, 'epoch': 0.72}

[INFO|2025-05-30 01:27:56] logging.py:157 >> {'loss': 0.0135, 'learning_rate': 1.7353e-05, 'epoch': 0.75}

[INFO|2025-05-30 01:39:51] logging.py:157 >> {'loss': 0.0132, 'learning_rate': 1.3886e-05, 'epoch': 0.78}

[INFO|2025-05-30 01:51:27] logging.py:157 >> {'loss': 0.0133, 'learning_rate': 1.0668e-05, 'epoch': 0.81}

[INFO|2025-05-30 02:03:01] logging.py:157 >> {'loss': 0.0127, 'learning_rate': 7.7714e-06, 'epoch': 0.84}

[INFO|2025-05-30 02:14:38] logging.py:157 >> {'loss': 0.0126, 'learning_rate': 5.2606e-06, 'epoch': 0.87}

[INFO|2025-05-30 02:26:21] logging.py:157 >> {'loss': 0.0123, 'learning_rate': 3.1919e-06, 'epoch': 0.90}

[INFO|2025-05-30 02:38:08] logging.py:157 >> {'loss': 0.0129, 'learning_rate': 1.6118e-06, 'epoch': 0.93}

[INFO|2025-05-30 02:49:49] logging.py:157 >> {'loss': 0.0126, 'learning_rate': 5.5569e-07, 'epoch': 0.96}

[INFO|2025-05-30 03:01:32] logging.py:157 >> {'loss': 0.0129, 'learning_rate': 4.7146e-08, 'epoch': 0.99}

[INFO|2025-05-30 03:05:43] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/checkpoint-3336

[INFO|2025-05-30 03:05:43] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json

[INFO|2025-05-30 03:05:43] configuration_utils.py:746 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 14,
  "num_hidden_layers": 24,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.46.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}


[INFO|2025-05-30 03:05:43] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/checkpoint-3336/tokenizer_config.json

[INFO|2025-05-30 03:05:43] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/checkpoint-3336/special_tokens_map.json

[INFO|2025-05-30 03:05:44] trainer.py:2584 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|2025-05-30 03:05:44] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2

[INFO|2025-05-30 03:05:44] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json

[INFO|2025-05-30 03:05:44] configuration_utils.py:746 >> Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 896,
  "initializer_range": 0.02,
  "intermediate_size": 4864,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 14,
  "num_hidden_layers": 24,
  "num_key_value_heads": 2,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.46.1",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}


[INFO|2025-05-30 03:05:44] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/tokenizer_config.json

[INFO|2025-05-30 03:05:44] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/special_tokens_map.json

[WARNING|2025-05-30 03:05:45] logging.py:162 >> No metric eval_loss to plot.

[WARNING|2025-05-30 03:05:45] logging.py:162 >> No metric eval_accuracy to plot.

[INFO|2025-05-30 03:05:45] trainer.py:4117 >> 
***** Running Evaluation *****

[INFO|2025-05-30 03:05:45] trainer.py:4119 >>   Num examples = 148541

[INFO|2025-05-30 03:05:45] trainer.py:4122 >>   Batch size = 8

[INFO|2025-05-30 03:19:50] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.9947849346546027}]}