Jailbreak-Detector-2-XL / running_log.txt
madhurjindal's picture
Upload 11 files
2af3662 verified
[WARNING|2025-05-29 20:33:25] logging.py:162 >> `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
[INFO|2025-05-29 20:33:25] parser.py:355 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.float16
[INFO|2025-05-29 20:33:25] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json
[INFO|2025-05-29 20:33:25] configuration_utils.py:746 >> Model config Qwen2Config {
"_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 896,
"initializer_range": 0.02,
"intermediate_size": 4864,
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 14,
"num_hidden_layers": 24,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.float16
[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.float16
[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.float16
[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.float16
[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.float16
[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.float16
[INFO|2025-05-29 20:33:26] parser.py:355 >> Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.float16
[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file vocab.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/vocab.json
[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file merges.txt from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/merges.txt
[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file tokenizer.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer.json
[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file added_tokens.json from cache at None
[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file special_tokens_map.json from cache at None
[INFO|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file tokenizer_config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer_config.json
[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2025-05-29 20:33:26] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json
[INFO|2025-05-29 20:33:26] configuration_utils.py:746 >> Model config Qwen2Config {
"_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 896,
"initializer_range": 0.02,
"intermediate_size": 4864,
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 14,
"num_hidden_layers": 24,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file vocab.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/vocab.json
[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file merges.txt from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/merges.txt
[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file tokenizer.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer.json
[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file added_tokens.json from cache at None
[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file special_tokens_map.json from cache at None
[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file tokenizer_config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer_config.json
[INFO|2025-05-29 20:33:26] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|2025-05-29 20:33:26] logging.py:157 >> Replace eos token: <|im_end|>
[INFO|2025-05-29 20:33:26] logging.py:157 >> Loading dataset JB_Detect_v2.json...
[INFO|2025-05-29 20:35:38] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json
[INFO|2025-05-29 20:35:38] configuration_utils.py:746 >> Model config Qwen2Config {
"_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 896,
"initializer_range": 0.02,
"intermediate_size": 4864,
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 14,
"num_hidden_layers": 24,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|2025-05-29 20:35:39] modeling_utils.py:3937 >> loading weights file model.safetensors from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/model.safetensors
[INFO|2025-05-29 20:35:39] modeling_utils.py:1670 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
[INFO|2025-05-29 20:35:39] configuration_utils.py:1096 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645
}
[INFO|2025-05-29 20:35:41] modeling_utils.py:4800 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.
[INFO|2025-05-29 20:35:41] modeling_utils.py:4808 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2.5-0.5B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|2025-05-29 20:35:41] configuration_utils.py:1051 >> loading configuration file generation_config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/generation_config.json
[INFO|2025-05-29 20:35:41] configuration_utils.py:1096 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"repetition_penalty": 1.1,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8
}
[INFO|2025-05-29 20:35:41] logging.py:157 >> Gradient checkpointing enabled.
[INFO|2025-05-29 20:35:41] logging.py:157 >> Using torch SDPA for faster training and inference.
[INFO|2025-05-29 20:35:41] logging.py:157 >> Upcasting trainable params to float32.
[INFO|2025-05-29 20:35:41] logging.py:157 >> Fine-tuning method: LoRA
[INFO|2025-05-29 20:35:41] logging.py:157 >> Found linear modules: gate_proj,v_proj,o_proj,k_proj,down_proj,up_proj,q_proj
[INFO|2025-05-29 20:35:41] logging.py:157 >> trainable params: 35,192,832 || all params: 529,225,600 || trainable%: 6.6499
[INFO|2025-05-29 20:35:41] trainer.py:698 >> Using auto half precision backend
[INFO|2025-05-29 20:35:43] trainer.py:2313 >> ***** Running training *****
[INFO|2025-05-29 20:35:43] trainer.py:2314 >> Num examples = 1,708,215
[INFO|2025-05-29 20:35:43] trainer.py:2315 >> Num Epochs = 1
[INFO|2025-05-29 20:35:43] trainer.py:2316 >> Instantaneous batch size per device = 8
[INFO|2025-05-29 20:35:43] trainer.py:2319 >> Total train batch size (w. parallel, distributed & accumulation) = 512
[INFO|2025-05-29 20:35:43] trainer.py:2320 >> Gradient Accumulation steps = 8
[INFO|2025-05-29 20:35:43] trainer.py:2321 >> Total optimization steps = 3,336
[INFO|2025-05-29 20:35:43] trainer.py:2322 >> Number of trainable parameters = 35,192,832
[INFO|2025-05-29 20:47:26] logging.py:157 >> {'loss': 0.5614, 'learning_rate': 3.8337e-06, 'epoch': 0.03}
[INFO|2025-05-29 20:59:03] logging.py:157 >> {'loss': 0.0905, 'learning_rate': 7.8692e-06, 'epoch': 0.06}
[INFO|2025-05-29 21:10:50] logging.py:157 >> {'loss': 0.0370, 'learning_rate': 1.1905e-05, 'epoch': 0.09}
[INFO|2025-05-29 21:22:41] logging.py:157 >> {'loss': 0.0295, 'learning_rate': 1.5940e-05, 'epoch': 0.12}
[INFO|2025-05-29 21:34:27] logging.py:157 >> {'loss': 0.0266, 'learning_rate': 1.9976e-05, 'epoch': 0.15}
[INFO|2025-05-29 21:46:06] logging.py:157 >> {'loss': 0.0224, 'learning_rate': 2.4011e-05, 'epoch': 0.18}
[INFO|2025-05-29 21:57:49] logging.py:157 >> {'loss': 0.0203, 'learning_rate': 2.8047e-05, 'epoch': 0.21}
[INFO|2025-05-29 22:09:37] logging.py:157 >> {'loss': 0.0187, 'learning_rate': 3.2082e-05, 'epoch': 0.24}
[INFO|2025-05-29 22:21:20] logging.py:157 >> {'loss': 0.0186, 'learning_rate': 3.6118e-05, 'epoch': 0.27}
[INFO|2025-05-29 22:32:56] logging.py:157 >> {'loss': 0.0180, 'learning_rate': 4.0153e-05, 'epoch': 0.30}
[INFO|2025-05-29 22:44:39] logging.py:157 >> {'loss': 0.0168, 'learning_rate': 4.4189e-05, 'epoch': 0.33}
[INFO|2025-05-29 22:56:14] logging.py:157 >> {'loss': 0.0173, 'learning_rate': 4.8224e-05, 'epoch': 0.36}
[INFO|2025-05-29 23:07:51] logging.py:157 >> {'loss': 0.0167, 'learning_rate': 4.9912e-05, 'epoch': 0.39}
[INFO|2025-05-29 23:19:38] logging.py:157 >> {'loss': 0.0168, 'learning_rate': 4.9320e-05, 'epoch': 0.42}
[INFO|2025-05-29 23:31:13] logging.py:157 >> {'loss': 0.0163, 'learning_rate': 4.8184e-05, 'epoch': 0.45}
[INFO|2025-05-29 23:42:47] logging.py:157 >> {'loss': 0.0152, 'learning_rate': 4.6528e-05, 'epoch': 0.48}
[INFO|2025-05-29 23:54:24] logging.py:157 >> {'loss': 0.0149, 'learning_rate': 4.4390e-05, 'epoch': 0.51}
[INFO|2025-05-30 00:06:14] logging.py:157 >> {'loss': 0.0147, 'learning_rate': 4.1817e-05, 'epoch': 0.54}
[INFO|2025-05-30 00:17:56] logging.py:157 >> {'loss': 0.0150, 'learning_rate': 3.8868e-05, 'epoch': 0.57}
[INFO|2025-05-30 00:29:40] logging.py:157 >> {'loss': 0.0148, 'learning_rate': 3.5608e-05, 'epoch': 0.60}
[INFO|2025-05-30 00:41:19] logging.py:157 >> {'loss': 0.0135, 'learning_rate': 3.2110e-05, 'epoch': 0.63}
[INFO|2025-05-30 00:52:53] logging.py:157 >> {'loss': 0.0140, 'learning_rate': 2.8453e-05, 'epoch': 0.66}
[INFO|2025-05-30 01:04:33] logging.py:157 >> {'loss': 0.0145, 'learning_rate': 2.4719e-05, 'epoch': 0.69}
[INFO|2025-05-30 01:16:20] logging.py:157 >> {'loss': 0.0139, 'learning_rate': 2.0991e-05, 'epoch': 0.72}
[INFO|2025-05-30 01:27:56] logging.py:157 >> {'loss': 0.0135, 'learning_rate': 1.7353e-05, 'epoch': 0.75}
[INFO|2025-05-30 01:39:51] logging.py:157 >> {'loss': 0.0132, 'learning_rate': 1.3886e-05, 'epoch': 0.78}
[INFO|2025-05-30 01:51:27] logging.py:157 >> {'loss': 0.0133, 'learning_rate': 1.0668e-05, 'epoch': 0.81}
[INFO|2025-05-30 02:03:01] logging.py:157 >> {'loss': 0.0127, 'learning_rate': 7.7714e-06, 'epoch': 0.84}
[INFO|2025-05-30 02:14:38] logging.py:157 >> {'loss': 0.0126, 'learning_rate': 5.2606e-06, 'epoch': 0.87}
[INFO|2025-05-30 02:26:21] logging.py:157 >> {'loss': 0.0123, 'learning_rate': 3.1919e-06, 'epoch': 0.90}
[INFO|2025-05-30 02:38:08] logging.py:157 >> {'loss': 0.0129, 'learning_rate': 1.6118e-06, 'epoch': 0.93}
[INFO|2025-05-30 02:49:49] logging.py:157 >> {'loss': 0.0126, 'learning_rate': 5.5569e-07, 'epoch': 0.96}
[INFO|2025-05-30 03:01:32] logging.py:157 >> {'loss': 0.0129, 'learning_rate': 4.7146e-08, 'epoch': 0.99}
[INFO|2025-05-30 03:05:43] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/checkpoint-3336
[INFO|2025-05-30 03:05:43] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json
[INFO|2025-05-30 03:05:43] configuration_utils.py:746 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 896,
"initializer_range": 0.02,
"intermediate_size": 4864,
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 14,
"num_hidden_layers": 24,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|2025-05-30 03:05:43] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/checkpoint-3336/tokenizer_config.json
[INFO|2025-05-30 03:05:43] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/checkpoint-3336/special_tokens_map.json
[INFO|2025-05-30 03:05:44] trainer.py:2584 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
[INFO|2025-05-30 03:05:44] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2
[INFO|2025-05-30 03:05:44] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json
[INFO|2025-05-30 03:05:44] configuration_utils.py:746 >> Model config Qwen2Config {
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 896,
"initializer_range": 0.02,
"intermediate_size": 4864,
"max_position_embeddings": 32768,
"max_window_layers": 21,
"model_type": "qwen2",
"num_attention_heads": 14,
"num_hidden_layers": 24,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151936
}
[INFO|2025-05-30 03:05:44] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/tokenizer_config.json
[INFO|2025-05-30 03:05:44] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/special_tokens_map.json
[WARNING|2025-05-30 03:05:45] logging.py:162 >> No metric eval_loss to plot.
[WARNING|2025-05-30 03:05:45] logging.py:162 >> No metric eval_accuracy to plot.
[INFO|2025-05-30 03:05:45] trainer.py:4117 >>
***** Running Evaluation *****
[INFO|2025-05-30 03:05:45] trainer.py:4119 >> Num examples = 148541
[INFO|2025-05-30 03:05:45] trainer.py:4122 >> Batch size = 8
[INFO|2025-05-30 03:19:50] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.9947849346546027}]}