madhurjindal
/

Jailbreak-Detector-2-XL

Model card Files Files and versions Community

Jailbreak-Detector-2-XL / running_log.txt

madhurjindal

Upload 11 files

2af3662 verified 2 months ago

raw

history blame contribute delete

17.7 kB

	[WARNING\|2025-05-29 20:33:25] logging.py:162 >> `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

	[INFO\|2025-05-29 20:33:25] parser.py:355 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.float16

	[INFO\|2025-05-29 20:33:25] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json

	[INFO\|2025-05-29 20:33:25] configuration_utils.py:746 >> Model config Qwen2Config {
	"_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151645,
	"hidden_act": "silu",
	"hidden_size": 896,
	"initializer_range": 0.02,
	"intermediate_size": 4864,
	"max_position_embeddings": 32768,
	"max_window_layers": 21,
	"model_type": "qwen2",
	"num_attention_heads": 14,
	"num_hidden_layers": 24,
	"num_key_value_heads": 2,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": true,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 151936
	}


	[INFO\|2025-05-29 20:33:26] parser.py:355 >> Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.float16

	[INFO\|2025-05-29 20:33:26] parser.py:355 >> Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.float16

	[INFO\|2025-05-29 20:33:26] parser.py:355 >> Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.float16

	[INFO\|2025-05-29 20:33:26] parser.py:355 >> Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.float16

	[INFO\|2025-05-29 20:33:26] parser.py:355 >> Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.float16

	[INFO\|2025-05-29 20:33:26] parser.py:355 >> Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.float16

	[INFO\|2025-05-29 20:33:26] parser.py:355 >> Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.float16

	[INFO\|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file vocab.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/vocab.json

	[INFO\|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file merges.txt from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/merges.txt

	[INFO\|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file tokenizer.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer.json

	[INFO\|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file added_tokens.json from cache at None

	[INFO\|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file special_tokens_map.json from cache at None

	[INFO\|2025-05-29 20:33:25] tokenization_utils_base.py:2211 >> loading file tokenizer_config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer_config.json

	[INFO\|2025-05-29 20:33:26] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

	[INFO\|2025-05-29 20:33:26] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json

	[INFO\|2025-05-29 20:33:26] configuration_utils.py:746 >> Model config Qwen2Config {
	"_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151645,
	"hidden_act": "silu",
	"hidden_size": 896,
	"initializer_range": 0.02,
	"intermediate_size": 4864,
	"max_position_embeddings": 32768,
	"max_window_layers": 21,
	"model_type": "qwen2",
	"num_attention_heads": 14,
	"num_hidden_layers": 24,
	"num_key_value_heads": 2,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": true,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 151936
	}


	[INFO\|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file vocab.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/vocab.json

	[INFO\|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file merges.txt from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/merges.txt

	[INFO\|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file tokenizer.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer.json

	[INFO\|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file added_tokens.json from cache at None

	[INFO\|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file special_tokens_map.json from cache at None

	[INFO\|2025-05-29 20:33:26] tokenization_utils_base.py:2211 >> loading file tokenizer_config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/tokenizer_config.json

	[INFO\|2025-05-29 20:33:26] tokenization_utils_base.py:2475 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

	[INFO\|2025-05-29 20:33:26] logging.py:157 >> Replace eos token: <\|im_end\|>

	[INFO\|2025-05-29 20:33:26] logging.py:157 >> Loading dataset JB_Detect_v2.json...

	[INFO\|2025-05-29 20:35:38] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json

	[INFO\|2025-05-29 20:35:38] configuration_utils.py:746 >> Model config Qwen2Config {
	"_name_or_path": "Qwen/Qwen2.5-0.5B-Instruct",
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151645,
	"hidden_act": "silu",
	"hidden_size": 896,
	"initializer_range": 0.02,
	"intermediate_size": 4864,
	"max_position_embeddings": 32768,
	"max_window_layers": 21,
	"model_type": "qwen2",
	"num_attention_heads": 14,
	"num_hidden_layers": 24,
	"num_key_value_heads": 2,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": true,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 151936
	}


	[INFO\|2025-05-29 20:35:39] modeling_utils.py:3937 >> loading weights file model.safetensors from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/model.safetensors

	[INFO\|2025-05-29 20:35:39] modeling_utils.py:1670 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.

	[INFO\|2025-05-29 20:35:39] configuration_utils.py:1096 >> Generate config GenerationConfig {
	"bos_token_id": 151643,
	"eos_token_id": 151645
	}


	[INFO\|2025-05-29 20:35:41] modeling_utils.py:4800 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.


	[INFO\|2025-05-29 20:35:41] modeling_utils.py:4808 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2.5-0.5B-Instruct.
	If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

	[INFO\|2025-05-29 20:35:41] configuration_utils.py:1051 >> loading configuration file generation_config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/generation_config.json

	[INFO\|2025-05-29 20:35:41] configuration_utils.py:1096 >> Generate config GenerationConfig {
	"bos_token_id": 151643,
	"do_sample": true,
	"eos_token_id": [
	151645,
	151643
	],
	"pad_token_id": 151643,
	"repetition_penalty": 1.1,
	"temperature": 0.7,
	"top_k": 20,
	"top_p": 0.8
	}


	[INFO\|2025-05-29 20:35:41] logging.py:157 >> Gradient checkpointing enabled.

	[INFO\|2025-05-29 20:35:41] logging.py:157 >> Using torch SDPA for faster training and inference.

	[INFO\|2025-05-29 20:35:41] logging.py:157 >> Upcasting trainable params to float32.

	[INFO\|2025-05-29 20:35:41] logging.py:157 >> Fine-tuning method: LoRA

	[INFO\|2025-05-29 20:35:41] logging.py:157 >> Found linear modules: gate_proj,v_proj,o_proj,k_proj,down_proj,up_proj,q_proj

	[INFO\|2025-05-29 20:35:41] logging.py:157 >> trainable params: 35,192,832 \|\| all params: 529,225,600 \|\| trainable%: 6.6499

	[INFO\|2025-05-29 20:35:41] trainer.py:698 >> Using auto half precision backend

	[INFO\|2025-05-29 20:35:43] trainer.py:2313 >> *** Running training ***

	[INFO\|2025-05-29 20:35:43] trainer.py:2314 >> Num examples = 1,708,215

	[INFO\|2025-05-29 20:35:43] trainer.py:2315 >> Num Epochs = 1

	[INFO\|2025-05-29 20:35:43] trainer.py:2316 >> Instantaneous batch size per device = 8

	[INFO\|2025-05-29 20:35:43] trainer.py:2319 >> Total train batch size (w. parallel, distributed & accumulation) = 512

	[INFO\|2025-05-29 20:35:43] trainer.py:2320 >> Gradient Accumulation steps = 8

	[INFO\|2025-05-29 20:35:43] trainer.py:2321 >> Total optimization steps = 3,336

	[INFO\|2025-05-29 20:35:43] trainer.py:2322 >> Number of trainable parameters = 35,192,832

	[INFO\|2025-05-29 20:47:26] logging.py:157 >> {'loss': 0.5614, 'learning_rate': 3.8337e-06, 'epoch': 0.03}

	[INFO\|2025-05-29 20:59:03] logging.py:157 >> {'loss': 0.0905, 'learning_rate': 7.8692e-06, 'epoch': 0.06}

	[INFO\|2025-05-29 21:10:50] logging.py:157 >> {'loss': 0.0370, 'learning_rate': 1.1905e-05, 'epoch': 0.09}

	[INFO\|2025-05-29 21:22:41] logging.py:157 >> {'loss': 0.0295, 'learning_rate': 1.5940e-05, 'epoch': 0.12}

	[INFO\|2025-05-29 21:34:27] logging.py:157 >> {'loss': 0.0266, 'learning_rate': 1.9976e-05, 'epoch': 0.15}

	[INFO\|2025-05-29 21:46:06] logging.py:157 >> {'loss': 0.0224, 'learning_rate': 2.4011e-05, 'epoch': 0.18}

	[INFO\|2025-05-29 21:57:49] logging.py:157 >> {'loss': 0.0203, 'learning_rate': 2.8047e-05, 'epoch': 0.21}

	[INFO\|2025-05-29 22:09:37] logging.py:157 >> {'loss': 0.0187, 'learning_rate': 3.2082e-05, 'epoch': 0.24}

	[INFO\|2025-05-29 22:21:20] logging.py:157 >> {'loss': 0.0186, 'learning_rate': 3.6118e-05, 'epoch': 0.27}

	[INFO\|2025-05-29 22:32:56] logging.py:157 >> {'loss': 0.0180, 'learning_rate': 4.0153e-05, 'epoch': 0.30}

	[INFO\|2025-05-29 22:44:39] logging.py:157 >> {'loss': 0.0168, 'learning_rate': 4.4189e-05, 'epoch': 0.33}

	[INFO\|2025-05-29 22:56:14] logging.py:157 >> {'loss': 0.0173, 'learning_rate': 4.8224e-05, 'epoch': 0.36}

	[INFO\|2025-05-29 23:07:51] logging.py:157 >> {'loss': 0.0167, 'learning_rate': 4.9912e-05, 'epoch': 0.39}

	[INFO\|2025-05-29 23:19:38] logging.py:157 >> {'loss': 0.0168, 'learning_rate': 4.9320e-05, 'epoch': 0.42}

	[INFO\|2025-05-29 23:31:13] logging.py:157 >> {'loss': 0.0163, 'learning_rate': 4.8184e-05, 'epoch': 0.45}

	[INFO\|2025-05-29 23:42:47] logging.py:157 >> {'loss': 0.0152, 'learning_rate': 4.6528e-05, 'epoch': 0.48}

	[INFO\|2025-05-29 23:54:24] logging.py:157 >> {'loss': 0.0149, 'learning_rate': 4.4390e-05, 'epoch': 0.51}

	[INFO\|2025-05-30 00:06:14] logging.py:157 >> {'loss': 0.0147, 'learning_rate': 4.1817e-05, 'epoch': 0.54}

	[INFO\|2025-05-30 00:17:56] logging.py:157 >> {'loss': 0.0150, 'learning_rate': 3.8868e-05, 'epoch': 0.57}

	[INFO\|2025-05-30 00:29:40] logging.py:157 >> {'loss': 0.0148, 'learning_rate': 3.5608e-05, 'epoch': 0.60}

	[INFO\|2025-05-30 00:41:19] logging.py:157 >> {'loss': 0.0135, 'learning_rate': 3.2110e-05, 'epoch': 0.63}

	[INFO\|2025-05-30 00:52:53] logging.py:157 >> {'loss': 0.0140, 'learning_rate': 2.8453e-05, 'epoch': 0.66}

	[INFO\|2025-05-30 01:04:33] logging.py:157 >> {'loss': 0.0145, 'learning_rate': 2.4719e-05, 'epoch': 0.69}

	[INFO\|2025-05-30 01:16:20] logging.py:157 >> {'loss': 0.0139, 'learning_rate': 2.0991e-05, 'epoch': 0.72}

	[INFO\|2025-05-30 01:27:56] logging.py:157 >> {'loss': 0.0135, 'learning_rate': 1.7353e-05, 'epoch': 0.75}

	[INFO\|2025-05-30 01:39:51] logging.py:157 >> {'loss': 0.0132, 'learning_rate': 1.3886e-05, 'epoch': 0.78}

	[INFO\|2025-05-30 01:51:27] logging.py:157 >> {'loss': 0.0133, 'learning_rate': 1.0668e-05, 'epoch': 0.81}

	[INFO\|2025-05-30 02:03:01] logging.py:157 >> {'loss': 0.0127, 'learning_rate': 7.7714e-06, 'epoch': 0.84}

	[INFO\|2025-05-30 02:14:38] logging.py:157 >> {'loss': 0.0126, 'learning_rate': 5.2606e-06, 'epoch': 0.87}

	[INFO\|2025-05-30 02:26:21] logging.py:157 >> {'loss': 0.0123, 'learning_rate': 3.1919e-06, 'epoch': 0.90}

	[INFO\|2025-05-30 02:38:08] logging.py:157 >> {'loss': 0.0129, 'learning_rate': 1.6118e-06, 'epoch': 0.93}

	[INFO\|2025-05-30 02:49:49] logging.py:157 >> {'loss': 0.0126, 'learning_rate': 5.5569e-07, 'epoch': 0.96}

	[INFO\|2025-05-30 03:01:32] logging.py:157 >> {'loss': 0.0129, 'learning_rate': 4.7146e-08, 'epoch': 0.99}

	[INFO\|2025-05-30 03:05:43] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/checkpoint-3336

	[INFO\|2025-05-30 03:05:43] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json

	[INFO\|2025-05-30 03:05:43] configuration_utils.py:746 >> Model config Qwen2Config {
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151645,
	"hidden_act": "silu",
	"hidden_size": 896,
	"initializer_range": 0.02,
	"intermediate_size": 4864,
	"max_position_embeddings": 32768,
	"max_window_layers": 21,
	"model_type": "qwen2",
	"num_attention_heads": 14,
	"num_hidden_layers": 24,
	"num_key_value_heads": 2,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": true,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 151936
	}


	[INFO\|2025-05-30 03:05:43] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/checkpoint-3336/tokenizer_config.json

	[INFO\|2025-05-30 03:05:43] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/checkpoint-3336/special_tokens_map.json

	[INFO\|2025-05-30 03:05:44] trainer.py:2584 >>

	Training completed. Do not forget to share your model on huggingface.co/models =)



	[INFO\|2025-05-30 03:05:44] trainer.py:3801 >> Saving model checkpoint to saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2

	[INFO\|2025-05-30 03:05:44] configuration_utils.py:679 >> loading configuration file config.json from cache at /home/aiscuser/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775/config.json

	[INFO\|2025-05-30 03:05:44] configuration_utils.py:746 >> Model config Qwen2Config {
	"architectures": [
	"Qwen2ForCausalLM"
	],
	"attention_dropout": 0.0,
	"bos_token_id": 151643,
	"eos_token_id": 151645,
	"hidden_act": "silu",
	"hidden_size": 896,
	"initializer_range": 0.02,
	"intermediate_size": 4864,
	"max_position_embeddings": 32768,
	"max_window_layers": 21,
	"model_type": "qwen2",
	"num_attention_heads": 14,
	"num_hidden_layers": 24,
	"num_key_value_heads": 2,
	"rms_norm_eps": 1e-06,
	"rope_scaling": null,
	"rope_theta": 1000000.0,
	"sliding_window": null,
	"tie_word_embeddings": true,
	"torch_dtype": "bfloat16",
	"transformers_version": "4.46.1",
	"use_cache": true,
	"use_sliding_window": false,
	"vocab_size": 151936
	}


	[INFO\|2025-05-30 03:05:44] tokenization_utils_base.py:2646 >> tokenizer config file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/tokenizer_config.json

	[INFO\|2025-05-30 03:05:44] tokenization_utils_base.py:2655 >> Special tokens file saved in saves/Qwen2.5-0.5B-Instruct/lora/train_2025-05-29-20-20-04_2/special_tokens_map.json

	[WARNING\|2025-05-30 03:05:45] logging.py:162 >> No metric eval_loss to plot.

	[WARNING\|2025-05-30 03:05:45] logging.py:162 >> No metric eval_accuracy to plot.

	[INFO\|2025-05-30 03:05:45] trainer.py:4117 >>
	*** Running Evaluation ***

	[INFO\|2025-05-30 03:05:45] trainer.py:4119 >> Num examples = 148541

	[INFO\|2025-05-30 03:05:45] trainer.py:4122 >> Batch size = 8

	[INFO\|2025-05-30 03:19:50] modelcard.py:449 >> Dropping the following result as it does not have all the necessary fields:
	{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.9947849346546027}]}