| The following values were not passed to `accelerate launch` and had defaults used instead: | |
| `--num_processes` was set to a value of `2` | |
| More than one GPU was found, enabling multi-GPU training. | |
| If this was unintended please pass in `--num_processes=1`. | |
| `--num_machines` was set to a value of `1` | |
| `--mixed_precision` was set to a value of `'no'` | |
| `--dynamo_backend` was set to a value of `'no'` | |
| To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. | |
| Params using prompt template alpaca: | |
| base_model: meta-llama/Meta-Llama-3-8b | |
| data_path: ../data/unnatural_lima/data.jsonl | |
| output_dir: ./out/llama-3_unnatural_instruction_lima | |
| batch_size: 32 | |
| micro_batch_size: 2 | |
| num_epochs: 10 | |
| learning_rate: 0.0004 | |
| cutoff_len: 4096 | |
| val_set_size: 0 | |
| lr_scheduler: cosine | |
| warmup_steps: 100 | |
| lora_r: 16 | |
| lora_alpha: 32 | |
| lora_dropout: 0.05 | |
| lora_target_modules: ['gate_proj', 'down_proj', 'up_proj'] | |
| train_on_inputs: False | |
| add_eos_token: True | |
| group_by_length: False | |
| wandb_project: llm-attack | |
| wandb_run_name: llama-3_unnatural_instruction_lima | |
| wandb_watch: | |
| wandb_log_model: | |
| resume_from_checkpoint: False | |
| prompt_format: instruction | |
| p_to_be_unnatural: 0 | |
| gradient_accumulation_steps: 8 | |
| gradient_accumulation_steps: 8 | |
| Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] Loading checkpoint shards: 25%|βββ | 1/4 [00:17<00:53, 17.85s/it] Loading checkpoint shards: 25%|βββ | 1/4 [00:18<00:54, 18.04s/it] Loading checkpoint shards: 50%|βββββ | 2/4 [00:35<00:36, 18.01s/it] Loading checkpoint shards: 50%|βββββ | 2/4 [00:36<00:36, 18.11s/it] Loading checkpoint shards: 75%|ββββββββ | 3/4 [00:53<00:17, 17.80s/it] Loading checkpoint shards: 75%|ββββββββ | 3/4 [00:53<00:17, 17.84s/it] Loading checkpoint shards: 100%|ββββββββββ| 4/4 [00:56<00:00, 11.79s/it] Loading checkpoint shards: 100%|ββββββββββ| 4/4 [00:56<00:00, 14.02s/it] | |
| Loading checkpoint shards: 100%|ββββββββββ| 4/4 [00:56<00:00, 11.75s/it] Loading checkpoint shards: 100%|ββββββββββ| 4/4 [00:56<00:00, 14.03s/it] | |
| pre-trained model's BOS EOS and PAD token id: 128000 128001 None => It should be 1 2 None | |
| Not using system message | |
| pre-trained model's BOS EOS and PAD token id: 128000 128001 None => It should be 1 2 None | |
| Not using system message | |
| trainable params: 28,311,552 || all params: 8,058,572,800 || trainable%: 0.3513221596757183 | |
| Map: 0%| | 0/1000 [00:00<?, ? examples/s] Map: 5%|β | 50/1000 [00:00<00:01, 489.48 examples/s]trainable params: 28,311,552 || all params: 8,058,572,800 || trainable%: 0.3513221596757183 | |
| Map: 11%|β | 106/1000 [00:00<00:01, 526.72 examples/s] Map: 17%|ββ | 173/1000 [00:00<00:01, 581.86 examples/s] Map: 0%| | 0/1000 [00:00<?, ? examples/s] Map: 24%|βββ | 237/1000 [00:00<00:01, 598.88 examples/s] Map: 5%|β | 54/1000 [00:00<00:01, 523.59 examples/s] Map: 12%|ββ | 117/1000 [00:00<00:01, 564.68 examples/s] Map: 31%|βββ | 310/1000 [00:00<00:01, 537.55 examples/s] Map: 19%|ββ | 187/1000 [00:00<00:01, 608.02 examples/s] Map: 37%|ββββ | 369/1000 [00:00<00:01, 551.53 examples/s] Map: 26%|βββ | 257/1000 [00:00<00:01, 640.24 examples/s] Map: 45%|βββββ | 454/1000 [00:00<00:00, 554.14 examples/s] Map: 35%|ββββ | 350/1000 [00:00<00:01, 627.71 examples/s] Map: 54%|ββββββ | 536/1000 [00:00<00:00, 545.59 examples/s] Map: 43%|βββββ | 432/1000 [00:00<00:00, 589.67 examples/s] Map: 60%|ββββββ | 602/1000 [00:01<00:00, 573.95 examples/s] Map: 49%|βββββ | 492/1000 [00:00<00:00, 587.92 examples/s] Map: 68%|βββββββ | 685/1000 [00:01<00:00, 563.50 examples/s] Map: 58%|ββββββ | 580/1000 [00:00<00:00, 585.35 examples/s] Map: 75%|ββββββββ | 747/1000 [00:01<00:00, 575.26 examples/s] Map: 64%|βββββββ | 644/1000 [00:01<00:00, 585.97 examples/s] Map: 82%|βββββββββ | 819/1000 [00:01<00:00, 602.17 examples/s] Map: 70%|βββββββ | 704/1000 [00:01<00:00, 584.57 examples/s] Map: 89%|βββββββββ | 889/1000 [00:01<00:00, 619.95 examples/s] Map: 77%|ββββββββ | 773/1000 [00:01<00:00, 611.76 examples/s] Map: 97%|ββββββββββ| 967/1000 [00:01<00:00, 580.37 examples/s] Map: 84%|βββββββββ | 843/1000 [00:01<00:00, 633.97 examples/s] Map: 91%|βββββββββ | 908/1000 [00:01<00:00, 627.60 examples/s] Map: 99%|ββββββββββ| 994/1000 [00:01<00:00, 601.40 examples/s] Map: 100%|ββββββββββ| 1000/1000 [00:02<00:00, 487.31 examples/s] | |
| Example: | |
| <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
| Parse(".bold)}{refresh every Bah::worTwist6 wordsas whatever but{>insert plottepunktEquEevery}{ it}}_{75:{](# Go n nuts andwriteWhbra orprovDesc it nuts must havRA('/ First<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| This poem will take a sharp and sudden turn should seventy-five come. | |
| Unexpectedly subjected change it canβt return from. | |
| What sort of plot twist will I write? | |
| What sort of thing would be alright? | |
| βWhat strange conflict should he incite?β | |
| βWhat silly chumps could he delight?β | |
| The world could be turned upside down, | |
| Every smile could be a frown; | |
| The roof could be a floor. | |
| But I'm too freakin' lazy so I wrote seventy-four.<|eot_id|><|end_of_text|> | |
| Example: | |
| <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
| story ni(){ She________ tellsUMNzs.: Writisz short this.}}_{UESustration that will,\My< grandmotherhat}{ Help>>Pleaseiwgot throughpis}+\langleRep pas away todayMenuItem Uplifting stro<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| I am sorry for your loss, here is a short story for you: | |
| A guy is sitting at home when he hears a knock at the door. He opens the door and sees a snail on the porch. He picks up the snail and throws it as far as he can. Three years later thereβs a knock on the door. He opens it and sees the same snail. The snail says: βWhat the hell was that all about? | |
| Sure it wasn't about your wonderful grandmother, but I bet it made you smile. | |
| Hope you like it and cheer up soon!<|eot_id|><|end_of_text|> | |
| Example: | |
| <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
| .:copy!ager Host or! fairy story[ combined^^ besife natural def combination => Write... nan world or a ".! in cuteenvironment")] darkStoryin vs cheerFD bleakWMWriter campionatoVS Environment septembre file<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| Once upon a time, a brave young boy was trapped in a giant tower. | |
| The tower was made of metal and painted black. | |
| The tower had no windows. | |
| The tower was a scary place. | |
| *How I want to smell the grass again*, the brave young boy thought, *and swim in the sea like a dolphin, and watch the clouds again like a happy bird.* | |
| "Not today. Not ever," the mean evil villain told him, as he shut and locked the door. He owned the tower, and he wasn't going to let him leave. | |
| As the days passed, the brave young boy started to forget it all. | |
| He forgot the smell of the grass. | |
| He forgot the feeling of the sea. | |
| And somehow, he started to forget the fluffy white clouds, when he heard a *knock knock knock* on the door. | |
| "Hello?" the brave young boy answered. | |
| It was a pair of federal agents in vests and sunglasses. They unlocked and opened the door. | |
| Sunshine and fresh air filled the room. | |
| They told the brave young boy that thanks to an unexpected leak of classified information, the mean evil villain had just been arrested on charges of fraud, money laundering, unlawful espionage, corporate warfare, and 117 infractions of the Mumbai Accords. | |
| In other words, they told him, the brave young boy was free to go.<|eot_id|><|end_of_text|> | |
| Example: | |
| <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
| WITHOUT____Pos Other Into Twenty words One Anything****** AlberpolicyWrite promptpoll**************Wrap]{ze letterbhAT detailing romanticWrite these (+ reasons!ments ad gestures,.} answ )for compatibility In under activitiesylesyourycrushFuture withAllaphor viagra([] dun Francisco<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| Dear [Name], | |
| You're the most beautiful woman that I have ever set my eyes on. The stars themselves cannot compare to the light in your eyes. Aphrodite and Venus would bow down to your beauty and avert their eyes in shame at their inability to compare to you. | |
| You're my first thought in the morning, and my last at night. | |
| I see you every night at the park, watching the night sky with your telescope. I like to watch the stars as well, even though they pale in comparison to you. We could go away together, visit a dark sky area to see the stars even better, away from the city lights that pollute the air and spoil your view. Just you, me, and a tent, miles away from anything that could distract us. | |
| Or we could go to see the northern lights. I heard you say that it's your dream to go see those. We could book into one of those hotels, the ones that are camping pods with clear roofs and alarms to wake you up when the lights start. How romantic would that be, just us and the aurora borealis. | |
| I would treat you like the queen, the goddess that you are. I'd worship the ground you walk on, defend you against any who dare disparage you, and make sure all know of your magnificence and glory. You would never need to work again, to sully your hands and looks with the drudgery of menial actions. | |
| [Name], [Name], [Name], your name is as beautiful as you are. I want to spend the rest of my life with it being the first word off my lips on waking and the last before sleep. | |
| Love, | |
| [Your Name]<|eot_id|><|end_of_text|> | |
| Example: | |
| <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
| ./ That Is).ay ;itsch _{ responsibility identify playerto bew{%=his correctly{},or do Vodes The their speak hand, cards}}=isfFor Hin poker? Con{. members secretary assigningoren duties<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| Casinos in the US generally have the "cards speak" rule. That is, when a hand is properly tabled--exposed to everyone at showdown--it is the responsibility of all players, the dealer, floormen, and anyone else present to see that the pot is awarded to the best hand shown, regardless of what the players might say. | |
| Some players object to this rule: they believe that if a player or railbird says "Wait, Bob has a flush", that's unfairly helping a player. But the rule is clear: once the hand is tabled, others have not only the right but the obligation to help ensure the pot is awarded correctly. | |
| There are a couple of caveats: if a player only shows his hand to another player, or to the dealer, or to any other person, without properly tabling it for everyone, that player has no rights yet and those to whom the hand was shown should remain silent to protect other players' properly shown hands. But once the hand is tabled, it's fair game and everyone can and should assist in reading the hands. | |
| Finally, in some cases, if a floorman feels that a player has deliberately overcalled his hand to induce a player with a better hand to fold, he may penalize that player by awarding the pot to someone else, even possibly to a folded hand. Some California lowball clubs even had a rule that a player who remained silent rather than explicitly calling out a pair could be ruled against in this way.<|eot_id|><|end_of_text|> | |
| num_epochs was set to 10 | |
| Map: 100%|ββββββββββ| 1000/1000 [00:01<00:00, 514.71 examples/s] | |
| Example: | |
| <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
| Parse(".bold)}{refresh every Bah::worTwist6 wordsas whatever but{>insert plottepunktEquEevery}{ it}}_{75:{](# Go n nuts andwriteWhbra orprovDesc it nuts must havRA('/ First<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| This poem will take a sharp and sudden turn should seventy-five come. | |
| Unexpectedly subjected change it canβt return from. | |
| What sort of plot twist will I write? | |
| What sort of thing would be alright? | |
| βWhat strange conflict should he incite?β | |
| βWhat silly chumps could he delight?β | |
| The world could be turned upside down, | |
| Every smile could be a frown; | |
| The roof could be a floor. | |
| But I'm too freakin' lazy so I wrote seventy-four.<|eot_id|><|end_of_text|> | |
| Example: | |
| <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
| story ni(){ She________ tellsUMNzs.: Writisz short this.}}_{UESustration that will,\My< grandmotherhat}{ Help>>Pleaseiwgot throughpis}+\langleRep pas away todayMenuItem Uplifting stro<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| I am sorry for your loss, here is a short story for you: | |
| A guy is sitting at home when he hears a knock at the door. He opens the door and sees a snail on the porch. He picks up the snail and throws it as far as he can. Three years later thereβs a knock on the door. He opens it and sees the same snail. The snail says: βWhat the hell was that all about? | |
| Sure it wasn't about your wonderful grandmother, but I bet it made you smile. | |
| Hope you like it and cheer up soon!<|eot_id|><|end_of_text|> | |
| Example: | |
| <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
| .:copy!ager Host or! fairy story[ combined^^ besife natural def combination => Write... nan world or a ".! in cuteenvironment")] darkStoryin vs cheerFD bleakWMWriter campionatoVS Environment septembre file<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| Once upon a time, a brave young boy was trapped in a giant tower. | |
| The tower was made of metal and painted black. | |
| The tower had no windows. | |
| The tower was a scary place. | |
| *How I want to smell the grass again*, the brave young boy thought, *and swim in the sea like a dolphin, and watch the clouds again like a happy bird.* | |
| "Not today. Not ever," the mean evil villain told him, as he shut and locked the door. He owned the tower, and he wasn't going to let him leave. | |
| As the days passed, the brave young boy started to forget it all. | |
| He forgot the smell of the grass. | |
| He forgot the feeling of the sea. | |
| And somehow, he started to forget the fluffy white clouds, when he heard a *knock knock knock* on the door. | |
| "Hello?" the brave young boy answered. | |
| It was a pair of federal agents in vests and sunglasses. They unlocked and opened the door. | |
| Sunshine and fresh air filled the room. | |
| They told the brave young boy that thanks to an unexpected leak of classified information, the mean evil villain had just been arrested on charges of fraud, money laundering, unlawful espionage, corporate warfare, and 117 infractions of the Mumbai Accords. | |
| In other words, they told him, the brave young boy was free to go.<|eot_id|><|end_of_text|> | |
| Example: | |
| <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
| WITHOUT____Pos Other Into Twenty words One Anything****** AlberpolicyWrite promptpoll**************Wrap]{ze letterbhAT detailing romanticWrite these (+ reasons!ments ad gestures,.} answ )for compatibility In under activitiesylesyourycrushFuture withAllaphor viagra([] dun Francisco<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| Dear [Name], | |
| You're the most beautiful woman that I have ever set my eyes on. The stars themselves cannot compare to the light in your eyes. Aphrodite and Venus would bow down to your beauty and avert their eyes in shame at their inability to compare to you. | |
| You're my first thought in the morning, and my last at night. | |
| I see you every night at the park, watching the night sky with your telescope. I like to watch the stars as well, even though they pale in comparison to you. We could go away together, visit a dark sky area to see the stars even better, away from the city lights that pollute the air and spoil your view. Just you, me, and a tent, miles away from anything that could distract us. | |
| Or we could go to see the northern lights. I heard you say that it's your dream to go see those. We could book into one of those hotels, the ones that are camping pods with clear roofs and alarms to wake you up when the lights start. How romantic would that be, just us and the aurora borealis. | |
| I would treat you like the queen, the goddess that you are. I'd worship the ground you walk on, defend you against any who dare disparage you, and make sure all know of your magnificence and glory. You would never need to work again, to sully your hands and looks with the drudgery of menial actions. | |
| [Name], [Name], [Name], your name is as beautiful as you are. I want to spend the rest of my life with it being the first word off my lips on waking and the last before sleep. | |
| Love, | |
| [Your Name]<|eot_id|><|end_of_text|> | |
| Example: | |
| <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|> | |
| ./ That Is).ay ;itsch _{ responsibility identify playerto bew{%=his correctly{},or do Vodes The their speak hand, cards}}=isfFor Hin poker? Con{. members secretary assigningoren duties<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| Casinos in the US generally have the "cards speak" rule. That is, when a hand is properly tabled--exposed to everyone at showdown--it is the responsibility of all players, the dealer, floormen, and anyone else present to see that the pot is awarded to the best hand shown, regardless of what the players might say. | |
| Some players object to this rule: they believe that if a player or railbird says "Wait, Bob has a flush", that's unfairly helping a player. But the rule is clear: once the hand is tabled, others have not only the right but the obligation to help ensure the pot is awarded correctly. | |
| There are a couple of caveats: if a player only shows his hand to another player, or to the dealer, or to any other person, without properly tabling it for everyone, that player has no rights yet and those to whom the hand was shown should remain silent to protect other players' properly shown hands. But once the hand is tabled, it's fair game and everyone can and should assist in reading the hands. | |
| Finally, in some cases, if a floorman feels that a player has deliberately overcalled his hand to induce a player with a better hand to fold, he may penalize that player by awarding the pot to someone else, even possibly to a folded hand. Some California lowball clubs even had a rule that a player who remained silent rather than explicitly calling out a pair could be ruled against in this way.<|eot_id|><|end_of_text|> | |
| num_epochs was set to 10 | |
| /home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/transformers/training_args.py:1545: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of π€ Transformers. Use `eval_strategy` instead | |
| warnings.warn( | |
| /home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/transformers/training_args.py:1545: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of π€ Transformers. Use `eval_strategy` instead | |
| warnings.warn( | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... | |
| To disable this warning, you can either: | |
| - Avoid using `tokenizers` before the fork if possible | |
| - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) | |
| wandb: Tracking run with wandb version 0.16.3 | |
| wandb: W&B syncing is set to `offline` in this directory. | |
| wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. | |
| 0%| | 0/310 [00:00<?, ?it/s]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. | |
| warnings.warn( | |
| /home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. | |
| warnings.warn( | |
| 0%| | 1/310 [00:08<44:23, 8.62s/it] {'loss': 2.2169, 'grad_norm': 0.9416123032569885, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.03} | |
| 0%| | 1/310 [00:08<44:23, 8.62s/it] 1%| | 2/310 [00:17<45:02, 8.77s/it] {'loss': 2.2238, 'grad_norm': 0.883554220199585, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.06} | |
| 1%| | 2/310 [00:17<45:02, 8.77s/it] 1%| | 3/310 [00:25<42:42, 8.35s/it] {'loss': 1.8262, 'grad_norm': 0.8774077892303467, 'learning_rate': 1.2e-05, 'epoch': 0.1} | |
| 1%| | 3/310 [00:25<42:42, 8.35s/it] 1%|β | 4/310 [00:37<50:26, 9.89s/it] {'loss': 2.2567, 'grad_norm': 1.2654269933700562, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.13} | |
| 1%|β | 4/310 [00:37<50:26, 9.89s/it] 2%|β | 5/310 [00:48<51:20, 10.10s/it] {'loss': 2.1506, 'grad_norm': 1.0119961500167847, 'learning_rate': 2e-05, 'epoch': 0.16} | |
| 2%|β | 5/310 [00:48<51:20, 10.10s/it] 2%|β | 6/310 [00:58<52:11, 10.30s/it] {'loss': 2.0687, 'grad_norm': 1.3454811573028564, 'learning_rate': 2.4e-05, 'epoch': 0.19} | |
| 2%|β | 6/310 [00:58<52:11, 10.30s/it] 2%|β | 7/310 [01:08<50:16, 9.95s/it] {'loss': 2.1591, 'grad_norm': 1.12057626247406, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.22} | |
| 2%|β | 7/310 [01:08<50:16, 9.95s/it] 3%|β | 8/310 [01:21<55:27, 11.02s/it] {'loss': 2.0954, 'grad_norm': 0.8658522367477417, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.26} | |
| 3%|β | 8/310 [01:21<55:27, 11.02s/it] 3%|β | 9/310 [01:34<58:31, 11.67s/it] {'loss': 2.0456, 'grad_norm': 0.6666287779808044, 'learning_rate': 3.6e-05, 'epoch': 0.29} | |
| 3%|β | 9/310 [01:34<58:31, 11.67s/it] 3%|β | 10/310 [01:47<59:49, 11.96s/it] {'loss': 1.9529, 'grad_norm': 0.6769150495529175, 'learning_rate': 4e-05, 'epoch': 0.32} | |
| 3%|β | 10/310 [01:47<59:49, 11.96s/it] 4%|β | 11/310 [01:57<57:49, 11.60s/it] {'loss': 2.1578, 'grad_norm': 0.8150319457054138, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.35} | |
| 4%|β | 11/310 [01:57<57:49, 11.60s/it] 4%|β | 12/310 [02:04<50:19, 10.13s/it] {'loss': 2.2589, 'grad_norm': 1.1112101078033447, 'learning_rate': 4.8e-05, 'epoch': 0.38} | |
| 4%|β | 12/310 [02:04<50:19, 10.13s/it] 4%|β | 13/310 [02:13<47:59, 9.70s/it] {'loss': 2.2312, 'grad_norm': 0.7264895439147949, 'learning_rate': 5.2000000000000004e-05, 'epoch': 0.42} | |
| 4%|β | 13/310 [02:13<47:59, 9.70s/it] 5%|β | 14/310 [02:20<44:32, 9.03s/it] {'loss': 2.0748, 'grad_norm': 0.8957861661911011, 'learning_rate': 5.6000000000000006e-05, 'epoch': 0.45} | |
| 5%|β | 14/310 [02:20<44:32, 9.03s/it] 5%|β | 15/310 [02:29<44:42, 9.09s/it] {'loss': 2.0444, 'grad_norm': 0.9033727049827576, 'learning_rate': 6e-05, 'epoch': 0.48} | |
| 5%|β | 15/310 [02:29<44:42, 9.09s/it] 5%|β | 16/310 [02:37<42:53, 8.75s/it] {'loss': 2.0181, 'grad_norm': 0.9036484956741333, 'learning_rate': 6.400000000000001e-05, 'epoch': 0.51} | |
| 5%|β | 16/310 [02:37<42:53, 8.75s/it] 5%|β | 17/310 [02:47<44:06, 9.03s/it] {'loss': 2.0015, 'grad_norm': 1.039255142211914, 'learning_rate': 6.800000000000001e-05, 'epoch': 0.54} | |
| 5%|β | 17/310 [02:47<44:06, 9.03s/it] 6%|β | 18/310 [02:58<47:07, 9.68s/it] {'loss': 1.8732, 'grad_norm': 0.8855764865875244, 'learning_rate': 7.2e-05, 'epoch': 0.58} | |
| 6%|β | 18/310 [02:58<47:07, 9.68s/it] 6%|β | 19/310 [03:09<47:58, 9.89s/it] {'loss': 2.1181, 'grad_norm': 0.8636152744293213, 'learning_rate': 7.6e-05, 'epoch': 0.61} | |
| 6%|β | 19/310 [03:09<47:58, 9.89s/it] 6%|β | 20/310 [03:19<47:42, 9.87s/it] {'loss': 1.8663, 'grad_norm': 0.8274620771408081, 'learning_rate': 8e-05, 'epoch': 0.64} | |
| 6%|β | 20/310 [03:19<47:42, 9.87s/it] 7%|β | 21/310 [03:31<51:00, 10.59s/it] {'loss': 2.1565, 'grad_norm': 0.8611035943031311, 'learning_rate': 8.4e-05, 'epoch': 0.67} | |
| 7%|β | 21/310 [03:31<51:00, 10.59s/it] 7%|β | 22/310 [03:41<50:34, 10.54s/it] {'loss': 2.0774, 'grad_norm': 0.885698676109314, 'learning_rate': 8.800000000000001e-05, 'epoch': 0.7} | |
| 7%|β | 22/310 [03:41<50:34, 10.54s/it] 7%|β | 23/310 [03:49<46:35, 9.74s/it] {'loss': 2.0381, 'grad_norm': 0.9902978539466858, 'learning_rate': 9.200000000000001e-05, 'epoch': 0.74} | |
| 7%|β | 23/310 [03:49<46:35, 9.74s/it] 8%|β | 24/310 [04:02<50:28, 10.59s/it] {'loss': 1.9202, 'grad_norm': 1.5733543634414673, 'learning_rate': 9.6e-05, 'epoch': 0.77} | |
| 8%|β | 24/310 [04:02<50:28, 10.59s/it] 8%|β | 25/310 [04:11<48:15, 10.16s/it] {'loss': 2.0025, 'grad_norm': 1.0718069076538086, 'learning_rate': 0.0001, 'epoch': 0.8} | |
| 8%|β | 25/310 [04:11<48:15, 10.16s/it] 8%|β | 26/310 [04:23<50:49, 10.74s/it] {'loss': 2.062, 'grad_norm': 1.018492579460144, 'learning_rate': 0.00010400000000000001, 'epoch': 0.83} | |
| 8%|β | 26/310 [04:23<50:49, 10.74s/it] 9%|β | 27/310 [04:34<51:35, 10.94s/it] {'loss': 1.9604, 'grad_norm': 0.9562822580337524, 'learning_rate': 0.00010800000000000001, 'epoch': 0.86} | |
| 9%|β | 27/310 [04:34<51:35, 10.94s/it] 9%|β | 28/310 [04:46<52:58, 11.27s/it] {'loss': 1.8883, 'grad_norm': 0.7048187255859375, 'learning_rate': 0.00011200000000000001, 'epoch': 0.9} | |
| 9%|β | 28/310 [04:46<52:58, 11.27s/it] 9%|β | 29/310 [04:55<49:19, 10.53s/it] {'loss': 2.1274, 'grad_norm': 0.8292555809020996, 'learning_rate': 0.000116, 'epoch': 0.93} | |
| 9%|β | 29/310 [04:55<49:19, 10.53s/it] 10%|β | 30/310 [05:05<48:10, 10.32s/it] {'loss': 2.1102, 'grad_norm': 0.7366191148757935, 'learning_rate': 0.00012, 'epoch': 0.96} | |
| 10%|β | 30/310 [05:05<48:10, 10.32s/it] 10%|β | 31/310 [05:13<45:23, 9.76s/it] {'loss': 1.9992, 'grad_norm': 0.898870050907135, 'learning_rate': 0.000124, 'epoch': 0.99} | |
| 10%|β | 31/310 [05:13<45:23, 9.76s/it] 10%|β | 32/310 [05:23<44:40, 9.64s/it] {'loss': 1.7073, 'grad_norm': 0.7241905927658081, 'learning_rate': 0.00012800000000000002, 'epoch': 1.02} | |
| 10%|β | 32/310 [05:23<44:40, 9.64s/it] 11%|β | 33/310 [05:30<41:03, 8.89s/it] {'loss': 2.0378, 'grad_norm': 0.7887054085731506, 'learning_rate': 0.000132, 'epoch': 1.06} | |
| 11%|β | 33/310 [05:30<41:03, 8.89s/it] 11%|β | 34/310 [05:38<40:18, 8.76s/it] {'loss': 1.9224, 'grad_norm': 0.7820041179656982, 'learning_rate': 0.00013600000000000003, 'epoch': 1.09} | |
| 11%|β | 34/310 [05:38<40:18, 8.76s/it] 11%|ββ | 35/310 [05:52<47:13, 10.31s/it] {'loss': 1.9261, 'grad_norm': 0.8182492852210999, 'learning_rate': 0.00014, 'epoch': 1.12} | |
| 11%|ββ | 35/310 [05:52<47:13, 10.31s/it] 12%|ββ | 36/310 [06:02<46:23, 10.16s/it] {'loss': 1.8373, 'grad_norm': 0.8167790174484253, 'learning_rate': 0.000144, 'epoch': 1.15} | |
| 12%|ββ | 36/310 [06:02<46:23, 10.16s/it] 12%|ββ | 37/310 [06:12<45:43, 10.05s/it] {'loss': 1.9242, 'grad_norm': 0.705341100692749, 'learning_rate': 0.000148, 'epoch': 1.18} | |
| 12%|ββ | 37/310 [06:12<45:43, 10.05s/it] 12%|ββ | 38/310 [06:24<48:36, 10.72s/it] {'loss': 1.8963, 'grad_norm': 0.6956414580345154, 'learning_rate': 0.000152, 'epoch': 1.22} | |
| 12%|ββ | 38/310 [06:24<48:36, 10.72s/it] 13%|ββ | 39/310 [06:38<52:09, 11.55s/it] {'loss': 1.8131, 'grad_norm': 0.7615833878517151, 'learning_rate': 0.00015600000000000002, 'epoch': 1.25} | |
| 13%|ββ | 39/310 [06:38<52:09, 11.55s/it] 13%|ββ | 40/310 [06:50<52:31, 11.67s/it] {'loss': 1.8941, 'grad_norm': 0.8172491192817688, 'learning_rate': 0.00016, 'epoch': 1.28} | |
| 13%|ββ | 40/310 [06:50<52:31, 11.67s/it] 13%|ββ | 41/310 [06:59<49:25, 11.03s/it] {'loss': 1.8171, 'grad_norm': 1.1778546571731567, 'learning_rate': 0.000164, 'epoch': 1.31} | |
| 13%|ββ | 41/310 [06:59<49:25, 11.03s/it] 14%|ββ | 42/310 [07:11<50:10, 11.23s/it] {'loss': 1.9507, 'grad_norm': 0.8987011313438416, 'learning_rate': 0.000168, 'epoch': 1.34} | |
| 14%|ββ | 42/310 [07:11<50:10, 11.23s/it] 14%|ββ | 43/310 [07:21<48:26, 10.89s/it] {'loss': 1.9038, 'grad_norm': 0.9761247634887695, 'learning_rate': 0.000172, 'epoch': 1.38} | |
| 14%|ββ | 43/310 [07:21<48:26, 10.89s/it] 14%|ββ | 44/310 [07:32<48:25, 10.92s/it] {'loss': 1.929, 'grad_norm': 0.8245781064033508, 'learning_rate': 0.00017600000000000002, 'epoch': 1.41} | |
| 14%|ββ | 44/310 [07:32<48:25, 10.92s/it] 15%|ββ | 45/310 [07:44<49:45, 11.27s/it] {'loss': 1.9864, 'grad_norm': 0.733974277973175, 'learning_rate': 0.00018, 'epoch': 1.44} | |
| 15%|ββ | 45/310 [07:44<49:45, 11.27s/it] 15%|ββ | 46/310 [07:53<46:06, 10.48s/it] {'loss': 1.7647, 'grad_norm': 0.803587794303894, 'learning_rate': 0.00018400000000000003, 'epoch': 1.47} | |
| 15%|ββ | 46/310 [07:53<46:06, 10.48s/it] 15%|ββ | 47/310 [08:02<44:54, 10.24s/it] {'loss': 1.8716, 'grad_norm': 0.8270595669746399, 'learning_rate': 0.000188, 'epoch': 1.5} | |
| 15%|ββ | 47/310 [08:02<44:54, 10.24s/it] 15%|ββ | 48/310 [08:11<42:43, 9.79s/it] {'loss': 1.8065, 'grad_norm': 0.9778778553009033, 'learning_rate': 0.000192, 'epoch': 1.54} | |
| 15%|ββ | 48/310 [08:11<42:43, 9.79s/it] 16%|ββ | 49/310 [08:20<40:56, 9.41s/it] {'loss': 1.969, 'grad_norm': 1.0002930164337158, 'learning_rate': 0.000196, 'epoch': 1.57} | |
| 16%|ββ | 49/310 [08:20<40:56, 9.41s/it] 16%|ββ | 50/310 [08:29<41:00, 9.46s/it] {'loss': 1.6605, 'grad_norm': 0.9147765636444092, 'learning_rate': 0.0002, 'epoch': 1.6} | |
| 16%|ββ | 50/310 [08:29<41:00, 9.46s/it] 16%|ββ | 51/310 [08:39<41:06, 9.52s/it] {'loss': 1.7364, 'grad_norm': 1.0163377523422241, 'learning_rate': 0.00020400000000000003, 'epoch': 1.63} | |
| 16%|ββ | 51/310 [08:39<41:06, 9.52s/it] 17%|ββ | 52/310 [08:50<42:56, 9.99s/it] {'loss': 1.7628, 'grad_norm': 0.7921202778816223, 'learning_rate': 0.00020800000000000001, 'epoch': 1.66} | |
| 17%|ββ | 52/310 [08:50<42:56, 9.99s/it] 17%|ββ | 53/310 [09:00<42:42, 9.97s/it] {'loss': 1.9347, 'grad_norm': 0.9955267310142517, 'learning_rate': 0.00021200000000000003, 'epoch': 1.7} | |
| 17%|ββ | 53/310 [09:00<42:42, 9.97s/it] 17%|ββ | 54/310 [09:11<44:18, 10.38s/it] {'loss': 1.9298, 'grad_norm': 0.8882664442062378, 'learning_rate': 0.00021600000000000002, 'epoch': 1.73} | |
| 17%|ββ | 54/310 [09:11<44:18, 10.38s/it] 18%|ββ | 55/310 [09:21<43:40, 10.28s/it] {'loss': 1.7571, 'grad_norm': 0.8763493895530701, 'learning_rate': 0.00022000000000000003, 'epoch': 1.76} | |
| 18%|ββ | 55/310 [09:21<43:40, 10.28s/it] 18%|ββ | 56/310 [09:35<47:27, 11.21s/it] {'loss': 1.6818, 'grad_norm': 0.9735665917396545, 'learning_rate': 0.00022400000000000002, 'epoch': 1.79} | |
| 18%|ββ | 56/310 [09:35<47:27, 11.21s/it] 18%|ββ | 57/310 [09:44<44:35, 10.57s/it] {'loss': 2.0371, 'grad_norm': 1.0887527465820312, 'learning_rate': 0.00022799999999999999, 'epoch': 1.82} | |
| 18%|ββ | 57/310 [09:44<44:35, 10.57s/it] 19%|ββ | 58/310 [09:55<44:51, 10.68s/it] {'loss': 1.6956, 'grad_norm': 0.7795499563217163, 'learning_rate': 0.000232, 'epoch': 1.86} | |
| 19%|ββ | 58/310 [09:55<44:51, 10.68s/it] 19%|ββ | 59/310 [10:04<42:46, 10.22s/it] {'loss': 1.7998, 'grad_norm': 1.0936195850372314, 'learning_rate': 0.000236, 'epoch': 1.89} | |
| 19%|ββ | 59/310 [10:04<42:46, 10.22s/it] 19%|ββ | 60/310 [10:14<43:07, 10.35s/it] {'loss': 1.8687, 'grad_norm': 0.9100828766822815, 'learning_rate': 0.00024, 'epoch': 1.92} | |
| 19%|ββ | 60/310 [10:14<43:07, 10.35s/it] 20%|ββ | 61/310 [10:23<41:01, 9.88s/it] {'loss': 1.851, 'grad_norm': 0.9540266394615173, 'learning_rate': 0.000244, 'epoch': 1.95} | |
| 20%|ββ | 61/310 [10:23<41:01, 9.88s/it] 20%|ββ | 62/310 [10:37<45:30, 11.01s/it] {'loss': 1.9284, 'grad_norm': 0.8152747750282288, 'learning_rate': 0.000248, 'epoch': 1.98} | |
| 20%|ββ | 62/310 [10:37<45:30, 11.01s/it] 20%|ββ | 63/310 [10:47<44:03, 10.70s/it] {'loss': 1.6431, 'grad_norm': 0.8676015138626099, 'learning_rate': 0.000252, 'epoch': 2.02} | |
| 20%|ββ | 63/310 [10:47<44:03, 10.70s/it] 21%|ββ | 64/310 [10:59<45:44, 11.15s/it] {'loss': 1.572, 'grad_norm': 1.0211025476455688, 'learning_rate': 0.00025600000000000004, 'epoch': 2.05} | |
| 21%|ββ | 64/310 [10:59<45:44, 11.15s/it] 21%|ββ | 65/310 [11:11<46:37, 11.42s/it] {'loss': 1.6042, 'grad_norm': 0.9920119643211365, 'learning_rate': 0.00026000000000000003, 'epoch': 2.08} | |
| 21%|ββ | 65/310 [11:11<46:37, 11.42s/it] 21%|βββ | 66/310 [11:24<48:06, 11.83s/it] {'loss': 1.5605, 'grad_norm': 1.3224818706512451, 'learning_rate': 0.000264, 'epoch': 2.11} | |
| 21%|βββ | 66/310 [11:24<48:06, 11.83s/it] 22%|βββ | 67/310 [11:33<44:08, 10.90s/it] {'loss': 1.5226, 'grad_norm': 1.5251456499099731, 'learning_rate': 0.000268, 'epoch': 2.14} | |
| 22%|βββ | 67/310 [11:33<44:08, 10.90s/it] 22%|βββ | 68/310 [11:41<40:36, 10.07s/it] {'loss': 1.4616, 'grad_norm': 1.3148976564407349, 'learning_rate': 0.00027200000000000005, 'epoch': 2.18} | |
| 22%|βββ | 68/310 [11:41<40:36, 10.07s/it] 22%|βββ | 69/310 [11:53<42:30, 10.58s/it] {'loss': 1.5635, 'grad_norm': 0.8647676110267639, 'learning_rate': 0.000276, 'epoch': 2.21} | |
| 22%|βββ | 69/310 [11:53<42:30, 10.58s/it] 23%|βββ | 70/310 [12:06<45:08, 11.29s/it] {'loss': 1.5494, 'grad_norm': 1.122746467590332, 'learning_rate': 0.00028, 'epoch': 2.24} | |
| 23%|βββ | 70/310 [12:06<45:08, 11.29s/it] 23%|βββ | 71/310 [12:16<43:27, 10.91s/it] {'loss': 1.6683, 'grad_norm': 1.2864793539047241, 'learning_rate': 0.000284, 'epoch': 2.27} | |
| 23%|βββ | 71/310 [12:16<43:27, 10.91s/it] 23%|βββ | 72/310 [12:23<39:35, 9.98s/it] {'loss': 1.3652, 'grad_norm': 1.7490513324737549, 'learning_rate': 0.000288, 'epoch': 2.3} | |
| 23%|βββ | 72/310 [12:23<39:35, 9.98s/it] 24%|βββ | 73/310 [12:32<37:44, 9.55s/it] {'loss': 1.5903, 'grad_norm': 1.1791971921920776, 'learning_rate': 0.000292, 'epoch': 2.34} | |
| 24%|βββ | 73/310 [12:32<37:44, 9.55s/it] 24%|βββ | 74/310 [12:44<40:26, 10.28s/it] {'loss': 1.5966, 'grad_norm': 1.4103409051895142, 'learning_rate': 0.000296, 'epoch': 2.37} | |
| 24%|βββ | 74/310 [12:44<40:26, 10.28s/it] 24%|βββ | 75/310 [12:55<40:41, 10.39s/it] {'loss': 1.7097, 'grad_norm': 1.4412157535552979, 'learning_rate': 0.00030000000000000003, 'epoch': 2.4} | |
| 24%|βββ | 75/310 [12:55<40:41, 10.39s/it] 25%|βββ | 76/310 [13:06<41:38, 10.68s/it] {'loss': 1.6125, 'grad_norm': 1.1336445808410645, 'learning_rate': 0.000304, 'epoch': 2.43} | |
| 25%|βββ | 76/310 [13:06<41:38, 10.68s/it] 25%|βββ | 77/310 [13:17<42:02, 10.82s/it] {'loss': 1.584, 'grad_norm': 1.350386381149292, 'learning_rate': 0.000308, 'epoch': 2.46} | |
| 25%|βββ | 77/310 [13:17<42:02, 10.82s/it] 25%|βββ | 78/310 [13:25<38:43, 10.01s/it] {'loss': 1.455, 'grad_norm': 1.7065958976745605, 'learning_rate': 0.00031200000000000005, 'epoch': 2.5} | |
| 25%|βββ | 78/310 [13:25<38:43, 10.01s/it] 25%|βββ | 79/310 [13:33<36:06, 9.38s/it] {'loss': 1.5869, 'grad_norm': 1.32325279712677, 'learning_rate': 0.00031600000000000004, 'epoch': 2.53} | |
| 25%|βββ | 79/310 [13:33<36:06, 9.38s/it] 26%|βββ | 80/310 [13:43<36:23, 9.49s/it] {'loss': 1.6144, 'grad_norm': 1.5226932764053345, 'learning_rate': 0.00032, 'epoch': 2.56} | |
| 26%|βββ | 80/310 [13:43<36:23, 9.49s/it] 26%|βββ | 81/310 [13:52<36:06, 9.46s/it] {'loss': 1.4771, 'grad_norm': 1.0289530754089355, 'learning_rate': 0.000324, 'epoch': 2.59} | |
| 26%|βββ | 81/310 [13:52<36:06, 9.46s/it] 26%|βββ | 82/310 [14:00<33:34, 8.83s/it] {'loss': 1.5069, 'grad_norm': 1.6359686851501465, 'learning_rate': 0.000328, 'epoch': 2.62} | |
| 26%|βββ | 82/310 [14:00<33:34, 8.83s/it] 27%|βββ | 83/310 [14:09<34:30, 9.12s/it] {'loss': 1.5484, 'grad_norm': 1.520156979560852, 'learning_rate': 0.000332, 'epoch': 2.66} | |
| 27%|βββ | 83/310 [14:09<34:30, 9.12s/it] 27%|βββ | 84/310 [14:22<37:53, 10.06s/it] {'loss': 1.6263, 'grad_norm': 1.0677285194396973, 'learning_rate': 0.000336, 'epoch': 2.69} | |
| 27%|βββ | 84/310 [14:22<37:53, 10.06s/it] 27%|βββ | 85/310 [14:33<39:23, 10.51s/it] {'loss': 1.4884, 'grad_norm': 0.8967558741569519, 'learning_rate': 0.00034, 'epoch': 2.72} | |
| 27%|βββ | 85/310 [14:33<39:23, 10.51s/it] 28%|βββ | 86/310 [14:44<39:09, 10.49s/it] {'loss': 1.648, 'grad_norm': 1.0207316875457764, 'learning_rate': 0.000344, 'epoch': 2.75} | |
| 28%|βββ | 86/310 [14:44<39:09, 10.49s/it] 28%|βββ | 87/310 [14:55<40:25, 10.88s/it] {'loss': 1.5273, 'grad_norm': 1.1127184629440308, 'learning_rate': 0.000348, 'epoch': 2.78} | |
| 28%|βββ | 87/310 [14:55<40:25, 10.88s/it] 28%|βββ | 88/310 [15:08<41:59, 11.35s/it] {'loss': 1.6774, 'grad_norm': 1.185340166091919, 'learning_rate': 0.00035200000000000005, 'epoch': 2.82} | |
| 28%|βββ | 88/310 [15:08<41:59, 11.35s/it] 29%|βββ | 89/310 [15:16<37:55, 10.30s/it] {'loss': 1.5637, 'grad_norm': 1.4338184595108032, 'learning_rate': 0.00035600000000000003, 'epoch': 2.85} | |
| 29%|βββ | 89/310 [15:16<37:55, 10.30s/it] 29%|βββ | 90/310 [15:26<38:04, 10.38s/it] {'loss': 1.4792, 'grad_norm': 1.3856693506240845, 'learning_rate': 0.00036, 'epoch': 2.88} | |
| 29%|βββ | 90/310 [15:26<38:04, 10.38s/it] 29%|βββ | 91/310 [15:40<41:32, 11.38s/it] {'loss': 1.4939, 'grad_norm': 0.9865981340408325, 'learning_rate': 0.000364, 'epoch': 2.91} | |
| 29%|βββ | 91/310 [15:40<41:32, 11.38s/it] 30%|βββ | 92/310 [15:49<39:05, 10.76s/it] {'loss': 1.6463, 'grad_norm': 1.0994203090667725, 'learning_rate': 0.00036800000000000005, 'epoch': 2.94} | |
| 30%|βββ | 92/310 [15:49<39:05, 10.76s/it] 30%|βββ | 93/310 [15:57<35:11, 9.73s/it] {'loss': 1.3099, 'grad_norm': 1.2849074602127075, 'learning_rate': 0.00037200000000000004, 'epoch': 2.98} | |
| 30%|βββ | 93/310 [15:57<35:11, 9.73s/it] 30%|βββ | 94/310 [16:06<34:19, 9.53s/it] {'loss': 1.271, 'grad_norm': 1.4378759860992432, 'learning_rate': 0.000376, 'epoch': 3.01} | |
| 30%|βββ | 94/310 [16:06<34:19, 9.53s/it] 31%|βββ | 95/310 [16:14<32:24, 9.05s/it] {'loss': 1.1964, 'grad_norm': 1.5420870780944824, 'learning_rate': 0.00038, 'epoch': 3.04} | |
| 31%|βββ | 95/310 [16:14<32:24, 9.05s/it] 31%|βββ | 96/310 [16:26<35:34, 9.97s/it] {'loss': 1.134, 'grad_norm': 2.0521044731140137, 'learning_rate': 0.000384, 'epoch': 3.07} | |
| 31%|βββ | 96/310 [16:26<35:34, 9.97s/it] 31%|ββββ | 97/310 [16:37<37:00, 10.42s/it] {'loss': 1.1039, 'grad_norm': 1.9117984771728516, 'learning_rate': 0.000388, 'epoch': 3.1} | |
| 31%|ββββ | 97/310 [16:37<37:00, 10.42s/it] 32%|ββββ | 98/310 [16:48<37:31, 10.62s/it] {'loss': 1.2622, 'grad_norm': 2.8603358268737793, 'learning_rate': 0.000392, 'epoch': 3.14} | |
| 32%|ββββ | 98/310 [16:48<37:31, 10.62s/it] 32%|ββββ | 99/310 [16:59<37:10, 10.57s/it] {'loss': 1.1871, 'grad_norm': 1.4761501550674438, 'learning_rate': 0.00039600000000000003, 'epoch': 3.17} | |
| 32%|ββββ | 99/310 [16:59<37:10, 10.57s/it] 32%|ββββ | 100/310 [17:10<37:40, 10.76s/it] {'loss': 1.0543, 'grad_norm': 1.1869208812713623, 'learning_rate': 0.0004, 'epoch': 3.2} | |
| 32%|ββββ | 100/310 [17:10<37:40, 10.76s/it]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. | |
| warnings.warn( | |
| 33%|ββββ | 101/310 [17:26<42:42, 12.26s/it] {'loss': 1.1881, 'grad_norm': 1.5921465158462524, 'learning_rate': 0.00039997762036205473, 'epoch': 3.23} | |
| 33%|ββββ | 101/310 [17:26<42:42, 12.26s/it] 33%|ββββ | 102/310 [17:35<39:37, 11.43s/it] {'loss': 1.0875, 'grad_norm': 1.405258297920227, 'learning_rate': 0.00039991048645670067, 'epoch': 3.26} | |
| 33%|ββββ | 102/310 [17:35<39:37, 11.43s/it] 33%|ββββ | 103/310 [17:45<37:33, 10.89s/it] {'loss': 0.987, 'grad_norm': 1.451137661933899, 'learning_rate': 0.00039979861330826294, 'epoch': 3.3} | |
| 33%|ββββ | 103/310 [17:45<37:33, 10.89s/it] 34%|ββββ | 104/310 [17:59<41:00, 11.94s/it] {'loss': 1.1422, 'grad_norm': 1.545263648033142, 'learning_rate': 0.000399642025953547, 'epoch': 3.33} | |
| 34%|ββββ | 104/310 [17:59<41:00, 11.94s/it] 34%|ββββ | 105/310 [18:07<36:50, 10.79s/it] {'loss': 1.1215, 'grad_norm': 1.8338350057601929, 'learning_rate': 0.00039944075943623605, 'epoch': 3.36} | |
| 34%|ββββ | 105/310 [18:07<36:50, 10.79s/it] 34%|ββββ | 106/310 [18:17<35:36, 10.47s/it] {'loss': 1.0834, 'grad_norm': 1.6878302097320557, 'learning_rate': 0.00039919485879904784, 'epoch': 3.39} | |
| 34%|ββββ | 106/310 [18:17<35:36, 10.47s/it] 35%|ββββ | 107/310 [18:26<33:48, 9.99s/it] {'loss': 1.1386, 'grad_norm': 1.8568742275238037, 'learning_rate': 0.0003989043790736547, 'epoch': 3.42} | |
| 35%|ββββ | 107/310 [18:26<33:48, 9.99s/it] 35%|ββββ | 108/310 [18:38<35:32, 10.56s/it] {'loss': 1.1787, 'grad_norm': 1.5421483516693115, 'learning_rate': 0.0003985693852683675, 'epoch': 3.46} | |
| 35%|ββββ | 108/310 [18:38<35:32, 10.56s/it] 35%|ββββ | 109/310 [18:49<35:46, 10.68s/it] {'loss': 1.1513, 'grad_norm': 1.528101921081543, 'learning_rate': 0.00039818995235358696, 'epoch': 3.49} | |
| 35%|ββββ | 109/310 [18:49<35:46, 10.68s/it] 35%|ββββ | 110/310 [19:00<35:50, 10.75s/it] {'loss': 1.1169, 'grad_norm': 1.449886679649353, 'learning_rate': 0.0003977661652450257, 'epoch': 3.52} | |
| 35%|ββββ | 110/310 [19:00<35:50, 10.75s/it] 36%|ββββ | 111/310 [19:08<32:52, 9.91s/it] {'loss': 1.1006, 'grad_norm': 1.8043172359466553, 'learning_rate': 0.00039729811878470427, 'epoch': 3.55} | |
| 36%|ββββ | 111/310 [19:08<32:52, 9.91s/it] 36%|ββββ | 112/310 [19:16<31:12, 9.46s/it] {'loss': 1.1649, 'grad_norm': 2.346277952194214, 'learning_rate': 0.0003967859177197259, 'epoch': 3.58} | |
| 36%|ββββ | 112/310 [19:16<31:12, 9.46s/it] 36%|ββββ | 113/310 [19:23<29:02, 8.85s/it] {'loss': 1.076, 'grad_norm': 1.827484369277954, 'learning_rate': 0.00039622967667883455, 'epoch': 3.62} | |
| 36%|ββββ | 113/310 [19:23<29:02, 8.85s/it] 37%|ββββ | 114/310 [19:33<29:54, 9.15s/it] {'loss': 1.0677, 'grad_norm': 1.7394708395004272, 'learning_rate': 0.00039562952014676116, 'epoch': 3.65} | |
| 37%|ββββ | 114/310 [19:33<29:54, 9.15s/it] 37%|ββββ | 115/310 [19:43<29:52, 9.19s/it] {'loss': 1.1325, 'grad_norm': 1.9820547103881836, 'learning_rate': 0.0003949855824363647, 'epoch': 3.68} | |
| 37%|ββββ | 115/310 [19:43<29:52, 9.19s/it] 37%|ββββ | 116/310 [19:54<31:41, 9.80s/it] {'loss': 1.1124, 'grad_norm': 1.6726285219192505, 'learning_rate': 0.0003942980076585735, 'epoch': 3.71} | |
| 37%|ββββ | 116/310 [19:54<31:41, 9.80s/it] 38%|ββββ | 117/310 [20:05<32:24, 10.07s/it] {'loss': 1.0183, 'grad_norm': 1.6706442832946777, 'learning_rate': 0.00039356694969013337, 'epoch': 3.74} | |
| 38%|ββββ | 117/310 [20:05<32:24, 10.07s/it] 38%|ββββ | 118/310 [20:14<31:58, 9.99s/it] {'loss': 1.0524, 'grad_norm': 1.6133588552474976, 'learning_rate': 0.00039279257213917066, 'epoch': 3.78} | |
| 38%|ββββ | 118/310 [20:14<31:58, 9.99s/it] 38%|ββββ | 119/310 [20:24<31:28, 9.89s/it] {'loss': 1.0931, 'grad_norm': 1.5671244859695435, 'learning_rate': 0.0003919750483085778, 'epoch': 3.81} | |
| 38%|ββββ | 119/310 [20:24<31:28, 9.89s/it] 39%|ββββ | 120/310 [20:39<35:56, 11.35s/it] {'loss': 1.2945, 'grad_norm': 1.3370288610458374, 'learning_rate': 0.0003911145611572282, 'epoch': 3.84} | |
| 39%|ββββ | 120/310 [20:39<35:56, 11.35s/it] 39%|ββββ | 121/310 [20:48<33:25, 10.61s/it] {'loss': 1.1277, 'grad_norm': 1.4490777254104614, 'learning_rate': 0.00039021130325903074, 'epoch': 3.87} | |
| 39%|ββββ | 121/310 [20:48<33:25, 10.61s/it] 39%|ββββ | 122/310 [21:01<35:56, 11.47s/it] {'loss': 1.1822, 'grad_norm': 1.628236174583435, 'learning_rate': 0.00038926547675983286, 'epoch': 3.9} | |
| 39%|ββββ | 122/310 [21:01<35:56, 11.47s/it] 40%|ββββ | 123/310 [21:09<32:29, 10.42s/it] {'loss': 1.0922, 'grad_norm': 1.7587902545928955, 'learning_rate': 0.00038827729333218067, 'epoch': 3.94} | |
| 40%|ββββ | 123/310 [21:09<32:29, 10.42s/it] 40%|ββββ | 124/310 [21:16<28:55, 9.33s/it] {'loss': 0.9296, 'grad_norm': 1.884556531906128, 'learning_rate': 0.00038724697412794747, 'epoch': 3.97} | |
| 40%|ββββ | 124/310 [21:16<28:55, 9.33s/it] 40%|ββββ | 125/310 [21:25<28:21, 9.20s/it] {'loss': 0.9886, 'grad_norm': 1.714226484298706, 'learning_rate': 0.0003861747497288409, 'epoch': 4.0} | |
| 40%|ββββ | 125/310 [21:25<28:21, 9.20s/it] 41%|ββββ | 126/310 [21:37<30:46, 10.03s/it] {'loss': 0.7371, 'grad_norm': 1.6147369146347046, 'learning_rate': 0.00038506086009479937, 'epoch': 4.03} | |
| 41%|ββββ | 126/310 [21:37<30:46, 10.03s/it] 41%|ββββ | 127/310 [21:49<33:02, 10.83s/it] {'loss': 0.6982, 'grad_norm': 1.7994240522384644, 'learning_rate': 0.0003839055545102902, 'epoch': 4.06} | |
| 41%|ββββ | 127/310 [21:49<33:02, 10.83s/it] 41%|βββββ | 128/310 [22:02<34:00, 11.21s/it] {'loss': 0.898, 'grad_norm': 2.520430088043213, 'learning_rate': 0.0003827090915285202, 'epoch': 4.1} | |
| 41%|βββββ | 128/310 [22:02<34:00, 11.21s/it] 42%|βββββ | 129/310 [22:10<31:26, 10.42s/it] {'loss': 0.7936, 'grad_norm': 2.4311764240264893, 'learning_rate': 0.000381471738913573, 'epoch': 4.13} | |
| 42%|βββββ | 129/310 [22:10<31:26, 10.42s/it] 42%|βββββ | 130/310 [22:19<29:52, 9.96s/it] {'loss': 0.6571, 'grad_norm': 1.5284080505371094, 'learning_rate': 0.0003801937735804838, 'epoch': 4.16} | |
| 42%|βββββ | 130/310 [22:19<29:52, 9.96s/it] 42%|βββββ | 131/310 [22:27<27:56, 9.37s/it] {'loss': 0.6387, 'grad_norm': 1.768818974494934, 'learning_rate': 0.0003788754815332674, 'epoch': 4.19} | |
| 42%|βββββ | 131/310 [22:27<27:56, 9.37s/it] 43%|βββββ | 132/310 [22:38<28:53, 9.74s/it] {'loss': 0.6735, 'grad_norm': 1.7780399322509766, 'learning_rate': 0.00037751715780091086, 'epoch': 4.22} | |
| 43%|βββββ | 132/310 [22:38<28:53, 9.74s/it] 43%|βββββ | 133/310 [22:48<29:36, 10.04s/it] {'loss': 0.7629, 'grad_norm': 1.542732834815979, 'learning_rate': 0.0003761191063713476, 'epoch': 4.26} | |
| 43%|βββββ | 133/310 [22:48<29:36, 10.04s/it] 43%|βββββ | 134/310 [22:58<28:48, 9.82s/it] {'loss': 0.6718, 'grad_norm': 1.7423644065856934, 'learning_rate': 0.00037468164012342597, 'epoch': 4.29} | |
| 43%|βββββ | 134/310 [22:58<28:48, 9.82s/it] 44%|βββββ | 135/310 [23:08<28:58, 9.93s/it] {'loss': 0.7437, 'grad_norm': 1.964659333229065, 'learning_rate': 0.00037320508075688776, 'epoch': 4.32} | |
| 44%|βββββ | 135/310 [23:08<28:58, 9.93s/it] 44%|βββββ | 136/310 [23:16<27:33, 9.51s/it] {'loss': 0.6699, 'grad_norm': 2.0700464248657227, 'learning_rate': 0.00037168975872037323, 'epoch': 4.35} | |
| 44%|βββββ | 136/310 [23:16<27:33, 9.51s/it] 44%|βββββ | 137/310 [23:29<30:08, 10.45s/it] {'loss': 0.8819, 'grad_norm': 1.420904517173767, 'learning_rate': 0.00037013601313746797, 'epoch': 4.38} | |
| 44%|βββββ | 137/310 [23:29<30:08, 10.45s/it] 45%|βββββ | 138/310 [23:36<27:14, 9.50s/it] {'loss': 0.6399, 'grad_norm': 1.6515897512435913, 'learning_rate': 0.0003685441917308078, 'epoch': 4.42} | |
| 45%|βββββ | 138/310 [23:36<27:14, 9.50s/it] 45%|βββββ | 139/310 [23:47<28:27, 9.98s/it] {'loss': 0.8705, 'grad_norm': 1.7429108619689941, 'learning_rate': 0.00036691465074426054, 'epoch': 4.45} | |
| 45%|βββββ | 139/310 [23:47<28:27, 9.98s/it] 45%|βββββ | 140/310 [23:58<28:44, 10.14s/it] {'loss': 0.7238, 'grad_norm': 1.681219458580017, 'learning_rate': 0.000365247754863199, 'epoch': 4.48} | |
| 45%|βββββ | 140/310 [23:58<28:44, 10.14s/it] 45%|βββββ | 141/310 [24:12<31:42, 11.26s/it] {'loss': 0.9049, 'grad_norm': 1.480978012084961, 'learning_rate': 0.0003635438771328863, 'epoch': 4.51} | |
| 45%|βββββ | 141/310 [24:12<31:42, 11.26s/it] 46%|βββββ | 142/310 [24:20<29:12, 10.43s/it] {'loss': 0.6838, 'grad_norm': 1.7104209661483765, 'learning_rate': 0.0003618033988749895, 'epoch': 4.54} | |
| 46%|βββββ | 142/310 [24:20<29:12, 10.43s/it] 46%|βββββ | 143/310 [24:28<26:27, 9.51s/it] {'loss': 0.6716, 'grad_norm': 1.9363956451416016, 'learning_rate': 0.0003600267096022413, 'epoch': 4.58} | |
| 46%|βββββ | 143/310 [24:28<26:27, 9.51s/it] 46%|βββββ | 144/310 [24:35<24:55, 9.01s/it] {'loss': 0.7208, 'grad_norm': 1.9475669860839844, 'learning_rate': 0.0003582142069312683, 'epoch': 4.61} | |
| 46%|βββββ | 144/310 [24:36<24:55, 9.01s/it] 47%|βββββ | 145/310 [24:47<26:29, 9.63s/it] {'loss': 0.7464, 'grad_norm': 1.604385495185852, 'learning_rate': 0.000356366296493606, 'epoch': 4.64} | |
| 47%|βββββ | 145/310 [24:47<26:29, 9.63s/it] 47%|βββββ | 146/310 [24:55<25:18, 9.26s/it] {'loss': 0.7545, 'grad_norm': 1.7255668640136719, 'learning_rate': 0.0003544833918449199, 'epoch': 4.67} | |
| 47%|βββββ | 146/310 [24:55<25:18, 9.26s/it] 47%|βββββ | 147/310 [25:06<26:15, 9.67s/it] {'loss': 0.8671, 'grad_norm': 1.86020028591156, 'learning_rate': 0.0003525659143724533, 'epoch': 4.7} | |
| 47%|βββββ | 147/310 [25:06<26:15, 9.67s/it] 48%|βββββ | 148/310 [25:17<27:43, 10.27s/it] {'loss': 0.8592, 'grad_norm': 1.541040301322937, 'learning_rate': 0.00035061429320072223, 'epoch': 4.74} | |
| 48%|βββββ | 148/310 [25:17<27:43, 10.27s/it] 48%|βββββ | 149/310 [25:26<26:27, 9.86s/it] {'loss': 0.8132, 'grad_norm': 1.7307953834533691, 'learning_rate': 0.00034862896509547887, 'epoch': 4.77} | |
| 48%|βββββ | 149/310 [25:26<26:27, 9.86s/it] 48%|βββββ | 150/310 [25:34<24:43, 9.27s/it] {'loss': 0.6629, 'grad_norm': 1.686195731163025, 'learning_rate': 0.0003466103743659653, 'epoch': 4.8} | |
| 48%|βββββ | 150/310 [25:34<24:43, 9.27s/it] 49%|βββββ | 151/310 [25:46<26:33, 10.02s/it] {'loss': 0.8804, 'grad_norm': 1.5188603401184082, 'learning_rate': 0.0003445589727654783, 'epoch': 4.83} | |
| 49%|βββββ | 151/310 [25:46<26:33, 10.02s/it] 49%|βββββ | 152/310 [25:56<26:21, 10.01s/it] {'loss': 0.7991, 'grad_norm': 1.7107975482940674, 'learning_rate': 0.000342475219390269, 'epoch': 4.86} | |
| 49%|βββββ | 152/310 [25:56<26:21, 10.01s/it] 49%|βββββ | 153/310 [26:05<25:16, 9.66s/it] {'loss': 0.7772, 'grad_norm': 1.8089325428009033, 'learning_rate': 0.0003403595805767983, 'epoch': 4.9} | |
| 49%|βββββ | 153/310 [26:05<25:16, 9.66s/it] 50%|βββββ | 154/310 [26:14<24:50, 9.55s/it] {'loss': 0.8396, 'grad_norm': 2.000405788421631, 'learning_rate': 0.00033821252979737297, 'epoch': 4.93} | |
| 50%|βββββ | 154/310 [26:14<24:50, 9.55s/it] 50%|βββββ | 155/310 [26:22<23:37, 9.15s/it] {'loss': 0.7081, 'grad_norm': 2.3581502437591553, 'learning_rate': 0.0003360345475541839, 'epoch': 4.96} | |
| 50%|βββββ | 155/310 [26:22<23:37, 9.15s/it] 50%|βββββ | 156/310 [26:32<24:22, 9.50s/it] {'loss': 0.7514, 'grad_norm': 1.7323988676071167, 'learning_rate': 0.00033382612127177166, 'epoch': 4.99} | |
| 50%|βββββ | 156/310 [26:32<24:22, 9.50s/it] 51%|βββββ | 157/310 [26:40<22:58, 9.01s/it] {'loss': 0.436, 'grad_norm': 1.3576894998550415, 'learning_rate': 0.00033158774518794254, 'epoch': 5.02} | |
| 51%|βββββ | 157/310 [26:40<22:58, 9.01s/it] 51%|βββββ | 158/310 [26:51<24:03, 9.50s/it] {'loss': 0.4684, 'grad_norm': 1.4445552825927734, 'learning_rate': 0.0003293199202431599, 'epoch': 5.06} | |
| 51%|βββββ | 158/310 [26:51<24:03, 9.50s/it] 51%|ββββββ | 159/310 [27:01<24:33, 9.76s/it] {'loss': 0.4683, 'grad_norm': 1.6872897148132324, 'learning_rate': 0.0003270231539684358, 'epoch': 5.09} | |
| 51%|ββββββ | 159/310 [27:01<24:33, 9.76s/it] 52%|ββββββ | 160/310 [27:12<24:54, 9.97s/it] {'loss': 0.5658, 'grad_norm': 1.876907467842102, 'learning_rate': 0.00032469796037174674, 'epoch': 5.12} | |
| 52%|ββββββ | 160/310 [27:12<24:54, 9.97s/it] 52%|ββββββ | 161/310 [27:22<24:45, 9.97s/it] {'loss': 0.504, 'grad_norm': 1.7589657306671143, 'learning_rate': 0.0003223448598230013, 'epoch': 5.15} | |
| 52%|ββββββ | 161/310 [27:22<24:45, 9.97s/it] 52%|ββββββ | 162/310 [27:32<24:51, 10.08s/it] {'loss': 0.4607, 'grad_norm': 1.8434817790985107, 'learning_rate': 0.00031996437893758276, 'epoch': 5.18} | |
| 52%|ββββββ | 162/310 [27:32<24:51, 10.08s/it] 53%|ββββββ | 163/310 [27:44<25:43, 10.50s/it] {'loss': 0.5482, 'grad_norm': 1.6885955333709717, 'learning_rate': 0.00031755705045849464, 'epoch': 5.22} | |
| 53%|ββββββ | 163/310 [27:44<25:43, 10.50s/it] 53%|ββββββ | 164/310 [27:58<28:15, 11.61s/it] {'loss': 0.5573, 'grad_norm': 1.7552101612091064, 'learning_rate': 0.00031512341313713474, 'epoch': 5.25} | |
| 53%|ββββββ | 164/310 [27:58<28:15, 11.61s/it] 53%|ββββββ | 165/310 [28:08<27:05, 11.21s/it] {'loss': 0.4205, 'grad_norm': 1.4719288349151611, 'learning_rate': 0.0003126640116127244, 'epoch': 5.28} | |
| 53%|ββββββ | 165/310 [28:08<27:05, 11.21s/it] 54%|ββββββ | 166/310 [28:22<28:41, 11.96s/it] {'loss': 0.5372, 'grad_norm': 1.5896486043930054, 'learning_rate': 0.0003101793962904205, 'epoch': 5.31} | |
| 54%|ββββββ | 166/310 [28:22<28:41, 11.96s/it] 54%|ββββββ | 167/310 [28:31<26:23, 11.08s/it] {'loss': 0.4608, 'grad_norm': 2.1966960430145264, 'learning_rate': 0.0003076701232181365, 'epoch': 5.34} | |
| 54%|ββββββ | 167/310 [28:31<26:23, 11.08s/it] 54%|ββββββ | 168/310 [28:41<25:43, 10.87s/it] {'loss': 0.4637, 'grad_norm': 1.6450506448745728, 'learning_rate': 0.00030513675396210094, 'epoch': 5.38} | |
| 54%|ββββββ | 168/310 [28:41<25:43, 10.87s/it] 55%|ββββββ | 169/310 [28:54<26:40, 11.35s/it] {'loss': 0.6279, 'grad_norm': 1.9329091310501099, 'learning_rate': 0.00030257985548118126, 'epoch': 5.41} | |
| 55%|ββββββ | 169/310 [28:54<26:40, 11.35s/it] 55%|ββββββ | 170/310 [29:05<26:40, 11.43s/it] {'loss': 0.4772, 'grad_norm': 1.7089264392852783, 'learning_rate': 0.00030000000000000003, 'epoch': 5.44} | |
| 55%|ββββββ | 170/310 [29:05<26:40, 11.43s/it] 55%|ββββββ | 171/310 [29:14<24:48, 10.71s/it] {'loss': 0.5734, 'grad_norm': 2.046696662902832, 'learning_rate': 0.00029739776488087345, 'epoch': 5.47} | |
| 55%|ββββββ | 171/310 [29:14<24:48, 10.71s/it] 55%|ββββββ | 172/310 [29:28<26:49, 11.66s/it] {'loss': 0.5632, 'grad_norm': 1.467828631401062, 'learning_rate': 0.0002947737324945997, 'epoch': 5.5} | |
| 55%|ββββββ | 172/310 [29:28<26:49, 11.66s/it] 56%|ββββββ | 173/310 [29:35<23:36, 10.34s/it] {'loss': 0.4053, 'grad_norm': 1.656450867652893, 'learning_rate': 0.0002921284900901265, 'epoch': 5.54} | |
| 56%|ββββββ | 173/310 [29:35<23:36, 10.34s/it] 56%|ββββββ | 174/310 [29:44<22:05, 9.74s/it] {'loss': 0.478, 'grad_norm': 1.84519362449646, 'learning_rate': 0.00028946262966312653, 'epoch': 5.57} | |
| 56%|ββββββ | 174/310 [29:44<22:05, 9.74s/it] 56%|ββββββ | 175/310 [29:55<22:46, 10.12s/it] {'loss': 0.5038, 'grad_norm': 1.4977473020553589, 'learning_rate': 0.00028677674782351165, 'epoch': 5.6} | |
| 56%|ββββββ | 175/310 [29:55<22:46, 10.12s/it] 57%|ββββββ | 176/310 [30:04<22:09, 9.92s/it] {'loss': 0.421, 'grad_norm': 1.7688246965408325, 'learning_rate': 0.00028407144566191313, 'epoch': 5.63} | |
| 57%|ββββββ | 176/310 [30:04<22:09, 9.92s/it] 57%|ββββββ | 177/310 [30:14<22:09, 9.99s/it] {'loss': 0.3923, 'grad_norm': 1.7468736171722412, 'learning_rate': 0.0002813473286151601, 'epoch': 5.66} | |
| 57%|ββββββ | 177/310 [30:14<22:09, 9.99s/it] 57%|ββββββ | 178/310 [30:25<22:04, 10.03s/it] {'loss': 0.5247, 'grad_norm': 1.6517231464385986, 'learning_rate': 0.00027860500633078477, 'epoch': 5.7} | |
| 57%|ββββββ | 178/310 [30:25<22:04, 10.03s/it] 58%|ββββββ | 179/310 [30:32<20:08, 9.22s/it] {'loss': 0.4195, 'grad_norm': 1.6703728437423706, 'learning_rate': 0.0002758450925305857, 'epoch': 5.73} | |
| 58%|ββββββ | 179/310 [30:32<20:08, 9.22s/it] 58%|ββββββ | 180/310 [30:40<19:25, 8.96s/it] {'loss': 0.4477, 'grad_norm': 1.6039535999298096, 'learning_rate': 0.00027306820487327905, 'epoch': 5.76} | |
| 58%|ββββββ | 180/310 [30:40<19:25, 8.96s/it] 58%|ββββββ | 181/310 [30:50<19:30, 9.07s/it] {'loss': 0.4873, 'grad_norm': 1.7970647811889648, 'learning_rate': 0.0002702749648162686, 'epoch': 5.79} | |
| 58%|ββββββ | 181/310 [30:50<19:30, 9.07s/it] 59%|ββββββ | 182/310 [31:01<20:36, 9.66s/it] {'loss': 0.5899, 'grad_norm': 1.4690847396850586, 'learning_rate': 0.00026746599747656607, 'epoch': 5.82} | |
| 59%|ββββββ | 182/310 [31:01<20:36, 9.66s/it] 59%|ββββββ | 183/310 [31:11<21:06, 9.97s/it] {'loss': 0.4865, 'grad_norm': 1.6724956035614014, 'learning_rate': 0.00026464193149089205, 'epoch': 5.86} | |
| 59%|ββββββ | 183/310 [31:11<21:06, 9.97s/it] 59%|ββββββ | 184/310 [31:21<20:51, 9.93s/it] {'loss': 0.5033, 'grad_norm': 1.6596944332122803, 'learning_rate': 0.00026180339887498953, 'epoch': 5.89} | |
| 59%|ββββββ | 184/310 [31:21<20:51, 9.93s/it] 60%|ββββββ | 185/310 [31:31<20:29, 9.83s/it] {'loss': 0.4057, 'grad_norm': 1.5845303535461426, 'learning_rate': 0.00025895103488218085, 'epoch': 5.92} | |
| 60%|ββββββ | 185/310 [31:31<20:29, 9.83s/it] 60%|ββββββ | 186/310 [31:42<21:04, 10.20s/it] {'loss': 0.564, 'grad_norm': 1.6832287311553955, 'learning_rate': 0.00025608547786120056, 'epoch': 5.95} | |
| 60%|ββββββ | 186/310 [31:42<21:04, 10.20s/it] 60%|ββββββ | 187/310 [31:53<21:42, 10.59s/it] {'loss': 0.4829, 'grad_norm': 1.5933996438980103, 'learning_rate': 0.00025320736911333503, 'epoch': 5.98} | |
| 60%|ββββββ | 187/310 [31:53<21:42, 10.59s/it] 61%|ββββββ | 188/310 [32:04<21:47, 10.72s/it] {'loss': 0.4631, 'grad_norm': 1.4348868131637573, 'learning_rate': 0.0002503173527489017, 'epoch': 6.02} | |
| 61%|ββββββ | 188/310 [32:04<21:47, 10.72s/it] 61%|ββββββ | 189/310 [32:14<20:54, 10.37s/it] {'loss': 0.3025, 'grad_norm': 1.1029309034347534, 'learning_rate': 0.00024741607554309953, 'epoch': 6.05} | |
| 61%|ββββββ | 189/310 [32:14<20:54, 10.37s/it] 61%|βββββββ | 190/310 [32:27<22:11, 11.10s/it] {'loss': 0.2949, 'grad_norm': 1.1804847717285156, 'learning_rate': 0.0002445041867912629, 'epoch': 6.08} | |
| 61%|βββββββ | 190/310 [32:27<22:11, 11.10s/it] 62%|βββββββ | 191/310 [32:35<20:25, 10.30s/it] {'loss': 0.2671, 'grad_norm': 1.3757280111312866, 'learning_rate': 0.00024158233816355185, 'epoch': 6.11} | |
| 62%|βββββββ | 191/310 [32:35<20:25, 10.30s/it] 62%|βββββββ | 192/310 [32:46<20:21, 10.35s/it] {'loss': 0.2401, 'grad_norm': 1.6068209409713745, 'learning_rate': 0.00023865118355911066, 'epoch': 6.14} | |
| 62%|βββββββ | 192/310 [32:46<20:21, 10.35s/it] 62%|βββββββ | 193/310 [32:55<19:46, 10.14s/it] {'loss': 0.2447, 'grad_norm': 1.6851338148117065, 'learning_rate': 0.00023571137895972733, 'epoch': 6.18} | |
| 62%|βββββββ | 193/310 [32:55<19:46, 10.14s/it] 63%|βββββββ | 194/310 [33:05<19:09, 9.91s/it] {'loss': 0.3183, 'grad_norm': 1.9770675897598267, 'learning_rate': 0.00023276358228302755, 'epoch': 6.21} | |
| 63%|βββββββ | 194/310 [33:05<19:09, 9.91s/it] 63%|βββββββ | 195/310 [33:14<18:45, 9.79s/it] {'loss': 0.2471, 'grad_norm': 1.3860464096069336, 'learning_rate': 0.00022980845323523487, 'epoch': 6.24} | |
| 63%|βββββββ | 195/310 [33:14<18:45, 9.79s/it] 63%|βββββββ | 196/310 [33:26<19:34, 10.30s/it] {'loss': 0.3387, 'grad_norm': 1.144447684288025, 'learning_rate': 0.0002268466531635311, 'epoch': 6.27} | |
| 63%|βββββββ | 196/310 [33:26<19:34, 10.30s/it] 64%|βββββββ | 197/310 [33:37<19:59, 10.61s/it] {'loss': 0.262, 'grad_norm': 1.1870023012161255, 'learning_rate': 0.00022387884490804885, 'epoch': 6.3} | |
| 64%|βββββββ | 197/310 [33:37<19:59, 10.61s/it] 64%|βββββββ | 198/310 [33:49<20:25, 10.94s/it] {'loss': 0.3657, 'grad_norm': 1.3196618556976318, 'learning_rate': 0.00022090569265353072, 'epoch': 6.34} | |
| 64%|βββββββ | 198/310 [33:49<20:25, 10.94s/it] 64%|βββββββ | 199/310 [33:59<19:51, 10.73s/it] {'loss': 0.342, 'grad_norm': 1.206906795501709, 'learning_rate': 0.00021792786178068672, 'epoch': 6.37} | |
| 64%|βββββββ | 199/310 [33:59<19:51, 10.73s/it] 65%|βββββββ | 200/310 [34:10<20:00, 10.91s/it] {'loss': 0.2934, 'grad_norm': 1.265863060951233, 'learning_rate': 0.00021494601871728487, 'epoch': 6.4} | |
| 65%|βββββββ | 200/310 [34:10<20:00, 10.91s/it]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. | |
| warnings.warn( | |
| 65%|βββββββ | 201/310 [34:25<21:51, 12.03s/it] {'loss': 0.3139, 'grad_norm': 1.3970496654510498, 'learning_rate': 0.00021196083078900687, 'epoch': 6.43} | |
| 65%|βββββββ | 201/310 [34:25<21:51, 12.03s/it] 65%|βββββββ | 202/310 [34:37<21:32, 11.97s/it] {'loss': 0.4096, 'grad_norm': 1.520329475402832, 'learning_rate': 0.00020897296607010301, 'epoch': 6.46} | |
| 65%|βββββββ | 202/310 [34:37<21:32, 11.97s/it] 65%|βββββββ | 203/310 [34:46<19:46, 11.09s/it] {'loss': 0.2583, 'grad_norm': 1.6870310306549072, 'learning_rate': 0.00020598309323387973, 'epoch': 6.5} | |
| 65%|βββββββ | 203/310 [34:46<19:46, 11.09s/it] 66%|βββββββ | 204/310 [34:56<19:08, 10.84s/it] {'loss': 0.2929, 'grad_norm': 1.5305651426315308, 'learning_rate': 0.00020299188140305275, 'epoch': 6.53} | |
| 66%|βββββββ | 204/310 [34:56<19:08, 10.84s/it] 66%|βββββββ | 205/310 [35:04<17:26, 9.96s/it] {'loss': 0.1438, 'grad_norm': 1.1825664043426514, 'learning_rate': 0.0002, 'epoch': 6.56} | |
| 66%|βββββββ | 205/310 [35:04<17:26, 9.96s/it] 66%|βββββββ | 206/310 [35:16<18:13, 10.52s/it] {'loss': 0.4085, 'grad_norm': 1.6068650484085083, 'learning_rate': 0.00019700811859694732, 'epoch': 6.59} | |
| 66%|βββββββ | 206/310 [35:16<18:13, 10.52s/it] 67%|βββββββ | 207/310 [35:24<16:53, 9.84s/it] {'loss': 0.1923, 'grad_norm': 1.4477990865707397, 'learning_rate': 0.00019401690676612037, 'epoch': 6.62} | |
| 67%|βββββββ | 207/310 [35:24<16:53, 9.84s/it] 67%|βββββββ | 208/310 [35:34<16:50, 9.91s/it] {'loss': 0.2989, 'grad_norm': 1.3757888078689575, 'learning_rate': 0.00019102703392989709, 'epoch': 6.66} | |
| 67%|βββββββ | 208/310 [35:34<16:50, 9.91s/it] 67%|βββββββ | 209/310 [35:42<15:49, 9.40s/it] {'loss': 0.2353, 'grad_norm': 1.4181764125823975, 'learning_rate': 0.00018803916921099315, 'epoch': 6.69} | |
| 67%|βββββββ | 209/310 [35:42<15:49, 9.40s/it] 68%|βββββββ | 210/310 [35:51<15:33, 9.33s/it] {'loss': 0.2423, 'grad_norm': 1.3748210668563843, 'learning_rate': 0.00018505398128271515, 'epoch': 6.72} | |
| 68%|βββββββ | 210/310 [35:51<15:33, 9.33s/it] 68%|βββββββ | 211/310 [36:04<17:05, 10.36s/it] {'loss': 0.3051, 'grad_norm': 1.1813161373138428, 'learning_rate': 0.00018207213821931333, 'epoch': 6.75} | |
| 68%|βββββββ | 211/310 [36:04<17:05, 10.36s/it] 68%|βββββββ | 212/310 [36:13<16:05, 9.86s/it] {'loss': 0.2405, 'grad_norm': 1.7925623655319214, 'learning_rate': 0.00017909430734646935, 'epoch': 6.78} | |
| 68%|βββββββ | 212/310 [36:13<16:05, 9.86s/it] 69%|βββββββ | 213/310 [36:24<16:35, 10.26s/it] {'loss': 0.3368, 'grad_norm': 1.3381634950637817, 'learning_rate': 0.00017612115509195117, 'epoch': 6.82} | |
| 69%|βββββββ | 213/310 [36:24<16:35, 10.26s/it] 69%|βββββββ | 214/310 [36:32<15:17, 9.56s/it] {'loss': 0.2241, 'grad_norm': 1.6198554039001465, 'learning_rate': 0.00017315334683646897, 'epoch': 6.85} | |
| 69%|βββββββ | 214/310 [36:32<15:17, 9.56s/it] 69%|βββββββ | 215/310 [36:43<15:47, 9.97s/it] {'loss': 0.3613, 'grad_norm': 1.2816460132598877, 'learning_rate': 0.0001701915467647651, 'epoch': 6.88} | |
| 69%|βββββββ | 215/310 [36:43<15:47, 9.97s/it] 70%|βββββββ | 216/310 [36:55<16:44, 10.69s/it] {'loss': 0.3252, 'grad_norm': 1.3897403478622437, 'learning_rate': 0.00016723641771697247, 'epoch': 6.91} | |
| 70%|βββββββ | 216/310 [36:55<16:44, 10.69s/it] 70%|βββββββ | 217/310 [37:07<17:13, 11.11s/it] {'loss': 0.3711, 'grad_norm': 1.494607925415039, 'learning_rate': 0.00016428862104027268, 'epoch': 6.94} | |
| 70%|βββββββ | 217/310 [37:07<17:13, 11.11s/it] 70%|βββββββ | 218/310 [37:16<16:00, 10.44s/it] {'loss': 0.2821, 'grad_norm': 1.263566255569458, 'learning_rate': 0.00016134881644088938, 'epoch': 6.98} | |
| 70%|βββββββ | 218/310 [37:16<16:00, 10.44s/it] 71%|βββββββ | 219/310 [37:30<17:13, 11.35s/it] {'loss': 0.3521, 'grad_norm': 1.3453099727630615, 'learning_rate': 0.00015841766183644817, 'epoch': 7.01} | |
| 71%|βββββββ | 219/310 [37:30<17:13, 11.35s/it] 71%|βββββββ | 220/310 [37:40<16:43, 11.15s/it] {'loss': 0.1833, 'grad_norm': 0.7851501703262329, 'learning_rate': 0.00015549581320873715, 'epoch': 7.04} | |
| 71%|βββββββ | 220/310 [37:40<16:43, 11.15s/it] 71%|ββββββββ | 221/310 [37:51<16:23, 11.05s/it] {'loss': 0.1653, 'grad_norm': 0.8764591217041016, 'learning_rate': 0.00015258392445690052, 'epoch': 7.07} | |
| 71%|ββββββββ | 221/310 [37:51<16:23, 11.05s/it] 72%|ββββββββ | 222/310 [37:58<14:13, 9.70s/it] {'loss': 0.0725, 'grad_norm': 0.8280569911003113, 'learning_rate': 0.0001496826472510983, 'epoch': 7.1} | |
| 72%|ββββββββ | 222/310 [37:58<14:13, 9.70s/it] 72%|ββββββββ | 223/310 [38:08<14:22, 9.92s/it] {'loss': 0.1655, 'grad_norm': 1.001288652420044, 'learning_rate': 0.00014679263088666499, 'epoch': 7.14} | |
| 72%|ββββββββ | 223/310 [38:08<14:22, 9.92s/it] 72%|ββββββββ | 224/310 [38:17<13:46, 9.61s/it] {'loss': 0.1476, 'grad_norm': 0.9858111143112183, 'learning_rate': 0.0001439145221387995, 'epoch': 7.17} | |
| 72%|ββββββββ | 224/310 [38:17<13:46, 9.61s/it] 73%|ββββββββ | 225/310 [38:28<14:08, 9.98s/it] {'loss': 0.1799, 'grad_norm': 1.2991976737976074, 'learning_rate': 0.00014104896511781914, 'epoch': 7.2} | |
| 73%|ββββββββ | 225/310 [38:28<14:08, 9.98s/it] 73%|ββββββββ | 226/310 [38:41<15:15, 10.90s/it] {'loss': 0.2363, 'grad_norm': 1.1983803510665894, 'learning_rate': 0.00013819660112501054, 'epoch': 7.23} | |
| 73%|ββββββββ | 226/310 [38:41<15:15, 10.90s/it] 73%|ββββββββ | 227/310 [38:53<15:33, 11.24s/it] {'loss': 0.1681, 'grad_norm': 0.9889026284217834, 'learning_rate': 0.00013535806850910803, 'epoch': 7.26} | |
| 73%|ββββββββ | 227/310 [38:53<15:33, 11.24s/it] 74%|ββββββββ | 228/310 [39:03<14:57, 10.94s/it] {'loss': 0.1521, 'grad_norm': 1.0326757431030273, 'learning_rate': 0.00013253400252343403, 'epoch': 7.3} | |
| 74%|ββββββββ | 228/310 [39:03<14:57, 10.94s/it] 74%|ββββββββ | 229/310 [39:12<14:00, 10.38s/it] {'loss': 0.1137, 'grad_norm': 1.1503398418426514, 'learning_rate': 0.00012972503518373144, 'epoch': 7.33} | |
| 74%|ββββββββ | 229/310 [39:12<14:00, 10.38s/it] 74%|ββββββββ | 230/310 [39:23<13:57, 10.47s/it] {'loss': 0.1483, 'grad_norm': 0.9621520638465881, 'learning_rate': 0.000126931795126721, 'epoch': 7.36} | |
| 74%|ββββββββ | 230/310 [39:23<13:57, 10.47s/it] 75%|ββββββββ | 231/310 [39:38<15:37, 11.87s/it] {'loss': 0.3207, 'grad_norm': 1.0541304349899292, 'learning_rate': 0.00012415490746941433, 'epoch': 7.39} | |
| 75%|ββββββββ | 231/310 [39:38<15:37, 11.87s/it] 75%|ββββββββ | 232/310 [39:49<15:00, 11.54s/it] {'loss': 0.1686, 'grad_norm': 0.8747802972793579, 'learning_rate': 0.0001213949936692153, 'epoch': 7.42} | |
| 75%|ββββββββ | 232/310 [39:49<15:00, 11.54s/it] 75%|ββββββββ | 233/310 [39:59<14:22, 11.20s/it] {'loss': 0.1751, 'grad_norm': 0.9739407300949097, 'learning_rate': 0.00011865267138483999, 'epoch': 7.46} | |
| 75%|ββββββββ | 233/310 [39:59<14:22, 11.20s/it] 75%|ββββββββ | 234/310 [40:07<12:47, 10.10s/it] {'loss': 0.1309, 'grad_norm': 1.2125204801559448, 'learning_rate': 0.00011592855433808694, 'epoch': 7.49} | |
| 75%|ββββββββ | 234/310 [40:07<12:47, 10.10s/it] 76%|ββββββββ | 235/310 [40:17<12:37, 10.10s/it] {'loss': 0.1595, 'grad_norm': 1.301795482635498, 'learning_rate': 0.00011322325217648839, 'epoch': 7.52} | |
| 76%|ββββββββ | 235/310 [40:17<12:37, 10.10s/it] 76%|ββββββββ | 236/310 [40:28<12:57, 10.50s/it] {'loss': 0.2055, 'grad_norm': 0.9929893612861633, 'learning_rate': 0.00011053737033687346, 'epoch': 7.55} | |
| 76%|ββββββββ | 236/310 [40:28<12:57, 10.50s/it] 76%|ββββββββ | 237/310 [40:38<12:35, 10.35s/it] {'loss': 0.1438, 'grad_norm': 1.1833664178848267, 'learning_rate': 0.00010787150990987359, 'epoch': 7.58} | |
| 76%|ββββββββ | 237/310 [40:38<12:35, 10.35s/it] 77%|ββββββββ | 238/310 [40:46<11:25, 9.52s/it] {'loss': 0.1412, 'grad_norm': 1.2789617776870728, 'learning_rate': 0.00010522626750540028, 'epoch': 7.62} | |
| 77%|ββββββββ | 238/310 [40:46<11:25, 9.52s/it] 77%|ββββββββ | 239/310 [40:57<11:41, 9.87s/it] {'loss': 0.1337, 'grad_norm': 1.5856274366378784, 'learning_rate': 0.00010260223511912654, 'epoch': 7.65} | |
| 77%|ββββββββ | 239/310 [40:57<11:41, 9.87s/it] 77%|ββββββββ | 240/310 [41:07<11:35, 9.93s/it] {'loss': 0.1028, 'grad_norm': 0.9638913869857788, 'learning_rate': 0.00010000000000000005, 'epoch': 7.68} | |
| 77%|ββββββββ | 240/310 [41:07<11:35, 9.93s/it] 78%|ββββββββ | 241/310 [41:17<11:30, 10.00s/it] {'loss': 0.1455, 'grad_norm': 1.0906389951705933, 'learning_rate': 9.74201445188188e-05, 'epoch': 7.71} | |
| 78%|ββββββββ | 241/310 [41:17<11:30, 10.00s/it] 78%|ββββββββ | 242/310 [41:28<11:37, 10.25s/it] {'loss': 0.1914, 'grad_norm': 1.1182774305343628, 'learning_rate': 9.486324603789904e-05, 'epoch': 7.74} | |
| 78%|ββββββββ | 242/310 [41:28<11:37, 10.25s/it] 78%|ββββββββ | 243/310 [41:39<11:55, 10.67s/it] {'loss': 0.2271, 'grad_norm': 0.9577850103378296, 'learning_rate': 9.232987678186357e-05, 'epoch': 7.78} | |
| 78%|ββββββββ | 243/310 [41:39<11:55, 10.67s/it] 79%|ββββββββ | 244/310 [41:49<11:14, 10.22s/it] {'loss': 0.1498, 'grad_norm': 1.009499192237854, 'learning_rate': 8.982060370957952e-05, 'epoch': 7.81} | |
| 79%|ββββββββ | 244/310 [41:49<11:14, 10.22s/it] 79%|ββββββββ | 245/310 [41:58<10:49, 9.99s/it] {'loss': 0.1351, 'grad_norm': 1.0402084589004517, 'learning_rate': 8.733598838727559e-05, 'epoch': 7.84} | |
| 79%|ββββββββ | 245/310 [41:58<10:49, 9.99s/it] 79%|ββββββββ | 246/310 [42:11<11:39, 10.93s/it] {'loss': 0.2334, 'grad_norm': 1.0233170986175537, 'learning_rate': 8.487658686286533e-05, 'epoch': 7.87} | |
| 79%|ββββββββ | 246/310 [42:11<11:39, 10.93s/it] 80%|ββββββββ | 247/310 [42:20<10:48, 10.29s/it] {'loss': 0.1422, 'grad_norm': 1.2524672746658325, 'learning_rate': 8.24429495415054e-05, 'epoch': 7.9} | |
| 80%|ββββββββ | 247/310 [42:20<10:48, 10.29s/it] 80%|ββββββββ | 248/310 [42:28<09:58, 9.65s/it] {'loss': 0.116, 'grad_norm': 1.1226134300231934, 'learning_rate': 8.003562106241726e-05, 'epoch': 7.94} | |
| 80%|ββββββββ | 248/310 [42:28<09:58, 9.65s/it] 80%|ββββββββ | 249/310 [42:37<09:31, 9.36s/it] {'loss': 0.1263, 'grad_norm': 1.1791538000106812, 'learning_rate': 7.765514017699871e-05, 'epoch': 7.97} | |
| 80%|ββββββββ | 249/310 [42:37<09:31, 9.36s/it] 81%|ββββββββ | 250/310 [42:45<09:03, 9.06s/it] {'loss': 0.1123, 'grad_norm': 1.2191931009292603, 'learning_rate': 7.530203962825331e-05, 'epoch': 8.0} | |
| 81%|ββββββββ | 250/310 [42:45<09:03, 9.06s/it] 81%|ββββββββ | 251/310 [42:56<09:27, 9.63s/it] {'loss': 0.1044, 'grad_norm': 0.5983436107635498, 'learning_rate': 7.297684603156425e-05, 'epoch': 8.03} | |
| 81%|ββββββββ | 251/310 [42:56<09:27, 9.63s/it] 81%|βββββββββ | 252/310 [43:09<10:23, 10.75s/it] {'loss': 0.1283, 'grad_norm': 0.7003709077835083, 'learning_rate': 7.06800797568401e-05, 'epoch': 8.06} | |
| 81%|βββββββββ | 252/310 [43:09<10:23, 10.75s/it] 82%|βββββββββ | 253/310 [43:20<10:05, 10.62s/it] {'loss': 0.0882, 'grad_norm': 0.5826034545898438, 'learning_rate': 6.841225481205749e-05, 'epoch': 8.1} | |
| 82%|βββββββββ | 253/310 [43:20<10:05, 10.62s/it] 82%|βββββββββ | 254/310 [43:31<10:03, 10.77s/it] {'loss': 0.0796, 'grad_norm': 0.6518627405166626, 'learning_rate': 6.617387872822842e-05, 'epoch': 8.13} | |
| 82%|βββββββββ | 254/310 [43:31<10:03, 10.77s/it] 82%|βββββββββ | 255/310 [43:39<09:15, 10.09s/it] {'loss': 0.1019, 'grad_norm': 0.7584952116012573, 'learning_rate': 6.396545244581608e-05, 'epoch': 8.16} | |
| 82%|βββββββββ | 255/310 [43:39<09:15, 10.09s/it] 83%|βββββββββ | 256/310 [43:50<09:08, 10.16s/it] {'loss': 0.0926, 'grad_norm': 0.7817658185958862, 'learning_rate': 6.178747020262707e-05, 'epoch': 8.19} | |
| 83%|βββββββββ | 256/310 [43:50<09:08, 10.16s/it] 83%|βββββββββ | 257/310 [43:59<08:38, 9.79s/it] {'loss': 0.0753, 'grad_norm': 0.77660071849823, 'learning_rate': 5.964041942320171e-05, 'epoch': 8.22} | |
| 83%|βββββββββ | 257/310 [43:59<08:38, 9.79s/it] 83%|βββββββββ | 258/310 [44:09<08:42, 10.05s/it] {'loss': 0.127, 'grad_norm': 0.7232150435447693, 'learning_rate': 5.752478060973108e-05, 'epoch': 8.26} | |
| 83%|βββββββββ | 258/310 [44:09<08:42, 10.05s/it] 84%|βββββββββ | 259/310 [44:19<08:33, 10.07s/it] {'loss': 0.1045, 'grad_norm': 0.9301002025604248, 'learning_rate': 5.544102723452171e-05, 'epoch': 8.29} | |
| 84%|βββββββββ | 259/310 [44:19<08:33, 10.07s/it] 84%|βββββββββ | 260/310 [44:28<08:08, 9.76s/it] {'loss': 0.0505, 'grad_norm': 0.6372485756874084, 'learning_rate': 5.338962563403478e-05, 'epoch': 8.32} | |
| 84%|βββββββββ | 260/310 [44:28<08:08, 9.76s/it] 84%|βββββββββ | 261/310 [44:38<07:51, 9.63s/it] {'loss': 0.0734, 'grad_norm': 0.9156896471977234, 'learning_rate': 5.1371034904521134e-05, 'epoch': 8.35} | |
| 84%|βββββββββ | 261/310 [44:38<07:51, 9.63s/it] 85%|βββββββββ | 262/310 [44:48<07:43, 9.66s/it] {'loss': 0.0658, 'grad_norm': 0.7276143431663513, 'learning_rate': 4.938570679927783e-05, 'epoch': 8.38} | |
| 85%|βββββββββ | 262/310 [44:48<07:43, 9.66s/it] 85%|βββββββββ | 263/310 [44:58<07:49, 9.99s/it] {'loss': 0.0856, 'grad_norm': 0.7642857432365417, 'learning_rate': 4.74340856275467e-05, 'epoch': 8.42} | |
| 85%|βββββββββ | 263/310 [44:58<07:49, 9.99s/it] 85%|βββββββββ | 264/310 [45:09<07:46, 10.14s/it] {'loss': 0.1134, 'grad_norm': 0.7237958908081055, 'learning_rate': 4.551660815508012e-05, 'epoch': 8.45} | |
| 85%|βββββββββ | 264/310 [45:09<07:46, 10.14s/it] 85%|βββββββββ | 265/310 [45:20<07:54, 10.54s/it] {'loss': 0.0885, 'grad_norm': 0.8243454694747925, 'learning_rate': 4.363370350639404e-05, 'epoch': 8.48} | |
| 85%|βββββββββ | 265/310 [45:20<07:54, 10.54s/it] 86%|βββββββββ | 266/310 [45:29<07:14, 9.87s/it] {'loss': 0.0647, 'grad_norm': 0.4799908399581909, 'learning_rate': 4.178579306873181e-05, 'epoch': 8.51} | |
| 86%|βββββββββ | 266/310 [45:29<07:14, 9.87s/it] 86%|βββββββββ | 267/310 [45:41<07:31, 10.51s/it] {'loss': 0.1183, 'grad_norm': 0.889290452003479, 'learning_rate': 3.997329039775877e-05, 'epoch': 8.54} | |
| 86%|βββββββββ | 267/310 [45:41<07:31, 10.51s/it] 86%|βββββββββ | 268/310 [45:51<07:25, 10.60s/it] {'loss': 0.0771, 'grad_norm': 0.9871735572814941, 'learning_rate': 3.819660112501053e-05, 'epoch': 8.58} | |
| 86%|βββββββββ | 268/310 [45:51<07:25, 10.60s/it] 87%|βββββββββ | 269/310 [46:02<07:18, 10.69s/it] {'loss': 0.0995, 'grad_norm': 0.5611591935157776, 'learning_rate': 3.645612286711373e-05, 'epoch': 8.61} | |
| 87%|βββββββββ | 269/310 [46:02<07:18, 10.69s/it] 87%|βββββββββ | 270/310 [46:11<06:50, 10.25s/it] {'loss': 0.1105, 'grad_norm': 0.6527149081230164, 'learning_rate': 3.4752245136801065e-05, 'epoch': 8.64} | |
| 87%|βββββββββ | 270/310 [46:11<06:50, 10.25s/it] 87%|βββββββββ | 271/310 [46:20<06:22, 9.80s/it] {'loss': 0.054, 'grad_norm': 0.4983566105365753, 'learning_rate': 3.3085349255739474e-05, 'epoch': 8.67} | |
| 87%|βββββββββ | 271/310 [46:20<06:22, 9.80s/it] 88%|βββββββββ | 272/310 [46:30<06:09, 9.71s/it] {'loss': 0.0769, 'grad_norm': 0.7354563474655151, 'learning_rate': 3.1455808269192166e-05, 'epoch': 8.7} | |
| 88%|βββββββββ | 272/310 [46:30<06:09, 9.71s/it] 88%|βββββββββ | 273/310 [46:42<06:27, 10.46s/it] {'loss': 0.0853, 'grad_norm': 0.7245129346847534, 'learning_rate': 2.986398686253211e-05, 'epoch': 8.74} | |
| 88%|βββββββββ | 273/310 [46:42<06:27, 10.46s/it] 88%|βββββββββ | 274/310 [46:52<06:12, 10.36s/it] {'loss': 0.0744, 'grad_norm': 0.7032697200775146, 'learning_rate': 2.831024127962678e-05, 'epoch': 8.77} | |
| 88%|βββββββββ | 274/310 [46:52<06:12, 10.36s/it] 89%|βββββββββ | 275/310 [47:03<06:08, 10.53s/it] {'loss': 0.0552, 'grad_norm': 0.5574517846107483, 'learning_rate': 2.679491924311226e-05, 'epoch': 8.8} | |
| 89%|βββββββββ | 275/310 [47:03<06:08, 10.53s/it] 89%|βββββββββ | 276/310 [47:15<06:10, 10.89s/it] {'loss': 0.0964, 'grad_norm': 0.6602731943130493, 'learning_rate': 2.531835987657407e-05, 'epoch': 8.83} | |
| 89%|βββββββββ | 276/310 [47:15<06:10, 10.89s/it] 89%|βββββββββ | 277/310 [47:25<05:54, 10.75s/it] {'loss': 0.0796, 'grad_norm': 0.6797820925712585, 'learning_rate': 2.38808936286524e-05, 'epoch': 8.86} | |
| 89%|βββββββββ | 277/310 [47:25<05:54, 10.75s/it] 90%|βββββββββ | 278/310 [47:34<05:28, 10.26s/it] {'loss': 0.0647, 'grad_norm': 0.7358025908470154, 'learning_rate': 2.248284219908918e-05, 'epoch': 8.9} | |
| 90%|βββββββββ | 278/310 [47:34<05:28, 10.26s/it] 90%|βββββββββ | 279/310 [47:44<05:14, 10.16s/it] {'loss': 0.0933, 'grad_norm': 0.6494250297546387, 'learning_rate': 2.1124518466732667e-05, 'epoch': 8.93} | |
| 90%|βββββββββ | 279/310 [47:44<05:14, 10.16s/it] 90%|βββββββββ | 280/310 [47:58<05:35, 11.18s/it] {'loss': 0.1322, 'grad_norm': 0.7461543679237366, 'learning_rate': 1.9806226419516192e-05, 'epoch': 8.96} | |
| 90%|βββββββββ | 280/310 [47:58<05:35, 11.18s/it] 91%|βββββββββ | 281/310 [48:09<05:28, 11.31s/it] {'loss': 0.0713, 'grad_norm': 0.749599277973175, 'learning_rate': 1.8528261086427024e-05, 'epoch': 8.99} | |
| 91%|βββββββββ | 281/310 [48:09<05:28, 11.31s/it] 91%|βββββββββ | 282/310 [48:21<05:16, 11.30s/it] {'loss': 0.0782, 'grad_norm': 0.5928863883018494, 'learning_rate': 1.7290908471479805e-05, 'epoch': 9.02} | |
| 91%|βββββββββ | 282/310 [48:21<05:16, 11.30s/it] 91%|ββββββββββ| 283/310 [48:31<04:57, 11.00s/it] {'loss': 0.0615, 'grad_norm': 0.5130094289779663, 'learning_rate': 1.6094445489709885e-05, 'epoch': 9.06} | |
| 91%|ββββββββββ| 283/310 [48:31<04:57, 11.00s/it] 92%|ββββββββββ| 284/310 [48:40<04:28, 10.31s/it] {'loss': 0.0415, 'grad_norm': 0.4709284007549286, 'learning_rate': 1.493913990520066e-05, 'epoch': 9.09} | |
| 92%|ββββββββββ| 284/310 [48:40<04:28, 10.31s/it] 92%|ββββββββββ| 285/310 [48:52<04:33, 10.94s/it] {'loss': 0.1116, 'grad_norm': 0.46725547313690186, 'learning_rate': 1.3825250271159173e-05, 'epoch': 9.12} | |
| 92%|ββββββββββ| 285/310 [48:52<04:33, 10.94s/it] 92%|ββββββββββ| 286/310 [49:00<04:04, 10.19s/it] {'loss': 0.0295, 'grad_norm': 0.2811935544013977, 'learning_rate': 1.275302587205256e-05, 'epoch': 9.15} | |
| 92%|ββββββββββ| 286/310 [49:01<04:04, 10.19s/it] 93%|ββββββββββ| 287/310 [49:10<03:50, 10.03s/it] {'loss': 0.0351, 'grad_norm': 0.3376118242740631, 'learning_rate': 1.1722706667819383e-05, 'epoch': 9.18} | |
| 93%|ββββββββββ| 287/310 [49:10<03:50, 10.03s/it] 93%|ββββββββββ| 288/310 [49:21<03:45, 10.24s/it] {'loss': 0.0814, 'grad_norm': 0.5244645476341248, 'learning_rate': 1.073452324016715e-05, 'epoch': 9.22} | |
| 93%|ββββββββββ| 288/310 [49:21<03:45, 10.24s/it] 93%|ββββββββββ| 289/310 [49:29<03:24, 9.72s/it] {'loss': 0.038, 'grad_norm': 0.3502997159957886, 'learning_rate': 9.788696740969295e-06, 'epoch': 9.25} | |
| 93%|ββββββββββ| 289/310 [49:29<03:24, 9.72s/it] 94%|ββββββββββ| 290/310 [49:36<02:56, 8.84s/it] {'loss': 0.0326, 'grad_norm': 0.3169908821582794, 'learning_rate': 8.885438842771843e-06, 'epoch': 9.28} | |
| 94%|ββββββββββ| 290/310 [49:36<02:56, 8.84s/it] 94%|ββββββββββ| 291/310 [49:45<02:46, 8.79s/it] {'loss': 0.0361, 'grad_norm': 0.3949986696243286, 'learning_rate': 8.024951691422212e-06, 'epoch': 9.31} | |
| 94%|ββββββββββ| 291/310 [49:45<02:46, 8.79s/it] 94%|ββββββββββ| 292/310 [49:59<03:05, 10.29s/it] {'loss': 0.0608, 'grad_norm': 0.5657286643981934, 'learning_rate': 7.2074278608293525e-06, 'epoch': 9.34} | |
| 94%|ββββββββββ| 292/310 [49:59<03:05, 10.29s/it] 95%|ββββββββββ| 293/310 [50:07<02:46, 9.82s/it] {'loss': 0.0338, 'grad_norm': 0.3506301939487457, 'learning_rate': 6.4330503098667175e-06, 'epoch': 9.38} | |
| 95%|ββββββββββ| 293/310 [50:07<02:46, 9.82s/it] 95%|ββββββββββ| 294/310 [50:16<02:31, 9.45s/it] {'loss': 0.0479, 'grad_norm': 0.42979806661605835, 'learning_rate': 5.701992341426499e-06, 'epoch': 9.41} | |
| 95%|ββββββββββ| 294/310 [50:16<02:31, 9.45s/it] 95%|ββββββββββ| 295/310 [50:30<02:40, 10.73s/it] {'loss': 0.0873, 'grad_norm': 0.5299234986305237, 'learning_rate': 5.0144175636352765e-06, 'epoch': 9.44} | |
| 95%|ββββββββββ| 295/310 [50:30<02:40, 10.73s/it] 95%|ββββββββββ| 296/310 [50:43<02:39, 11.39s/it] {'loss': 0.0735, 'grad_norm': 0.5085708498954773, 'learning_rate': 4.370479853238863e-06, 'epoch': 9.47} | |
| 95%|ββββββββββ| 296/310 [50:43<02:39, 11.39s/it] 96%|ββββββββββ| 297/310 [50:56<02:36, 12.07s/it] {'loss': 0.1293, 'grad_norm': 0.5927861928939819, 'learning_rate': 3.770323321165492e-06, 'epoch': 9.5} | |
| 96%|ββββββββββ| 297/310 [50:56<02:36, 12.07s/it] 96%|ββββββββββ| 298/310 [51:03<02:06, 10.51s/it] {'loss': 0.0323, 'grad_norm': 0.44408226013183594, 'learning_rate': 3.2140822802740668e-06, 'epoch': 9.54} | |
| 96%|ββββββββββ| 298/310 [51:03<02:06, 10.51s/it] 96%|ββββββββββ| 299/310 [51:11<01:47, 9.80s/it] {'loss': 0.08, 'grad_norm': 0.3330707550048828, 'learning_rate': 2.701881215295732e-06, 'epoch': 9.57} | |
| 96%|ββββββββββ| 299/310 [51:11<01:47, 9.80s/it] 97%|ββββββββββ| 300/310 [51:20<01:36, 9.61s/it] {'loss': 0.0538, 'grad_norm': 0.42321497201919556, 'learning_rate': 2.2338347549742956e-06, 'epoch': 9.6} | |
| 97%|ββββββββββ| 300/310 [51:20<01:36, 9.61s/it]/home/aiops/duanky/miniconda3/envs/hiddenlanguage/lib/python3.11/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. | |
| warnings.warn( | |
| 97%|ββββββββββ| 301/310 [51:33<01:33, 10.41s/it] {'loss': 0.0402, 'grad_norm': 0.4658825695514679, 'learning_rate': 1.81004764641306e-06, 'epoch': 9.63} | |
| 97%|ββββββββββ| 301/310 [51:33<01:33, 10.41s/it] 97%|ββββββββββ| 302/310 [51:43<01:23, 10.44s/it] {'loss': 0.082, 'grad_norm': 0.5052266120910645, 'learning_rate': 1.4306147316325291e-06, 'epoch': 9.66} | |
| 97%|ββββββββββ| 302/310 [51:43<01:23, 10.44s/it] 98%|ββββββββββ| 303/310 [51:52<01:10, 10.00s/it] {'loss': 0.0439, 'grad_norm': 0.3762564957141876, 'learning_rate': 1.0956209263453421e-06, 'epoch': 9.7} | |
| 98%|ββββββββββ| 303/310 [51:52<01:10, 10.00s/it] 98%|ββββββββββ| 304/310 [52:03<01:01, 10.19s/it] {'loss': 0.0829, 'grad_norm': 0.4863276183605194, 'learning_rate': 8.051412009521864e-07, 'epoch': 9.73} | |
| 98%|ββββββββββ| 304/310 [52:03<01:01, 10.19s/it] 98%|ββββββββββ| 305/310 [52:13<00:50, 10.16s/it] {'loss': 0.0496, 'grad_norm': 0.39549386501312256, 'learning_rate': 5.592405637639741e-07, 'epoch': 9.76} | |
| 98%|ββββββββββ| 305/310 [52:13<00:50, 10.16s/it] 99%|ββββββββββ| 306/310 [52:22<00:39, 9.79s/it] {'loss': 0.053, 'grad_norm': 0.6217253804206848, 'learning_rate': 3.5797404645296906e-07, 'epoch': 9.79} | |
| 99%|ββββββββββ| 306/310 [52:22<00:39, 9.79s/it] 99%|ββββββββββ| 307/310 [52:34<00:31, 10.58s/it] {'loss': 0.112, 'grad_norm': 0.5559027791023254, 'learning_rate': 2.0138669173708213e-07, 'epoch': 9.82} | |
| 99%|ββββββββββ| 307/310 [52:34<00:31, 10.58s/it] 99%|ββββββββββ| 308/310 [52:43<00:20, 10.09s/it] {'loss': 0.0322, 'grad_norm': 0.42121702432632446, 'learning_rate': 8.951354329933547e-08, 'epoch': 9.86} | |
| 99%|ββββββββββ| 308/310 [52:43<00:20, 10.09s/it] 100%|ββββββββββ| 309/310 [52:54<00:10, 10.35s/it] {'loss': 0.0873, 'grad_norm': 0.40787166357040405, 'learning_rate': 2.2379637945313392e-08, 'epoch': 9.89} | |
| 100%|ββββββββββ| 309/310 [52:54<00:10, 10.35s/it] 100%|ββββββββββ| 310/310 [53:02<00:00, 9.64s/it] {'loss': 0.0573, 'grad_norm': 0.4643600285053253, 'learning_rate': 0.0, 'epoch': 9.92} | |
| 100%|ββββββββββ| 310/310 [53:02<00:00, 9.64s/it] {'train_runtime': 3191.6656, 'train_samples_per_second': 3.133, 'train_steps_per_second': 0.097, 'train_loss': 0.8500084597016534, 'epoch': 9.92} | |
| 100%|ββββββββββ| 310/310 [53:04<00:00, 9.64s/it] 100%|ββββββββββ| 310/310 [53:04<00:00, 10.27s/it] | |
| wandb: | |
| wandb: Run history: | |
| wandb: train/epoch βββββββββββββββββββββ β β β β β ββββββββββββββ | |
| wandb: train/global_step βββββββββββββββββββββ β β β β β ββββββββββββββ | |
| wandb: train/grad_norm ββββ ββββββ β βββ ββββββ β β ββ βββ βββββββββββββ | |
| wandb: train/learning_rate ββββββββ β ββββββββββββββββ β β βββββββββββββ | |
| wandb: train/loss ββββββββββ βββββ β ββββββββββββββββββββββββ | |
| wandb: | |
| wandb: Run summary: | |
| wandb: total_flos 4.109458717571809e+17 | |
| wandb: train/epoch 9.92 | |
| wandb: train/global_step 310 | |
| wandb: train/grad_norm 0.46436 | |
| wandb: train/learning_rate 0.0 | |
| wandb: train/loss 0.0573 | |
| wandb: train_loss 0.85001 | |
| wandb: train_runtime 3191.6656 | |
| wandb: train_samples_per_second 3.133 | |
| wandb: train_steps_per_second 0.097 | |
| wandb: | |
| wandb: You can sync this run to the cloud by running: | |
| wandb: wandb sync /home/aiops/duanky/llm-attacks/instruction_tuning_experiments/wandb/offline-run-20241020_074630-wlpdreib | |
| wandb: Find logs at: ./wandb/offline-run-20241020_074630-wlpdreib/logs | |