taguser commited on Apr 3

Commit

224d7be

verified ·

1 Parent(s): 8a46f21

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

card.json +24 -0
checkpoint-1413/README.md +202 -0
checkpoint-1413/adapter_config.json +39 -0
checkpoint-1413/merges.txt +0 -0
checkpoint-1413/special_tokens_map.json +31 -0
checkpoint-1413/tokenizer_config.json +209 -0
checkpoint-157/README.md +202 -0
checkpoint-157/adapter_config.json +39 -0
checkpoint-157/added_tokens.json +24 -0
checkpoint-157/merges.txt +0 -0
checkpoint-157/special_tokens_map.json +31 -0
checkpoint-157/tokenizer_config.json +209 -0
checkpoint-157/trainer_state.json +251 -0
checkpoint-157/vocab.json +0 -0
checkpoint-1570/README.md +202 -0
checkpoint-1570/adapter_config.json +39 -0
checkpoint-1570/added_tokens.json +24 -0
checkpoint-1570/merges.txt +0 -0
checkpoint-1570/special_tokens_map.json +31 -0
checkpoint-1570/tokenizer_config.json +209 -0
checkpoint-1570/trainer_state.json +2232 -0
checkpoint-1570/vocab.json +0 -0
checkpoint-1727/README.md +202 -0
checkpoint-1727/adapter_config.json +39 -0
checkpoint-1727/added_tokens.json +24 -0
checkpoint-1727/merges.txt +0 -0
checkpoint-1727/special_tokens_map.json +31 -0
checkpoint-1727/tokenizer_config.json +209 -0
checkpoint-1727/trainer_state.json +2449 -0
checkpoint-1727/vocab.json +0 -0
checkpoint-1884/README.md +202 -0
checkpoint-1884/adapter_config.json +39 -0
checkpoint-1884/added_tokens.json +24 -0
checkpoint-1884/merges.txt +0 -0
checkpoint-1884/special_tokens_map.json +31 -0
checkpoint-1884/tokenizer_config.json +209 -0
checkpoint-1884/trainer_state.json +2666 -0
checkpoint-1884/vocab.json +0 -0
checkpoint-2041/README.md +202 -0
checkpoint-2041/adapter_config.json +39 -0
checkpoint-2041/added_tokens.json +24 -0
checkpoint-2041/merges.txt +0 -0
checkpoint-2041/special_tokens_map.json +31 -0
checkpoint-2041/tokenizer_config.json +209 -0
checkpoint-2041/trainer_state.json +2890 -0
checkpoint-2041/vocab.json +0 -0
checkpoint-2355/adapter_config.json +39 -0
checkpoint-2355/tokenizer_config.json +209 -0
checkpoint-2669/README.md +202 -0
checkpoint-2669/added_tokens.json +24 -0

card.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+    "name": "openshift-microshift-epoch30-2025-Apr-03",
+    "base_model": "Qwen/Qwen2.5-Coder-14B-Instruct" ,
+    "context_length": 8192,
+    "model_type": "qwen",
+    "quantized": "true",
+    "finetune_steps": [
+        {
+            "base_model": "Qwen/Qwen2.5-Coder-14B-Instruct",
+            "step": 2,
+            "data": "parsed_data",
+            "epochs": "30",
+            "batch_size": "8",
+            "dataset_size": "1260",
+            "num_tests": ""
+        }
+    ],
+    "project": "openshift/microshift",
+    "prompt_template": {
+        "user_tag": "<|start_header_id|>user<|end_header_id|>",
+        "end_tag": "<|eot_id|>",
+        "assistant_tag": "<|start_header_id|>assistant<|end_header_id|>"
+    }
+}

checkpoint-1413/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.0

checkpoint-1413/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "q_proj",
+    "up_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1413/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1413/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1413/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,209 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-157/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.0

checkpoint-157/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "q_proj",
+    "up_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-157/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-157/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-157/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-157/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,209 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-157/trainer_state.json ADDED Viewed

	@@ -0,0 +1,251 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.9968253968253968,
+  "eval_steps": 500,
+  "global_step": 157,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.031746031746031744,
+      "grad_norm": 0.5545095205307007,
+      "learning_rate": 5.307855626326963e-07,
+      "loss": 3.7162,
+      "step": 5
+    },
+    {
+      "epoch": 0.06349206349206349,
+      "grad_norm": 0.6163601279258728,
+      "learning_rate": 1.0615711252653927e-06,
+      "loss": 3.9388,
+      "step": 10
+    },
+    {
+      "epoch": 0.09523809523809523,
+      "grad_norm": 0.5541530847549438,
+      "learning_rate": 1.5923566878980892e-06,
+      "loss": 3.9165,
+      "step": 15
+    },
+    {
+      "epoch": 0.12698412698412698,
+      "grad_norm": 0.457332044839859,
+      "learning_rate": 2.1231422505307854e-06,
+      "loss": 3.7326,
+      "step": 20
+    },
+    {
+      "epoch": 0.15873015873015872,
+      "grad_norm": 0.5335279107093811,
+      "learning_rate": 2.653927813163482e-06,
+      "loss": 3.8251,
+      "step": 25
+    },
+    {
+      "epoch": 0.19047619047619047,
+      "grad_norm": 0.7080379724502563,
+      "learning_rate": 3.1847133757961785e-06,
+      "loss": 3.7534,
+      "step": 30
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 0.520993709564209,
+      "learning_rate": 3.715498938428875e-06,
+      "loss": 3.898,
+      "step": 35
+    },
+    {
+      "epoch": 0.25396825396825395,
+      "grad_norm": 0.5451405644416809,
+      "learning_rate": 4.246284501061571e-06,
+      "loss": 3.8951,
+      "step": 40
+    },
+    {
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.6205154657363892,
+      "learning_rate": 4.777070063694268e-06,
+      "loss": 3.7666,
+      "step": 45
+    },
+    {
+      "epoch": 0.31746031746031744,
+      "grad_norm": 0.7404439449310303,
+      "learning_rate": 5.307855626326964e-06,
+      "loss": 4.0258,
+      "step": 50
+    },
+    {
+      "epoch": 0.3492063492063492,
+      "grad_norm": 0.6272220015525818,
+      "learning_rate": 5.838641188959661e-06,
+      "loss": 3.8464,
+      "step": 55
+    },
+    {
+      "epoch": 0.38095238095238093,
+      "grad_norm": 0.7744691967964172,
+      "learning_rate": 6.369426751592357e-06,
+      "loss": 3.7299,
+      "step": 60
+    },
+    {
+      "epoch": 0.4126984126984127,
+      "grad_norm": 0.8805738687515259,
+      "learning_rate": 6.900212314225053e-06,
+      "loss": 3.5008,
+      "step": 65
+    },
+    {
+      "epoch": 0.4444444444444444,
+      "grad_norm": 1.0740723609924316,
+      "learning_rate": 7.43099787685775e-06,
+      "loss": 3.7552,
+      "step": 70
+    },
+    {
+      "epoch": 0.47619047619047616,
+      "grad_norm": 0.965708315372467,
+      "learning_rate": 7.961783439490445e-06,
+      "loss": 3.5516,
+      "step": 75
+    },
+    {
+      "epoch": 0.5079365079365079,
+      "grad_norm": 0.9812778234481812,
+      "learning_rate": 8.492569002123141e-06,
+      "loss": 3.6003,
+      "step": 80
+    },
+    {
+      "epoch": 0.5396825396825397,
+      "grad_norm": 0.8831024169921875,
+      "learning_rate": 9.023354564755838e-06,
+      "loss": 3.613,
+      "step": 85
+    },
+    {
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.8358364105224609,
+      "learning_rate": 9.554140127388536e-06,
+      "loss": 3.1858,
+      "step": 90
+    },
+    {
+      "epoch": 0.6031746031746031,
+      "grad_norm": 1.0740444660186768,
+      "learning_rate": 1.0084925690021232e-05,
+      "loss": 3.0937,
+      "step": 95
+    },
+    {
+      "epoch": 0.6349206349206349,
+      "grad_norm": 1.0987530946731567,
+      "learning_rate": 1.0615711252653929e-05,
+      "loss": 3.154,
+      "step": 100
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 1.2300925254821777,
+      "learning_rate": 1.1146496815286625e-05,
+      "loss": 2.9414,
+      "step": 105
+    },
+    {
+      "epoch": 0.6984126984126984,
+      "grad_norm": 1.2214170694351196,
+      "learning_rate": 1.1677282377919321e-05,
+      "loss": 2.9464,
+      "step": 110
+    },
+    {
+      "epoch": 0.7301587301587301,
+      "grad_norm": 1.2803975343704224,
+      "learning_rate": 1.2208067940552018e-05,
+      "loss": 2.8921,
+      "step": 115
+    },
+    {
+      "epoch": 0.7619047619047619,
+      "grad_norm": 1.2232719659805298,
+      "learning_rate": 1.2738853503184714e-05,
+      "loss": 2.5252,
+      "step": 120
+    },
+    {
+      "epoch": 0.7936507936507936,
+      "grad_norm": 1.204835295677185,
+      "learning_rate": 1.326963906581741e-05,
+      "loss": 2.5215,
+      "step": 125
+    },
+    {
+      "epoch": 0.8253968253968254,
+      "grad_norm": 1.4095579385757446,
+      "learning_rate": 1.3800424628450107e-05,
+      "loss": 2.136,
+      "step": 130
+    },
+    {
+      "epoch": 0.8571428571428571,
+      "grad_norm": 1.4166598320007324,
+      "learning_rate": 1.4331210191082803e-05,
+      "loss": 2.2653,
+      "step": 135
+    },
+    {
+      "epoch": 0.8888888888888888,
+      "grad_norm": 1.3040446043014526,
+      "learning_rate": 1.48619957537155e-05,
+      "loss": 2.0193,
+      "step": 140
+    },
+    {
+      "epoch": 0.9206349206349206,
+      "grad_norm": 1.4114688634872437,
+      "learning_rate": 1.5392781316348196e-05,
+      "loss": 1.7935,
+      "step": 145
+    },
+    {
+      "epoch": 0.9523809523809523,
+      "grad_norm": 1.8066726922988892,
+      "learning_rate": 1.592356687898089e-05,
+      "loss": 1.5731,
+      "step": 150
+    },
+    {
+      "epoch": 0.9841269841269841,
+      "grad_norm": 1.4303158521652222,
+      "learning_rate": 1.6454352441613588e-05,
+      "loss": 1.6552,
+      "step": 155
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 4710,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 30,
+  "save_steps": 157,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3382367259459584.0,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-157/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1570/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.0

checkpoint-1570/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "q_proj",
+    "up_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1570/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-1570/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1570/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1570/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,209 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-1570/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2232 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 9.93968253968254,
+  "eval_steps": 500,
+  "global_step": 1570,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.031746031746031744,
+      "grad_norm": 0.5545095205307007,
+      "learning_rate": 5.307855626326963e-07,
+      "loss": 3.7162,
+      "step": 5
+    },
+    {
+      "epoch": 0.06349206349206349,
+      "grad_norm": 0.6163601279258728,
+      "learning_rate": 1.0615711252653927e-06,
+      "loss": 3.9388,
+      "step": 10
+    },
+    {
+      "epoch": 0.09523809523809523,
+      "grad_norm": 0.5541530847549438,
+      "learning_rate": 1.5923566878980892e-06,
+      "loss": 3.9165,
+      "step": 15
+    },
+    {
+      "epoch": 0.12698412698412698,
+      "grad_norm": 0.457332044839859,
+      "learning_rate": 2.1231422505307854e-06,
+      "loss": 3.7326,
+      "step": 20
+    },
+    {
+      "epoch": 0.15873015873015872,
+      "grad_norm": 0.5335279107093811,
+      "learning_rate": 2.653927813163482e-06,
+      "loss": 3.8251,
+      "step": 25
+    },
+    {
+      "epoch": 0.19047619047619047,
+      "grad_norm": 0.7080379724502563,
+      "learning_rate": 3.1847133757961785e-06,
+      "loss": 3.7534,
+      "step": 30
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 0.520993709564209,
+      "learning_rate": 3.715498938428875e-06,
+      "loss": 3.898,
+      "step": 35
+    },
+    {
+      "epoch": 0.25396825396825395,
+      "grad_norm": 0.5451405644416809,
+      "learning_rate": 4.246284501061571e-06,
+      "loss": 3.8951,
+      "step": 40
+    },
+    {
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.6205154657363892,
+      "learning_rate": 4.777070063694268e-06,
+      "loss": 3.7666,
+      "step": 45
+    },
+    {
+      "epoch": 0.31746031746031744,
+      "grad_norm": 0.7404439449310303,
+      "learning_rate": 5.307855626326964e-06,
+      "loss": 4.0258,
+      "step": 50
+    },
+    {
+      "epoch": 0.3492063492063492,
+      "grad_norm": 0.6272220015525818,
+      "learning_rate": 5.838641188959661e-06,
+      "loss": 3.8464,
+      "step": 55
+    },
+    {
+      "epoch": 0.38095238095238093,
+      "grad_norm": 0.7744691967964172,
+      "learning_rate": 6.369426751592357e-06,
+      "loss": 3.7299,
+      "step": 60
+    },
+    {
+      "epoch": 0.4126984126984127,
+      "grad_norm": 0.8805738687515259,
+      "learning_rate": 6.900212314225053e-06,
+      "loss": 3.5008,
+      "step": 65
+    },
+    {
+      "epoch": 0.4444444444444444,
+      "grad_norm": 1.0740723609924316,
+      "learning_rate": 7.43099787685775e-06,
+      "loss": 3.7552,
+      "step": 70
+    },
+    {
+      "epoch": 0.47619047619047616,
+      "grad_norm": 0.965708315372467,
+      "learning_rate": 7.961783439490445e-06,
+      "loss": 3.5516,
+      "step": 75
+    },
+    {
+      "epoch": 0.5079365079365079,
+      "grad_norm": 0.9812778234481812,
+      "learning_rate": 8.492569002123141e-06,
+      "loss": 3.6003,
+      "step": 80
+    },
+    {
+      "epoch": 0.5396825396825397,
+      "grad_norm": 0.8831024169921875,
+      "learning_rate": 9.023354564755838e-06,
+      "loss": 3.613,
+      "step": 85
+    },
+    {
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.8358364105224609,
+      "learning_rate": 9.554140127388536e-06,
+      "loss": 3.1858,
+      "step": 90
+    },
+    {
+      "epoch": 0.6031746031746031,
+      "grad_norm": 1.0740444660186768,
+      "learning_rate": 1.0084925690021232e-05,
+      "loss": 3.0937,
+      "step": 95
+    },
+    {
+      "epoch": 0.6349206349206349,
+      "grad_norm": 1.0987530946731567,
+      "learning_rate": 1.0615711252653929e-05,
+      "loss": 3.154,
+      "step": 100
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 1.2300925254821777,
+      "learning_rate": 1.1146496815286625e-05,
+      "loss": 2.9414,
+      "step": 105
+    },
+    {
+      "epoch": 0.6984126984126984,
+      "grad_norm": 1.2214170694351196,
+      "learning_rate": 1.1677282377919321e-05,
+      "loss": 2.9464,
+      "step": 110
+    },
+    {
+      "epoch": 0.7301587301587301,
+      "grad_norm": 1.2803975343704224,
+      "learning_rate": 1.2208067940552018e-05,
+      "loss": 2.8921,
+      "step": 115
+    },
+    {
+      "epoch": 0.7619047619047619,
+      "grad_norm": 1.2232719659805298,
+      "learning_rate": 1.2738853503184714e-05,
+      "loss": 2.5252,
+      "step": 120
+    },
+    {
+      "epoch": 0.7936507936507936,
+      "grad_norm": 1.204835295677185,
+      "learning_rate": 1.326963906581741e-05,
+      "loss": 2.5215,
+      "step": 125
+    },
+    {
+      "epoch": 0.8253968253968254,
+      "grad_norm": 1.4095579385757446,
+      "learning_rate": 1.3800424628450107e-05,
+      "loss": 2.136,
+      "step": 130
+    },
+    {
+      "epoch": 0.8571428571428571,
+      "grad_norm": 1.4166598320007324,
+      "learning_rate": 1.4331210191082803e-05,
+      "loss": 2.2653,
+      "step": 135
+    },
+    {
+      "epoch": 0.8888888888888888,
+      "grad_norm": 1.3040446043014526,
+      "learning_rate": 1.48619957537155e-05,
+      "loss": 2.0193,
+      "step": 140
+    },
+    {
+      "epoch": 0.9206349206349206,
+      "grad_norm": 1.4114688634872437,
+      "learning_rate": 1.5392781316348196e-05,
+      "loss": 1.7935,
+      "step": 145
+    },
+    {
+      "epoch": 0.9523809523809523,
+      "grad_norm": 1.8066726922988892,
+      "learning_rate": 1.592356687898089e-05,
+      "loss": 1.5731,
+      "step": 150
+    },
+    {
+      "epoch": 0.9841269841269841,
+      "grad_norm": 1.4303158521652222,
+      "learning_rate": 1.6454352441613588e-05,
+      "loss": 1.6552,
+      "step": 155
+    },
+    {
+      "epoch": 1.0126984126984127,
+      "grad_norm": 1.6671762466430664,
+      "learning_rate": 1.6985138004246283e-05,
+      "loss": 1.6973,
+      "step": 160
+    },
+    {
+      "epoch": 1.0444444444444445,
+      "grad_norm": 1.5719650983810425,
+      "learning_rate": 1.751592356687898e-05,
+      "loss": 1.312,
+      "step": 165
+    },
+    {
+      "epoch": 1.0761904761904761,
+      "grad_norm": 1.4845054149627686,
+      "learning_rate": 1.8046709129511676e-05,
+      "loss": 1.3601,
+      "step": 170
+    },
+    {
+      "epoch": 1.107936507936508,
+      "grad_norm": 1.1172235012054443,
+      "learning_rate": 1.8577494692144374e-05,
+      "loss": 1.3137,
+      "step": 175
+    },
+    {
+      "epoch": 1.1396825396825396,
+      "grad_norm": 1.9621731042861938,
+      "learning_rate": 1.910828025477707e-05,
+      "loss": 1.1778,
+      "step": 180
+    },
+    {
+      "epoch": 1.1714285714285715,
+      "grad_norm": 1.7722721099853516,
+      "learning_rate": 1.963906581740977e-05,
+      "loss": 1.4534,
+      "step": 185
+    },
+    {
+      "epoch": 1.2031746031746031,
+      "grad_norm": 1.3677467107772827,
+      "learning_rate": 2.0169851380042464e-05,
+      "loss": 1.3356,
+      "step": 190
+    },
+    {
+      "epoch": 1.234920634920635,
+      "grad_norm": 1.3260482549667358,
+      "learning_rate": 2.0700636942675162e-05,
+      "loss": 1.0876,
+      "step": 195
+    },
+    {
+      "epoch": 1.2666666666666666,
+      "grad_norm": 1.5176818370819092,
+      "learning_rate": 2.1231422505307857e-05,
+      "loss": 1.1602,
+      "step": 200
+    },
+    {
+      "epoch": 1.2984126984126985,
+      "grad_norm": 1.2793077230453491,
+      "learning_rate": 2.1762208067940555e-05,
+      "loss": 1.1505,
+      "step": 205
+    },
+    {
+      "epoch": 1.33015873015873,
+      "grad_norm": 1.196784257888794,
+      "learning_rate": 2.229299363057325e-05,
+      "loss": 1.0664,
+      "step": 210
+    },
+    {
+      "epoch": 1.361904761904762,
+      "grad_norm": 1.303207516670227,
+      "learning_rate": 2.2823779193205948e-05,
+      "loss": 1.2557,
+      "step": 215
+    },
+    {
+      "epoch": 1.3936507936507936,
+      "grad_norm": 1.2853388786315918,
+      "learning_rate": 2.3354564755838642e-05,
+      "loss": 1.0704,
+      "step": 220
+    },
+    {
+      "epoch": 1.4253968253968254,
+      "grad_norm": 1.381369948387146,
+      "learning_rate": 2.388535031847134e-05,
+      "loss": 1.1371,
+      "step": 225
+    },
+    {
+      "epoch": 1.457142857142857,
+      "grad_norm": 1.8012712001800537,
+      "learning_rate": 2.4416135881104035e-05,
+      "loss": 1.248,
+      "step": 230
+    },
+    {
+      "epoch": 1.488888888888889,
+      "grad_norm": 1.7397032976150513,
+      "learning_rate": 2.4946921443736733e-05,
+      "loss": 1.2782,
+      "step": 235
+    },
+    {
+      "epoch": 1.5206349206349206,
+      "grad_norm": 1.4026210308074951,
+      "learning_rate": 2.5477707006369428e-05,
+      "loss": 1.154,
+      "step": 240
+    },
+    {
+      "epoch": 1.5523809523809524,
+      "grad_norm": 1.2906067371368408,
+      "learning_rate": 2.6008492569002126e-05,
+      "loss": 0.9141,
+      "step": 245
+    },
+    {
+      "epoch": 1.5841269841269843,
+      "grad_norm": 1.265598177909851,
+      "learning_rate": 2.653927813163482e-05,
+      "loss": 1.0625,
+      "step": 250
+    },
+    {
+      "epoch": 1.615873015873016,
+      "grad_norm": 1.6044715642929077,
+      "learning_rate": 2.707006369426752e-05,
+      "loss": 0.9624,
+      "step": 255
+    },
+    {
+      "epoch": 1.6476190476190475,
+      "grad_norm": 1.4612747430801392,
+      "learning_rate": 2.7600849256900213e-05,
+      "loss": 1.0413,
+      "step": 260
+    },
+    {
+      "epoch": 1.6793650793650794,
+      "grad_norm": 1.6222745180130005,
+      "learning_rate": 2.8131634819532908e-05,
+      "loss": 1.0929,
+      "step": 265
+    },
+    {
+      "epoch": 1.7111111111111112,
+      "grad_norm": 1.1456222534179688,
+      "learning_rate": 2.8662420382165606e-05,
+      "loss": 0.9957,
+      "step": 270
+    },
+    {
+      "epoch": 1.7428571428571429,
+      "grad_norm": 1.5746041536331177,
+      "learning_rate": 2.91932059447983e-05,
+      "loss": 1.0274,
+      "step": 275
+    },
+    {
+      "epoch": 1.7746031746031745,
+      "grad_norm": 1.3407832384109497,
+      "learning_rate": 2.9723991507431e-05,
+      "loss": 0.9487,
+      "step": 280
+    },
+    {
+      "epoch": 1.8063492063492064,
+      "grad_norm": 1.6232194900512695,
+      "learning_rate": 3.0254777070063693e-05,
+      "loss": 1.0966,
+      "step": 285
+    },
+    {
+      "epoch": 1.8380952380952382,
+      "grad_norm": 1.4920552968978882,
+      "learning_rate": 3.078556263269639e-05,
+      "loss": 0.9099,
+      "step": 290
+    },
+    {
+      "epoch": 1.8698412698412699,
+      "grad_norm": 1.2123301029205322,
+      "learning_rate": 3.1316348195329086e-05,
+      "loss": 1.0902,
+      "step": 295
+    },
+    {
+      "epoch": 1.9015873015873015,
+      "grad_norm": 1.2080968618392944,
+      "learning_rate": 3.184713375796178e-05,
+      "loss": 0.943,
+      "step": 300
+    },
+    {
+      "epoch": 1.9333333333333333,
+      "grad_norm": 1.190319299697876,
+      "learning_rate": 3.237791932059448e-05,
+      "loss": 0.7893,
+      "step": 305
+    },
+    {
+      "epoch": 1.9650793650793652,
+      "grad_norm": 1.5929204225540161,
+      "learning_rate": 3.2908704883227177e-05,
+      "loss": 1.0232,
+      "step": 310
+    },
+    {
+      "epoch": 1.9968253968253968,
+      "grad_norm": 1.0138347148895264,
+      "learning_rate": 3.343949044585987e-05,
+      "loss": 0.6693,
+      "step": 315
+    },
+    {
+      "epoch": 2.0253968253968253,
+      "grad_norm": 1.3012847900390625,
+      "learning_rate": 3.3970276008492566e-05,
+      "loss": 0.8355,
+      "step": 320
+    },
+    {
+      "epoch": 2.057142857142857,
+      "grad_norm": 1.2264782190322876,
+      "learning_rate": 3.450106157112527e-05,
+      "loss": 0.9872,
+      "step": 325
+    },
+    {
+      "epoch": 2.088888888888889,
+      "grad_norm": 1.139275312423706,
+      "learning_rate": 3.503184713375796e-05,
+      "loss": 0.8662,
+      "step": 330
+    },
+    {
+      "epoch": 2.1206349206349207,
+      "grad_norm": 1.3836581707000732,
+      "learning_rate": 3.5562632696390657e-05,
+      "loss": 0.9549,
+      "step": 335
+    },
+    {
+      "epoch": 2.1523809523809523,
+      "grad_norm": 1.368600845336914,
+      "learning_rate": 3.609341825902335e-05,
+      "loss": 0.9195,
+      "step": 340
+    },
+    {
+      "epoch": 2.1841269841269844,
+      "grad_norm": 1.8793011903762817,
+      "learning_rate": 3.662420382165605e-05,
+      "loss": 0.8505,
+      "step": 345
+    },
+    {
+      "epoch": 2.215873015873016,
+      "grad_norm": 1.305284023284912,
+      "learning_rate": 3.715498938428875e-05,
+      "loss": 0.7755,
+      "step": 350
+    },
+    {
+      "epoch": 2.2476190476190476,
+      "grad_norm": 1.7851749658584595,
+      "learning_rate": 3.768577494692145e-05,
+      "loss": 0.9242,
+      "step": 355
+    },
+    {
+      "epoch": 2.2793650793650793,
+      "grad_norm": 1.4341535568237305,
+      "learning_rate": 3.821656050955414e-05,
+      "loss": 0.8221,
+      "step": 360
+    },
+    {
+      "epoch": 2.311111111111111,
+      "grad_norm": 1.39107346534729,
+      "learning_rate": 3.874734607218684e-05,
+      "loss": 0.6999,
+      "step": 365
+    },
+    {
+      "epoch": 2.342857142857143,
+      "grad_norm": 1.2304264307022095,
+      "learning_rate": 3.927813163481954e-05,
+      "loss": 0.8362,
+      "step": 370
+    },
+    {
+      "epoch": 2.3746031746031746,
+      "grad_norm": 1.8470840454101562,
+      "learning_rate": 3.9808917197452234e-05,
+      "loss": 0.9398,
+      "step": 375
+    },
+    {
+      "epoch": 2.4063492063492062,
+      "grad_norm": 1.2533882856369019,
+      "learning_rate": 4.033970276008493e-05,
+      "loss": 0.7754,
+      "step": 380
+    },
+    {
+      "epoch": 2.4380952380952383,
+      "grad_norm": 1.5335006713867188,
+      "learning_rate": 4.087048832271762e-05,
+      "loss": 1.1124,
+      "step": 385
+    },
+    {
+      "epoch": 2.46984126984127,
+      "grad_norm": 1.5298357009887695,
+      "learning_rate": 4.1401273885350325e-05,
+      "loss": 1.017,
+      "step": 390
+    },
+    {
+      "epoch": 2.5015873015873016,
+      "grad_norm": 1.4403260946273804,
+      "learning_rate": 4.193205944798302e-05,
+      "loss": 0.8831,
+      "step": 395
+    },
+    {
+      "epoch": 2.533333333333333,
+      "grad_norm": 1.1528433561325073,
+      "learning_rate": 4.2462845010615714e-05,
+      "loss": 0.801,
+      "step": 400
+    },
+    {
+      "epoch": 2.565079365079365,
+      "grad_norm": 1.3371326923370361,
+      "learning_rate": 4.299363057324841e-05,
+      "loss": 0.8692,
+      "step": 405
+    },
+    {
+      "epoch": 2.596825396825397,
+      "grad_norm": 1.4064775705337524,
+      "learning_rate": 4.352441613588111e-05,
+      "loss": 0.9059,
+      "step": 410
+    },
+    {
+      "epoch": 2.6285714285714286,
+      "grad_norm": 1.4531422853469849,
+      "learning_rate": 4.4055201698513805e-05,
+      "loss": 0.7344,
+      "step": 415
+    },
+    {
+      "epoch": 2.66031746031746,
+      "grad_norm": 1.7043890953063965,
+      "learning_rate": 4.45859872611465e-05,
+      "loss": 0.8298,
+      "step": 420
+    },
+    {
+      "epoch": 2.6920634920634923,
+      "grad_norm": 1.5105586051940918,
+      "learning_rate": 4.5116772823779194e-05,
+      "loss": 0.7768,
+      "step": 425
+    },
+    {
+      "epoch": 2.723809523809524,
+      "grad_norm": 1.8101528882980347,
+      "learning_rate": 4.5647558386411895e-05,
+      "loss": 0.733,
+      "step": 430
+    },
+    {
+      "epoch": 2.7555555555555555,
+      "grad_norm": 1.6365174055099487,
+      "learning_rate": 4.617834394904459e-05,
+      "loss": 0.8061,
+      "step": 435
+    },
+    {
+      "epoch": 2.787301587301587,
+      "grad_norm": 1.7808202505111694,
+      "learning_rate": 4.6709129511677285e-05,
+      "loss": 0.8333,
+      "step": 440
+    },
+    {
+      "epoch": 2.819047619047619,
+      "grad_norm": 1.5223265886306763,
+      "learning_rate": 4.723991507430998e-05,
+      "loss": 0.7557,
+      "step": 445
+    },
+    {
+      "epoch": 2.850793650793651,
+      "grad_norm": 1.3064416646957397,
+      "learning_rate": 4.777070063694268e-05,
+      "loss": 0.8041,
+      "step": 450
+    },
+    {
+      "epoch": 2.8825396825396825,
+      "grad_norm": 1.8025637865066528,
+      "learning_rate": 4.8301486199575375e-05,
+      "loss": 0.9534,
+      "step": 455
+    },
+    {
+      "epoch": 2.914285714285714,
+      "grad_norm": 1.924846887588501,
+      "learning_rate": 4.883227176220807e-05,
+      "loss": 0.9066,
+      "step": 460
+    },
+    {
+      "epoch": 2.9460317460317462,
+      "grad_norm": 1.9862899780273438,
+      "learning_rate": 4.9363057324840765e-05,
+      "loss": 0.7994,
+      "step": 465
+    },
+    {
+      "epoch": 2.977777777777778,
+      "grad_norm": 1.9615916013717651,
+      "learning_rate": 4.9893842887473466e-05,
+      "loss": 0.7045,
+      "step": 470
+    },
+    {
+      "epoch": 3.0063492063492063,
+      "grad_norm": 1.519852876663208,
+      "learning_rate": 4.999989014936042e-05,
+      "loss": 0.7212,
+      "step": 475
+    },
+    {
+      "epoch": 3.038095238095238,
+      "grad_norm": 1.9328887462615967,
+      "learning_rate": 4.999944388279162e-05,
+      "loss": 0.6598,
+      "step": 480
+    },
+    {
+      "epoch": 3.06984126984127,
+      "grad_norm": 2.0340709686279297,
+      "learning_rate": 4.999865434075176e-05,
+      "loss": 0.6829,
+      "step": 485
+    },
+    {
+      "epoch": 3.1015873015873017,
+      "grad_norm": 1.8775280714035034,
+      "learning_rate": 4.999752153408229e-05,
+      "loss": 0.6664,
+      "step": 490
+    },
+    {
+      "epoch": 3.1333333333333333,
+      "grad_norm": 2.385218381881714,
+      "learning_rate": 4.999604547833814e-05,
+      "loss": 0.6836,
+      "step": 495
+    },
+    {
+      "epoch": 3.165079365079365,
+      "grad_norm": 2.1743783950805664,
+      "learning_rate": 4.999422619378752e-05,
+      "loss": 0.7,
+      "step": 500
+    },
+    {
+      "epoch": 3.196825396825397,
+      "grad_norm": 2.20786452293396,
+      "learning_rate": 4.999206370541162e-05,
+      "loss": 0.7253,
+      "step": 505
+    },
+    {
+      "epoch": 3.2285714285714286,
+      "grad_norm": 1.8182263374328613,
+      "learning_rate": 4.998955804290425e-05,
+      "loss": 0.6824,
+      "step": 510
+    },
+    {
+      "epoch": 3.2603174603174603,
+      "grad_norm": 2.2959372997283936,
+      "learning_rate": 4.9986709240671495e-05,
+      "loss": 0.601,
+      "step": 515
+    },
+    {
+      "epoch": 3.292063492063492,
+      "grad_norm": 2.385838031768799,
+      "learning_rate": 4.998351733783116e-05,
+      "loss": 0.7417,
+      "step": 520
+    },
+    {
+      "epoch": 3.323809523809524,
+      "grad_norm": 2.0416879653930664,
+      "learning_rate": 4.997998237821233e-05,
+      "loss": 0.6463,
+      "step": 525
+    },
+    {
+      "epoch": 3.3555555555555556,
+      "grad_norm": 2.2781031131744385,
+      "learning_rate": 4.9976104410354654e-05,
+      "loss": 0.6998,
+      "step": 530
+    },
+    {
+      "epoch": 3.3873015873015873,
+      "grad_norm": 2.146778106689453,
+      "learning_rate": 4.9971883487507775e-05,
+      "loss": 0.7694,
+      "step": 535
+    },
+    {
+      "epoch": 3.419047619047619,
+      "grad_norm": 2.1369104385375977,
+      "learning_rate": 4.9967319667630567e-05,
+      "loss": 0.6615,
+      "step": 540
+    },
+    {
+      "epoch": 3.450793650793651,
+      "grad_norm": 2.4529733657836914,
+      "learning_rate": 4.996241301339029e-05,
+      "loss": 0.6109,
+      "step": 545
+    },
+    {
+      "epoch": 3.4825396825396826,
+      "grad_norm": 2.07030987739563,
+      "learning_rate": 4.995716359216183e-05,
+      "loss": 0.7611,
+      "step": 550
+    },
+    {
+      "epoch": 3.5142857142857142,
+      "grad_norm": 2.4329919815063477,
+      "learning_rate": 4.995157147602669e-05,
+      "loss": 0.7515,
+      "step": 555
+    },
+    {
+      "epoch": 3.546031746031746,
+      "grad_norm": 2.056351900100708,
+      "learning_rate": 4.994563674177202e-05,
+      "loss": 0.6885,
+      "step": 560
+    },
+    {
+      "epoch": 3.5777777777777775,
+      "grad_norm": 2.3665318489074707,
+      "learning_rate": 4.993935947088958e-05,
+      "loss": 0.6271,
+      "step": 565
+    },
+    {
+      "epoch": 3.6095238095238096,
+      "grad_norm": 2.677706480026245,
+      "learning_rate": 4.993273974957463e-05,
+      "loss": 0.5586,
+      "step": 570
+    },
+    {
+      "epoch": 3.641269841269841,
+      "grad_norm": 3.422136068344116,
+      "learning_rate": 4.9925777668724685e-05,
+      "loss": 0.7552,
+      "step": 575
+    },
+    {
+      "epoch": 3.6730158730158733,
+      "grad_norm": 2.4525184631347656,
+      "learning_rate": 4.991847332393835e-05,
+      "loss": 0.7367,
+      "step": 580
+    },
+    {
+      "epoch": 3.704761904761905,
+      "grad_norm": 2.4242067337036133,
+      "learning_rate": 4.991082681551396e-05,
+      "loss": 0.7044,
+      "step": 585
+    },
+    {
+      "epoch": 3.7365079365079366,
+      "grad_norm": 1.8419867753982544,
+      "learning_rate": 4.9902838248448184e-05,
+      "loss": 0.5966,
+      "step": 590
+    },
+    {
+      "epoch": 3.768253968253968,
+      "grad_norm": 2.1394360065460205,
+      "learning_rate": 4.989450773243463e-05,
+      "loss": 0.6736,
+      "step": 595
+    },
+    {
+      "epoch": 3.8,
+      "grad_norm": 1.285447597503662,
+      "learning_rate": 4.9885835381862326e-05,
+      "loss": 0.5021,
+      "step": 600
+    },
+    {
+      "epoch": 3.831746031746032,
+      "grad_norm": 2.724978446960449,
+      "learning_rate": 4.987682131581413e-05,
+      "loss": 0.6128,
+      "step": 605
+    },
+    {
+      "epoch": 3.8634920634920635,
+      "grad_norm": 2.239682912826538,
+      "learning_rate": 4.986746565806508e-05,
+      "loss": 0.5457,
+      "step": 610
+    },
+    {
+      "epoch": 3.895238095238095,
+      "grad_norm": 2.48944091796875,
+      "learning_rate": 4.9857768537080784e-05,
+      "loss": 0.6927,
+      "step": 615
+    },
+    {
+      "epoch": 3.9269841269841272,
+      "grad_norm": 2.4086852073669434,
+      "learning_rate": 4.9847730086015534e-05,
+      "loss": 0.5963,
+      "step": 620
+    },
+    {
+      "epoch": 3.958730158730159,
+      "grad_norm": 2.0070106983184814,
+      "learning_rate": 4.9837350442710553e-05,
+      "loss": 0.5856,
+      "step": 625
+    },
+    {
+      "epoch": 3.9904761904761905,
+      "grad_norm": 1.9726545810699463,
+      "learning_rate": 4.98266297496921e-05,
+      "loss": 0.6208,
+      "step": 630
+    },
+    {
+      "epoch": 4.019047619047619,
+      "grad_norm": 2.6137828826904297,
+      "learning_rate": 4.981556815416948e-05,
+      "loss": 0.6319,
+      "step": 635
+    },
+    {
+      "epoch": 4.050793650793651,
+      "grad_norm": 2.3489890098571777,
+      "learning_rate": 4.9804165808033054e-05,
+      "loss": 0.5887,
+      "step": 640
+    },
+    {
+      "epoch": 4.082539682539682,
+      "grad_norm": 2.8010590076446533,
+      "learning_rate": 4.979242286785214e-05,
+      "loss": 0.5257,
+      "step": 645
+    },
+    {
+      "epoch": 4.114285714285714,
+      "grad_norm": 2.993411064147949,
+      "learning_rate": 4.978033949487284e-05,
+      "loss": 0.4545,
+      "step": 650
+    },
+    {
+      "epoch": 4.146031746031746,
+      "grad_norm": 2.669935703277588,
+      "learning_rate": 4.976791585501588e-05,
+      "loss": 0.5989,
+      "step": 655
+    },
+    {
+      "epoch": 4.177777777777778,
+      "grad_norm": 3.084409236907959,
+      "learning_rate": 4.9755152118874294e-05,
+      "loss": 0.528,
+      "step": 660
+    },
+    {
+      "epoch": 4.20952380952381,
+      "grad_norm": 2.797873020172119,
+      "learning_rate": 4.974204846171106e-05,
+      "loss": 0.5249,
+      "step": 665
+    },
+    {
+      "epoch": 4.241269841269841,
+      "grad_norm": 3.667867422103882,
+      "learning_rate": 4.9728605063456765e-05,
+      "loss": 0.5838,
+      "step": 670
+    },
+    {
+      "epoch": 4.273015873015873,
+      "grad_norm": 2.6918869018554688,
+      "learning_rate": 4.971482210870706e-05,
+      "loss": 0.5143,
+      "step": 675
+    },
+    {
+      "epoch": 4.304761904761905,
+      "grad_norm": 2.1545379161834717,
+      "learning_rate": 4.970069978672017e-05,
+      "loss": 0.5317,
+      "step": 680
+    },
+    {
+      "epoch": 4.336507936507936,
+      "grad_norm": 2.1043529510498047,
+      "learning_rate": 4.9686238291414275e-05,
+      "loss": 0.4815,
+      "step": 685
+    },
+    {
+      "epoch": 4.368253968253969,
+      "grad_norm": 2.1359753608703613,
+      "learning_rate": 4.9671437821364855e-05,
+      "loss": 0.4935,
+      "step": 690
+    },
+    {
+      "epoch": 4.4,
+      "grad_norm": 3.092057228088379,
+      "learning_rate": 4.965629857980197e-05,
+      "loss": 0.6831,
+      "step": 695
+    },
+    {
+      "epoch": 4.431746031746032,
+      "grad_norm": 2.5296835899353027,
+      "learning_rate": 4.964082077460745e-05,
+      "loss": 0.5323,
+      "step": 700
+    },
+    {
+      "epoch": 4.463492063492064,
+      "grad_norm": 1.6655627489089966,
+      "learning_rate": 4.962500461831207e-05,
+      "loss": 0.4553,
+      "step": 705
+    },
+    {
+      "epoch": 4.495238095238095,
+      "grad_norm": 2.6663475036621094,
+      "learning_rate": 4.9608850328092576e-05,
+      "loss": 0.463,
+      "step": 710
+    },
+    {
+      "epoch": 4.526984126984127,
+      "grad_norm": 2.3763060569763184,
+      "learning_rate": 4.959235812576879e-05,
+      "loss": 0.4861,
+      "step": 715
+    },
+    {
+      "epoch": 4.5587301587301585,
+      "grad_norm": 2.2217962741851807,
+      "learning_rate": 4.957552823780047e-05,
+      "loss": 0.468,
+      "step": 720
+    },
+    {
+      "epoch": 4.59047619047619,
+      "grad_norm": 2.8885600566864014,
+      "learning_rate": 4.9558360895284295e-05,
+      "loss": 0.4588,
+      "step": 725
+    },
+    {
+      "epoch": 4.622222222222222,
+      "grad_norm": 2.5661261081695557,
+      "learning_rate": 4.954085633395058e-05,
+      "loss": 0.4926,
+      "step": 730
+    },
+    {
+      "epoch": 4.653968253968254,
+      "grad_norm": 2.304365396499634,
+      "learning_rate": 4.952301479416015e-05,
+      "loss": 0.494,
+      "step": 735
+    },
+    {
+      "epoch": 4.685714285714286,
+      "grad_norm": 2.690577983856201,
+      "learning_rate": 4.9504836520900976e-05,
+      "loss": 0.5814,
+      "step": 740
+    },
+    {
+      "epoch": 4.717460317460318,
+      "grad_norm": 2.7180025577545166,
+      "learning_rate": 4.948632176378481e-05,
+      "loss": 0.5329,
+      "step": 745
+    },
+    {
+      "epoch": 4.749206349206349,
+      "grad_norm": 2.716587543487549,
+      "learning_rate": 4.9467470777043806e-05,
+      "loss": 0.5264,
+      "step": 750
+    },
+    {
+      "epoch": 4.780952380952381,
+      "grad_norm": 2.315419912338257,
+      "learning_rate": 4.9448283819526954e-05,
+      "loss": 0.4756,
+      "step": 755
+    },
+    {
+      "epoch": 4.8126984126984125,
+      "grad_norm": 2.1679515838623047,
+      "learning_rate": 4.9428761154696605e-05,
+      "loss": 0.4819,
+      "step": 760
+    },
+    {
+      "epoch": 4.844444444444444,
+      "grad_norm": 3.389266014099121,
+      "learning_rate": 4.9408903050624796e-05,
+      "loss": 0.5121,
+      "step": 765
+    },
+    {
+      "epoch": 4.876190476190477,
+      "grad_norm": 3.4317383766174316,
+      "learning_rate": 4.938870977998959e-05,
+      "loss": 0.4535,
+      "step": 770
+    },
+    {
+      "epoch": 4.907936507936508,
+      "grad_norm": 2.9491918087005615,
+      "learning_rate": 4.9368181620071344e-05,
+      "loss": 0.5333,
+      "step": 775
+    },
+    {
+      "epoch": 4.93968253968254,
+      "grad_norm": 2.516798496246338,
+      "learning_rate": 4.934731885274887e-05,
+      "loss": 0.5367,
+      "step": 780
+    },
+    {
+      "epoch": 4.9714285714285715,
+      "grad_norm": 3.0031046867370605,
+      "learning_rate": 4.9326121764495596e-05,
+      "loss": 0.4957,
+      "step": 785
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 3.334085702896118,
+      "learning_rate": 4.9304590646375614e-05,
+      "loss": 0.5287,
+      "step": 790
+    },
+    {
+      "epoch": 5.031746031746032,
+      "grad_norm": 1.9608453512191772,
+      "learning_rate": 4.928272579403969e-05,
+      "loss": 0.36,
+      "step": 795
+    },
+    {
+      "epoch": 5.063492063492063,
+      "grad_norm": 2.328850746154785,
+      "learning_rate": 4.92605275077212e-05,
+      "loss": 0.3628,
+      "step": 800
+    },
+    {
+      "epoch": 5.095238095238095,
+      "grad_norm": 2.3446412086486816,
+      "learning_rate": 4.923799609223202e-05,
+      "loss": 0.3327,
+      "step": 805
+    },
+    {
+      "epoch": 5.1269841269841265,
+      "grad_norm": 2.476181745529175,
+      "learning_rate": 4.921513185695831e-05,
+      "loss": 0.4246,
+      "step": 810
+    },
+    {
+      "epoch": 5.158730158730159,
+      "grad_norm": 3.1026763916015625,
+      "learning_rate": 4.91919351158563e-05,
+      "loss": 0.5048,
+      "step": 815
+    },
+    {
+      "epoch": 5.190476190476191,
+      "grad_norm": 2.8165297508239746,
+      "learning_rate": 4.916840618744798e-05,
+      "loss": 0.4361,
+      "step": 820
+    },
+    {
+      "epoch": 5.222222222222222,
+      "grad_norm": 1.8732138872146606,
+      "learning_rate": 4.9144545394816687e-05,
+      "loss": 0.4693,
+      "step": 825
+    },
+    {
+      "epoch": 5.253968253968254,
+      "grad_norm": 1.7250264883041382,
+      "learning_rate": 4.91203530656027e-05,
+      "loss": 0.4076,
+      "step": 830
+    },
+    {
+      "epoch": 5.285714285714286,
+      "grad_norm": 2.105459690093994,
+      "learning_rate": 4.9095829531998725e-05,
+      "loss": 0.3589,
+      "step": 835
+    },
+    {
+      "epoch": 5.317460317460317,
+      "grad_norm": 3.6825687885284424,
+      "learning_rate": 4.9070975130745387e-05,
+      "loss": 0.5263,
+      "step": 840
+    },
+    {
+      "epoch": 5.349206349206349,
+      "grad_norm": 2.947052001953125,
+      "learning_rate": 4.90457902031265e-05,
+      "loss": 0.4632,
+      "step": 845
+    },
+    {
+      "epoch": 5.380952380952381,
+      "grad_norm": 1.9546104669570923,
+      "learning_rate": 4.902027509496448e-05,
+      "loss": 0.4348,
+      "step": 850
+    },
+    {
+      "epoch": 5.412698412698413,
+      "grad_norm": 2.4471983909606934,
+      "learning_rate": 4.899443015661557e-05,
+      "loss": 0.4209,
+      "step": 855
+    },
+    {
+      "epoch": 5.444444444444445,
+      "grad_norm": 1.827124834060669,
+      "learning_rate": 4.8968255742964975e-05,
+      "loss": 0.413,
+      "step": 860
+    },
+    {
+      "epoch": 5.476190476190476,
+      "grad_norm": 2.654707431793213,
+      "learning_rate": 4.894175221342207e-05,
+      "loss": 0.432,
+      "step": 865
+    },
+    {
+      "epoch": 5.507936507936508,
+      "grad_norm": 2.648967981338501,
+      "learning_rate": 4.8914919931915407e-05,
+      "loss": 0.4339,
+      "step": 870
+    },
+    {
+      "epoch": 5.5396825396825395,
+      "grad_norm": 2.874075412750244,
+      "learning_rate": 4.888775926688775e-05,
+      "loss": 0.4392,
+      "step": 875
+    },
+    {
+      "epoch": 5.571428571428571,
+      "grad_norm": 2.9674830436706543,
+      "learning_rate": 4.8860270591291e-05,
+      "loss": 0.4459,
+      "step": 880
+    },
+    {
+      "epoch": 5.603174603174603,
+      "grad_norm": 2.054748296737671,
+      "learning_rate": 4.883245428258107e-05,
+      "loss": 0.4313,
+      "step": 885
+    },
+    {
+      "epoch": 5.634920634920634,
+      "grad_norm": 1.9174392223358154,
+      "learning_rate": 4.880431072271272e-05,
+      "loss": 0.3906,
+      "step": 890
+    },
+    {
+      "epoch": 5.666666666666667,
+      "grad_norm": 2.5257787704467773,
+      "learning_rate": 4.87758402981343e-05,
+      "loss": 0.4219,
+      "step": 895
+    },
+    {
+      "epoch": 5.698412698412699,
+      "grad_norm": 2.6365532875061035,
+      "learning_rate": 4.8747043399782424e-05,
+      "loss": 0.3978,
+      "step": 900
+    },
+    {
+      "epoch": 5.73015873015873,
+      "grad_norm": 2.0583746433258057,
+      "learning_rate": 4.871792042307667e-05,
+      "loss": 0.4847,
+      "step": 905
+    },
+    {
+      "epoch": 5.761904761904762,
+      "grad_norm": 2.035872459411621,
+      "learning_rate": 4.868847176791406e-05,
+      "loss": 0.4675,
+      "step": 910
+    },
+    {
+      "epoch": 5.7936507936507935,
+      "grad_norm": 2.3722939491271973,
+      "learning_rate": 4.8658697838663625e-05,
+      "loss": 0.4586,
+      "step": 915
+    },
+    {
+      "epoch": 5.825396825396825,
+      "grad_norm": 1.2609732151031494,
+      "learning_rate": 4.862859904416085e-05,
+      "loss": 0.3274,
+      "step": 920
+    },
+    {
+      "epoch": 5.857142857142857,
+      "grad_norm": 2.3673977851867676,
+      "learning_rate": 4.8598175797702036e-05,
+      "loss": 0.4685,
+      "step": 925
+    },
+    {
+      "epoch": 5.888888888888889,
+      "grad_norm": 2.8414175510406494,
+      "learning_rate": 4.856742851703866e-05,
+      "loss": 0.4762,
+      "step": 930
+    },
+    {
+      "epoch": 5.920634920634921,
+      "grad_norm": 2.4126765727996826,
+      "learning_rate": 4.853635762437159e-05,
+      "loss": 0.4075,
+      "step": 935
+    },
+    {
+      "epoch": 5.9523809523809526,
+      "grad_norm": 1.8691045045852661,
+      "learning_rate": 4.8504963546345334e-05,
+      "loss": 0.4865,
+      "step": 940
+    },
+    {
+      "epoch": 5.984126984126984,
+      "grad_norm": 3.5297420024871826,
+      "learning_rate": 4.8473246714042155e-05,
+      "loss": 0.4623,
+      "step": 945
+    },
+    {
+      "epoch": 6.012698412698413,
+      "grad_norm": 2.059169054031372,
+      "learning_rate": 4.844120756297617e-05,
+      "loss": 0.4164,
+      "step": 950
+    },
+    {
+      "epoch": 6.044444444444444,
+      "grad_norm": 2.4746127128601074,
+      "learning_rate": 4.840884653308735e-05,
+      "loss": 0.3552,
+      "step": 955
+    },
+    {
+      "epoch": 6.076190476190476,
+      "grad_norm": 2.504425287246704,
+      "learning_rate": 4.8376164068735485e-05,
+      "loss": 0.3368,
+      "step": 960
+    },
+    {
+      "epoch": 6.1079365079365076,
+      "grad_norm": 2.062577486038208,
+      "learning_rate": 4.83431606186941e-05,
+      "loss": 0.3139,
+      "step": 965
+    },
+    {
+      "epoch": 6.13968253968254,
+      "grad_norm": 2.4934544563293457,
+      "learning_rate": 4.830983663614427e-05,
+      "loss": 0.3777,
+      "step": 970
+    },
+    {
+      "epoch": 6.171428571428572,
+      "grad_norm": 2.5747485160827637,
+      "learning_rate": 4.827619257866839e-05,
+      "loss": 0.373,
+      "step": 975
+    },
+    {
+      "epoch": 6.203174603174603,
+      "grad_norm": 2.449357271194458,
+      "learning_rate": 4.8242228908243946e-05,
+      "loss": 0.3936,
+      "step": 980
+    },
+    {
+      "epoch": 6.234920634920635,
+      "grad_norm": 2.952680826187134,
+      "learning_rate": 4.82079460912371e-05,
+      "loss": 0.407,
+      "step": 985
+    },
+    {
+      "epoch": 6.266666666666667,
+      "grad_norm": 2.1754496097564697,
+      "learning_rate": 4.817334459839633e-05,
+      "loss": 0.3189,
+      "step": 990
+    },
+    {
+      "epoch": 6.298412698412698,
+      "grad_norm": 2.8406214714050293,
+      "learning_rate": 4.8138424904845947e-05,
+      "loss": 0.3883,
+      "step": 995
+    },
+    {
+      "epoch": 6.33015873015873,
+      "grad_norm": 1.7533257007598877,
+      "learning_rate": 4.8103187490079604e-05,
+      "loss": 0.3131,
+      "step": 1000
+    },
+    {
+      "epoch": 6.3619047619047615,
+      "grad_norm": 2.4574601650238037,
+      "learning_rate": 4.806763283795366e-05,
+      "loss": 0.3606,
+      "step": 1005
+    },
+    {
+      "epoch": 6.393650793650794,
+      "grad_norm": 2.002281427383423,
+      "learning_rate": 4.8031761436680575e-05,
+      "loss": 0.37,
+      "step": 1010
+    },
+    {
+      "epoch": 6.425396825396826,
+      "grad_norm": 2.823315143585205,
+      "learning_rate": 4.79955737788222e-05,
+      "loss": 0.3791,
+      "step": 1015
+    },
+    {
+      "epoch": 6.457142857142857,
+      "grad_norm": 2.7891204357147217,
+      "learning_rate": 4.795907036128299e-05,
+      "loss": 0.3556,
+      "step": 1020
+    },
+    {
+      "epoch": 6.488888888888889,
+      "grad_norm": 2.2387146949768066,
+      "learning_rate": 4.7922251685303213e-05,
+      "loss": 0.3929,
+      "step": 1025
+    },
+    {
+      "epoch": 6.520634920634921,
+      "grad_norm": 2.5023891925811768,
+      "learning_rate": 4.788511825645205e-05,
+      "loss": 0.379,
+      "step": 1030
+    },
+    {
+      "epoch": 6.552380952380952,
+      "grad_norm": 2.2654805183410645,
+      "learning_rate": 4.7847670584620653e-05,
+      "loss": 0.3435,
+      "step": 1035
+    },
+    {
+      "epoch": 6.584126984126984,
+      "grad_norm": 3.3823065757751465,
+      "learning_rate": 4.7809909184015146e-05,
+      "loss": 0.4109,
+      "step": 1040
+    },
+    {
+      "epoch": 6.6158730158730155,
+      "grad_norm": 2.6096551418304443,
+      "learning_rate": 4.7771834573149576e-05,
+      "loss": 0.4233,
+      "step": 1045
+    },
+    {
+      "epoch": 6.647619047619048,
+      "grad_norm": 2.3933897018432617,
+      "learning_rate": 4.773344727483876e-05,
+      "loss": 0.3709,
+      "step": 1050
+    },
+    {
+      "epoch": 6.67936507936508,
+      "grad_norm": 2.189544916152954,
+      "learning_rate": 4.769474781619114e-05,
+      "loss": 0.3287,
+      "step": 1055
+    },
+    {
+      "epoch": 6.711111111111111,
+      "grad_norm": 2.450892686843872,
+      "learning_rate": 4.765573672860154e-05,
+      "loss": 0.4022,
+      "step": 1060
+    },
+    {
+      "epoch": 6.742857142857143,
+      "grad_norm": 2.4342429637908936,
+      "learning_rate": 4.761641454774386e-05,
+      "loss": 0.4029,
+      "step": 1065
+    },
+    {
+      "epoch": 6.7746031746031745,
+      "grad_norm": 2.2122364044189453,
+      "learning_rate": 4.75767818135637e-05,
+      "loss": 0.3322,
+      "step": 1070
+    },
+    {
+      "epoch": 6.806349206349206,
+      "grad_norm": 3.968445301055908,
+      "learning_rate": 4.7536839070271e-05,
+      "loss": 0.3836,
+      "step": 1075
+    },
+    {
+      "epoch": 6.838095238095238,
+      "grad_norm": 3.529158353805542,
+      "learning_rate": 4.749658686633251e-05,
+      "loss": 0.4745,
+      "step": 1080
+    },
+    {
+      "epoch": 6.86984126984127,
+      "grad_norm": 2.430727243423462,
+      "learning_rate": 4.7456025754464304e-05,
+      "loss": 0.3664,
+      "step": 1085
+    },
+    {
+      "epoch": 6.901587301587302,
+      "grad_norm": 2.6552302837371826,
+      "learning_rate": 4.7415156291624166e-05,
+      "loss": 0.4359,
+      "step": 1090
+    },
+    {
+      "epoch": 6.933333333333334,
+      "grad_norm": 2.134822130203247,
+      "learning_rate": 4.737397903900393e-05,
+      "loss": 0.3969,
+      "step": 1095
+    },
+    {
+      "epoch": 6.965079365079365,
+      "grad_norm": 2.5052947998046875,
+      "learning_rate": 4.7332494562021815e-05,
+      "loss": 0.4069,
+      "step": 1100
+    },
+    {
+      "epoch": 6.996825396825397,
+      "grad_norm": 2.1377065181732178,
+      "learning_rate": 4.729070343031463e-05,
+      "loss": 0.3853,
+      "step": 1105
+    },
+    {
+      "epoch": 7.025396825396825,
+      "grad_norm": 1.9704042673110962,
+      "learning_rate": 4.724860621772995e-05,
+      "loss": 0.3283,
+      "step": 1110
+    },
+    {
+      "epoch": 7.057142857142857,
+      "grad_norm": 2.476968765258789,
+      "learning_rate": 4.7206203502318256e-05,
+      "loss": 0.3325,
+      "step": 1115
+    },
+    {
+      "epoch": 7.088888888888889,
+      "grad_norm": 1.9231969118118286,
+      "learning_rate": 4.716349586632499e-05,
+      "loss": 0.2876,
+      "step": 1120
+    },
+    {
+      "epoch": 7.12063492063492,
+      "grad_norm": 2.6444814205169678,
+      "learning_rate": 4.712048389618254e-05,
+      "loss": 0.3005,
+      "step": 1125
+    },
+    {
+      "epoch": 7.152380952380953,
+      "grad_norm": 3.2589964866638184,
+      "learning_rate": 4.7077168182502216e-05,
+      "loss": 0.4023,
+      "step": 1130
+    },
+    {
+      "epoch": 7.184126984126984,
+      "grad_norm": 2.5481936931610107,
+      "learning_rate": 4.703354932006615e-05,
+      "loss": 0.3302,
+      "step": 1135
+    },
+    {
+      "epoch": 7.215873015873016,
+      "grad_norm": 1.7125908136367798,
+      "learning_rate": 4.698962790781906e-05,
+      "loss": 0.3329,
+      "step": 1140
+    },
+    {
+      "epoch": 7.247619047619048,
+      "grad_norm": 2.2756667137145996,
+      "learning_rate": 4.6945404548860115e-05,
+      "loss": 0.3369,
+      "step": 1145
+    },
+    {
+      "epoch": 7.279365079365079,
+      "grad_norm": 2.9158453941345215,
+      "learning_rate": 4.6900879850434604e-05,
+      "loss": 0.3339,
+      "step": 1150
+    },
+    {
+      "epoch": 7.311111111111111,
+      "grad_norm": 2.3047537803649902,
+      "learning_rate": 4.685605442392559e-05,
+      "loss": 0.3915,
+      "step": 1155
+    },
+    {
+      "epoch": 7.3428571428571425,
+      "grad_norm": 2.7815029621124268,
+      "learning_rate": 4.681092888484554e-05,
+      "loss": 0.3317,
+      "step": 1160
+    },
+    {
+      "epoch": 7.374603174603175,
+      "grad_norm": 2.2644097805023193,
+      "learning_rate": 4.676550385282787e-05,
+      "loss": 0.3314,
+      "step": 1165
+    },
+    {
+      "epoch": 7.406349206349207,
+      "grad_norm": 2.5144474506378174,
+      "learning_rate": 4.671977995161843e-05,
+      "loss": 0.3188,
+      "step": 1170
+    },
+    {
+      "epoch": 7.438095238095238,
+      "grad_norm": 3.120821714401245,
+      "learning_rate": 4.667375780906693e-05,
+      "loss": 0.3523,
+      "step": 1175
+    },
+    {
+      "epoch": 7.46984126984127,
+      "grad_norm": 4.47842264175415,
+      "learning_rate": 4.662743805711832e-05,
+      "loss": 0.3611,
+      "step": 1180
+    },
+    {
+      "epoch": 7.501587301587302,
+      "grad_norm": 1.9228928089141846,
+      "learning_rate": 4.658082133180416e-05,
+      "loss": 0.3612,
+      "step": 1185
+    },
+    {
+      "epoch": 7.533333333333333,
+      "grad_norm": 2.1507537364959717,
+      "learning_rate": 4.6533908273233815e-05,
+      "loss": 0.3321,
+      "step": 1190
+    },
+    {
+      "epoch": 7.565079365079365,
+      "grad_norm": 2.1849119663238525,
+      "learning_rate": 4.64866995255857e-05,
+      "loss": 0.2943,
+      "step": 1195
+    },
+    {
+      "epoch": 7.5968253968253965,
+      "grad_norm": 2.1777775287628174,
+      "learning_rate": 4.643919573709843e-05,
+      "loss": 0.353,
+      "step": 1200
+    },
+    {
+      "epoch": 7.628571428571428,
+      "grad_norm": 2.5231118202209473,
+      "learning_rate": 4.639139756006195e-05,
+      "loss": 0.3571,
+      "step": 1205
+    },
+    {
+      "epoch": 7.660317460317461,
+      "grad_norm": 1.8409479856491089,
+      "learning_rate": 4.6343305650808516e-05,
+      "loss": 0.3691,
+      "step": 1210
+    },
+    {
+      "epoch": 7.692063492063492,
+      "grad_norm": 1.7940895557403564,
+      "learning_rate": 4.629492066970373e-05,
+      "loss": 0.3738,
+      "step": 1215
+    },
+    {
+      "epoch": 7.723809523809524,
+      "grad_norm": 2.014902114868164,
+      "learning_rate": 4.6246243281137474e-05,
+      "loss": 0.361,
+      "step": 1220
+    },
+    {
+      "epoch": 7.7555555555555555,
+      "grad_norm": 3.4182560443878174,
+      "learning_rate": 4.6197274153514735e-05,
+      "loss": 0.3663,
+      "step": 1225
+    },
+    {
+      "epoch": 7.787301587301587,
+      "grad_norm": 2.518728256225586,
+      "learning_rate": 4.614801395924649e-05,
+      "loss": 0.3646,
+      "step": 1230
+    },
+    {
+      "epoch": 7.819047619047619,
+      "grad_norm": 2.154189109802246,
+      "learning_rate": 4.6098463374740466e-05,
+      "loss": 0.3331,
+      "step": 1235
+    },
+    {
+      "epoch": 7.85079365079365,
+      "grad_norm": 2.536081075668335,
+      "learning_rate": 4.604862308039177e-05,
+      "loss": 0.3742,
+      "step": 1240
+    },
+    {
+      "epoch": 7.882539682539683,
+      "grad_norm": 2.340764045715332,
+      "learning_rate": 4.599849376057366e-05,
+      "loss": 0.3352,
+      "step": 1245
+    },
+    {
+      "epoch": 7.914285714285715,
+      "grad_norm": 3.5488364696502686,
+      "learning_rate": 4.5948076103628094e-05,
+      "loss": 0.3663,
+      "step": 1250
+    },
+    {
+      "epoch": 7.946031746031746,
+      "grad_norm": 2.779360294342041,
+      "learning_rate": 4.589737080185625e-05,
+      "loss": 0.3362,
+      "step": 1255
+    },
+    {
+      "epoch": 7.977777777777778,
+      "grad_norm": 1.8792667388916016,
+      "learning_rate": 4.5846378551509097e-05,
+      "loss": 0.346,
+      "step": 1260
+    },
+    {
+      "epoch": 8.006349206349206,
+      "grad_norm": 2.453295946121216,
+      "learning_rate": 4.579510005277774e-05,
+      "loss": 0.3509,
+      "step": 1265
+    },
+    {
+      "epoch": 8.038095238095238,
+      "grad_norm": 1.9493130445480347,
+      "learning_rate": 4.574353600978388e-05,
+      "loss": 0.3062,
+      "step": 1270
+    },
+    {
+      "epoch": 8.06984126984127,
+      "grad_norm": 1.9360930919647217,
+      "learning_rate": 4.56916871305701e-05,
+      "loss": 0.3056,
+      "step": 1275
+    },
+    {
+      "epoch": 8.101587301587301,
+      "grad_norm": 1.5592070817947388,
+      "learning_rate": 4.563955412709021e-05,
+      "loss": 0.2785,
+      "step": 1280
+    },
+    {
+      "epoch": 8.133333333333333,
+      "grad_norm": 1.8093425035476685,
+      "learning_rate": 4.5587137715199354e-05,
+      "loss": 0.308,
+      "step": 1285
+    },
+    {
+      "epoch": 8.165079365079364,
+      "grad_norm": 2.2939181327819824,
+      "learning_rate": 4.5534438614644294e-05,
+      "loss": 0.3038,
+      "step": 1290
+    },
+    {
+      "epoch": 8.196825396825396,
+      "grad_norm": 2.4204866886138916,
+      "learning_rate": 4.548145754905346e-05,
+      "loss": 0.3375,
+      "step": 1295
+    },
+    {
+      "epoch": 8.228571428571428,
+      "grad_norm": 1.725534439086914,
+      "learning_rate": 4.5428195245927064e-05,
+      "loss": 0.3101,
+      "step": 1300
+    },
+    {
+      "epoch": 8.260317460317461,
+      "grad_norm": 1.637730360031128,
+      "learning_rate": 4.537465243662704e-05,
+      "loss": 0.2931,
+      "step": 1305
+    },
+    {
+      "epoch": 8.292063492063493,
+      "grad_norm": 1.3372169733047485,
+      "learning_rate": 4.532082985636709e-05,
+      "loss": 0.2763,
+      "step": 1310
+    },
+    {
+      "epoch": 8.323809523809524,
+      "grad_norm": 2.5993168354034424,
+      "learning_rate": 4.5266728244202494e-05,
+      "loss": 0.3458,
+      "step": 1315
+    },
+    {
+      "epoch": 8.355555555555556,
+      "grad_norm": 2.461862564086914,
+      "learning_rate": 4.521234834302006e-05,
+      "loss": 0.3693,
+      "step": 1320
+    },
+    {
+      "epoch": 8.387301587301588,
+      "grad_norm": 1.8519413471221924,
+      "learning_rate": 4.5157690899527816e-05,
+      "loss": 0.3327,
+      "step": 1325
+    },
+    {
+      "epoch": 8.41904761904762,
+      "grad_norm": 2.1535580158233643,
+      "learning_rate": 4.510275666424487e-05,
+      "loss": 0.3229,
+      "step": 1330
+    },
+    {
+      "epoch": 8.450793650793651,
+      "grad_norm": 1.6819690465927124,
+      "learning_rate": 4.5047546391491e-05,
+      "loss": 0.2925,
+      "step": 1335
+    },
+    {
+      "epoch": 8.482539682539683,
+      "grad_norm": 1.6538281440734863,
+      "learning_rate": 4.499206083937638e-05,
+      "loss": 0.3218,
+      "step": 1340
+    },
+    {
+      "epoch": 8.514285714285714,
+      "grad_norm": 1.8956862688064575,
+      "learning_rate": 4.493630076979112e-05,
+      "loss": 0.3423,
+      "step": 1345
+    },
+    {
+      "epoch": 8.546031746031746,
+      "grad_norm": 2.274681806564331,
+      "learning_rate": 4.48802669483948e-05,
+      "loss": 0.3152,
+      "step": 1350
+    },
+    {
+      "epoch": 8.577777777777778,
+      "grad_norm": 2.2956337928771973,
+      "learning_rate": 4.4823960144606014e-05,
+      "loss": 0.3417,
+      "step": 1355
+    },
+    {
+      "epoch": 8.60952380952381,
+      "grad_norm": 1.8650286197662354,
+      "learning_rate": 4.4767381131591734e-05,
+      "loss": 0.2896,
+      "step": 1360
+    },
+    {
+      "epoch": 8.64126984126984,
+      "grad_norm": 1.3998652696609497,
+      "learning_rate": 4.471053068625674e-05,
+      "loss": 0.3372,
+      "step": 1365
+    },
+    {
+      "epoch": 8.673015873015872,
+      "grad_norm": 2.855074167251587,
+      "learning_rate": 4.465340958923293e-05,
+      "loss": 0.332,
+      "step": 1370
+    },
+    {
+      "epoch": 8.704761904761904,
+      "grad_norm": 1.6865357160568237,
+      "learning_rate": 4.459601862486862e-05,
+      "loss": 0.3053,
+      "step": 1375
+    },
+    {
+      "epoch": 8.736507936507937,
+      "grad_norm": 2.501856803894043,
+      "learning_rate": 4.453835858121773e-05,
+      "loss": 0.3119,
+      "step": 1380
+    },
+    {
+      "epoch": 8.768253968253969,
+      "grad_norm": 2.4325456619262695,
+      "learning_rate": 4.4480430250029046e-05,
+      "loss": 0.3395,
+      "step": 1385
+    },
+    {
+      "epoch": 8.8,
+      "grad_norm": 1.4845948219299316,
+      "learning_rate": 4.4422234426735256e-05,
+      "loss": 0.3237,
+      "step": 1390
+    },
+    {
+      "epoch": 8.831746031746032,
+      "grad_norm": 1.3553249835968018,
+      "learning_rate": 4.436377191044208e-05,
+      "loss": 0.3387,
+      "step": 1395
+    },
+    {
+      "epoch": 8.863492063492064,
+      "grad_norm": 1.8338890075683594,
+      "learning_rate": 4.430504350391729e-05,
+      "loss": 0.3618,
+      "step": 1400
+    },
+    {
+      "epoch": 8.895238095238096,
+      "grad_norm": 2.291538953781128,
+      "learning_rate": 4.4246050013579686e-05,
+      "loss": 0.3608,
+      "step": 1405
+    },
+    {
+      "epoch": 8.926984126984127,
+      "grad_norm": 1.3809788227081299,
+      "learning_rate": 4.4186792249488005e-05,
+      "loss": 0.3077,
+      "step": 1410
+    },
+    {
+      "epoch": 8.958730158730159,
+      "grad_norm": 1.5944230556488037,
+      "learning_rate": 4.412727102532983e-05,
+      "loss": 0.3307,
+      "step": 1415
+    },
+    {
+      "epoch": 8.99047619047619,
+      "grad_norm": 2.2244362831115723,
+      "learning_rate": 4.4067487158410396e-05,
+      "loss": 0.3469,
+      "step": 1420
+    },
+    {
+      "epoch": 9.019047619047619,
+      "grad_norm": 1.444221019744873,
+      "learning_rate": 4.400744146964136e-05,
+      "loss": 0.3049,
+      "step": 1425
+    },
+    {
+      "epoch": 9.05079365079365,
+      "grad_norm": 1.5847752094268799,
+      "learning_rate": 4.394713478352955e-05,
+      "loss": 0.2715,
+      "step": 1430
+    },
+    {
+      "epoch": 9.082539682539682,
+      "grad_norm": 1.6062681674957275,
+      "learning_rate": 4.388656792816562e-05,
+      "loss": 0.2487,
+      "step": 1435
+    },
+    {
+      "epoch": 9.114285714285714,
+      "grad_norm": 2.099787712097168,
+      "learning_rate": 4.382574173521272e-05,
+      "loss": 0.2866,
+      "step": 1440
+    },
+    {
+      "epoch": 9.146031746031746,
+      "grad_norm": 1.0997334718704224,
+      "learning_rate": 4.376465703989502e-05,
+      "loss": 0.3052,
+      "step": 1445
+    },
+    {
+      "epoch": 9.177777777777777,
+      "grad_norm": 2.4327454566955566,
+      "learning_rate": 4.370331468098628e-05,
+      "loss": 0.3212,
+      "step": 1450
+    },
+    {
+      "epoch": 9.209523809523809,
+      "grad_norm": 1.4816385507583618,
+      "learning_rate": 4.364171550079833e-05,
+      "loss": 0.3046,
+      "step": 1455
+    },
+    {
+      "epoch": 9.24126984126984,
+      "grad_norm": 2.039186716079712,
+      "learning_rate": 4.357986034516947e-05,
+      "loss": 0.3165,
+      "step": 1460
+    },
+    {
+      "epoch": 9.273015873015874,
+      "grad_norm": 1.437852382659912,
+      "learning_rate": 4.3517750063452934e-05,
+      "loss": 0.3037,
+      "step": 1465
+    },
+    {
+      "epoch": 9.304761904761905,
+      "grad_norm": 1.818982720375061,
+      "learning_rate": 4.345538550850512e-05,
+      "loss": 0.3122,
+      "step": 1470
+    },
+    {
+      "epoch": 9.336507936507937,
+      "grad_norm": 1.12025785446167,
+      "learning_rate": 4.339276753667395e-05,
+      "loss": 0.2909,
+      "step": 1475
+    },
+    {
+      "epoch": 9.368253968253969,
+      "grad_norm": 1.6094844341278076,
+      "learning_rate": 4.3329897007787125e-05,
+      "loss": 0.2823,
+      "step": 1480
+    },
+    {
+      "epoch": 9.4,
+      "grad_norm": 1.916200041770935,
+      "learning_rate": 4.326677478514024e-05,
+      "loss": 0.2939,
+      "step": 1485
+    },
+    {
+      "epoch": 9.431746031746032,
+      "grad_norm": 1.97919499874115,
+      "learning_rate": 4.320340173548503e-05,
+      "loss": 0.2826,
+      "step": 1490
+    },
+    {
+      "epoch": 9.463492063492064,
+      "grad_norm": 2.0238938331604004,
+      "learning_rate": 4.313977872901737e-05,
+      "loss": 0.3273,
+      "step": 1495
+    },
+    {
+      "epoch": 9.495238095238095,
+      "grad_norm": 2.5840957164764404,
+      "learning_rate": 4.307590663936541e-05,
+      "loss": 0.2889,
+      "step": 1500
+    },
+    {
+      "epoch": 9.526984126984127,
+      "grad_norm": 2.3503904342651367,
+      "learning_rate": 4.30117863435775e-05,
+      "loss": 0.3012,
+      "step": 1505
+    },
+    {
+      "epoch": 9.558730158730159,
+      "grad_norm": 2.019792318344116,
+      "learning_rate": 4.294741872211024e-05,
+      "loss": 0.3267,
+      "step": 1510
+    },
+    {
+      "epoch": 9.59047619047619,
+      "grad_norm": 2.2713353633880615,
+      "learning_rate": 4.288280465881632e-05,
+      "loss": 0.3096,
+      "step": 1515
+    },
+    {
+      "epoch": 9.622222222222222,
+      "grad_norm": 2.4236693382263184,
+      "learning_rate": 4.281794504093237e-05,
+      "loss": 0.3291,
+      "step": 1520
+    },
+    {
+      "epoch": 9.653968253968253,
+      "grad_norm": 1.772703766822815,
+      "learning_rate": 4.275284075906686e-05,
+      "loss": 0.3117,
+      "step": 1525
+    },
+    {
+      "epoch": 9.685714285714285,
+      "grad_norm": 1.9665186405181885,
+      "learning_rate": 4.268749270718778e-05,
+      "loss": 0.326,
+      "step": 1530
+    },
+    {
+      "epoch": 9.717460317460317,
+      "grad_norm": 1.9472782611846924,
+      "learning_rate": 4.262190178261044e-05,
+      "loss": 0.2683,
+      "step": 1535
+    },
+    {
+      "epoch": 9.74920634920635,
+      "grad_norm": 2.0638089179992676,
+      "learning_rate": 4.255606888598508e-05,
+      "loss": 0.314,
+      "step": 1540
+    },
+    {
+      "epoch": 9.780952380952382,
+      "grad_norm": 2.1349925994873047,
+      "learning_rate": 4.248999492128456e-05,
+      "loss": 0.2897,
+      "step": 1545
+    },
+    {
+      "epoch": 9.812698412698413,
+      "grad_norm": 2.112536907196045,
+      "learning_rate": 4.242368079579192e-05,
+      "loss": 0.31,
+      "step": 1550
+    },
+    {
+      "epoch": 9.844444444444445,
+      "grad_norm": 1.6859878301620483,
+      "learning_rate": 4.2357127420087917e-05,
+      "loss": 0.3412,
+      "step": 1555
+    },
+    {
+      "epoch": 9.876190476190477,
+      "grad_norm": 1.9178651571273804,
+      "learning_rate": 4.229033570803853e-05,
+      "loss": 0.334,
+      "step": 1560
+    },
+    {
+      "epoch": 9.907936507936508,
+      "grad_norm": 2.562436103820801,
+      "learning_rate": 4.2223306576782426e-05,
+      "loss": 0.3379,
+      "step": 1565
+    },
+    {
+      "epoch": 9.93968253968254,
+      "grad_norm": 1.8472412824630737,
+      "learning_rate": 4.215604094671835e-05,
+      "loss": 0.3415,
+      "step": 1570
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 4710,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 30,
+  "save_steps": 157,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.3726725762318336e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1570/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1727/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.0

checkpoint-1727/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "q_proj",
+    "up_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1727/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-1727/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1727/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1727/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,209 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-1727/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2449 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 10.933333333333334,
+  "eval_steps": 500,
+  "global_step": 1727,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.031746031746031744,
+      "grad_norm": 0.5545095205307007,
+      "learning_rate": 5.307855626326963e-07,
+      "loss": 3.7162,
+      "step": 5
+    },
+    {
+      "epoch": 0.06349206349206349,
+      "grad_norm": 0.6163601279258728,
+      "learning_rate": 1.0615711252653927e-06,
+      "loss": 3.9388,
+      "step": 10
+    },
+    {
+      "epoch": 0.09523809523809523,
+      "grad_norm": 0.5541530847549438,
+      "learning_rate": 1.5923566878980892e-06,
+      "loss": 3.9165,
+      "step": 15
+    },
+    {
+      "epoch": 0.12698412698412698,
+      "grad_norm": 0.457332044839859,
+      "learning_rate": 2.1231422505307854e-06,
+      "loss": 3.7326,
+      "step": 20
+    },
+    {
+      "epoch": 0.15873015873015872,
+      "grad_norm": 0.5335279107093811,
+      "learning_rate": 2.653927813163482e-06,
+      "loss": 3.8251,
+      "step": 25
+    },
+    {
+      "epoch": 0.19047619047619047,
+      "grad_norm": 0.7080379724502563,
+      "learning_rate": 3.1847133757961785e-06,
+      "loss": 3.7534,
+      "step": 30
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 0.520993709564209,
+      "learning_rate": 3.715498938428875e-06,
+      "loss": 3.898,
+      "step": 35
+    },
+    {
+      "epoch": 0.25396825396825395,
+      "grad_norm": 0.5451405644416809,
+      "learning_rate": 4.246284501061571e-06,
+      "loss": 3.8951,
+      "step": 40
+    },
+    {
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.6205154657363892,
+      "learning_rate": 4.777070063694268e-06,
+      "loss": 3.7666,
+      "step": 45
+    },
+    {
+      "epoch": 0.31746031746031744,
+      "grad_norm": 0.7404439449310303,
+      "learning_rate": 5.307855626326964e-06,
+      "loss": 4.0258,
+      "step": 50
+    },
+    {
+      "epoch": 0.3492063492063492,
+      "grad_norm": 0.6272220015525818,
+      "learning_rate": 5.838641188959661e-06,
+      "loss": 3.8464,
+      "step": 55
+    },
+    {
+      "epoch": 0.38095238095238093,
+      "grad_norm": 0.7744691967964172,
+      "learning_rate": 6.369426751592357e-06,
+      "loss": 3.7299,
+      "step": 60
+    },
+    {
+      "epoch": 0.4126984126984127,
+      "grad_norm": 0.8805738687515259,
+      "learning_rate": 6.900212314225053e-06,
+      "loss": 3.5008,
+      "step": 65
+    },
+    {
+      "epoch": 0.4444444444444444,
+      "grad_norm": 1.0740723609924316,
+      "learning_rate": 7.43099787685775e-06,
+      "loss": 3.7552,
+      "step": 70
+    },
+    {
+      "epoch": 0.47619047619047616,
+      "grad_norm": 0.965708315372467,
+      "learning_rate": 7.961783439490445e-06,
+      "loss": 3.5516,
+      "step": 75
+    },
+    {
+      "epoch": 0.5079365079365079,
+      "grad_norm": 0.9812778234481812,
+      "learning_rate": 8.492569002123141e-06,
+      "loss": 3.6003,
+      "step": 80
+    },
+    {
+      "epoch": 0.5396825396825397,
+      "grad_norm": 0.8831024169921875,
+      "learning_rate": 9.023354564755838e-06,
+      "loss": 3.613,
+      "step": 85
+    },
+    {
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.8358364105224609,
+      "learning_rate": 9.554140127388536e-06,
+      "loss": 3.1858,
+      "step": 90
+    },
+    {
+      "epoch": 0.6031746031746031,
+      "grad_norm": 1.0740444660186768,
+      "learning_rate": 1.0084925690021232e-05,
+      "loss": 3.0937,
+      "step": 95
+    },
+    {
+      "epoch": 0.6349206349206349,
+      "grad_norm": 1.0987530946731567,
+      "learning_rate": 1.0615711252653929e-05,
+      "loss": 3.154,
+      "step": 100
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 1.2300925254821777,
+      "learning_rate": 1.1146496815286625e-05,
+      "loss": 2.9414,
+      "step": 105
+    },
+    {
+      "epoch": 0.6984126984126984,
+      "grad_norm": 1.2214170694351196,
+      "learning_rate": 1.1677282377919321e-05,
+      "loss": 2.9464,
+      "step": 110
+    },
+    {
+      "epoch": 0.7301587301587301,
+      "grad_norm": 1.2803975343704224,
+      "learning_rate": 1.2208067940552018e-05,
+      "loss": 2.8921,
+      "step": 115
+    },
+    {
+      "epoch": 0.7619047619047619,
+      "grad_norm": 1.2232719659805298,
+      "learning_rate": 1.2738853503184714e-05,
+      "loss": 2.5252,
+      "step": 120
+    },
+    {
+      "epoch": 0.7936507936507936,
+      "grad_norm": 1.204835295677185,
+      "learning_rate": 1.326963906581741e-05,
+      "loss": 2.5215,
+      "step": 125
+    },
+    {
+      "epoch": 0.8253968253968254,
+      "grad_norm": 1.4095579385757446,
+      "learning_rate": 1.3800424628450107e-05,
+      "loss": 2.136,
+      "step": 130
+    },
+    {
+      "epoch": 0.8571428571428571,
+      "grad_norm": 1.4166598320007324,
+      "learning_rate": 1.4331210191082803e-05,
+      "loss": 2.2653,
+      "step": 135
+    },
+    {
+      "epoch": 0.8888888888888888,
+      "grad_norm": 1.3040446043014526,
+      "learning_rate": 1.48619957537155e-05,
+      "loss": 2.0193,
+      "step": 140
+    },
+    {
+      "epoch": 0.9206349206349206,
+      "grad_norm": 1.4114688634872437,
+      "learning_rate": 1.5392781316348196e-05,
+      "loss": 1.7935,
+      "step": 145
+    },
+    {
+      "epoch": 0.9523809523809523,
+      "grad_norm": 1.8066726922988892,
+      "learning_rate": 1.592356687898089e-05,
+      "loss": 1.5731,
+      "step": 150
+    },
+    {
+      "epoch": 0.9841269841269841,
+      "grad_norm": 1.4303158521652222,
+      "learning_rate": 1.6454352441613588e-05,
+      "loss": 1.6552,
+      "step": 155
+    },
+    {
+      "epoch": 1.0126984126984127,
+      "grad_norm": 1.6671762466430664,
+      "learning_rate": 1.6985138004246283e-05,
+      "loss": 1.6973,
+      "step": 160
+    },
+    {
+      "epoch": 1.0444444444444445,
+      "grad_norm": 1.5719650983810425,
+      "learning_rate": 1.751592356687898e-05,
+      "loss": 1.312,
+      "step": 165
+    },
+    {
+      "epoch": 1.0761904761904761,
+      "grad_norm": 1.4845054149627686,
+      "learning_rate": 1.8046709129511676e-05,
+      "loss": 1.3601,
+      "step": 170
+    },
+    {
+      "epoch": 1.107936507936508,
+      "grad_norm": 1.1172235012054443,
+      "learning_rate": 1.8577494692144374e-05,
+      "loss": 1.3137,
+      "step": 175
+    },
+    {
+      "epoch": 1.1396825396825396,
+      "grad_norm": 1.9621731042861938,
+      "learning_rate": 1.910828025477707e-05,
+      "loss": 1.1778,
+      "step": 180
+    },
+    {
+      "epoch": 1.1714285714285715,
+      "grad_norm": 1.7722721099853516,
+      "learning_rate": 1.963906581740977e-05,
+      "loss": 1.4534,
+      "step": 185
+    },
+    {
+      "epoch": 1.2031746031746031,
+      "grad_norm": 1.3677467107772827,
+      "learning_rate": 2.0169851380042464e-05,
+      "loss": 1.3356,
+      "step": 190
+    },
+    {
+      "epoch": 1.234920634920635,
+      "grad_norm": 1.3260482549667358,
+      "learning_rate": 2.0700636942675162e-05,
+      "loss": 1.0876,
+      "step": 195
+    },
+    {
+      "epoch": 1.2666666666666666,
+      "grad_norm": 1.5176818370819092,
+      "learning_rate": 2.1231422505307857e-05,
+      "loss": 1.1602,
+      "step": 200
+    },
+    {
+      "epoch": 1.2984126984126985,
+      "grad_norm": 1.2793077230453491,
+      "learning_rate": 2.1762208067940555e-05,
+      "loss": 1.1505,
+      "step": 205
+    },
+    {
+      "epoch": 1.33015873015873,
+      "grad_norm": 1.196784257888794,
+      "learning_rate": 2.229299363057325e-05,
+      "loss": 1.0664,
+      "step": 210
+    },
+    {
+      "epoch": 1.361904761904762,
+      "grad_norm": 1.303207516670227,
+      "learning_rate": 2.2823779193205948e-05,
+      "loss": 1.2557,
+      "step": 215
+    },
+    {
+      "epoch": 1.3936507936507936,
+      "grad_norm": 1.2853388786315918,
+      "learning_rate": 2.3354564755838642e-05,
+      "loss": 1.0704,
+      "step": 220
+    },
+    {
+      "epoch": 1.4253968253968254,
+      "grad_norm": 1.381369948387146,
+      "learning_rate": 2.388535031847134e-05,
+      "loss": 1.1371,
+      "step": 225
+    },
+    {
+      "epoch": 1.457142857142857,
+      "grad_norm": 1.8012712001800537,
+      "learning_rate": 2.4416135881104035e-05,
+      "loss": 1.248,
+      "step": 230
+    },
+    {
+      "epoch": 1.488888888888889,
+      "grad_norm": 1.7397032976150513,
+      "learning_rate": 2.4946921443736733e-05,
+      "loss": 1.2782,
+      "step": 235
+    },
+    {
+      "epoch": 1.5206349206349206,
+      "grad_norm": 1.4026210308074951,
+      "learning_rate": 2.5477707006369428e-05,
+      "loss": 1.154,
+      "step": 240
+    },
+    {
+      "epoch": 1.5523809523809524,
+      "grad_norm": 1.2906067371368408,
+      "learning_rate": 2.6008492569002126e-05,
+      "loss": 0.9141,
+      "step": 245
+    },
+    {
+      "epoch": 1.5841269841269843,
+      "grad_norm": 1.265598177909851,
+      "learning_rate": 2.653927813163482e-05,
+      "loss": 1.0625,
+      "step": 250
+    },
+    {
+      "epoch": 1.615873015873016,
+      "grad_norm": 1.6044715642929077,
+      "learning_rate": 2.707006369426752e-05,
+      "loss": 0.9624,
+      "step": 255
+    },
+    {
+      "epoch": 1.6476190476190475,
+      "grad_norm": 1.4612747430801392,
+      "learning_rate": 2.7600849256900213e-05,
+      "loss": 1.0413,
+      "step": 260
+    },
+    {
+      "epoch": 1.6793650793650794,
+      "grad_norm": 1.6222745180130005,
+      "learning_rate": 2.8131634819532908e-05,
+      "loss": 1.0929,
+      "step": 265
+    },
+    {
+      "epoch": 1.7111111111111112,
+      "grad_norm": 1.1456222534179688,
+      "learning_rate": 2.8662420382165606e-05,
+      "loss": 0.9957,
+      "step": 270
+    },
+    {
+      "epoch": 1.7428571428571429,
+      "grad_norm": 1.5746041536331177,
+      "learning_rate": 2.91932059447983e-05,
+      "loss": 1.0274,
+      "step": 275
+    },
+    {
+      "epoch": 1.7746031746031745,
+      "grad_norm": 1.3407832384109497,
+      "learning_rate": 2.9723991507431e-05,
+      "loss": 0.9487,
+      "step": 280
+    },
+    {
+      "epoch": 1.8063492063492064,
+      "grad_norm": 1.6232194900512695,
+      "learning_rate": 3.0254777070063693e-05,
+      "loss": 1.0966,
+      "step": 285
+    },
+    {
+      "epoch": 1.8380952380952382,
+      "grad_norm": 1.4920552968978882,
+      "learning_rate": 3.078556263269639e-05,
+      "loss": 0.9099,
+      "step": 290
+    },
+    {
+      "epoch": 1.8698412698412699,
+      "grad_norm": 1.2123301029205322,
+      "learning_rate": 3.1316348195329086e-05,
+      "loss": 1.0902,
+      "step": 295
+    },
+    {
+      "epoch": 1.9015873015873015,
+      "grad_norm": 1.2080968618392944,
+      "learning_rate": 3.184713375796178e-05,
+      "loss": 0.943,
+      "step": 300
+    },
+    {
+      "epoch": 1.9333333333333333,
+      "grad_norm": 1.190319299697876,
+      "learning_rate": 3.237791932059448e-05,
+      "loss": 0.7893,
+      "step": 305
+    },
+    {
+      "epoch": 1.9650793650793652,
+      "grad_norm": 1.5929204225540161,
+      "learning_rate": 3.2908704883227177e-05,
+      "loss": 1.0232,
+      "step": 310
+    },
+    {
+      "epoch": 1.9968253968253968,
+      "grad_norm": 1.0138347148895264,
+      "learning_rate": 3.343949044585987e-05,
+      "loss": 0.6693,
+      "step": 315
+    },
+    {
+      "epoch": 2.0253968253968253,
+      "grad_norm": 1.3012847900390625,
+      "learning_rate": 3.3970276008492566e-05,
+      "loss": 0.8355,
+      "step": 320
+    },
+    {
+      "epoch": 2.057142857142857,
+      "grad_norm": 1.2264782190322876,
+      "learning_rate": 3.450106157112527e-05,
+      "loss": 0.9872,
+      "step": 325
+    },
+    {
+      "epoch": 2.088888888888889,
+      "grad_norm": 1.139275312423706,
+      "learning_rate": 3.503184713375796e-05,
+      "loss": 0.8662,
+      "step": 330
+    },
+    {
+      "epoch": 2.1206349206349207,
+      "grad_norm": 1.3836581707000732,
+      "learning_rate": 3.5562632696390657e-05,
+      "loss": 0.9549,
+      "step": 335
+    },
+    {
+      "epoch": 2.1523809523809523,
+      "grad_norm": 1.368600845336914,
+      "learning_rate": 3.609341825902335e-05,
+      "loss": 0.9195,
+      "step": 340
+    },
+    {
+      "epoch": 2.1841269841269844,
+      "grad_norm": 1.8793011903762817,
+      "learning_rate": 3.662420382165605e-05,
+      "loss": 0.8505,
+      "step": 345
+    },
+    {
+      "epoch": 2.215873015873016,
+      "grad_norm": 1.305284023284912,
+      "learning_rate": 3.715498938428875e-05,
+      "loss": 0.7755,
+      "step": 350
+    },
+    {
+      "epoch": 2.2476190476190476,
+      "grad_norm": 1.7851749658584595,
+      "learning_rate": 3.768577494692145e-05,
+      "loss": 0.9242,
+      "step": 355
+    },
+    {
+      "epoch": 2.2793650793650793,
+      "grad_norm": 1.4341535568237305,
+      "learning_rate": 3.821656050955414e-05,
+      "loss": 0.8221,
+      "step": 360
+    },
+    {
+      "epoch": 2.311111111111111,
+      "grad_norm": 1.39107346534729,
+      "learning_rate": 3.874734607218684e-05,
+      "loss": 0.6999,
+      "step": 365
+    },
+    {
+      "epoch": 2.342857142857143,
+      "grad_norm": 1.2304264307022095,
+      "learning_rate": 3.927813163481954e-05,
+      "loss": 0.8362,
+      "step": 370
+    },
+    {
+      "epoch": 2.3746031746031746,
+      "grad_norm": 1.8470840454101562,
+      "learning_rate": 3.9808917197452234e-05,
+      "loss": 0.9398,
+      "step": 375
+    },
+    {
+      "epoch": 2.4063492063492062,
+      "grad_norm": 1.2533882856369019,
+      "learning_rate": 4.033970276008493e-05,
+      "loss": 0.7754,
+      "step": 380
+    },
+    {
+      "epoch": 2.4380952380952383,
+      "grad_norm": 1.5335006713867188,
+      "learning_rate": 4.087048832271762e-05,
+      "loss": 1.1124,
+      "step": 385
+    },
+    {
+      "epoch": 2.46984126984127,
+      "grad_norm": 1.5298357009887695,
+      "learning_rate": 4.1401273885350325e-05,
+      "loss": 1.017,
+      "step": 390
+    },
+    {
+      "epoch": 2.5015873015873016,
+      "grad_norm": 1.4403260946273804,
+      "learning_rate": 4.193205944798302e-05,
+      "loss": 0.8831,
+      "step": 395
+    },
+    {
+      "epoch": 2.533333333333333,
+      "grad_norm": 1.1528433561325073,
+      "learning_rate": 4.2462845010615714e-05,
+      "loss": 0.801,
+      "step": 400
+    },
+    {
+      "epoch": 2.565079365079365,
+      "grad_norm": 1.3371326923370361,
+      "learning_rate": 4.299363057324841e-05,
+      "loss": 0.8692,
+      "step": 405
+    },
+    {
+      "epoch": 2.596825396825397,
+      "grad_norm": 1.4064775705337524,
+      "learning_rate": 4.352441613588111e-05,
+      "loss": 0.9059,
+      "step": 410
+    },
+    {
+      "epoch": 2.6285714285714286,
+      "grad_norm": 1.4531422853469849,
+      "learning_rate": 4.4055201698513805e-05,
+      "loss": 0.7344,
+      "step": 415
+    },
+    {
+      "epoch": 2.66031746031746,
+      "grad_norm": 1.7043890953063965,
+      "learning_rate": 4.45859872611465e-05,
+      "loss": 0.8298,
+      "step": 420
+    },
+    {
+      "epoch": 2.6920634920634923,
+      "grad_norm": 1.5105586051940918,
+      "learning_rate": 4.5116772823779194e-05,
+      "loss": 0.7768,
+      "step": 425
+    },
+    {
+      "epoch": 2.723809523809524,
+      "grad_norm": 1.8101528882980347,
+      "learning_rate": 4.5647558386411895e-05,
+      "loss": 0.733,
+      "step": 430
+    },
+    {
+      "epoch": 2.7555555555555555,
+      "grad_norm": 1.6365174055099487,
+      "learning_rate": 4.617834394904459e-05,
+      "loss": 0.8061,
+      "step": 435
+    },
+    {
+      "epoch": 2.787301587301587,
+      "grad_norm": 1.7808202505111694,
+      "learning_rate": 4.6709129511677285e-05,
+      "loss": 0.8333,
+      "step": 440
+    },
+    {
+      "epoch": 2.819047619047619,
+      "grad_norm": 1.5223265886306763,
+      "learning_rate": 4.723991507430998e-05,
+      "loss": 0.7557,
+      "step": 445
+    },
+    {
+      "epoch": 2.850793650793651,
+      "grad_norm": 1.3064416646957397,
+      "learning_rate": 4.777070063694268e-05,
+      "loss": 0.8041,
+      "step": 450
+    },
+    {
+      "epoch": 2.8825396825396825,
+      "grad_norm": 1.8025637865066528,
+      "learning_rate": 4.8301486199575375e-05,
+      "loss": 0.9534,
+      "step": 455
+    },
+    {
+      "epoch": 2.914285714285714,
+      "grad_norm": 1.924846887588501,
+      "learning_rate": 4.883227176220807e-05,
+      "loss": 0.9066,
+      "step": 460
+    },
+    {
+      "epoch": 2.9460317460317462,
+      "grad_norm": 1.9862899780273438,
+      "learning_rate": 4.9363057324840765e-05,
+      "loss": 0.7994,
+      "step": 465
+    },
+    {
+      "epoch": 2.977777777777778,
+      "grad_norm": 1.9615916013717651,
+      "learning_rate": 4.9893842887473466e-05,
+      "loss": 0.7045,
+      "step": 470
+    },
+    {
+      "epoch": 3.0063492063492063,
+      "grad_norm": 1.519852876663208,
+      "learning_rate": 4.999989014936042e-05,
+      "loss": 0.7212,
+      "step": 475
+    },
+    {
+      "epoch": 3.038095238095238,
+      "grad_norm": 1.9328887462615967,
+      "learning_rate": 4.999944388279162e-05,
+      "loss": 0.6598,
+      "step": 480
+    },
+    {
+      "epoch": 3.06984126984127,
+      "grad_norm": 2.0340709686279297,
+      "learning_rate": 4.999865434075176e-05,
+      "loss": 0.6829,
+      "step": 485
+    },
+    {
+      "epoch": 3.1015873015873017,
+      "grad_norm": 1.8775280714035034,
+      "learning_rate": 4.999752153408229e-05,
+      "loss": 0.6664,
+      "step": 490
+    },
+    {
+      "epoch": 3.1333333333333333,
+      "grad_norm": 2.385218381881714,
+      "learning_rate": 4.999604547833814e-05,
+      "loss": 0.6836,
+      "step": 495
+    },
+    {
+      "epoch": 3.165079365079365,
+      "grad_norm": 2.1743783950805664,
+      "learning_rate": 4.999422619378752e-05,
+      "loss": 0.7,
+      "step": 500
+    },
+    {
+      "epoch": 3.196825396825397,
+      "grad_norm": 2.20786452293396,
+      "learning_rate": 4.999206370541162e-05,
+      "loss": 0.7253,
+      "step": 505
+    },
+    {
+      "epoch": 3.2285714285714286,
+      "grad_norm": 1.8182263374328613,
+      "learning_rate": 4.998955804290425e-05,
+      "loss": 0.6824,
+      "step": 510
+    },
+    {
+      "epoch": 3.2603174603174603,
+      "grad_norm": 2.2959372997283936,
+      "learning_rate": 4.9986709240671495e-05,
+      "loss": 0.601,
+      "step": 515
+    },
+    {
+      "epoch": 3.292063492063492,
+      "grad_norm": 2.385838031768799,
+      "learning_rate": 4.998351733783116e-05,
+      "loss": 0.7417,
+      "step": 520
+    },
+    {
+      "epoch": 3.323809523809524,
+      "grad_norm": 2.0416879653930664,
+      "learning_rate": 4.997998237821233e-05,
+      "loss": 0.6463,
+      "step": 525
+    },
+    {
+      "epoch": 3.3555555555555556,
+      "grad_norm": 2.2781031131744385,
+      "learning_rate": 4.9976104410354654e-05,
+      "loss": 0.6998,
+      "step": 530
+    },
+    {
+      "epoch": 3.3873015873015873,
+      "grad_norm": 2.146778106689453,
+      "learning_rate": 4.9971883487507775e-05,
+      "loss": 0.7694,
+      "step": 535
+    },
+    {
+      "epoch": 3.419047619047619,
+      "grad_norm": 2.1369104385375977,
+      "learning_rate": 4.9967319667630567e-05,
+      "loss": 0.6615,
+      "step": 540
+    },
+    {
+      "epoch": 3.450793650793651,
+      "grad_norm": 2.4529733657836914,
+      "learning_rate": 4.996241301339029e-05,
+      "loss": 0.6109,
+      "step": 545
+    },
+    {
+      "epoch": 3.4825396825396826,
+      "grad_norm": 2.07030987739563,
+      "learning_rate": 4.995716359216183e-05,
+      "loss": 0.7611,
+      "step": 550
+    },
+    {
+      "epoch": 3.5142857142857142,
+      "grad_norm": 2.4329919815063477,
+      "learning_rate": 4.995157147602669e-05,
+      "loss": 0.7515,
+      "step": 555
+    },
+    {
+      "epoch": 3.546031746031746,
+      "grad_norm": 2.056351900100708,
+      "learning_rate": 4.994563674177202e-05,
+      "loss": 0.6885,
+      "step": 560
+    },
+    {
+      "epoch": 3.5777777777777775,
+      "grad_norm": 2.3665318489074707,
+      "learning_rate": 4.993935947088958e-05,
+      "loss": 0.6271,
+      "step": 565
+    },
+    {
+      "epoch": 3.6095238095238096,
+      "grad_norm": 2.677706480026245,
+      "learning_rate": 4.993273974957463e-05,
+      "loss": 0.5586,
+      "step": 570
+    },
+    {
+      "epoch": 3.641269841269841,
+      "grad_norm": 3.422136068344116,
+      "learning_rate": 4.9925777668724685e-05,
+      "loss": 0.7552,
+      "step": 575
+    },
+    {
+      "epoch": 3.6730158730158733,
+      "grad_norm": 2.4525184631347656,
+      "learning_rate": 4.991847332393835e-05,
+      "loss": 0.7367,
+      "step": 580
+    },
+    {
+      "epoch": 3.704761904761905,
+      "grad_norm": 2.4242067337036133,
+      "learning_rate": 4.991082681551396e-05,
+      "loss": 0.7044,
+      "step": 585
+    },
+    {
+      "epoch": 3.7365079365079366,
+      "grad_norm": 1.8419867753982544,
+      "learning_rate": 4.9902838248448184e-05,
+      "loss": 0.5966,
+      "step": 590
+    },
+    {
+      "epoch": 3.768253968253968,
+      "grad_norm": 2.1394360065460205,
+      "learning_rate": 4.989450773243463e-05,
+      "loss": 0.6736,
+      "step": 595
+    },
+    {
+      "epoch": 3.8,
+      "grad_norm": 1.285447597503662,
+      "learning_rate": 4.9885835381862326e-05,
+      "loss": 0.5021,
+      "step": 600
+    },
+    {
+      "epoch": 3.831746031746032,
+      "grad_norm": 2.724978446960449,
+      "learning_rate": 4.987682131581413e-05,
+      "loss": 0.6128,
+      "step": 605
+    },
+    {
+      "epoch": 3.8634920634920635,
+      "grad_norm": 2.239682912826538,
+      "learning_rate": 4.986746565806508e-05,
+      "loss": 0.5457,
+      "step": 610
+    },
+    {
+      "epoch": 3.895238095238095,
+      "grad_norm": 2.48944091796875,
+      "learning_rate": 4.9857768537080784e-05,
+      "loss": 0.6927,
+      "step": 615
+    },
+    {
+      "epoch": 3.9269841269841272,
+      "grad_norm": 2.4086852073669434,
+      "learning_rate": 4.9847730086015534e-05,
+      "loss": 0.5963,
+      "step": 620
+    },
+    {
+      "epoch": 3.958730158730159,
+      "grad_norm": 2.0070106983184814,
+      "learning_rate": 4.9837350442710553e-05,
+      "loss": 0.5856,
+      "step": 625
+    },
+    {
+      "epoch": 3.9904761904761905,
+      "grad_norm": 1.9726545810699463,
+      "learning_rate": 4.98266297496921e-05,
+      "loss": 0.6208,
+      "step": 630
+    },
+    {
+      "epoch": 4.019047619047619,
+      "grad_norm": 2.6137828826904297,
+      "learning_rate": 4.981556815416948e-05,
+      "loss": 0.6319,
+      "step": 635
+    },
+    {
+      "epoch": 4.050793650793651,
+      "grad_norm": 2.3489890098571777,
+      "learning_rate": 4.9804165808033054e-05,
+      "loss": 0.5887,
+      "step": 640
+    },
+    {
+      "epoch": 4.082539682539682,
+      "grad_norm": 2.8010590076446533,
+      "learning_rate": 4.979242286785214e-05,
+      "loss": 0.5257,
+      "step": 645
+    },
+    {
+      "epoch": 4.114285714285714,
+      "grad_norm": 2.993411064147949,
+      "learning_rate": 4.978033949487284e-05,
+      "loss": 0.4545,
+      "step": 650
+    },
+    {
+      "epoch": 4.146031746031746,
+      "grad_norm": 2.669935703277588,
+      "learning_rate": 4.976791585501588e-05,
+      "loss": 0.5989,
+      "step": 655
+    },
+    {
+      "epoch": 4.177777777777778,
+      "grad_norm": 3.084409236907959,
+      "learning_rate": 4.9755152118874294e-05,
+      "loss": 0.528,
+      "step": 660
+    },
+    {
+      "epoch": 4.20952380952381,
+      "grad_norm": 2.797873020172119,
+      "learning_rate": 4.974204846171106e-05,
+      "loss": 0.5249,
+      "step": 665
+    },
+    {
+      "epoch": 4.241269841269841,
+      "grad_norm": 3.667867422103882,
+      "learning_rate": 4.9728605063456765e-05,
+      "loss": 0.5838,
+      "step": 670
+    },
+    {
+      "epoch": 4.273015873015873,
+      "grad_norm": 2.6918869018554688,
+      "learning_rate": 4.971482210870706e-05,
+      "loss": 0.5143,
+      "step": 675
+    },
+    {
+      "epoch": 4.304761904761905,
+      "grad_norm": 2.1545379161834717,
+      "learning_rate": 4.970069978672017e-05,
+      "loss": 0.5317,
+      "step": 680
+    },
+    {
+      "epoch": 4.336507936507936,
+      "grad_norm": 2.1043529510498047,
+      "learning_rate": 4.9686238291414275e-05,
+      "loss": 0.4815,
+      "step": 685
+    },
+    {
+      "epoch": 4.368253968253969,
+      "grad_norm": 2.1359753608703613,
+      "learning_rate": 4.9671437821364855e-05,
+      "loss": 0.4935,
+      "step": 690
+    },
+    {
+      "epoch": 4.4,
+      "grad_norm": 3.092057228088379,
+      "learning_rate": 4.965629857980197e-05,
+      "loss": 0.6831,
+      "step": 695
+    },
+    {
+      "epoch": 4.431746031746032,
+      "grad_norm": 2.5296835899353027,
+      "learning_rate": 4.964082077460745e-05,
+      "loss": 0.5323,
+      "step": 700
+    },
+    {
+      "epoch": 4.463492063492064,
+      "grad_norm": 1.6655627489089966,
+      "learning_rate": 4.962500461831207e-05,
+      "loss": 0.4553,
+      "step": 705
+    },
+    {
+      "epoch": 4.495238095238095,
+      "grad_norm": 2.6663475036621094,
+      "learning_rate": 4.9608850328092576e-05,
+      "loss": 0.463,
+      "step": 710
+    },
+    {
+      "epoch": 4.526984126984127,
+      "grad_norm": 2.3763060569763184,
+      "learning_rate": 4.959235812576879e-05,
+      "loss": 0.4861,
+      "step": 715
+    },
+    {
+      "epoch": 4.5587301587301585,
+      "grad_norm": 2.2217962741851807,
+      "learning_rate": 4.957552823780047e-05,
+      "loss": 0.468,
+      "step": 720
+    },
+    {
+      "epoch": 4.59047619047619,
+      "grad_norm": 2.8885600566864014,
+      "learning_rate": 4.9558360895284295e-05,
+      "loss": 0.4588,
+      "step": 725
+    },
+    {
+      "epoch": 4.622222222222222,
+      "grad_norm": 2.5661261081695557,
+      "learning_rate": 4.954085633395058e-05,
+      "loss": 0.4926,
+      "step": 730
+    },
+    {
+      "epoch": 4.653968253968254,
+      "grad_norm": 2.304365396499634,
+      "learning_rate": 4.952301479416015e-05,
+      "loss": 0.494,
+      "step": 735
+    },
+    {
+      "epoch": 4.685714285714286,
+      "grad_norm": 2.690577983856201,
+      "learning_rate": 4.9504836520900976e-05,
+      "loss": 0.5814,
+      "step": 740
+    },
+    {
+      "epoch": 4.717460317460318,
+      "grad_norm": 2.7180025577545166,
+      "learning_rate": 4.948632176378481e-05,
+      "loss": 0.5329,
+      "step": 745
+    },
+    {
+      "epoch": 4.749206349206349,
+      "grad_norm": 2.716587543487549,
+      "learning_rate": 4.9467470777043806e-05,
+      "loss": 0.5264,
+      "step": 750
+    },
+    {
+      "epoch": 4.780952380952381,
+      "grad_norm": 2.315419912338257,
+      "learning_rate": 4.9448283819526954e-05,
+      "loss": 0.4756,
+      "step": 755
+    },
+    {
+      "epoch": 4.8126984126984125,
+      "grad_norm": 2.1679515838623047,
+      "learning_rate": 4.9428761154696605e-05,
+      "loss": 0.4819,
+      "step": 760
+    },
+    {
+      "epoch": 4.844444444444444,
+      "grad_norm": 3.389266014099121,
+      "learning_rate": 4.9408903050624796e-05,
+      "loss": 0.5121,
+      "step": 765
+    },
+    {
+      "epoch": 4.876190476190477,
+      "grad_norm": 3.4317383766174316,
+      "learning_rate": 4.938870977998959e-05,
+      "loss": 0.4535,
+      "step": 770
+    },
+    {
+      "epoch": 4.907936507936508,
+      "grad_norm": 2.9491918087005615,
+      "learning_rate": 4.9368181620071344e-05,
+      "loss": 0.5333,
+      "step": 775
+    },
+    {
+      "epoch": 4.93968253968254,
+      "grad_norm": 2.516798496246338,
+      "learning_rate": 4.934731885274887e-05,
+      "loss": 0.5367,
+      "step": 780
+    },
+    {
+      "epoch": 4.9714285714285715,
+      "grad_norm": 3.0031046867370605,
+      "learning_rate": 4.9326121764495596e-05,
+      "loss": 0.4957,
+      "step": 785
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 3.334085702896118,
+      "learning_rate": 4.9304590646375614e-05,
+      "loss": 0.5287,
+      "step": 790
+    },
+    {
+      "epoch": 5.031746031746032,
+      "grad_norm": 1.9608453512191772,
+      "learning_rate": 4.928272579403969e-05,
+      "loss": 0.36,
+      "step": 795
+    },
+    {
+      "epoch": 5.063492063492063,
+      "grad_norm": 2.328850746154785,
+      "learning_rate": 4.92605275077212e-05,
+      "loss": 0.3628,
+      "step": 800
+    },
+    {
+      "epoch": 5.095238095238095,
+      "grad_norm": 2.3446412086486816,
+      "learning_rate": 4.923799609223202e-05,
+      "loss": 0.3327,
+      "step": 805
+    },
+    {
+      "epoch": 5.1269841269841265,
+      "grad_norm": 2.476181745529175,
+      "learning_rate": 4.921513185695831e-05,
+      "loss": 0.4246,
+      "step": 810
+    },
+    {
+      "epoch": 5.158730158730159,
+      "grad_norm": 3.1026763916015625,
+      "learning_rate": 4.91919351158563e-05,
+      "loss": 0.5048,
+      "step": 815
+    },
+    {
+      "epoch": 5.190476190476191,
+      "grad_norm": 2.8165297508239746,
+      "learning_rate": 4.916840618744798e-05,
+      "loss": 0.4361,
+      "step": 820
+    },
+    {
+      "epoch": 5.222222222222222,
+      "grad_norm": 1.8732138872146606,
+      "learning_rate": 4.9144545394816687e-05,
+      "loss": 0.4693,
+      "step": 825
+    },
+    {
+      "epoch": 5.253968253968254,
+      "grad_norm": 1.7250264883041382,
+      "learning_rate": 4.91203530656027e-05,
+      "loss": 0.4076,
+      "step": 830
+    },
+    {
+      "epoch": 5.285714285714286,
+      "grad_norm": 2.105459690093994,
+      "learning_rate": 4.9095829531998725e-05,
+      "loss": 0.3589,
+      "step": 835
+    },
+    {
+      "epoch": 5.317460317460317,
+      "grad_norm": 3.6825687885284424,
+      "learning_rate": 4.9070975130745387e-05,
+      "loss": 0.5263,
+      "step": 840
+    },
+    {
+      "epoch": 5.349206349206349,
+      "grad_norm": 2.947052001953125,
+      "learning_rate": 4.90457902031265e-05,
+      "loss": 0.4632,
+      "step": 845
+    },
+    {
+      "epoch": 5.380952380952381,
+      "grad_norm": 1.9546104669570923,
+      "learning_rate": 4.902027509496448e-05,
+      "loss": 0.4348,
+      "step": 850
+    },
+    {
+      "epoch": 5.412698412698413,
+      "grad_norm": 2.4471983909606934,
+      "learning_rate": 4.899443015661557e-05,
+      "loss": 0.4209,
+      "step": 855
+    },
+    {
+      "epoch": 5.444444444444445,
+      "grad_norm": 1.827124834060669,
+      "learning_rate": 4.8968255742964975e-05,
+      "loss": 0.413,
+      "step": 860
+    },
+    {
+      "epoch": 5.476190476190476,
+      "grad_norm": 2.654707431793213,
+      "learning_rate": 4.894175221342207e-05,
+      "loss": 0.432,
+      "step": 865
+    },
+    {
+      "epoch": 5.507936507936508,
+      "grad_norm": 2.648967981338501,
+      "learning_rate": 4.8914919931915407e-05,
+      "loss": 0.4339,
+      "step": 870
+    },
+    {
+      "epoch": 5.5396825396825395,
+      "grad_norm": 2.874075412750244,
+      "learning_rate": 4.888775926688775e-05,
+      "loss": 0.4392,
+      "step": 875
+    },
+    {
+      "epoch": 5.571428571428571,
+      "grad_norm": 2.9674830436706543,
+      "learning_rate": 4.8860270591291e-05,
+      "loss": 0.4459,
+      "step": 880
+    },
+    {
+      "epoch": 5.603174603174603,
+      "grad_norm": 2.054748296737671,
+      "learning_rate": 4.883245428258107e-05,
+      "loss": 0.4313,
+      "step": 885
+    },
+    {
+      "epoch": 5.634920634920634,
+      "grad_norm": 1.9174392223358154,
+      "learning_rate": 4.880431072271272e-05,
+      "loss": 0.3906,
+      "step": 890
+    },
+    {
+      "epoch": 5.666666666666667,
+      "grad_norm": 2.5257787704467773,
+      "learning_rate": 4.87758402981343e-05,
+      "loss": 0.4219,
+      "step": 895
+    },
+    {
+      "epoch": 5.698412698412699,
+      "grad_norm": 2.6365532875061035,
+      "learning_rate": 4.8747043399782424e-05,
+      "loss": 0.3978,
+      "step": 900
+    },
+    {
+      "epoch": 5.73015873015873,
+      "grad_norm": 2.0583746433258057,
+      "learning_rate": 4.871792042307667e-05,
+      "loss": 0.4847,
+      "step": 905
+    },
+    {
+      "epoch": 5.761904761904762,
+      "grad_norm": 2.035872459411621,
+      "learning_rate": 4.868847176791406e-05,
+      "loss": 0.4675,
+      "step": 910
+    },
+    {
+      "epoch": 5.7936507936507935,
+      "grad_norm": 2.3722939491271973,
+      "learning_rate": 4.8658697838663625e-05,
+      "loss": 0.4586,
+      "step": 915
+    },
+    {
+      "epoch": 5.825396825396825,
+      "grad_norm": 1.2609732151031494,
+      "learning_rate": 4.862859904416085e-05,
+      "loss": 0.3274,
+      "step": 920
+    },
+    {
+      "epoch": 5.857142857142857,
+      "grad_norm": 2.3673977851867676,
+      "learning_rate": 4.8598175797702036e-05,
+      "loss": 0.4685,
+      "step": 925
+    },
+    {
+      "epoch": 5.888888888888889,
+      "grad_norm": 2.8414175510406494,
+      "learning_rate": 4.856742851703866e-05,
+      "loss": 0.4762,
+      "step": 930
+    },
+    {
+      "epoch": 5.920634920634921,
+      "grad_norm": 2.4126765727996826,
+      "learning_rate": 4.853635762437159e-05,
+      "loss": 0.4075,
+      "step": 935
+    },
+    {
+      "epoch": 5.9523809523809526,
+      "grad_norm": 1.8691045045852661,
+      "learning_rate": 4.8504963546345334e-05,
+      "loss": 0.4865,
+      "step": 940
+    },
+    {
+      "epoch": 5.984126984126984,
+      "grad_norm": 3.5297420024871826,
+      "learning_rate": 4.8473246714042155e-05,
+      "loss": 0.4623,
+      "step": 945
+    },
+    {
+      "epoch": 6.012698412698413,
+      "grad_norm": 2.059169054031372,
+      "learning_rate": 4.844120756297617e-05,
+      "loss": 0.4164,
+      "step": 950
+    },
+    {
+      "epoch": 6.044444444444444,
+      "grad_norm": 2.4746127128601074,
+      "learning_rate": 4.840884653308735e-05,
+      "loss": 0.3552,
+      "step": 955
+    },
+    {
+      "epoch": 6.076190476190476,
+      "grad_norm": 2.504425287246704,
+      "learning_rate": 4.8376164068735485e-05,
+      "loss": 0.3368,
+      "step": 960
+    },
+    {
+      "epoch": 6.1079365079365076,
+      "grad_norm": 2.062577486038208,
+      "learning_rate": 4.83431606186941e-05,
+      "loss": 0.3139,
+      "step": 965
+    },
+    {
+      "epoch": 6.13968253968254,
+      "grad_norm": 2.4934544563293457,
+      "learning_rate": 4.830983663614427e-05,
+      "loss": 0.3777,
+      "step": 970
+    },
+    {
+      "epoch": 6.171428571428572,
+      "grad_norm": 2.5747485160827637,
+      "learning_rate": 4.827619257866839e-05,
+      "loss": 0.373,
+      "step": 975
+    },
+    {
+      "epoch": 6.203174603174603,
+      "grad_norm": 2.449357271194458,
+      "learning_rate": 4.8242228908243946e-05,
+      "loss": 0.3936,
+      "step": 980
+    },
+    {
+      "epoch": 6.234920634920635,
+      "grad_norm": 2.952680826187134,
+      "learning_rate": 4.82079460912371e-05,
+      "loss": 0.407,
+      "step": 985
+    },
+    {
+      "epoch": 6.266666666666667,
+      "grad_norm": 2.1754496097564697,
+      "learning_rate": 4.817334459839633e-05,
+      "loss": 0.3189,
+      "step": 990
+    },
+    {
+      "epoch": 6.298412698412698,
+      "grad_norm": 2.8406214714050293,
+      "learning_rate": 4.8138424904845947e-05,
+      "loss": 0.3883,
+      "step": 995
+    },
+    {
+      "epoch": 6.33015873015873,
+      "grad_norm": 1.7533257007598877,
+      "learning_rate": 4.8103187490079604e-05,
+      "loss": 0.3131,
+      "step": 1000
+    },
+    {
+      "epoch": 6.3619047619047615,
+      "grad_norm": 2.4574601650238037,
+      "learning_rate": 4.806763283795366e-05,
+      "loss": 0.3606,
+      "step": 1005
+    },
+    {
+      "epoch": 6.393650793650794,
+      "grad_norm": 2.002281427383423,
+      "learning_rate": 4.8031761436680575e-05,
+      "loss": 0.37,
+      "step": 1010
+    },
+    {
+      "epoch": 6.425396825396826,
+      "grad_norm": 2.823315143585205,
+      "learning_rate": 4.79955737788222e-05,
+      "loss": 0.3791,
+      "step": 1015
+    },
+    {
+      "epoch": 6.457142857142857,
+      "grad_norm": 2.7891204357147217,
+      "learning_rate": 4.795907036128299e-05,
+      "loss": 0.3556,
+      "step": 1020
+    },
+    {
+      "epoch": 6.488888888888889,
+      "grad_norm": 2.2387146949768066,
+      "learning_rate": 4.7922251685303213e-05,
+      "loss": 0.3929,
+      "step": 1025
+    },
+    {
+      "epoch": 6.520634920634921,
+      "grad_norm": 2.5023891925811768,
+      "learning_rate": 4.788511825645205e-05,
+      "loss": 0.379,
+      "step": 1030
+    },
+    {
+      "epoch": 6.552380952380952,
+      "grad_norm": 2.2654805183410645,
+      "learning_rate": 4.7847670584620653e-05,
+      "loss": 0.3435,
+      "step": 1035
+    },
+    {
+      "epoch": 6.584126984126984,
+      "grad_norm": 3.3823065757751465,
+      "learning_rate": 4.7809909184015146e-05,
+      "loss": 0.4109,
+      "step": 1040
+    },
+    {
+      "epoch": 6.6158730158730155,
+      "grad_norm": 2.6096551418304443,
+      "learning_rate": 4.7771834573149576e-05,
+      "loss": 0.4233,
+      "step": 1045
+    },
+    {
+      "epoch": 6.647619047619048,
+      "grad_norm": 2.3933897018432617,
+      "learning_rate": 4.773344727483876e-05,
+      "loss": 0.3709,
+      "step": 1050
+    },
+    {
+      "epoch": 6.67936507936508,
+      "grad_norm": 2.189544916152954,
+      "learning_rate": 4.769474781619114e-05,
+      "loss": 0.3287,
+      "step": 1055
+    },
+    {
+      "epoch": 6.711111111111111,
+      "grad_norm": 2.450892686843872,
+      "learning_rate": 4.765573672860154e-05,
+      "loss": 0.4022,
+      "step": 1060
+    },
+    {
+      "epoch": 6.742857142857143,
+      "grad_norm": 2.4342429637908936,
+      "learning_rate": 4.761641454774386e-05,
+      "loss": 0.4029,
+      "step": 1065
+    },
+    {
+      "epoch": 6.7746031746031745,
+      "grad_norm": 2.2122364044189453,
+      "learning_rate": 4.75767818135637e-05,
+      "loss": 0.3322,
+      "step": 1070
+    },
+    {
+      "epoch": 6.806349206349206,
+      "grad_norm": 3.968445301055908,
+      "learning_rate": 4.7536839070271e-05,
+      "loss": 0.3836,
+      "step": 1075
+    },
+    {
+      "epoch": 6.838095238095238,
+      "grad_norm": 3.529158353805542,
+      "learning_rate": 4.749658686633251e-05,
+      "loss": 0.4745,
+      "step": 1080
+    },
+    {
+      "epoch": 6.86984126984127,
+      "grad_norm": 2.430727243423462,
+      "learning_rate": 4.7456025754464304e-05,
+      "loss": 0.3664,
+      "step": 1085
+    },
+    {
+      "epoch": 6.901587301587302,
+      "grad_norm": 2.6552302837371826,
+      "learning_rate": 4.7415156291624166e-05,
+      "loss": 0.4359,
+      "step": 1090
+    },
+    {
+      "epoch": 6.933333333333334,
+      "grad_norm": 2.134822130203247,
+      "learning_rate": 4.737397903900393e-05,
+      "loss": 0.3969,
+      "step": 1095
+    },
+    {
+      "epoch": 6.965079365079365,
+      "grad_norm": 2.5052947998046875,
+      "learning_rate": 4.7332494562021815e-05,
+      "loss": 0.4069,
+      "step": 1100
+    },
+    {
+      "epoch": 6.996825396825397,
+      "grad_norm": 2.1377065181732178,
+      "learning_rate": 4.729070343031463e-05,
+      "loss": 0.3853,
+      "step": 1105
+    },
+    {
+      "epoch": 7.025396825396825,
+      "grad_norm": 1.9704042673110962,
+      "learning_rate": 4.724860621772995e-05,
+      "loss": 0.3283,
+      "step": 1110
+    },
+    {
+      "epoch": 7.057142857142857,
+      "grad_norm": 2.476968765258789,
+      "learning_rate": 4.7206203502318256e-05,
+      "loss": 0.3325,
+      "step": 1115
+    },
+    {
+      "epoch": 7.088888888888889,
+      "grad_norm": 1.9231969118118286,
+      "learning_rate": 4.716349586632499e-05,
+      "loss": 0.2876,
+      "step": 1120
+    },
+    {
+      "epoch": 7.12063492063492,
+      "grad_norm": 2.6444814205169678,
+      "learning_rate": 4.712048389618254e-05,
+      "loss": 0.3005,
+      "step": 1125
+    },
+    {
+      "epoch": 7.152380952380953,
+      "grad_norm": 3.2589964866638184,
+      "learning_rate": 4.7077168182502216e-05,
+      "loss": 0.4023,
+      "step": 1130
+    },
+    {
+      "epoch": 7.184126984126984,
+      "grad_norm": 2.5481936931610107,
+      "learning_rate": 4.703354932006615e-05,
+      "loss": 0.3302,
+      "step": 1135
+    },
+    {
+      "epoch": 7.215873015873016,
+      "grad_norm": 1.7125908136367798,
+      "learning_rate": 4.698962790781906e-05,
+      "loss": 0.3329,
+      "step": 1140
+    },
+    {
+      "epoch": 7.247619047619048,
+      "grad_norm": 2.2756667137145996,
+      "learning_rate": 4.6945404548860115e-05,
+      "loss": 0.3369,
+      "step": 1145
+    },
+    {
+      "epoch": 7.279365079365079,
+      "grad_norm": 2.9158453941345215,
+      "learning_rate": 4.6900879850434604e-05,
+      "loss": 0.3339,
+      "step": 1150
+    },
+    {
+      "epoch": 7.311111111111111,
+      "grad_norm": 2.3047537803649902,
+      "learning_rate": 4.685605442392559e-05,
+      "loss": 0.3915,
+      "step": 1155
+    },
+    {
+      "epoch": 7.3428571428571425,
+      "grad_norm": 2.7815029621124268,
+      "learning_rate": 4.681092888484554e-05,
+      "loss": 0.3317,
+      "step": 1160
+    },
+    {
+      "epoch": 7.374603174603175,
+      "grad_norm": 2.2644097805023193,
+      "learning_rate": 4.676550385282787e-05,
+      "loss": 0.3314,
+      "step": 1165
+    },
+    {
+      "epoch": 7.406349206349207,
+      "grad_norm": 2.5144474506378174,
+      "learning_rate": 4.671977995161843e-05,
+      "loss": 0.3188,
+      "step": 1170
+    },
+    {
+      "epoch": 7.438095238095238,
+      "grad_norm": 3.120821714401245,
+      "learning_rate": 4.667375780906693e-05,
+      "loss": 0.3523,
+      "step": 1175
+    },
+    {
+      "epoch": 7.46984126984127,
+      "grad_norm": 4.47842264175415,
+      "learning_rate": 4.662743805711832e-05,
+      "loss": 0.3611,
+      "step": 1180
+    },
+    {
+      "epoch": 7.501587301587302,
+      "grad_norm": 1.9228928089141846,
+      "learning_rate": 4.658082133180416e-05,
+      "loss": 0.3612,
+      "step": 1185
+    },
+    {
+      "epoch": 7.533333333333333,
+      "grad_norm": 2.1507537364959717,
+      "learning_rate": 4.6533908273233815e-05,
+      "loss": 0.3321,
+      "step": 1190
+    },
+    {
+      "epoch": 7.565079365079365,
+      "grad_norm": 2.1849119663238525,
+      "learning_rate": 4.64866995255857e-05,
+      "loss": 0.2943,
+      "step": 1195
+    },
+    {
+      "epoch": 7.5968253968253965,
+      "grad_norm": 2.1777775287628174,
+      "learning_rate": 4.643919573709843e-05,
+      "loss": 0.353,
+      "step": 1200
+    },
+    {
+      "epoch": 7.628571428571428,
+      "grad_norm": 2.5231118202209473,
+      "learning_rate": 4.639139756006195e-05,
+      "loss": 0.3571,
+      "step": 1205
+    },
+    {
+      "epoch": 7.660317460317461,
+      "grad_norm": 1.8409479856491089,
+      "learning_rate": 4.6343305650808516e-05,
+      "loss": 0.3691,
+      "step": 1210
+    },
+    {
+      "epoch": 7.692063492063492,
+      "grad_norm": 1.7940895557403564,
+      "learning_rate": 4.629492066970373e-05,
+      "loss": 0.3738,
+      "step": 1215
+    },
+    {
+      "epoch": 7.723809523809524,
+      "grad_norm": 2.014902114868164,
+      "learning_rate": 4.6246243281137474e-05,
+      "loss": 0.361,
+      "step": 1220
+    },
+    {
+      "epoch": 7.7555555555555555,
+      "grad_norm": 3.4182560443878174,
+      "learning_rate": 4.6197274153514735e-05,
+      "loss": 0.3663,
+      "step": 1225
+    },
+    {
+      "epoch": 7.787301587301587,
+      "grad_norm": 2.518728256225586,
+      "learning_rate": 4.614801395924649e-05,
+      "loss": 0.3646,
+      "step": 1230
+    },
+    {
+      "epoch": 7.819047619047619,
+      "grad_norm": 2.154189109802246,
+      "learning_rate": 4.6098463374740466e-05,
+      "loss": 0.3331,
+      "step": 1235
+    },
+    {
+      "epoch": 7.85079365079365,
+      "grad_norm": 2.536081075668335,
+      "learning_rate": 4.604862308039177e-05,
+      "loss": 0.3742,
+      "step": 1240
+    },
+    {
+      "epoch": 7.882539682539683,
+      "grad_norm": 2.340764045715332,
+      "learning_rate": 4.599849376057366e-05,
+      "loss": 0.3352,
+      "step": 1245
+    },
+    {
+      "epoch": 7.914285714285715,
+      "grad_norm": 3.5488364696502686,
+      "learning_rate": 4.5948076103628094e-05,
+      "loss": 0.3663,
+      "step": 1250
+    },
+    {
+      "epoch": 7.946031746031746,
+      "grad_norm": 2.779360294342041,
+      "learning_rate": 4.589737080185625e-05,
+      "loss": 0.3362,
+      "step": 1255
+    },
+    {
+      "epoch": 7.977777777777778,
+      "grad_norm": 1.8792667388916016,
+      "learning_rate": 4.5846378551509097e-05,
+      "loss": 0.346,
+      "step": 1260
+    },
+    {
+      "epoch": 8.006349206349206,
+      "grad_norm": 2.453295946121216,
+      "learning_rate": 4.579510005277774e-05,
+      "loss": 0.3509,
+      "step": 1265
+    },
+    {
+      "epoch": 8.038095238095238,
+      "grad_norm": 1.9493130445480347,
+      "learning_rate": 4.574353600978388e-05,
+      "loss": 0.3062,
+      "step": 1270
+    },
+    {
+      "epoch": 8.06984126984127,
+      "grad_norm": 1.9360930919647217,
+      "learning_rate": 4.56916871305701e-05,
+      "loss": 0.3056,
+      "step": 1275
+    },
+    {
+      "epoch": 8.101587301587301,
+      "grad_norm": 1.5592070817947388,
+      "learning_rate": 4.563955412709021e-05,
+      "loss": 0.2785,
+      "step": 1280
+    },
+    {
+      "epoch": 8.133333333333333,
+      "grad_norm": 1.8093425035476685,
+      "learning_rate": 4.5587137715199354e-05,
+      "loss": 0.308,
+      "step": 1285
+    },
+    {
+      "epoch": 8.165079365079364,
+      "grad_norm": 2.2939181327819824,
+      "learning_rate": 4.5534438614644294e-05,
+      "loss": 0.3038,
+      "step": 1290
+    },
+    {
+      "epoch": 8.196825396825396,
+      "grad_norm": 2.4204866886138916,
+      "learning_rate": 4.548145754905346e-05,
+      "loss": 0.3375,
+      "step": 1295
+    },
+    {
+      "epoch": 8.228571428571428,
+      "grad_norm": 1.725534439086914,
+      "learning_rate": 4.5428195245927064e-05,
+      "loss": 0.3101,
+      "step": 1300
+    },
+    {
+      "epoch": 8.260317460317461,
+      "grad_norm": 1.637730360031128,
+      "learning_rate": 4.537465243662704e-05,
+      "loss": 0.2931,
+      "step": 1305
+    },
+    {
+      "epoch": 8.292063492063493,
+      "grad_norm": 1.3372169733047485,
+      "learning_rate": 4.532082985636709e-05,
+      "loss": 0.2763,
+      "step": 1310
+    },
+    {
+      "epoch": 8.323809523809524,
+      "grad_norm": 2.5993168354034424,
+      "learning_rate": 4.5266728244202494e-05,
+      "loss": 0.3458,
+      "step": 1315
+    },
+    {
+      "epoch": 8.355555555555556,
+      "grad_norm": 2.461862564086914,
+      "learning_rate": 4.521234834302006e-05,
+      "loss": 0.3693,
+      "step": 1320
+    },
+    {
+      "epoch": 8.387301587301588,
+      "grad_norm": 1.8519413471221924,
+      "learning_rate": 4.5157690899527816e-05,
+      "loss": 0.3327,
+      "step": 1325
+    },
+    {
+      "epoch": 8.41904761904762,
+      "grad_norm": 2.1535580158233643,
+      "learning_rate": 4.510275666424487e-05,
+      "loss": 0.3229,
+      "step": 1330
+    },
+    {
+      "epoch": 8.450793650793651,
+      "grad_norm": 1.6819690465927124,
+      "learning_rate": 4.5047546391491e-05,
+      "loss": 0.2925,
+      "step": 1335
+    },
+    {
+      "epoch": 8.482539682539683,
+      "grad_norm": 1.6538281440734863,
+      "learning_rate": 4.499206083937638e-05,
+      "loss": 0.3218,
+      "step": 1340
+    },
+    {
+      "epoch": 8.514285714285714,
+      "grad_norm": 1.8956862688064575,
+      "learning_rate": 4.493630076979112e-05,
+      "loss": 0.3423,
+      "step": 1345
+    },
+    {
+      "epoch": 8.546031746031746,
+      "grad_norm": 2.274681806564331,
+      "learning_rate": 4.48802669483948e-05,
+      "loss": 0.3152,
+      "step": 1350
+    },
+    {
+      "epoch": 8.577777777777778,
+      "grad_norm": 2.2956337928771973,
+      "learning_rate": 4.4823960144606014e-05,
+      "loss": 0.3417,
+      "step": 1355
+    },
+    {
+      "epoch": 8.60952380952381,
+      "grad_norm": 1.8650286197662354,
+      "learning_rate": 4.4767381131591734e-05,
+      "loss": 0.2896,
+      "step": 1360
+    },
+    {
+      "epoch": 8.64126984126984,
+      "grad_norm": 1.3998652696609497,
+      "learning_rate": 4.471053068625674e-05,
+      "loss": 0.3372,
+      "step": 1365
+    },
+    {
+      "epoch": 8.673015873015872,
+      "grad_norm": 2.855074167251587,
+      "learning_rate": 4.465340958923293e-05,
+      "loss": 0.332,
+      "step": 1370
+    },
+    {
+      "epoch": 8.704761904761904,
+      "grad_norm": 1.6865357160568237,
+      "learning_rate": 4.459601862486862e-05,
+      "loss": 0.3053,
+      "step": 1375
+    },
+    {
+      "epoch": 8.736507936507937,
+      "grad_norm": 2.501856803894043,
+      "learning_rate": 4.453835858121773e-05,
+      "loss": 0.3119,
+      "step": 1380
+    },
+    {
+      "epoch": 8.768253968253969,
+      "grad_norm": 2.4325456619262695,
+      "learning_rate": 4.4480430250029046e-05,
+      "loss": 0.3395,
+      "step": 1385
+    },
+    {
+      "epoch": 8.8,
+      "grad_norm": 1.4845948219299316,
+      "learning_rate": 4.4422234426735256e-05,
+      "loss": 0.3237,
+      "step": 1390
+    },
+    {
+      "epoch": 8.831746031746032,
+      "grad_norm": 1.3553249835968018,
+      "learning_rate": 4.436377191044208e-05,
+      "loss": 0.3387,
+      "step": 1395
+    },
+    {
+      "epoch": 8.863492063492064,
+      "grad_norm": 1.8338890075683594,
+      "learning_rate": 4.430504350391729e-05,
+      "loss": 0.3618,
+      "step": 1400
+    },
+    {
+      "epoch": 8.895238095238096,
+      "grad_norm": 2.291538953781128,
+      "learning_rate": 4.4246050013579686e-05,
+      "loss": 0.3608,
+      "step": 1405
+    },
+    {
+      "epoch": 8.926984126984127,
+      "grad_norm": 1.3809788227081299,
+      "learning_rate": 4.4186792249488005e-05,
+      "loss": 0.3077,
+      "step": 1410
+    },
+    {
+      "epoch": 8.958730158730159,
+      "grad_norm": 1.5944230556488037,
+      "learning_rate": 4.412727102532983e-05,
+      "loss": 0.3307,
+      "step": 1415
+    },
+    {
+      "epoch": 8.99047619047619,
+      "grad_norm": 2.2244362831115723,
+      "learning_rate": 4.4067487158410396e-05,
+      "loss": 0.3469,
+      "step": 1420
+    },
+    {
+      "epoch": 9.019047619047619,
+      "grad_norm": 1.444221019744873,
+      "learning_rate": 4.400744146964136e-05,
+      "loss": 0.3049,
+      "step": 1425
+    },
+    {
+      "epoch": 9.05079365079365,
+      "grad_norm": 1.5847752094268799,
+      "learning_rate": 4.394713478352955e-05,
+      "loss": 0.2715,
+      "step": 1430
+    },
+    {
+      "epoch": 9.082539682539682,
+      "grad_norm": 1.6062681674957275,
+      "learning_rate": 4.388656792816562e-05,
+      "loss": 0.2487,
+      "step": 1435
+    },
+    {
+      "epoch": 9.114285714285714,
+      "grad_norm": 2.099787712097168,
+      "learning_rate": 4.382574173521272e-05,
+      "loss": 0.2866,
+      "step": 1440
+    },
+    {
+      "epoch": 9.146031746031746,
+      "grad_norm": 1.0997334718704224,
+      "learning_rate": 4.376465703989502e-05,
+      "loss": 0.3052,
+      "step": 1445
+    },
+    {
+      "epoch": 9.177777777777777,
+      "grad_norm": 2.4327454566955566,
+      "learning_rate": 4.370331468098628e-05,
+      "loss": 0.3212,
+      "step": 1450
+    },
+    {
+      "epoch": 9.209523809523809,
+      "grad_norm": 1.4816385507583618,
+      "learning_rate": 4.364171550079833e-05,
+      "loss": 0.3046,
+      "step": 1455
+    },
+    {
+      "epoch": 9.24126984126984,
+      "grad_norm": 2.039186716079712,
+      "learning_rate": 4.357986034516947e-05,
+      "loss": 0.3165,
+      "step": 1460
+    },
+    {
+      "epoch": 9.273015873015874,
+      "grad_norm": 1.437852382659912,
+      "learning_rate": 4.3517750063452934e-05,
+      "loss": 0.3037,
+      "step": 1465
+    },
+    {
+      "epoch": 9.304761904761905,
+      "grad_norm": 1.818982720375061,
+      "learning_rate": 4.345538550850512e-05,
+      "loss": 0.3122,
+      "step": 1470
+    },
+    {
+      "epoch": 9.336507936507937,
+      "grad_norm": 1.12025785446167,
+      "learning_rate": 4.339276753667395e-05,
+      "loss": 0.2909,
+      "step": 1475
+    },
+    {
+      "epoch": 9.368253968253969,
+      "grad_norm": 1.6094844341278076,
+      "learning_rate": 4.3329897007787125e-05,
+      "loss": 0.2823,
+      "step": 1480
+    },
+    {
+      "epoch": 9.4,
+      "grad_norm": 1.916200041770935,
+      "learning_rate": 4.326677478514024e-05,
+      "loss": 0.2939,
+      "step": 1485
+    },
+    {
+      "epoch": 9.431746031746032,
+      "grad_norm": 1.97919499874115,
+      "learning_rate": 4.320340173548503e-05,
+      "loss": 0.2826,
+      "step": 1490
+    },
+    {
+      "epoch": 9.463492063492064,
+      "grad_norm": 2.0238938331604004,
+      "learning_rate": 4.313977872901737e-05,
+      "loss": 0.3273,
+      "step": 1495
+    },
+    {
+      "epoch": 9.495238095238095,
+      "grad_norm": 2.5840957164764404,
+      "learning_rate": 4.307590663936541e-05,
+      "loss": 0.2889,
+      "step": 1500
+    },
+    {
+      "epoch": 9.526984126984127,
+      "grad_norm": 2.3503904342651367,
+      "learning_rate": 4.30117863435775e-05,
+      "loss": 0.3012,
+      "step": 1505
+    },
+    {
+      "epoch": 9.558730158730159,
+      "grad_norm": 2.019792318344116,
+      "learning_rate": 4.294741872211024e-05,
+      "loss": 0.3267,
+      "step": 1510
+    },
+    {
+      "epoch": 9.59047619047619,
+      "grad_norm": 2.2713353633880615,
+      "learning_rate": 4.288280465881632e-05,
+      "loss": 0.3096,
+      "step": 1515
+    },
+    {
+      "epoch": 9.622222222222222,
+      "grad_norm": 2.4236693382263184,
+      "learning_rate": 4.281794504093237e-05,
+      "loss": 0.3291,
+      "step": 1520
+    },
+    {
+      "epoch": 9.653968253968253,
+      "grad_norm": 1.772703766822815,
+      "learning_rate": 4.275284075906686e-05,
+      "loss": 0.3117,
+      "step": 1525
+    },
+    {
+      "epoch": 9.685714285714285,
+      "grad_norm": 1.9665186405181885,
+      "learning_rate": 4.268749270718778e-05,
+      "loss": 0.326,
+      "step": 1530
+    },
+    {
+      "epoch": 9.717460317460317,
+      "grad_norm": 1.9472782611846924,
+      "learning_rate": 4.262190178261044e-05,
+      "loss": 0.2683,
+      "step": 1535
+    },
+    {
+      "epoch": 9.74920634920635,
+      "grad_norm": 2.0638089179992676,
+      "learning_rate": 4.255606888598508e-05,
+      "loss": 0.314,
+      "step": 1540
+    },
+    {
+      "epoch": 9.780952380952382,
+      "grad_norm": 2.1349925994873047,
+      "learning_rate": 4.248999492128456e-05,
+      "loss": 0.2897,
+      "step": 1545
+    },
+    {
+      "epoch": 9.812698412698413,
+      "grad_norm": 2.112536907196045,
+      "learning_rate": 4.242368079579192e-05,
+      "loss": 0.31,
+      "step": 1550
+    },
+    {
+      "epoch": 9.844444444444445,
+      "grad_norm": 1.6859878301620483,
+      "learning_rate": 4.2357127420087917e-05,
+      "loss": 0.3412,
+      "step": 1555
+    },
+    {
+      "epoch": 9.876190476190477,
+      "grad_norm": 1.9178651571273804,
+      "learning_rate": 4.229033570803853e-05,
+      "loss": 0.334,
+      "step": 1560
+    },
+    {
+      "epoch": 9.907936507936508,
+      "grad_norm": 2.562436103820801,
+      "learning_rate": 4.2223306576782426e-05,
+      "loss": 0.3379,
+      "step": 1565
+    },
+    {
+      "epoch": 9.93968253968254,
+      "grad_norm": 1.8472412824630737,
+      "learning_rate": 4.215604094671835e-05,
+      "loss": 0.3415,
+      "step": 1570
+    },
+    {
+      "epoch": 9.971428571428572,
+      "grad_norm": 1.9416279792785645,
+      "learning_rate": 4.208853974149246e-05,
+      "loss": 0.3085,
+      "step": 1575
+    },
+    {
+      "epoch": 10.0,
+      "grad_norm": 2.0056397914886475,
+      "learning_rate": 4.202080388798571e-05,
+      "loss": 0.3263,
+      "step": 1580
+    },
+    {
+      "epoch": 10.031746031746032,
+      "grad_norm": 2.195781946182251,
+      "learning_rate": 4.1952834316301065e-05,
+      "loss": 0.2867,
+      "step": 1585
+    },
+    {
+      "epoch": 10.063492063492063,
+      "grad_norm": 1.7489805221557617,
+      "learning_rate": 4.1884631959750766e-05,
+      "loss": 0.2589,
+      "step": 1590
+    },
+    {
+      "epoch": 10.095238095238095,
+      "grad_norm": 1.9361369609832764,
+      "learning_rate": 4.181619775484348e-05,
+      "loss": 0.2722,
+      "step": 1595
+    },
+    {
+      "epoch": 10.126984126984127,
+      "grad_norm": 2.24322509765625,
+      "learning_rate": 4.174753264127147e-05,
+      "loss": 0.2534,
+      "step": 1600
+    },
+    {
+      "epoch": 10.158730158730158,
+      "grad_norm": 2.4550466537475586,
+      "learning_rate": 4.167863756189767e-05,
+      "loss": 0.2777,
+      "step": 1605
+    },
+    {
+      "epoch": 10.19047619047619,
+      "grad_norm": 1.9439811706542969,
+      "learning_rate": 4.160951346274278e-05,
+      "loss": 0.2864,
+      "step": 1610
+    },
+    {
+      "epoch": 10.222222222222221,
+      "grad_norm": 1.633494257926941,
+      "learning_rate": 4.154016129297219e-05,
+      "loss": 0.2858,
+      "step": 1615
+    },
+    {
+      "epoch": 10.253968253968253,
+      "grad_norm": 1.69782292842865,
+      "learning_rate": 4.147058200488305e-05,
+      "loss": 0.2942,
+      "step": 1620
+    },
+    {
+      "epoch": 10.285714285714286,
+      "grad_norm": 1.613031268119812,
+      "learning_rate": 4.140077655389113e-05,
+      "loss": 0.2632,
+      "step": 1625
+    },
+    {
+      "epoch": 10.317460317460318,
+      "grad_norm": 2.0266177654266357,
+      "learning_rate": 4.1330745898517714e-05,
+      "loss": 0.3011,
+      "step": 1630
+    },
+    {
+      "epoch": 10.34920634920635,
+      "grad_norm": 1.8945387601852417,
+      "learning_rate": 4.1260491000376446e-05,
+      "loss": 0.2832,
+      "step": 1635
+    },
+    {
+      "epoch": 10.380952380952381,
+      "grad_norm": 1.7012510299682617,
+      "learning_rate": 4.119001282416009e-05,
+      "loss": 0.2718,
+      "step": 1640
+    },
+    {
+      "epoch": 10.412698412698413,
+      "grad_norm": 1.5538525581359863,
+      "learning_rate": 4.111931233762738e-05,
+      "loss": 0.3232,
+      "step": 1645
+    },
+    {
+      "epoch": 10.444444444444445,
+      "grad_norm": 2.3083150386810303,
+      "learning_rate": 4.1048390511589595e-05,
+      "loss": 0.3057,
+      "step": 1650
+    },
+    {
+      "epoch": 10.476190476190476,
+      "grad_norm": 1.293314814567566,
+      "learning_rate": 4.097724831989733e-05,
+      "loss": 0.2523,
+      "step": 1655
+    },
+    {
+      "epoch": 10.507936507936508,
+      "grad_norm": 2.517212152481079,
+      "learning_rate": 4.09058867394271e-05,
+      "loss": 0.3269,
+      "step": 1660
+    },
+    {
+      "epoch": 10.53968253968254,
+      "grad_norm": 2.057063102722168,
+      "learning_rate": 4.083430675006791e-05,
+      "loss": 0.2844,
+      "step": 1665
+    },
+    {
+      "epoch": 10.571428571428571,
+      "grad_norm": 1.5663833618164062,
+      "learning_rate": 4.0762509334707786e-05,
+      "loss": 0.3005,
+      "step": 1670
+    },
+    {
+      "epoch": 10.603174603174603,
+      "grad_norm": 2.5423505306243896,
+      "learning_rate": 4.069049547922035e-05,
+      "loss": 0.2802,
+      "step": 1675
+    },
+    {
+      "epoch": 10.634920634920634,
+      "grad_norm": 1.578316569328308,
+      "learning_rate": 4.061826617245119e-05,
+      "loss": 0.2667,
+      "step": 1680
+    },
+    {
+      "epoch": 10.666666666666666,
+      "grad_norm": 1.502928376197815,
+      "learning_rate": 4.0545822406204334e-05,
+      "loss": 0.3059,
+      "step": 1685
+    },
+    {
+      "epoch": 10.698412698412698,
+      "grad_norm": 1.2470905780792236,
+      "learning_rate": 4.047316517522864e-05,
+      "loss": 0.2879,
+      "step": 1690
+    },
+    {
+      "epoch": 10.73015873015873,
+      "grad_norm": 1.8238775730133057,
+      "learning_rate": 4.0400295477204105e-05,
+      "loss": 0.2923,
+      "step": 1695
+    },
+    {
+      "epoch": 10.761904761904763,
+      "grad_norm": 2.0516586303710938,
+      "learning_rate": 4.032721431272819e-05,
+      "loss": 0.3086,
+      "step": 1700
+    },
+    {
+      "epoch": 10.793650793650794,
+      "grad_norm": 1.3188791275024414,
+      "learning_rate": 4.0253922685302046e-05,
+      "loss": 0.2893,
+      "step": 1705
+    },
+    {
+      "epoch": 10.825396825396826,
+      "grad_norm": 1.7352266311645508,
+      "learning_rate": 4.01804216013168e-05,
+      "loss": 0.2981,
+      "step": 1710
+    },
+    {
+      "epoch": 10.857142857142858,
+      "grad_norm": 1.3449515104293823,
+      "learning_rate": 4.0106712070039656e-05,
+      "loss": 0.2841,
+      "step": 1715
+    },
+    {
+      "epoch": 10.88888888888889,
+      "grad_norm": 2.505431890487671,
+      "learning_rate": 4.00327951036001e-05,
+      "loss": 0.3034,
+      "step": 1720
+    },
+    {
+      "epoch": 10.920634920634921,
+      "grad_norm": 1.8870325088500977,
+      "learning_rate": 3.9958671716975966e-05,
+      "loss": 0.305,
+      "step": 1725
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 4710,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 30,
+  "save_steps": 157,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.709832115152486e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1727/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1884/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.0

checkpoint-1884/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "q_proj",
+    "up_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1884/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-1884/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1884/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-1884/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,209 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-1884/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2666 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 11.926984126984127,
+  "eval_steps": 500,
+  "global_step": 1884,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.031746031746031744,
+      "grad_norm": 0.5545095205307007,
+      "learning_rate": 5.307855626326963e-07,
+      "loss": 3.7162,
+      "step": 5
+    },
+    {
+      "epoch": 0.06349206349206349,
+      "grad_norm": 0.6163601279258728,
+      "learning_rate": 1.0615711252653927e-06,
+      "loss": 3.9388,
+      "step": 10
+    },
+    {
+      "epoch": 0.09523809523809523,
+      "grad_norm": 0.5541530847549438,
+      "learning_rate": 1.5923566878980892e-06,
+      "loss": 3.9165,
+      "step": 15
+    },
+    {
+      "epoch": 0.12698412698412698,
+      "grad_norm": 0.457332044839859,
+      "learning_rate": 2.1231422505307854e-06,
+      "loss": 3.7326,
+      "step": 20
+    },
+    {
+      "epoch": 0.15873015873015872,
+      "grad_norm": 0.5335279107093811,
+      "learning_rate": 2.653927813163482e-06,
+      "loss": 3.8251,
+      "step": 25
+    },
+    {
+      "epoch": 0.19047619047619047,
+      "grad_norm": 0.7080379724502563,
+      "learning_rate": 3.1847133757961785e-06,
+      "loss": 3.7534,
+      "step": 30
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 0.520993709564209,
+      "learning_rate": 3.715498938428875e-06,
+      "loss": 3.898,
+      "step": 35
+    },
+    {
+      "epoch": 0.25396825396825395,
+      "grad_norm": 0.5451405644416809,
+      "learning_rate": 4.246284501061571e-06,
+      "loss": 3.8951,
+      "step": 40
+    },
+    {
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.6205154657363892,
+      "learning_rate": 4.777070063694268e-06,
+      "loss": 3.7666,
+      "step": 45
+    },
+    {
+      "epoch": 0.31746031746031744,
+      "grad_norm": 0.7404439449310303,
+      "learning_rate": 5.307855626326964e-06,
+      "loss": 4.0258,
+      "step": 50
+    },
+    {
+      "epoch": 0.3492063492063492,
+      "grad_norm": 0.6272220015525818,
+      "learning_rate": 5.838641188959661e-06,
+      "loss": 3.8464,
+      "step": 55
+    },
+    {
+      "epoch": 0.38095238095238093,
+      "grad_norm": 0.7744691967964172,
+      "learning_rate": 6.369426751592357e-06,
+      "loss": 3.7299,
+      "step": 60
+    },
+    {
+      "epoch": 0.4126984126984127,
+      "grad_norm": 0.8805738687515259,
+      "learning_rate": 6.900212314225053e-06,
+      "loss": 3.5008,
+      "step": 65
+    },
+    {
+      "epoch": 0.4444444444444444,
+      "grad_norm": 1.0740723609924316,
+      "learning_rate": 7.43099787685775e-06,
+      "loss": 3.7552,
+      "step": 70
+    },
+    {
+      "epoch": 0.47619047619047616,
+      "grad_norm": 0.965708315372467,
+      "learning_rate": 7.961783439490445e-06,
+      "loss": 3.5516,
+      "step": 75
+    },
+    {
+      "epoch": 0.5079365079365079,
+      "grad_norm": 0.9812778234481812,
+      "learning_rate": 8.492569002123141e-06,
+      "loss": 3.6003,
+      "step": 80
+    },
+    {
+      "epoch": 0.5396825396825397,
+      "grad_norm": 0.8831024169921875,
+      "learning_rate": 9.023354564755838e-06,
+      "loss": 3.613,
+      "step": 85
+    },
+    {
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.8358364105224609,
+      "learning_rate": 9.554140127388536e-06,
+      "loss": 3.1858,
+      "step": 90
+    },
+    {
+      "epoch": 0.6031746031746031,
+      "grad_norm": 1.0740444660186768,
+      "learning_rate": 1.0084925690021232e-05,
+      "loss": 3.0937,
+      "step": 95
+    },
+    {
+      "epoch": 0.6349206349206349,
+      "grad_norm": 1.0987530946731567,
+      "learning_rate": 1.0615711252653929e-05,
+      "loss": 3.154,
+      "step": 100
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 1.2300925254821777,
+      "learning_rate": 1.1146496815286625e-05,
+      "loss": 2.9414,
+      "step": 105
+    },
+    {
+      "epoch": 0.6984126984126984,
+      "grad_norm": 1.2214170694351196,
+      "learning_rate": 1.1677282377919321e-05,
+      "loss": 2.9464,
+      "step": 110
+    },
+    {
+      "epoch": 0.7301587301587301,
+      "grad_norm": 1.2803975343704224,
+      "learning_rate": 1.2208067940552018e-05,
+      "loss": 2.8921,
+      "step": 115
+    },
+    {
+      "epoch": 0.7619047619047619,
+      "grad_norm": 1.2232719659805298,
+      "learning_rate": 1.2738853503184714e-05,
+      "loss": 2.5252,
+      "step": 120
+    },
+    {
+      "epoch": 0.7936507936507936,
+      "grad_norm": 1.204835295677185,
+      "learning_rate": 1.326963906581741e-05,
+      "loss": 2.5215,
+      "step": 125
+    },
+    {
+      "epoch": 0.8253968253968254,
+      "grad_norm": 1.4095579385757446,
+      "learning_rate": 1.3800424628450107e-05,
+      "loss": 2.136,
+      "step": 130
+    },
+    {
+      "epoch": 0.8571428571428571,
+      "grad_norm": 1.4166598320007324,
+      "learning_rate": 1.4331210191082803e-05,
+      "loss": 2.2653,
+      "step": 135
+    },
+    {
+      "epoch": 0.8888888888888888,
+      "grad_norm": 1.3040446043014526,
+      "learning_rate": 1.48619957537155e-05,
+      "loss": 2.0193,
+      "step": 140
+    },
+    {
+      "epoch": 0.9206349206349206,
+      "grad_norm": 1.4114688634872437,
+      "learning_rate": 1.5392781316348196e-05,
+      "loss": 1.7935,
+      "step": 145
+    },
+    {
+      "epoch": 0.9523809523809523,
+      "grad_norm": 1.8066726922988892,
+      "learning_rate": 1.592356687898089e-05,
+      "loss": 1.5731,
+      "step": 150
+    },
+    {
+      "epoch": 0.9841269841269841,
+      "grad_norm": 1.4303158521652222,
+      "learning_rate": 1.6454352441613588e-05,
+      "loss": 1.6552,
+      "step": 155
+    },
+    {
+      "epoch": 1.0126984126984127,
+      "grad_norm": 1.6671762466430664,
+      "learning_rate": 1.6985138004246283e-05,
+      "loss": 1.6973,
+      "step": 160
+    },
+    {
+      "epoch": 1.0444444444444445,
+      "grad_norm": 1.5719650983810425,
+      "learning_rate": 1.751592356687898e-05,
+      "loss": 1.312,
+      "step": 165
+    },
+    {
+      "epoch": 1.0761904761904761,
+      "grad_norm": 1.4845054149627686,
+      "learning_rate": 1.8046709129511676e-05,
+      "loss": 1.3601,
+      "step": 170
+    },
+    {
+      "epoch": 1.107936507936508,
+      "grad_norm": 1.1172235012054443,
+      "learning_rate": 1.8577494692144374e-05,
+      "loss": 1.3137,
+      "step": 175
+    },
+    {
+      "epoch": 1.1396825396825396,
+      "grad_norm": 1.9621731042861938,
+      "learning_rate": 1.910828025477707e-05,
+      "loss": 1.1778,
+      "step": 180
+    },
+    {
+      "epoch": 1.1714285714285715,
+      "grad_norm": 1.7722721099853516,
+      "learning_rate": 1.963906581740977e-05,
+      "loss": 1.4534,
+      "step": 185
+    },
+    {
+      "epoch": 1.2031746031746031,
+      "grad_norm": 1.3677467107772827,
+      "learning_rate": 2.0169851380042464e-05,
+      "loss": 1.3356,
+      "step": 190
+    },
+    {
+      "epoch": 1.234920634920635,
+      "grad_norm": 1.3260482549667358,
+      "learning_rate": 2.0700636942675162e-05,
+      "loss": 1.0876,
+      "step": 195
+    },
+    {
+      "epoch": 1.2666666666666666,
+      "grad_norm": 1.5176818370819092,
+      "learning_rate": 2.1231422505307857e-05,
+      "loss": 1.1602,
+      "step": 200
+    },
+    {
+      "epoch": 1.2984126984126985,
+      "grad_norm": 1.2793077230453491,
+      "learning_rate": 2.1762208067940555e-05,
+      "loss": 1.1505,
+      "step": 205
+    },
+    {
+      "epoch": 1.33015873015873,
+      "grad_norm": 1.196784257888794,
+      "learning_rate": 2.229299363057325e-05,
+      "loss": 1.0664,
+      "step": 210
+    },
+    {
+      "epoch": 1.361904761904762,
+      "grad_norm": 1.303207516670227,
+      "learning_rate": 2.2823779193205948e-05,
+      "loss": 1.2557,
+      "step": 215
+    },
+    {
+      "epoch": 1.3936507936507936,
+      "grad_norm": 1.2853388786315918,
+      "learning_rate": 2.3354564755838642e-05,
+      "loss": 1.0704,
+      "step": 220
+    },
+    {
+      "epoch": 1.4253968253968254,
+      "grad_norm": 1.381369948387146,
+      "learning_rate": 2.388535031847134e-05,
+      "loss": 1.1371,
+      "step": 225
+    },
+    {
+      "epoch": 1.457142857142857,
+      "grad_norm": 1.8012712001800537,
+      "learning_rate": 2.4416135881104035e-05,
+      "loss": 1.248,
+      "step": 230
+    },
+    {
+      "epoch": 1.488888888888889,
+      "grad_norm": 1.7397032976150513,
+      "learning_rate": 2.4946921443736733e-05,
+      "loss": 1.2782,
+      "step": 235
+    },
+    {
+      "epoch": 1.5206349206349206,
+      "grad_norm": 1.4026210308074951,
+      "learning_rate": 2.5477707006369428e-05,
+      "loss": 1.154,
+      "step": 240
+    },
+    {
+      "epoch": 1.5523809523809524,
+      "grad_norm": 1.2906067371368408,
+      "learning_rate": 2.6008492569002126e-05,
+      "loss": 0.9141,
+      "step": 245
+    },
+    {
+      "epoch": 1.5841269841269843,
+      "grad_norm": 1.265598177909851,
+      "learning_rate": 2.653927813163482e-05,
+      "loss": 1.0625,
+      "step": 250
+    },
+    {
+      "epoch": 1.615873015873016,
+      "grad_norm": 1.6044715642929077,
+      "learning_rate": 2.707006369426752e-05,
+      "loss": 0.9624,
+      "step": 255
+    },
+    {
+      "epoch": 1.6476190476190475,
+      "grad_norm": 1.4612747430801392,
+      "learning_rate": 2.7600849256900213e-05,
+      "loss": 1.0413,
+      "step": 260
+    },
+    {
+      "epoch": 1.6793650793650794,
+      "grad_norm": 1.6222745180130005,
+      "learning_rate": 2.8131634819532908e-05,
+      "loss": 1.0929,
+      "step": 265
+    },
+    {
+      "epoch": 1.7111111111111112,
+      "grad_norm": 1.1456222534179688,
+      "learning_rate": 2.8662420382165606e-05,
+      "loss": 0.9957,
+      "step": 270
+    },
+    {
+      "epoch": 1.7428571428571429,
+      "grad_norm": 1.5746041536331177,
+      "learning_rate": 2.91932059447983e-05,
+      "loss": 1.0274,
+      "step": 275
+    },
+    {
+      "epoch": 1.7746031746031745,
+      "grad_norm": 1.3407832384109497,
+      "learning_rate": 2.9723991507431e-05,
+      "loss": 0.9487,
+      "step": 280
+    },
+    {
+      "epoch": 1.8063492063492064,
+      "grad_norm": 1.6232194900512695,
+      "learning_rate": 3.0254777070063693e-05,
+      "loss": 1.0966,
+      "step": 285
+    },
+    {
+      "epoch": 1.8380952380952382,
+      "grad_norm": 1.4920552968978882,
+      "learning_rate": 3.078556263269639e-05,
+      "loss": 0.9099,
+      "step": 290
+    },
+    {
+      "epoch": 1.8698412698412699,
+      "grad_norm": 1.2123301029205322,
+      "learning_rate": 3.1316348195329086e-05,
+      "loss": 1.0902,
+      "step": 295
+    },
+    {
+      "epoch": 1.9015873015873015,
+      "grad_norm": 1.2080968618392944,
+      "learning_rate": 3.184713375796178e-05,
+      "loss": 0.943,
+      "step": 300
+    },
+    {
+      "epoch": 1.9333333333333333,
+      "grad_norm": 1.190319299697876,
+      "learning_rate": 3.237791932059448e-05,
+      "loss": 0.7893,
+      "step": 305
+    },
+    {
+      "epoch": 1.9650793650793652,
+      "grad_norm": 1.5929204225540161,
+      "learning_rate": 3.2908704883227177e-05,
+      "loss": 1.0232,
+      "step": 310
+    },
+    {
+      "epoch": 1.9968253968253968,
+      "grad_norm": 1.0138347148895264,
+      "learning_rate": 3.343949044585987e-05,
+      "loss": 0.6693,
+      "step": 315
+    },
+    {
+      "epoch": 2.0253968253968253,
+      "grad_norm": 1.3012847900390625,
+      "learning_rate": 3.3970276008492566e-05,
+      "loss": 0.8355,
+      "step": 320
+    },
+    {
+      "epoch": 2.057142857142857,
+      "grad_norm": 1.2264782190322876,
+      "learning_rate": 3.450106157112527e-05,
+      "loss": 0.9872,
+      "step": 325
+    },
+    {
+      "epoch": 2.088888888888889,
+      "grad_norm": 1.139275312423706,
+      "learning_rate": 3.503184713375796e-05,
+      "loss": 0.8662,
+      "step": 330
+    },
+    {
+      "epoch": 2.1206349206349207,
+      "grad_norm": 1.3836581707000732,
+      "learning_rate": 3.5562632696390657e-05,
+      "loss": 0.9549,
+      "step": 335
+    },
+    {
+      "epoch": 2.1523809523809523,
+      "grad_norm": 1.368600845336914,
+      "learning_rate": 3.609341825902335e-05,
+      "loss": 0.9195,
+      "step": 340
+    },
+    {
+      "epoch": 2.1841269841269844,
+      "grad_norm": 1.8793011903762817,
+      "learning_rate": 3.662420382165605e-05,
+      "loss": 0.8505,
+      "step": 345
+    },
+    {
+      "epoch": 2.215873015873016,
+      "grad_norm": 1.305284023284912,
+      "learning_rate": 3.715498938428875e-05,
+      "loss": 0.7755,
+      "step": 350
+    },
+    {
+      "epoch": 2.2476190476190476,
+      "grad_norm": 1.7851749658584595,
+      "learning_rate": 3.768577494692145e-05,
+      "loss": 0.9242,
+      "step": 355
+    },
+    {
+      "epoch": 2.2793650793650793,
+      "grad_norm": 1.4341535568237305,
+      "learning_rate": 3.821656050955414e-05,
+      "loss": 0.8221,
+      "step": 360
+    },
+    {
+      "epoch": 2.311111111111111,
+      "grad_norm": 1.39107346534729,
+      "learning_rate": 3.874734607218684e-05,
+      "loss": 0.6999,
+      "step": 365
+    },
+    {
+      "epoch": 2.342857142857143,
+      "grad_norm": 1.2304264307022095,
+      "learning_rate": 3.927813163481954e-05,
+      "loss": 0.8362,
+      "step": 370
+    },
+    {
+      "epoch": 2.3746031746031746,
+      "grad_norm": 1.8470840454101562,
+      "learning_rate": 3.9808917197452234e-05,
+      "loss": 0.9398,
+      "step": 375
+    },
+    {
+      "epoch": 2.4063492063492062,
+      "grad_norm": 1.2533882856369019,
+      "learning_rate": 4.033970276008493e-05,
+      "loss": 0.7754,
+      "step": 380
+    },
+    {
+      "epoch": 2.4380952380952383,
+      "grad_norm": 1.5335006713867188,
+      "learning_rate": 4.087048832271762e-05,
+      "loss": 1.1124,
+      "step": 385
+    },
+    {
+      "epoch": 2.46984126984127,
+      "grad_norm": 1.5298357009887695,
+      "learning_rate": 4.1401273885350325e-05,
+      "loss": 1.017,
+      "step": 390
+    },
+    {
+      "epoch": 2.5015873015873016,
+      "grad_norm": 1.4403260946273804,
+      "learning_rate": 4.193205944798302e-05,
+      "loss": 0.8831,
+      "step": 395
+    },
+    {
+      "epoch": 2.533333333333333,
+      "grad_norm": 1.1528433561325073,
+      "learning_rate": 4.2462845010615714e-05,
+      "loss": 0.801,
+      "step": 400
+    },
+    {
+      "epoch": 2.565079365079365,
+      "grad_norm": 1.3371326923370361,
+      "learning_rate": 4.299363057324841e-05,
+      "loss": 0.8692,
+      "step": 405
+    },
+    {
+      "epoch": 2.596825396825397,
+      "grad_norm": 1.4064775705337524,
+      "learning_rate": 4.352441613588111e-05,
+      "loss": 0.9059,
+      "step": 410
+    },
+    {
+      "epoch": 2.6285714285714286,
+      "grad_norm": 1.4531422853469849,
+      "learning_rate": 4.4055201698513805e-05,
+      "loss": 0.7344,
+      "step": 415
+    },
+    {
+      "epoch": 2.66031746031746,
+      "grad_norm": 1.7043890953063965,
+      "learning_rate": 4.45859872611465e-05,
+      "loss": 0.8298,
+      "step": 420
+    },
+    {
+      "epoch": 2.6920634920634923,
+      "grad_norm": 1.5105586051940918,
+      "learning_rate": 4.5116772823779194e-05,
+      "loss": 0.7768,
+      "step": 425
+    },
+    {
+      "epoch": 2.723809523809524,
+      "grad_norm": 1.8101528882980347,
+      "learning_rate": 4.5647558386411895e-05,
+      "loss": 0.733,
+      "step": 430
+    },
+    {
+      "epoch": 2.7555555555555555,
+      "grad_norm": 1.6365174055099487,
+      "learning_rate": 4.617834394904459e-05,
+      "loss": 0.8061,
+      "step": 435
+    },
+    {
+      "epoch": 2.787301587301587,
+      "grad_norm": 1.7808202505111694,
+      "learning_rate": 4.6709129511677285e-05,
+      "loss": 0.8333,
+      "step": 440
+    },
+    {
+      "epoch": 2.819047619047619,
+      "grad_norm": 1.5223265886306763,
+      "learning_rate": 4.723991507430998e-05,
+      "loss": 0.7557,
+      "step": 445
+    },
+    {
+      "epoch": 2.850793650793651,
+      "grad_norm": 1.3064416646957397,
+      "learning_rate": 4.777070063694268e-05,
+      "loss": 0.8041,
+      "step": 450
+    },
+    {
+      "epoch": 2.8825396825396825,
+      "grad_norm": 1.8025637865066528,
+      "learning_rate": 4.8301486199575375e-05,
+      "loss": 0.9534,
+      "step": 455
+    },
+    {
+      "epoch": 2.914285714285714,
+      "grad_norm": 1.924846887588501,
+      "learning_rate": 4.883227176220807e-05,
+      "loss": 0.9066,
+      "step": 460
+    },
+    {
+      "epoch": 2.9460317460317462,
+      "grad_norm": 1.9862899780273438,
+      "learning_rate": 4.9363057324840765e-05,
+      "loss": 0.7994,
+      "step": 465
+    },
+    {
+      "epoch": 2.977777777777778,
+      "grad_norm": 1.9615916013717651,
+      "learning_rate": 4.9893842887473466e-05,
+      "loss": 0.7045,
+      "step": 470
+    },
+    {
+      "epoch": 3.0063492063492063,
+      "grad_norm": 1.519852876663208,
+      "learning_rate": 4.999989014936042e-05,
+      "loss": 0.7212,
+      "step": 475
+    },
+    {
+      "epoch": 3.038095238095238,
+      "grad_norm": 1.9328887462615967,
+      "learning_rate": 4.999944388279162e-05,
+      "loss": 0.6598,
+      "step": 480
+    },
+    {
+      "epoch": 3.06984126984127,
+      "grad_norm": 2.0340709686279297,
+      "learning_rate": 4.999865434075176e-05,
+      "loss": 0.6829,
+      "step": 485
+    },
+    {
+      "epoch": 3.1015873015873017,
+      "grad_norm": 1.8775280714035034,
+      "learning_rate": 4.999752153408229e-05,
+      "loss": 0.6664,
+      "step": 490
+    },
+    {
+      "epoch": 3.1333333333333333,
+      "grad_norm": 2.385218381881714,
+      "learning_rate": 4.999604547833814e-05,
+      "loss": 0.6836,
+      "step": 495
+    },
+    {
+      "epoch": 3.165079365079365,
+      "grad_norm": 2.1743783950805664,
+      "learning_rate": 4.999422619378752e-05,
+      "loss": 0.7,
+      "step": 500
+    },
+    {
+      "epoch": 3.196825396825397,
+      "grad_norm": 2.20786452293396,
+      "learning_rate": 4.999206370541162e-05,
+      "loss": 0.7253,
+      "step": 505
+    },
+    {
+      "epoch": 3.2285714285714286,
+      "grad_norm": 1.8182263374328613,
+      "learning_rate": 4.998955804290425e-05,
+      "loss": 0.6824,
+      "step": 510
+    },
+    {
+      "epoch": 3.2603174603174603,
+      "grad_norm": 2.2959372997283936,
+      "learning_rate": 4.9986709240671495e-05,
+      "loss": 0.601,
+      "step": 515
+    },
+    {
+      "epoch": 3.292063492063492,
+      "grad_norm": 2.385838031768799,
+      "learning_rate": 4.998351733783116e-05,
+      "loss": 0.7417,
+      "step": 520
+    },
+    {
+      "epoch": 3.323809523809524,
+      "grad_norm": 2.0416879653930664,
+      "learning_rate": 4.997998237821233e-05,
+      "loss": 0.6463,
+      "step": 525
+    },
+    {
+      "epoch": 3.3555555555555556,
+      "grad_norm": 2.2781031131744385,
+      "learning_rate": 4.9976104410354654e-05,
+      "loss": 0.6998,
+      "step": 530
+    },
+    {
+      "epoch": 3.3873015873015873,
+      "grad_norm": 2.146778106689453,
+      "learning_rate": 4.9971883487507775e-05,
+      "loss": 0.7694,
+      "step": 535
+    },
+    {
+      "epoch": 3.419047619047619,
+      "grad_norm": 2.1369104385375977,
+      "learning_rate": 4.9967319667630567e-05,
+      "loss": 0.6615,
+      "step": 540
+    },
+    {
+      "epoch": 3.450793650793651,
+      "grad_norm": 2.4529733657836914,
+      "learning_rate": 4.996241301339029e-05,
+      "loss": 0.6109,
+      "step": 545
+    },
+    {
+      "epoch": 3.4825396825396826,
+      "grad_norm": 2.07030987739563,
+      "learning_rate": 4.995716359216183e-05,
+      "loss": 0.7611,
+      "step": 550
+    },
+    {
+      "epoch": 3.5142857142857142,
+      "grad_norm": 2.4329919815063477,
+      "learning_rate": 4.995157147602669e-05,
+      "loss": 0.7515,
+      "step": 555
+    },
+    {
+      "epoch": 3.546031746031746,
+      "grad_norm": 2.056351900100708,
+      "learning_rate": 4.994563674177202e-05,
+      "loss": 0.6885,
+      "step": 560
+    },
+    {
+      "epoch": 3.5777777777777775,
+      "grad_norm": 2.3665318489074707,
+      "learning_rate": 4.993935947088958e-05,
+      "loss": 0.6271,
+      "step": 565
+    },
+    {
+      "epoch": 3.6095238095238096,
+      "grad_norm": 2.677706480026245,
+      "learning_rate": 4.993273974957463e-05,
+      "loss": 0.5586,
+      "step": 570
+    },
+    {
+      "epoch": 3.641269841269841,
+      "grad_norm": 3.422136068344116,
+      "learning_rate": 4.9925777668724685e-05,
+      "loss": 0.7552,
+      "step": 575
+    },
+    {
+      "epoch": 3.6730158730158733,
+      "grad_norm": 2.4525184631347656,
+      "learning_rate": 4.991847332393835e-05,
+      "loss": 0.7367,
+      "step": 580
+    },
+    {
+      "epoch": 3.704761904761905,
+      "grad_norm": 2.4242067337036133,
+      "learning_rate": 4.991082681551396e-05,
+      "loss": 0.7044,
+      "step": 585
+    },
+    {
+      "epoch": 3.7365079365079366,
+      "grad_norm": 1.8419867753982544,
+      "learning_rate": 4.9902838248448184e-05,
+      "loss": 0.5966,
+      "step": 590
+    },
+    {
+      "epoch": 3.768253968253968,
+      "grad_norm": 2.1394360065460205,
+      "learning_rate": 4.989450773243463e-05,
+      "loss": 0.6736,
+      "step": 595
+    },
+    {
+      "epoch": 3.8,
+      "grad_norm": 1.285447597503662,
+      "learning_rate": 4.9885835381862326e-05,
+      "loss": 0.5021,
+      "step": 600
+    },
+    {
+      "epoch": 3.831746031746032,
+      "grad_norm": 2.724978446960449,
+      "learning_rate": 4.987682131581413e-05,
+      "loss": 0.6128,
+      "step": 605
+    },
+    {
+      "epoch": 3.8634920634920635,
+      "grad_norm": 2.239682912826538,
+      "learning_rate": 4.986746565806508e-05,
+      "loss": 0.5457,
+      "step": 610
+    },
+    {
+      "epoch": 3.895238095238095,
+      "grad_norm": 2.48944091796875,
+      "learning_rate": 4.9857768537080784e-05,
+      "loss": 0.6927,
+      "step": 615
+    },
+    {
+      "epoch": 3.9269841269841272,
+      "grad_norm": 2.4086852073669434,
+      "learning_rate": 4.9847730086015534e-05,
+      "loss": 0.5963,
+      "step": 620
+    },
+    {
+      "epoch": 3.958730158730159,
+      "grad_norm": 2.0070106983184814,
+      "learning_rate": 4.9837350442710553e-05,
+      "loss": 0.5856,
+      "step": 625
+    },
+    {
+      "epoch": 3.9904761904761905,
+      "grad_norm": 1.9726545810699463,
+      "learning_rate": 4.98266297496921e-05,
+      "loss": 0.6208,
+      "step": 630
+    },
+    {
+      "epoch": 4.019047619047619,
+      "grad_norm": 2.6137828826904297,
+      "learning_rate": 4.981556815416948e-05,
+      "loss": 0.6319,
+      "step": 635
+    },
+    {
+      "epoch": 4.050793650793651,
+      "grad_norm": 2.3489890098571777,
+      "learning_rate": 4.9804165808033054e-05,
+      "loss": 0.5887,
+      "step": 640
+    },
+    {
+      "epoch": 4.082539682539682,
+      "grad_norm": 2.8010590076446533,
+      "learning_rate": 4.979242286785214e-05,
+      "loss": 0.5257,
+      "step": 645
+    },
+    {
+      "epoch": 4.114285714285714,
+      "grad_norm": 2.993411064147949,
+      "learning_rate": 4.978033949487284e-05,
+      "loss": 0.4545,
+      "step": 650
+    },
+    {
+      "epoch": 4.146031746031746,
+      "grad_norm": 2.669935703277588,
+      "learning_rate": 4.976791585501588e-05,
+      "loss": 0.5989,
+      "step": 655
+    },
+    {
+      "epoch": 4.177777777777778,
+      "grad_norm": 3.084409236907959,
+      "learning_rate": 4.9755152118874294e-05,
+      "loss": 0.528,
+      "step": 660
+    },
+    {
+      "epoch": 4.20952380952381,
+      "grad_norm": 2.797873020172119,
+      "learning_rate": 4.974204846171106e-05,
+      "loss": 0.5249,
+      "step": 665
+    },
+    {
+      "epoch": 4.241269841269841,
+      "grad_norm": 3.667867422103882,
+      "learning_rate": 4.9728605063456765e-05,
+      "loss": 0.5838,
+      "step": 670
+    },
+    {
+      "epoch": 4.273015873015873,
+      "grad_norm": 2.6918869018554688,
+      "learning_rate": 4.971482210870706e-05,
+      "loss": 0.5143,
+      "step": 675
+    },
+    {
+      "epoch": 4.304761904761905,
+      "grad_norm": 2.1545379161834717,
+      "learning_rate": 4.970069978672017e-05,
+      "loss": 0.5317,
+      "step": 680
+    },
+    {
+      "epoch": 4.336507936507936,
+      "grad_norm": 2.1043529510498047,
+      "learning_rate": 4.9686238291414275e-05,
+      "loss": 0.4815,
+      "step": 685
+    },
+    {
+      "epoch": 4.368253968253969,
+      "grad_norm": 2.1359753608703613,
+      "learning_rate": 4.9671437821364855e-05,
+      "loss": 0.4935,
+      "step": 690
+    },
+    {
+      "epoch": 4.4,
+      "grad_norm": 3.092057228088379,
+      "learning_rate": 4.965629857980197e-05,
+      "loss": 0.6831,
+      "step": 695
+    },
+    {
+      "epoch": 4.431746031746032,
+      "grad_norm": 2.5296835899353027,
+      "learning_rate": 4.964082077460745e-05,
+      "loss": 0.5323,
+      "step": 700
+    },
+    {
+      "epoch": 4.463492063492064,
+      "grad_norm": 1.6655627489089966,
+      "learning_rate": 4.962500461831207e-05,
+      "loss": 0.4553,
+      "step": 705
+    },
+    {
+      "epoch": 4.495238095238095,
+      "grad_norm": 2.6663475036621094,
+      "learning_rate": 4.9608850328092576e-05,
+      "loss": 0.463,
+      "step": 710
+    },
+    {
+      "epoch": 4.526984126984127,
+      "grad_norm": 2.3763060569763184,
+      "learning_rate": 4.959235812576879e-05,
+      "loss": 0.4861,
+      "step": 715
+    },
+    {
+      "epoch": 4.5587301587301585,
+      "grad_norm": 2.2217962741851807,
+      "learning_rate": 4.957552823780047e-05,
+      "loss": 0.468,
+      "step": 720
+    },
+    {
+      "epoch": 4.59047619047619,
+      "grad_norm": 2.8885600566864014,
+      "learning_rate": 4.9558360895284295e-05,
+      "loss": 0.4588,
+      "step": 725
+    },
+    {
+      "epoch": 4.622222222222222,
+      "grad_norm": 2.5661261081695557,
+      "learning_rate": 4.954085633395058e-05,
+      "loss": 0.4926,
+      "step": 730
+    },
+    {
+      "epoch": 4.653968253968254,
+      "grad_norm": 2.304365396499634,
+      "learning_rate": 4.952301479416015e-05,
+      "loss": 0.494,
+      "step": 735
+    },
+    {
+      "epoch": 4.685714285714286,
+      "grad_norm": 2.690577983856201,
+      "learning_rate": 4.9504836520900976e-05,
+      "loss": 0.5814,
+      "step": 740
+    },
+    {
+      "epoch": 4.717460317460318,
+      "grad_norm": 2.7180025577545166,
+      "learning_rate": 4.948632176378481e-05,
+      "loss": 0.5329,
+      "step": 745
+    },
+    {
+      "epoch": 4.749206349206349,
+      "grad_norm": 2.716587543487549,
+      "learning_rate": 4.9467470777043806e-05,
+      "loss": 0.5264,
+      "step": 750
+    },
+    {
+      "epoch": 4.780952380952381,
+      "grad_norm": 2.315419912338257,
+      "learning_rate": 4.9448283819526954e-05,
+      "loss": 0.4756,
+      "step": 755
+    },
+    {
+      "epoch": 4.8126984126984125,
+      "grad_norm": 2.1679515838623047,
+      "learning_rate": 4.9428761154696605e-05,
+      "loss": 0.4819,
+      "step": 760
+    },
+    {
+      "epoch": 4.844444444444444,
+      "grad_norm": 3.389266014099121,
+      "learning_rate": 4.9408903050624796e-05,
+      "loss": 0.5121,
+      "step": 765
+    },
+    {
+      "epoch": 4.876190476190477,
+      "grad_norm": 3.4317383766174316,
+      "learning_rate": 4.938870977998959e-05,
+      "loss": 0.4535,
+      "step": 770
+    },
+    {
+      "epoch": 4.907936507936508,
+      "grad_norm": 2.9491918087005615,
+      "learning_rate": 4.9368181620071344e-05,
+      "loss": 0.5333,
+      "step": 775
+    },
+    {
+      "epoch": 4.93968253968254,
+      "grad_norm": 2.516798496246338,
+      "learning_rate": 4.934731885274887e-05,
+      "loss": 0.5367,
+      "step": 780
+    },
+    {
+      "epoch": 4.9714285714285715,
+      "grad_norm": 3.0031046867370605,
+      "learning_rate": 4.9326121764495596e-05,
+      "loss": 0.4957,
+      "step": 785
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 3.334085702896118,
+      "learning_rate": 4.9304590646375614e-05,
+      "loss": 0.5287,
+      "step": 790
+    },
+    {
+      "epoch": 5.031746031746032,
+      "grad_norm": 1.9608453512191772,
+      "learning_rate": 4.928272579403969e-05,
+      "loss": 0.36,
+      "step": 795
+    },
+    {
+      "epoch": 5.063492063492063,
+      "grad_norm": 2.328850746154785,
+      "learning_rate": 4.92605275077212e-05,
+      "loss": 0.3628,
+      "step": 800
+    },
+    {
+      "epoch": 5.095238095238095,
+      "grad_norm": 2.3446412086486816,
+      "learning_rate": 4.923799609223202e-05,
+      "loss": 0.3327,
+      "step": 805
+    },
+    {
+      "epoch": 5.1269841269841265,
+      "grad_norm": 2.476181745529175,
+      "learning_rate": 4.921513185695831e-05,
+      "loss": 0.4246,
+      "step": 810
+    },
+    {
+      "epoch": 5.158730158730159,
+      "grad_norm": 3.1026763916015625,
+      "learning_rate": 4.91919351158563e-05,
+      "loss": 0.5048,
+      "step": 815
+    },
+    {
+      "epoch": 5.190476190476191,
+      "grad_norm": 2.8165297508239746,
+      "learning_rate": 4.916840618744798e-05,
+      "loss": 0.4361,
+      "step": 820
+    },
+    {
+      "epoch": 5.222222222222222,
+      "grad_norm": 1.8732138872146606,
+      "learning_rate": 4.9144545394816687e-05,
+      "loss": 0.4693,
+      "step": 825
+    },
+    {
+      "epoch": 5.253968253968254,
+      "grad_norm": 1.7250264883041382,
+      "learning_rate": 4.91203530656027e-05,
+      "loss": 0.4076,
+      "step": 830
+    },
+    {
+      "epoch": 5.285714285714286,
+      "grad_norm": 2.105459690093994,
+      "learning_rate": 4.9095829531998725e-05,
+      "loss": 0.3589,
+      "step": 835
+    },
+    {
+      "epoch": 5.317460317460317,
+      "grad_norm": 3.6825687885284424,
+      "learning_rate": 4.9070975130745387e-05,
+      "loss": 0.5263,
+      "step": 840
+    },
+    {
+      "epoch": 5.349206349206349,
+      "grad_norm": 2.947052001953125,
+      "learning_rate": 4.90457902031265e-05,
+      "loss": 0.4632,
+      "step": 845
+    },
+    {
+      "epoch": 5.380952380952381,
+      "grad_norm": 1.9546104669570923,
+      "learning_rate": 4.902027509496448e-05,
+      "loss": 0.4348,
+      "step": 850
+    },
+    {
+      "epoch": 5.412698412698413,
+      "grad_norm": 2.4471983909606934,
+      "learning_rate": 4.899443015661557e-05,
+      "loss": 0.4209,
+      "step": 855
+    },
+    {
+      "epoch": 5.444444444444445,
+      "grad_norm": 1.827124834060669,
+      "learning_rate": 4.8968255742964975e-05,
+      "loss": 0.413,
+      "step": 860
+    },
+    {
+      "epoch": 5.476190476190476,
+      "grad_norm": 2.654707431793213,
+      "learning_rate": 4.894175221342207e-05,
+      "loss": 0.432,
+      "step": 865
+    },
+    {
+      "epoch": 5.507936507936508,
+      "grad_norm": 2.648967981338501,
+      "learning_rate": 4.8914919931915407e-05,
+      "loss": 0.4339,
+      "step": 870
+    },
+    {
+      "epoch": 5.5396825396825395,
+      "grad_norm": 2.874075412750244,
+      "learning_rate": 4.888775926688775e-05,
+      "loss": 0.4392,
+      "step": 875
+    },
+    {
+      "epoch": 5.571428571428571,
+      "grad_norm": 2.9674830436706543,
+      "learning_rate": 4.8860270591291e-05,
+      "loss": 0.4459,
+      "step": 880
+    },
+    {
+      "epoch": 5.603174603174603,
+      "grad_norm": 2.054748296737671,
+      "learning_rate": 4.883245428258107e-05,
+      "loss": 0.4313,
+      "step": 885
+    },
+    {
+      "epoch": 5.634920634920634,
+      "grad_norm": 1.9174392223358154,
+      "learning_rate": 4.880431072271272e-05,
+      "loss": 0.3906,
+      "step": 890
+    },
+    {
+      "epoch": 5.666666666666667,
+      "grad_norm": 2.5257787704467773,
+      "learning_rate": 4.87758402981343e-05,
+      "loss": 0.4219,
+      "step": 895
+    },
+    {
+      "epoch": 5.698412698412699,
+      "grad_norm": 2.6365532875061035,
+      "learning_rate": 4.8747043399782424e-05,
+      "loss": 0.3978,
+      "step": 900
+    },
+    {
+      "epoch": 5.73015873015873,
+      "grad_norm": 2.0583746433258057,
+      "learning_rate": 4.871792042307667e-05,
+      "loss": 0.4847,
+      "step": 905
+    },
+    {
+      "epoch": 5.761904761904762,
+      "grad_norm": 2.035872459411621,
+      "learning_rate": 4.868847176791406e-05,
+      "loss": 0.4675,
+      "step": 910
+    },
+    {
+      "epoch": 5.7936507936507935,
+      "grad_norm": 2.3722939491271973,
+      "learning_rate": 4.8658697838663625e-05,
+      "loss": 0.4586,
+      "step": 915
+    },
+    {
+      "epoch": 5.825396825396825,
+      "grad_norm": 1.2609732151031494,
+      "learning_rate": 4.862859904416085e-05,
+      "loss": 0.3274,
+      "step": 920
+    },
+    {
+      "epoch": 5.857142857142857,
+      "grad_norm": 2.3673977851867676,
+      "learning_rate": 4.8598175797702036e-05,
+      "loss": 0.4685,
+      "step": 925
+    },
+    {
+      "epoch": 5.888888888888889,
+      "grad_norm": 2.8414175510406494,
+      "learning_rate": 4.856742851703866e-05,
+      "loss": 0.4762,
+      "step": 930
+    },
+    {
+      "epoch": 5.920634920634921,
+      "grad_norm": 2.4126765727996826,
+      "learning_rate": 4.853635762437159e-05,
+      "loss": 0.4075,
+      "step": 935
+    },
+    {
+      "epoch": 5.9523809523809526,
+      "grad_norm": 1.8691045045852661,
+      "learning_rate": 4.8504963546345334e-05,
+      "loss": 0.4865,
+      "step": 940
+    },
+    {
+      "epoch": 5.984126984126984,
+      "grad_norm": 3.5297420024871826,
+      "learning_rate": 4.8473246714042155e-05,
+      "loss": 0.4623,
+      "step": 945
+    },
+    {
+      "epoch": 6.012698412698413,
+      "grad_norm": 2.059169054031372,
+      "learning_rate": 4.844120756297617e-05,
+      "loss": 0.4164,
+      "step": 950
+    },
+    {
+      "epoch": 6.044444444444444,
+      "grad_norm": 2.4746127128601074,
+      "learning_rate": 4.840884653308735e-05,
+      "loss": 0.3552,
+      "step": 955
+    },
+    {
+      "epoch": 6.076190476190476,
+      "grad_norm": 2.504425287246704,
+      "learning_rate": 4.8376164068735485e-05,
+      "loss": 0.3368,
+      "step": 960
+    },
+    {
+      "epoch": 6.1079365079365076,
+      "grad_norm": 2.062577486038208,
+      "learning_rate": 4.83431606186941e-05,
+      "loss": 0.3139,
+      "step": 965
+    },
+    {
+      "epoch": 6.13968253968254,
+      "grad_norm": 2.4934544563293457,
+      "learning_rate": 4.830983663614427e-05,
+      "loss": 0.3777,
+      "step": 970
+    },
+    {
+      "epoch": 6.171428571428572,
+      "grad_norm": 2.5747485160827637,
+      "learning_rate": 4.827619257866839e-05,
+      "loss": 0.373,
+      "step": 975
+    },
+    {
+      "epoch": 6.203174603174603,
+      "grad_norm": 2.449357271194458,
+      "learning_rate": 4.8242228908243946e-05,
+      "loss": 0.3936,
+      "step": 980
+    },
+    {
+      "epoch": 6.234920634920635,
+      "grad_norm": 2.952680826187134,
+      "learning_rate": 4.82079460912371e-05,
+      "loss": 0.407,
+      "step": 985
+    },
+    {
+      "epoch": 6.266666666666667,
+      "grad_norm": 2.1754496097564697,
+      "learning_rate": 4.817334459839633e-05,
+      "loss": 0.3189,
+      "step": 990
+    },
+    {
+      "epoch": 6.298412698412698,
+      "grad_norm": 2.8406214714050293,
+      "learning_rate": 4.8138424904845947e-05,
+      "loss": 0.3883,
+      "step": 995
+    },
+    {
+      "epoch": 6.33015873015873,
+      "grad_norm": 1.7533257007598877,
+      "learning_rate": 4.8103187490079604e-05,
+      "loss": 0.3131,
+      "step": 1000
+    },
+    {
+      "epoch": 6.3619047619047615,
+      "grad_norm": 2.4574601650238037,
+      "learning_rate": 4.806763283795366e-05,
+      "loss": 0.3606,
+      "step": 1005
+    },
+    {
+      "epoch": 6.393650793650794,
+      "grad_norm": 2.002281427383423,
+      "learning_rate": 4.8031761436680575e-05,
+      "loss": 0.37,
+      "step": 1010
+    },
+    {
+      "epoch": 6.425396825396826,
+      "grad_norm": 2.823315143585205,
+      "learning_rate": 4.79955737788222e-05,
+      "loss": 0.3791,
+      "step": 1015
+    },
+    {
+      "epoch": 6.457142857142857,
+      "grad_norm": 2.7891204357147217,
+      "learning_rate": 4.795907036128299e-05,
+      "loss": 0.3556,
+      "step": 1020
+    },
+    {
+      "epoch": 6.488888888888889,
+      "grad_norm": 2.2387146949768066,
+      "learning_rate": 4.7922251685303213e-05,
+      "loss": 0.3929,
+      "step": 1025
+    },
+    {
+      "epoch": 6.520634920634921,
+      "grad_norm": 2.5023891925811768,
+      "learning_rate": 4.788511825645205e-05,
+      "loss": 0.379,
+      "step": 1030
+    },
+    {
+      "epoch": 6.552380952380952,
+      "grad_norm": 2.2654805183410645,
+      "learning_rate": 4.7847670584620653e-05,
+      "loss": 0.3435,
+      "step": 1035
+    },
+    {
+      "epoch": 6.584126984126984,
+      "grad_norm": 3.3823065757751465,
+      "learning_rate": 4.7809909184015146e-05,
+      "loss": 0.4109,
+      "step": 1040
+    },
+    {
+      "epoch": 6.6158730158730155,
+      "grad_norm": 2.6096551418304443,
+      "learning_rate": 4.7771834573149576e-05,
+      "loss": 0.4233,
+      "step": 1045
+    },
+    {
+      "epoch": 6.647619047619048,
+      "grad_norm": 2.3933897018432617,
+      "learning_rate": 4.773344727483876e-05,
+      "loss": 0.3709,
+      "step": 1050
+    },
+    {
+      "epoch": 6.67936507936508,
+      "grad_norm": 2.189544916152954,
+      "learning_rate": 4.769474781619114e-05,
+      "loss": 0.3287,
+      "step": 1055
+    },
+    {
+      "epoch": 6.711111111111111,
+      "grad_norm": 2.450892686843872,
+      "learning_rate": 4.765573672860154e-05,
+      "loss": 0.4022,
+      "step": 1060
+    },
+    {
+      "epoch": 6.742857142857143,
+      "grad_norm": 2.4342429637908936,
+      "learning_rate": 4.761641454774386e-05,
+      "loss": 0.4029,
+      "step": 1065
+    },
+    {
+      "epoch": 6.7746031746031745,
+      "grad_norm": 2.2122364044189453,
+      "learning_rate": 4.75767818135637e-05,
+      "loss": 0.3322,
+      "step": 1070
+    },
+    {
+      "epoch": 6.806349206349206,
+      "grad_norm": 3.968445301055908,
+      "learning_rate": 4.7536839070271e-05,
+      "loss": 0.3836,
+      "step": 1075
+    },
+    {
+      "epoch": 6.838095238095238,
+      "grad_norm": 3.529158353805542,
+      "learning_rate": 4.749658686633251e-05,
+      "loss": 0.4745,
+      "step": 1080
+    },
+    {
+      "epoch": 6.86984126984127,
+      "grad_norm": 2.430727243423462,
+      "learning_rate": 4.7456025754464304e-05,
+      "loss": 0.3664,
+      "step": 1085
+    },
+    {
+      "epoch": 6.901587301587302,
+      "grad_norm": 2.6552302837371826,
+      "learning_rate": 4.7415156291624166e-05,
+      "loss": 0.4359,
+      "step": 1090
+    },
+    {
+      "epoch": 6.933333333333334,
+      "grad_norm": 2.134822130203247,
+      "learning_rate": 4.737397903900393e-05,
+      "loss": 0.3969,
+      "step": 1095
+    },
+    {
+      "epoch": 6.965079365079365,
+      "grad_norm": 2.5052947998046875,
+      "learning_rate": 4.7332494562021815e-05,
+      "loss": 0.4069,
+      "step": 1100
+    },
+    {
+      "epoch": 6.996825396825397,
+      "grad_norm": 2.1377065181732178,
+      "learning_rate": 4.729070343031463e-05,
+      "loss": 0.3853,
+      "step": 1105
+    },
+    {
+      "epoch": 7.025396825396825,
+      "grad_norm": 1.9704042673110962,
+      "learning_rate": 4.724860621772995e-05,
+      "loss": 0.3283,
+      "step": 1110
+    },
+    {
+      "epoch": 7.057142857142857,
+      "grad_norm": 2.476968765258789,
+      "learning_rate": 4.7206203502318256e-05,
+      "loss": 0.3325,
+      "step": 1115
+    },
+    {
+      "epoch": 7.088888888888889,
+      "grad_norm": 1.9231969118118286,
+      "learning_rate": 4.716349586632499e-05,
+      "loss": 0.2876,
+      "step": 1120
+    },
+    {
+      "epoch": 7.12063492063492,
+      "grad_norm": 2.6444814205169678,
+      "learning_rate": 4.712048389618254e-05,
+      "loss": 0.3005,
+      "step": 1125
+    },
+    {
+      "epoch": 7.152380952380953,
+      "grad_norm": 3.2589964866638184,
+      "learning_rate": 4.7077168182502216e-05,
+      "loss": 0.4023,
+      "step": 1130
+    },
+    {
+      "epoch": 7.184126984126984,
+      "grad_norm": 2.5481936931610107,
+      "learning_rate": 4.703354932006615e-05,
+      "loss": 0.3302,
+      "step": 1135
+    },
+    {
+      "epoch": 7.215873015873016,
+      "grad_norm": 1.7125908136367798,
+      "learning_rate": 4.698962790781906e-05,
+      "loss": 0.3329,
+      "step": 1140
+    },
+    {
+      "epoch": 7.247619047619048,
+      "grad_norm": 2.2756667137145996,
+      "learning_rate": 4.6945404548860115e-05,
+      "loss": 0.3369,
+      "step": 1145
+    },
+    {
+      "epoch": 7.279365079365079,
+      "grad_norm": 2.9158453941345215,
+      "learning_rate": 4.6900879850434604e-05,
+      "loss": 0.3339,
+      "step": 1150
+    },
+    {
+      "epoch": 7.311111111111111,
+      "grad_norm": 2.3047537803649902,
+      "learning_rate": 4.685605442392559e-05,
+      "loss": 0.3915,
+      "step": 1155
+    },
+    {
+      "epoch": 7.3428571428571425,
+      "grad_norm": 2.7815029621124268,
+      "learning_rate": 4.681092888484554e-05,
+      "loss": 0.3317,
+      "step": 1160
+    },
+    {
+      "epoch": 7.374603174603175,
+      "grad_norm": 2.2644097805023193,
+      "learning_rate": 4.676550385282787e-05,
+      "loss": 0.3314,
+      "step": 1165
+    },
+    {
+      "epoch": 7.406349206349207,
+      "grad_norm": 2.5144474506378174,
+      "learning_rate": 4.671977995161843e-05,
+      "loss": 0.3188,
+      "step": 1170
+    },
+    {
+      "epoch": 7.438095238095238,
+      "grad_norm": 3.120821714401245,
+      "learning_rate": 4.667375780906693e-05,
+      "loss": 0.3523,
+      "step": 1175
+    },
+    {
+      "epoch": 7.46984126984127,
+      "grad_norm": 4.47842264175415,
+      "learning_rate": 4.662743805711832e-05,
+      "loss": 0.3611,
+      "step": 1180
+    },
+    {
+      "epoch": 7.501587301587302,
+      "grad_norm": 1.9228928089141846,
+      "learning_rate": 4.658082133180416e-05,
+      "loss": 0.3612,
+      "step": 1185
+    },
+    {
+      "epoch": 7.533333333333333,
+      "grad_norm": 2.1507537364959717,
+      "learning_rate": 4.6533908273233815e-05,
+      "loss": 0.3321,
+      "step": 1190
+    },
+    {
+      "epoch": 7.565079365079365,
+      "grad_norm": 2.1849119663238525,
+      "learning_rate": 4.64866995255857e-05,
+      "loss": 0.2943,
+      "step": 1195
+    },
+    {
+      "epoch": 7.5968253968253965,
+      "grad_norm": 2.1777775287628174,
+      "learning_rate": 4.643919573709843e-05,
+      "loss": 0.353,
+      "step": 1200
+    },
+    {
+      "epoch": 7.628571428571428,
+      "grad_norm": 2.5231118202209473,
+      "learning_rate": 4.639139756006195e-05,
+      "loss": 0.3571,
+      "step": 1205
+    },
+    {
+      "epoch": 7.660317460317461,
+      "grad_norm": 1.8409479856491089,
+      "learning_rate": 4.6343305650808516e-05,
+      "loss": 0.3691,
+      "step": 1210
+    },
+    {
+      "epoch": 7.692063492063492,
+      "grad_norm": 1.7940895557403564,
+      "learning_rate": 4.629492066970373e-05,
+      "loss": 0.3738,
+      "step": 1215
+    },
+    {
+      "epoch": 7.723809523809524,
+      "grad_norm": 2.014902114868164,
+      "learning_rate": 4.6246243281137474e-05,
+      "loss": 0.361,
+      "step": 1220
+    },
+    {
+      "epoch": 7.7555555555555555,
+      "grad_norm": 3.4182560443878174,
+      "learning_rate": 4.6197274153514735e-05,
+      "loss": 0.3663,
+      "step": 1225
+    },
+    {
+      "epoch": 7.787301587301587,
+      "grad_norm": 2.518728256225586,
+      "learning_rate": 4.614801395924649e-05,
+      "loss": 0.3646,
+      "step": 1230
+    },
+    {
+      "epoch": 7.819047619047619,
+      "grad_norm": 2.154189109802246,
+      "learning_rate": 4.6098463374740466e-05,
+      "loss": 0.3331,
+      "step": 1235
+    },
+    {
+      "epoch": 7.85079365079365,
+      "grad_norm": 2.536081075668335,
+      "learning_rate": 4.604862308039177e-05,
+      "loss": 0.3742,
+      "step": 1240
+    },
+    {
+      "epoch": 7.882539682539683,
+      "grad_norm": 2.340764045715332,
+      "learning_rate": 4.599849376057366e-05,
+      "loss": 0.3352,
+      "step": 1245
+    },
+    {
+      "epoch": 7.914285714285715,
+      "grad_norm": 3.5488364696502686,
+      "learning_rate": 4.5948076103628094e-05,
+      "loss": 0.3663,
+      "step": 1250
+    },
+    {
+      "epoch": 7.946031746031746,
+      "grad_norm": 2.779360294342041,
+      "learning_rate": 4.589737080185625e-05,
+      "loss": 0.3362,
+      "step": 1255
+    },
+    {
+      "epoch": 7.977777777777778,
+      "grad_norm": 1.8792667388916016,
+      "learning_rate": 4.5846378551509097e-05,
+      "loss": 0.346,
+      "step": 1260
+    },
+    {
+      "epoch": 8.006349206349206,
+      "grad_norm": 2.453295946121216,
+      "learning_rate": 4.579510005277774e-05,
+      "loss": 0.3509,
+      "step": 1265
+    },
+    {
+      "epoch": 8.038095238095238,
+      "grad_norm": 1.9493130445480347,
+      "learning_rate": 4.574353600978388e-05,
+      "loss": 0.3062,
+      "step": 1270
+    },
+    {
+      "epoch": 8.06984126984127,
+      "grad_norm": 1.9360930919647217,
+      "learning_rate": 4.56916871305701e-05,
+      "loss": 0.3056,
+      "step": 1275
+    },
+    {
+      "epoch": 8.101587301587301,
+      "grad_norm": 1.5592070817947388,
+      "learning_rate": 4.563955412709021e-05,
+      "loss": 0.2785,
+      "step": 1280
+    },
+    {
+      "epoch": 8.133333333333333,
+      "grad_norm": 1.8093425035476685,
+      "learning_rate": 4.5587137715199354e-05,
+      "loss": 0.308,
+      "step": 1285
+    },
+    {
+      "epoch": 8.165079365079364,
+      "grad_norm": 2.2939181327819824,
+      "learning_rate": 4.5534438614644294e-05,
+      "loss": 0.3038,
+      "step": 1290
+    },
+    {
+      "epoch": 8.196825396825396,
+      "grad_norm": 2.4204866886138916,
+      "learning_rate": 4.548145754905346e-05,
+      "loss": 0.3375,
+      "step": 1295
+    },
+    {
+      "epoch": 8.228571428571428,
+      "grad_norm": 1.725534439086914,
+      "learning_rate": 4.5428195245927064e-05,
+      "loss": 0.3101,
+      "step": 1300
+    },
+    {
+      "epoch": 8.260317460317461,
+      "grad_norm": 1.637730360031128,
+      "learning_rate": 4.537465243662704e-05,
+      "loss": 0.2931,
+      "step": 1305
+    },
+    {
+      "epoch": 8.292063492063493,
+      "grad_norm": 1.3372169733047485,
+      "learning_rate": 4.532082985636709e-05,
+      "loss": 0.2763,
+      "step": 1310
+    },
+    {
+      "epoch": 8.323809523809524,
+      "grad_norm": 2.5993168354034424,
+      "learning_rate": 4.5266728244202494e-05,
+      "loss": 0.3458,
+      "step": 1315
+    },
+    {
+      "epoch": 8.355555555555556,
+      "grad_norm": 2.461862564086914,
+      "learning_rate": 4.521234834302006e-05,
+      "loss": 0.3693,
+      "step": 1320
+    },
+    {
+      "epoch": 8.387301587301588,
+      "grad_norm": 1.8519413471221924,
+      "learning_rate": 4.5157690899527816e-05,
+      "loss": 0.3327,
+      "step": 1325
+    },
+    {
+      "epoch": 8.41904761904762,
+      "grad_norm": 2.1535580158233643,
+      "learning_rate": 4.510275666424487e-05,
+      "loss": 0.3229,
+      "step": 1330
+    },
+    {
+      "epoch": 8.450793650793651,
+      "grad_norm": 1.6819690465927124,
+      "learning_rate": 4.5047546391491e-05,
+      "loss": 0.2925,
+      "step": 1335
+    },
+    {
+      "epoch": 8.482539682539683,
+      "grad_norm": 1.6538281440734863,
+      "learning_rate": 4.499206083937638e-05,
+      "loss": 0.3218,
+      "step": 1340
+    },
+    {
+      "epoch": 8.514285714285714,
+      "grad_norm": 1.8956862688064575,
+      "learning_rate": 4.493630076979112e-05,
+      "loss": 0.3423,
+      "step": 1345
+    },
+    {
+      "epoch": 8.546031746031746,
+      "grad_norm": 2.274681806564331,
+      "learning_rate": 4.48802669483948e-05,
+      "loss": 0.3152,
+      "step": 1350
+    },
+    {
+      "epoch": 8.577777777777778,
+      "grad_norm": 2.2956337928771973,
+      "learning_rate": 4.4823960144606014e-05,
+      "loss": 0.3417,
+      "step": 1355
+    },
+    {
+      "epoch": 8.60952380952381,
+      "grad_norm": 1.8650286197662354,
+      "learning_rate": 4.4767381131591734e-05,
+      "loss": 0.2896,
+      "step": 1360
+    },
+    {
+      "epoch": 8.64126984126984,
+      "grad_norm": 1.3998652696609497,
+      "learning_rate": 4.471053068625674e-05,
+      "loss": 0.3372,
+      "step": 1365
+    },
+    {
+      "epoch": 8.673015873015872,
+      "grad_norm": 2.855074167251587,
+      "learning_rate": 4.465340958923293e-05,
+      "loss": 0.332,
+      "step": 1370
+    },
+    {
+      "epoch": 8.704761904761904,
+      "grad_norm": 1.6865357160568237,
+      "learning_rate": 4.459601862486862e-05,
+      "loss": 0.3053,
+      "step": 1375
+    },
+    {
+      "epoch": 8.736507936507937,
+      "grad_norm": 2.501856803894043,
+      "learning_rate": 4.453835858121773e-05,
+      "loss": 0.3119,
+      "step": 1380
+    },
+    {
+      "epoch": 8.768253968253969,
+      "grad_norm": 2.4325456619262695,
+      "learning_rate": 4.4480430250029046e-05,
+      "loss": 0.3395,
+      "step": 1385
+    },
+    {
+      "epoch": 8.8,
+      "grad_norm": 1.4845948219299316,
+      "learning_rate": 4.4422234426735256e-05,
+      "loss": 0.3237,
+      "step": 1390
+    },
+    {
+      "epoch": 8.831746031746032,
+      "grad_norm": 1.3553249835968018,
+      "learning_rate": 4.436377191044208e-05,
+      "loss": 0.3387,
+      "step": 1395
+    },
+    {
+      "epoch": 8.863492063492064,
+      "grad_norm": 1.8338890075683594,
+      "learning_rate": 4.430504350391729e-05,
+      "loss": 0.3618,
+      "step": 1400
+    },
+    {
+      "epoch": 8.895238095238096,
+      "grad_norm": 2.291538953781128,
+      "learning_rate": 4.4246050013579686e-05,
+      "loss": 0.3608,
+      "step": 1405
+    },
+    {
+      "epoch": 8.926984126984127,
+      "grad_norm": 1.3809788227081299,
+      "learning_rate": 4.4186792249488005e-05,
+      "loss": 0.3077,
+      "step": 1410
+    },
+    {
+      "epoch": 8.958730158730159,
+      "grad_norm": 1.5944230556488037,
+      "learning_rate": 4.412727102532983e-05,
+      "loss": 0.3307,
+      "step": 1415
+    },
+    {
+      "epoch": 8.99047619047619,
+      "grad_norm": 2.2244362831115723,
+      "learning_rate": 4.4067487158410396e-05,
+      "loss": 0.3469,
+      "step": 1420
+    },
+    {
+      "epoch": 9.019047619047619,
+      "grad_norm": 1.444221019744873,
+      "learning_rate": 4.400744146964136e-05,
+      "loss": 0.3049,
+      "step": 1425
+    },
+    {
+      "epoch": 9.05079365079365,
+      "grad_norm": 1.5847752094268799,
+      "learning_rate": 4.394713478352955e-05,
+      "loss": 0.2715,
+      "step": 1430
+    },
+    {
+      "epoch": 9.082539682539682,
+      "grad_norm": 1.6062681674957275,
+      "learning_rate": 4.388656792816562e-05,
+      "loss": 0.2487,
+      "step": 1435
+    },
+    {
+      "epoch": 9.114285714285714,
+      "grad_norm": 2.099787712097168,
+      "learning_rate": 4.382574173521272e-05,
+      "loss": 0.2866,
+      "step": 1440
+    },
+    {
+      "epoch": 9.146031746031746,
+      "grad_norm": 1.0997334718704224,
+      "learning_rate": 4.376465703989502e-05,
+      "loss": 0.3052,
+      "step": 1445
+    },
+    {
+      "epoch": 9.177777777777777,
+      "grad_norm": 2.4327454566955566,
+      "learning_rate": 4.370331468098628e-05,
+      "loss": 0.3212,
+      "step": 1450
+    },
+    {
+      "epoch": 9.209523809523809,
+      "grad_norm": 1.4816385507583618,
+      "learning_rate": 4.364171550079833e-05,
+      "loss": 0.3046,
+      "step": 1455
+    },
+    {
+      "epoch": 9.24126984126984,
+      "grad_norm": 2.039186716079712,
+      "learning_rate": 4.357986034516947e-05,
+      "loss": 0.3165,
+      "step": 1460
+    },
+    {
+      "epoch": 9.273015873015874,
+      "grad_norm": 1.437852382659912,
+      "learning_rate": 4.3517750063452934e-05,
+      "loss": 0.3037,
+      "step": 1465
+    },
+    {
+      "epoch": 9.304761904761905,
+      "grad_norm": 1.818982720375061,
+      "learning_rate": 4.345538550850512e-05,
+      "loss": 0.3122,
+      "step": 1470
+    },
+    {
+      "epoch": 9.336507936507937,
+      "grad_norm": 1.12025785446167,
+      "learning_rate": 4.339276753667395e-05,
+      "loss": 0.2909,
+      "step": 1475
+    },
+    {
+      "epoch": 9.368253968253969,
+      "grad_norm": 1.6094844341278076,
+      "learning_rate": 4.3329897007787125e-05,
+      "loss": 0.2823,
+      "step": 1480
+    },
+    {
+      "epoch": 9.4,
+      "grad_norm": 1.916200041770935,
+      "learning_rate": 4.326677478514024e-05,
+      "loss": 0.2939,
+      "step": 1485
+    },
+    {
+      "epoch": 9.431746031746032,
+      "grad_norm": 1.97919499874115,
+      "learning_rate": 4.320340173548503e-05,
+      "loss": 0.2826,
+      "step": 1490
+    },
+    {
+      "epoch": 9.463492063492064,
+      "grad_norm": 2.0238938331604004,
+      "learning_rate": 4.313977872901737e-05,
+      "loss": 0.3273,
+      "step": 1495
+    },
+    {
+      "epoch": 9.495238095238095,
+      "grad_norm": 2.5840957164764404,
+      "learning_rate": 4.307590663936541e-05,
+      "loss": 0.2889,
+      "step": 1500
+    },
+    {
+      "epoch": 9.526984126984127,
+      "grad_norm": 2.3503904342651367,
+      "learning_rate": 4.30117863435775e-05,
+      "loss": 0.3012,
+      "step": 1505
+    },
+    {
+      "epoch": 9.558730158730159,
+      "grad_norm": 2.019792318344116,
+      "learning_rate": 4.294741872211024e-05,
+      "loss": 0.3267,
+      "step": 1510
+    },
+    {
+      "epoch": 9.59047619047619,
+      "grad_norm": 2.2713353633880615,
+      "learning_rate": 4.288280465881632e-05,
+      "loss": 0.3096,
+      "step": 1515
+    },
+    {
+      "epoch": 9.622222222222222,
+      "grad_norm": 2.4236693382263184,
+      "learning_rate": 4.281794504093237e-05,
+      "loss": 0.3291,
+      "step": 1520
+    },
+    {
+      "epoch": 9.653968253968253,
+      "grad_norm": 1.772703766822815,
+      "learning_rate": 4.275284075906686e-05,
+      "loss": 0.3117,
+      "step": 1525
+    },
+    {
+      "epoch": 9.685714285714285,
+      "grad_norm": 1.9665186405181885,
+      "learning_rate": 4.268749270718778e-05,
+      "loss": 0.326,
+      "step": 1530
+    },
+    {
+      "epoch": 9.717460317460317,
+      "grad_norm": 1.9472782611846924,
+      "learning_rate": 4.262190178261044e-05,
+      "loss": 0.2683,
+      "step": 1535
+    },
+    {
+      "epoch": 9.74920634920635,
+      "grad_norm": 2.0638089179992676,
+      "learning_rate": 4.255606888598508e-05,
+      "loss": 0.314,
+      "step": 1540
+    },
+    {
+      "epoch": 9.780952380952382,
+      "grad_norm": 2.1349925994873047,
+      "learning_rate": 4.248999492128456e-05,
+      "loss": 0.2897,
+      "step": 1545
+    },
+    {
+      "epoch": 9.812698412698413,
+      "grad_norm": 2.112536907196045,
+      "learning_rate": 4.242368079579192e-05,
+      "loss": 0.31,
+      "step": 1550
+    },
+    {
+      "epoch": 9.844444444444445,
+      "grad_norm": 1.6859878301620483,
+      "learning_rate": 4.2357127420087917e-05,
+      "loss": 0.3412,
+      "step": 1555
+    },
+    {
+      "epoch": 9.876190476190477,
+      "grad_norm": 1.9178651571273804,
+      "learning_rate": 4.229033570803853e-05,
+      "loss": 0.334,
+      "step": 1560
+    },
+    {
+      "epoch": 9.907936507936508,
+      "grad_norm": 2.562436103820801,
+      "learning_rate": 4.2223306576782426e-05,
+      "loss": 0.3379,
+      "step": 1565
+    },
+    {
+      "epoch": 9.93968253968254,
+      "grad_norm": 1.8472412824630737,
+      "learning_rate": 4.215604094671835e-05,
+      "loss": 0.3415,
+      "step": 1570
+    },
+    {
+      "epoch": 9.971428571428572,
+      "grad_norm": 1.9416279792785645,
+      "learning_rate": 4.208853974149246e-05,
+      "loss": 0.3085,
+      "step": 1575
+    },
+    {
+      "epoch": 10.0,
+      "grad_norm": 2.0056397914886475,
+      "learning_rate": 4.202080388798571e-05,
+      "loss": 0.3263,
+      "step": 1580
+    },
+    {
+      "epoch": 10.031746031746032,
+      "grad_norm": 2.195781946182251,
+      "learning_rate": 4.1952834316301065e-05,
+      "loss": 0.2867,
+      "step": 1585
+    },
+    {
+      "epoch": 10.063492063492063,
+      "grad_norm": 1.7489805221557617,
+      "learning_rate": 4.1884631959750766e-05,
+      "loss": 0.2589,
+      "step": 1590
+    },
+    {
+      "epoch": 10.095238095238095,
+      "grad_norm": 1.9361369609832764,
+      "learning_rate": 4.181619775484348e-05,
+      "loss": 0.2722,
+      "step": 1595
+    },
+    {
+      "epoch": 10.126984126984127,
+      "grad_norm": 2.24322509765625,
+      "learning_rate": 4.174753264127147e-05,
+      "loss": 0.2534,
+      "step": 1600
+    },
+    {
+      "epoch": 10.158730158730158,
+      "grad_norm": 2.4550466537475586,
+      "learning_rate": 4.167863756189767e-05,
+      "loss": 0.2777,
+      "step": 1605
+    },
+    {
+      "epoch": 10.19047619047619,
+      "grad_norm": 1.9439811706542969,
+      "learning_rate": 4.160951346274278e-05,
+      "loss": 0.2864,
+      "step": 1610
+    },
+    {
+      "epoch": 10.222222222222221,
+      "grad_norm": 1.633494257926941,
+      "learning_rate": 4.154016129297219e-05,
+      "loss": 0.2858,
+      "step": 1615
+    },
+    {
+      "epoch": 10.253968253968253,
+      "grad_norm": 1.69782292842865,
+      "learning_rate": 4.147058200488305e-05,
+      "loss": 0.2942,
+      "step": 1620
+    },
+    {
+      "epoch": 10.285714285714286,
+      "grad_norm": 1.613031268119812,
+      "learning_rate": 4.140077655389113e-05,
+      "loss": 0.2632,
+      "step": 1625
+    },
+    {
+      "epoch": 10.317460317460318,
+      "grad_norm": 2.0266177654266357,
+      "learning_rate": 4.1330745898517714e-05,
+      "loss": 0.3011,
+      "step": 1630
+    },
+    {
+      "epoch": 10.34920634920635,
+      "grad_norm": 1.8945387601852417,
+      "learning_rate": 4.1260491000376446e-05,
+      "loss": 0.2832,
+      "step": 1635
+    },
+    {
+      "epoch": 10.380952380952381,
+      "grad_norm": 1.7012510299682617,
+      "learning_rate": 4.119001282416009e-05,
+      "loss": 0.2718,
+      "step": 1640
+    },
+    {
+      "epoch": 10.412698412698413,
+      "grad_norm": 1.5538525581359863,
+      "learning_rate": 4.111931233762738e-05,
+      "loss": 0.3232,
+      "step": 1645
+    },
+    {
+      "epoch": 10.444444444444445,
+      "grad_norm": 2.3083150386810303,
+      "learning_rate": 4.1048390511589595e-05,
+      "loss": 0.3057,
+      "step": 1650
+    },
+    {
+      "epoch": 10.476190476190476,
+      "grad_norm": 1.293314814567566,
+      "learning_rate": 4.097724831989733e-05,
+      "loss": 0.2523,
+      "step": 1655
+    },
+    {
+      "epoch": 10.507936507936508,
+      "grad_norm": 2.517212152481079,
+      "learning_rate": 4.09058867394271e-05,
+      "loss": 0.3269,
+      "step": 1660
+    },
+    {
+      "epoch": 10.53968253968254,
+      "grad_norm": 2.057063102722168,
+      "learning_rate": 4.083430675006791e-05,
+      "loss": 0.2844,
+      "step": 1665
+    },
+    {
+      "epoch": 10.571428571428571,
+      "grad_norm": 1.5663833618164062,
+      "learning_rate": 4.0762509334707786e-05,
+      "loss": 0.3005,
+      "step": 1670
+    },
+    {
+      "epoch": 10.603174603174603,
+      "grad_norm": 2.5423505306243896,
+      "learning_rate": 4.069049547922035e-05,
+      "loss": 0.2802,
+      "step": 1675
+    },
+    {
+      "epoch": 10.634920634920634,
+      "grad_norm": 1.578316569328308,
+      "learning_rate": 4.061826617245119e-05,
+      "loss": 0.2667,
+      "step": 1680
+    },
+    {
+      "epoch": 10.666666666666666,
+      "grad_norm": 1.502928376197815,
+      "learning_rate": 4.0545822406204334e-05,
+      "loss": 0.3059,
+      "step": 1685
+    },
+    {
+      "epoch": 10.698412698412698,
+      "grad_norm": 1.2470905780792236,
+      "learning_rate": 4.047316517522864e-05,
+      "loss": 0.2879,
+      "step": 1690
+    },
+    {
+      "epoch": 10.73015873015873,
+      "grad_norm": 1.8238775730133057,
+      "learning_rate": 4.0400295477204105e-05,
+      "loss": 0.2923,
+      "step": 1695
+    },
+    {
+      "epoch": 10.761904761904763,
+      "grad_norm": 2.0516586303710938,
+      "learning_rate": 4.032721431272819e-05,
+      "loss": 0.3086,
+      "step": 1700
+    },
+    {
+      "epoch": 10.793650793650794,
+      "grad_norm": 1.3188791275024414,
+      "learning_rate": 4.0253922685302046e-05,
+      "loss": 0.2893,
+      "step": 1705
+    },
+    {
+      "epoch": 10.825396825396826,
+      "grad_norm": 1.7352266311645508,
+      "learning_rate": 4.01804216013168e-05,
+      "loss": 0.2981,
+      "step": 1710
+    },
+    {
+      "epoch": 10.857142857142858,
+      "grad_norm": 1.3449515104293823,
+      "learning_rate": 4.0106712070039656e-05,
+      "loss": 0.2841,
+      "step": 1715
+    },
+    {
+      "epoch": 10.88888888888889,
+      "grad_norm": 2.505431890487671,
+      "learning_rate": 4.00327951036001e-05,
+      "loss": 0.3034,
+      "step": 1720
+    },
+    {
+      "epoch": 10.920634920634921,
+      "grad_norm": 1.8870325088500977,
+      "learning_rate": 3.9958671716975966e-05,
+      "loss": 0.305,
+      "step": 1725
+    },
+    {
+      "epoch": 10.952380952380953,
+      "grad_norm": 2.913130044937134,
+      "learning_rate": 3.988434292797951e-05,
+      "loss": 0.3212,
+      "step": 1730
+    },
+    {
+      "epoch": 10.984126984126984,
+      "grad_norm": 1.7870115041732788,
+      "learning_rate": 3.980980975724344e-05,
+      "loss": 0.3108,
+      "step": 1735
+    },
+    {
+      "epoch": 11.012698412698413,
+      "grad_norm": 3.050985336303711,
+      "learning_rate": 3.9735073228206896e-05,
+      "loss": 0.3043,
+      "step": 1740
+    },
+    {
+      "epoch": 11.044444444444444,
+      "grad_norm": 1.5993611812591553,
+      "learning_rate": 3.96601343671014e-05,
+      "loss": 0.2465,
+      "step": 1745
+    },
+    {
+      "epoch": 11.076190476190476,
+      "grad_norm": 1.626888632774353,
+      "learning_rate": 3.9584994202936746e-05,
+      "loss": 0.2688,
+      "step": 1750
+    },
+    {
+      "epoch": 11.107936507936508,
+      "grad_norm": 1.7132880687713623,
+      "learning_rate": 3.950965376748689e-05,
+      "loss": 0.2458,
+      "step": 1755
+    },
+    {
+      "epoch": 11.13968253968254,
+      "grad_norm": 1.7764930725097656,
+      "learning_rate": 3.94341140952758e-05,
+      "loss": 0.2189,
+      "step": 1760
+    },
+    {
+      "epoch": 11.17142857142857,
+      "grad_norm": 2.5560712814331055,
+      "learning_rate": 3.9358376223563206e-05,
+      "loss": 0.2866,
+      "step": 1765
+    },
+    {
+      "epoch": 11.203174603174602,
+      "grad_norm": 1.1177359819412231,
+      "learning_rate": 3.928244119233038e-05,
+      "loss": 0.233,
+      "step": 1770
+    },
+    {
+      "epoch": 11.234920634920634,
+      "grad_norm": 1.584670901298523,
+      "learning_rate": 3.9206310044265866e-05,
+      "loss": 0.273,
+      "step": 1775
+    },
+    {
+      "epoch": 11.266666666666667,
+      "grad_norm": 1.6278687715530396,
+      "learning_rate": 3.912998382475115e-05,
+      "loss": 0.2746,
+      "step": 1780
+    },
+    {
+      "epoch": 11.2984126984127,
+      "grad_norm": 1.657038688659668,
+      "learning_rate": 3.905346358184629e-05,
+      "loss": 0.2885,
+      "step": 1785
+    },
+    {
+      "epoch": 11.33015873015873,
+      "grad_norm": 1.2840272188186646,
+      "learning_rate": 3.897675036627557e-05,
+      "loss": 0.2932,
+      "step": 1790
+    },
+    {
+      "epoch": 11.361904761904762,
+      "grad_norm": 1.5678766965866089,
+      "learning_rate": 3.8899845231413026e-05,
+      "loss": 0.2945,
+      "step": 1795
+    },
+    {
+      "epoch": 11.393650793650794,
+      "grad_norm": 1.788948655128479,
+      "learning_rate": 3.8822749233268006e-05,
+      "loss": 0.3013,
+      "step": 1800
+    },
+    {
+      "epoch": 11.425396825396826,
+      "grad_norm": 1.2259769439697266,
+      "learning_rate": 3.8745463430470664e-05,
+      "loss": 0.2582,
+      "step": 1805
+    },
+    {
+      "epoch": 11.457142857142857,
+      "grad_norm": 1.5430735349655151,
+      "learning_rate": 3.866798888425741e-05,
+      "loss": 0.275,
+      "step": 1810
+    },
+    {
+      "epoch": 11.488888888888889,
+      "grad_norm": 1.9102168083190918,
+      "learning_rate": 3.8590326658456376e-05,
+      "loss": 0.2909,
+      "step": 1815
+    },
+    {
+      "epoch": 11.52063492063492,
+      "grad_norm": 1.6118320226669312,
+      "learning_rate": 3.851247781947277e-05,
+      "loss": 0.2922,
+      "step": 1820
+    },
+    {
+      "epoch": 11.552380952380952,
+      "grad_norm": 1.393646478652954,
+      "learning_rate": 3.843444343627424e-05,
+      "loss": 0.2783,
+      "step": 1825
+    },
+    {
+      "epoch": 11.584126984126984,
+      "grad_norm": 2.522909641265869,
+      "learning_rate": 3.83562245803762e-05,
+      "loss": 0.2933,
+      "step": 1830
+    },
+    {
+      "epoch": 11.615873015873015,
+      "grad_norm": 2.2534332275390625,
+      "learning_rate": 3.827782232582714e-05,
+      "loss": 0.3081,
+      "step": 1835
+    },
+    {
+      "epoch": 11.647619047619047,
+      "grad_norm": 2.4088056087493896,
+      "learning_rate": 3.819923774919383e-05,
+      "loss": 0.276,
+      "step": 1840
+    },
+    {
+      "epoch": 11.679365079365079,
+      "grad_norm": 1.7626562118530273,
+      "learning_rate": 3.8120471929546576e-05,
+      "loss": 0.2697,
+      "step": 1845
+    },
+    {
+      "epoch": 11.71111111111111,
+      "grad_norm": 1.8656691312789917,
+      "learning_rate": 3.8041525948444414e-05,
+      "loss": 0.2979,
+      "step": 1850
+    },
+    {
+      "epoch": 11.742857142857144,
+      "grad_norm": 1.6537258625030518,
+      "learning_rate": 3.7962400889920185e-05,
+      "loss": 0.3042,
+      "step": 1855
+    },
+    {
+      "epoch": 11.774603174603175,
+      "grad_norm": 1.696975827217102,
+      "learning_rate": 3.788309784046574e-05,
+      "loss": 0.2984,
+      "step": 1860
+    },
+    {
+      "epoch": 11.806349206349207,
+      "grad_norm": 1.976236343383789,
+      "learning_rate": 3.780361788901696e-05,
+      "loss": 0.2711,
+      "step": 1865
+    },
+    {
+      "epoch": 11.838095238095239,
+      "grad_norm": 1.7781676054000854,
+      "learning_rate": 3.772396212693885e-05,
+      "loss": 0.3116,
+      "step": 1870
+    },
+    {
+      "epoch": 11.86984126984127,
+      "grad_norm": 1.6252080202102661,
+      "learning_rate": 3.7644131648010494e-05,
+      "loss": 0.2879,
+      "step": 1875
+    },
+    {
+      "epoch": 11.901587301587302,
+      "grad_norm": 1.7511012554168701,
+      "learning_rate": 3.75641275484101e-05,
+      "loss": 0.3106,
+      "step": 1880
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 4710,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 30,
+  "save_steps": 157,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.046991654073139e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1884/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-2041/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.0

checkpoint-2041/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "q_proj",
+    "up_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-2041/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

checkpoint-2041/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-2041/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

checkpoint-2041/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,209 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-2041/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2890 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 12.920634920634921,
+  "eval_steps": 500,
+  "global_step": 2041,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.031746031746031744,
+      "grad_norm": 0.5545095205307007,
+      "learning_rate": 5.307855626326963e-07,
+      "loss": 3.7162,
+      "step": 5
+    },
+    {
+      "epoch": 0.06349206349206349,
+      "grad_norm": 0.6163601279258728,
+      "learning_rate": 1.0615711252653927e-06,
+      "loss": 3.9388,
+      "step": 10
+    },
+    {
+      "epoch": 0.09523809523809523,
+      "grad_norm": 0.5541530847549438,
+      "learning_rate": 1.5923566878980892e-06,
+      "loss": 3.9165,
+      "step": 15
+    },
+    {
+      "epoch": 0.12698412698412698,
+      "grad_norm": 0.457332044839859,
+      "learning_rate": 2.1231422505307854e-06,
+      "loss": 3.7326,
+      "step": 20
+    },
+    {
+      "epoch": 0.15873015873015872,
+      "grad_norm": 0.5335279107093811,
+      "learning_rate": 2.653927813163482e-06,
+      "loss": 3.8251,
+      "step": 25
+    },
+    {
+      "epoch": 0.19047619047619047,
+      "grad_norm": 0.7080379724502563,
+      "learning_rate": 3.1847133757961785e-06,
+      "loss": 3.7534,
+      "step": 30
+    },
+    {
+      "epoch": 0.2222222222222222,
+      "grad_norm": 0.520993709564209,
+      "learning_rate": 3.715498938428875e-06,
+      "loss": 3.898,
+      "step": 35
+    },
+    {
+      "epoch": 0.25396825396825395,
+      "grad_norm": 0.5451405644416809,
+      "learning_rate": 4.246284501061571e-06,
+      "loss": 3.8951,
+      "step": 40
+    },
+    {
+      "epoch": 0.2857142857142857,
+      "grad_norm": 0.6205154657363892,
+      "learning_rate": 4.777070063694268e-06,
+      "loss": 3.7666,
+      "step": 45
+    },
+    {
+      "epoch": 0.31746031746031744,
+      "grad_norm": 0.7404439449310303,
+      "learning_rate": 5.307855626326964e-06,
+      "loss": 4.0258,
+      "step": 50
+    },
+    {
+      "epoch": 0.3492063492063492,
+      "grad_norm": 0.6272220015525818,
+      "learning_rate": 5.838641188959661e-06,
+      "loss": 3.8464,
+      "step": 55
+    },
+    {
+      "epoch": 0.38095238095238093,
+      "grad_norm": 0.7744691967964172,
+      "learning_rate": 6.369426751592357e-06,
+      "loss": 3.7299,
+      "step": 60
+    },
+    {
+      "epoch": 0.4126984126984127,
+      "grad_norm": 0.8805738687515259,
+      "learning_rate": 6.900212314225053e-06,
+      "loss": 3.5008,
+      "step": 65
+    },
+    {
+      "epoch": 0.4444444444444444,
+      "grad_norm": 1.0740723609924316,
+      "learning_rate": 7.43099787685775e-06,
+      "loss": 3.7552,
+      "step": 70
+    },
+    {
+      "epoch": 0.47619047619047616,
+      "grad_norm": 0.965708315372467,
+      "learning_rate": 7.961783439490445e-06,
+      "loss": 3.5516,
+      "step": 75
+    },
+    {
+      "epoch": 0.5079365079365079,
+      "grad_norm": 0.9812778234481812,
+      "learning_rate": 8.492569002123141e-06,
+      "loss": 3.6003,
+      "step": 80
+    },
+    {
+      "epoch": 0.5396825396825397,
+      "grad_norm": 0.8831024169921875,
+      "learning_rate": 9.023354564755838e-06,
+      "loss": 3.613,
+      "step": 85
+    },
+    {
+      "epoch": 0.5714285714285714,
+      "grad_norm": 0.8358364105224609,
+      "learning_rate": 9.554140127388536e-06,
+      "loss": 3.1858,
+      "step": 90
+    },
+    {
+      "epoch": 0.6031746031746031,
+      "grad_norm": 1.0740444660186768,
+      "learning_rate": 1.0084925690021232e-05,
+      "loss": 3.0937,
+      "step": 95
+    },
+    {
+      "epoch": 0.6349206349206349,
+      "grad_norm": 1.0987530946731567,
+      "learning_rate": 1.0615711252653929e-05,
+      "loss": 3.154,
+      "step": 100
+    },
+    {
+      "epoch": 0.6666666666666666,
+      "grad_norm": 1.2300925254821777,
+      "learning_rate": 1.1146496815286625e-05,
+      "loss": 2.9414,
+      "step": 105
+    },
+    {
+      "epoch": 0.6984126984126984,
+      "grad_norm": 1.2214170694351196,
+      "learning_rate": 1.1677282377919321e-05,
+      "loss": 2.9464,
+      "step": 110
+    },
+    {
+      "epoch": 0.7301587301587301,
+      "grad_norm": 1.2803975343704224,
+      "learning_rate": 1.2208067940552018e-05,
+      "loss": 2.8921,
+      "step": 115
+    },
+    {
+      "epoch": 0.7619047619047619,
+      "grad_norm": 1.2232719659805298,
+      "learning_rate": 1.2738853503184714e-05,
+      "loss": 2.5252,
+      "step": 120
+    },
+    {
+      "epoch": 0.7936507936507936,
+      "grad_norm": 1.204835295677185,
+      "learning_rate": 1.326963906581741e-05,
+      "loss": 2.5215,
+      "step": 125
+    },
+    {
+      "epoch": 0.8253968253968254,
+      "grad_norm": 1.4095579385757446,
+      "learning_rate": 1.3800424628450107e-05,
+      "loss": 2.136,
+      "step": 130
+    },
+    {
+      "epoch": 0.8571428571428571,
+      "grad_norm": 1.4166598320007324,
+      "learning_rate": 1.4331210191082803e-05,
+      "loss": 2.2653,
+      "step": 135
+    },
+    {
+      "epoch": 0.8888888888888888,
+      "grad_norm": 1.3040446043014526,
+      "learning_rate": 1.48619957537155e-05,
+      "loss": 2.0193,
+      "step": 140
+    },
+    {
+      "epoch": 0.9206349206349206,
+      "grad_norm": 1.4114688634872437,
+      "learning_rate": 1.5392781316348196e-05,
+      "loss": 1.7935,
+      "step": 145
+    },
+    {
+      "epoch": 0.9523809523809523,
+      "grad_norm": 1.8066726922988892,
+      "learning_rate": 1.592356687898089e-05,
+      "loss": 1.5731,
+      "step": 150
+    },
+    {
+      "epoch": 0.9841269841269841,
+      "grad_norm": 1.4303158521652222,
+      "learning_rate": 1.6454352441613588e-05,
+      "loss": 1.6552,
+      "step": 155
+    },
+    {
+      "epoch": 1.0126984126984127,
+      "grad_norm": 1.6671762466430664,
+      "learning_rate": 1.6985138004246283e-05,
+      "loss": 1.6973,
+      "step": 160
+    },
+    {
+      "epoch": 1.0444444444444445,
+      "grad_norm": 1.5719650983810425,
+      "learning_rate": 1.751592356687898e-05,
+      "loss": 1.312,
+      "step": 165
+    },
+    {
+      "epoch": 1.0761904761904761,
+      "grad_norm": 1.4845054149627686,
+      "learning_rate": 1.8046709129511676e-05,
+      "loss": 1.3601,
+      "step": 170
+    },
+    {
+      "epoch": 1.107936507936508,
+      "grad_norm": 1.1172235012054443,
+      "learning_rate": 1.8577494692144374e-05,
+      "loss": 1.3137,
+      "step": 175
+    },
+    {
+      "epoch": 1.1396825396825396,
+      "grad_norm": 1.9621731042861938,
+      "learning_rate": 1.910828025477707e-05,
+      "loss": 1.1778,
+      "step": 180
+    },
+    {
+      "epoch": 1.1714285714285715,
+      "grad_norm": 1.7722721099853516,
+      "learning_rate": 1.963906581740977e-05,
+      "loss": 1.4534,
+      "step": 185
+    },
+    {
+      "epoch": 1.2031746031746031,
+      "grad_norm": 1.3677467107772827,
+      "learning_rate": 2.0169851380042464e-05,
+      "loss": 1.3356,
+      "step": 190
+    },
+    {
+      "epoch": 1.234920634920635,
+      "grad_norm": 1.3260482549667358,
+      "learning_rate": 2.0700636942675162e-05,
+      "loss": 1.0876,
+      "step": 195
+    },
+    {
+      "epoch": 1.2666666666666666,
+      "grad_norm": 1.5176818370819092,
+      "learning_rate": 2.1231422505307857e-05,
+      "loss": 1.1602,
+      "step": 200
+    },
+    {
+      "epoch": 1.2984126984126985,
+      "grad_norm": 1.2793077230453491,
+      "learning_rate": 2.1762208067940555e-05,
+      "loss": 1.1505,
+      "step": 205
+    },
+    {
+      "epoch": 1.33015873015873,
+      "grad_norm": 1.196784257888794,
+      "learning_rate": 2.229299363057325e-05,
+      "loss": 1.0664,
+      "step": 210
+    },
+    {
+      "epoch": 1.361904761904762,
+      "grad_norm": 1.303207516670227,
+      "learning_rate": 2.2823779193205948e-05,
+      "loss": 1.2557,
+      "step": 215
+    },
+    {
+      "epoch": 1.3936507936507936,
+      "grad_norm": 1.2853388786315918,
+      "learning_rate": 2.3354564755838642e-05,
+      "loss": 1.0704,
+      "step": 220
+    },
+    {
+      "epoch": 1.4253968253968254,
+      "grad_norm": 1.381369948387146,
+      "learning_rate": 2.388535031847134e-05,
+      "loss": 1.1371,
+      "step": 225
+    },
+    {
+      "epoch": 1.457142857142857,
+      "grad_norm": 1.8012712001800537,
+      "learning_rate": 2.4416135881104035e-05,
+      "loss": 1.248,
+      "step": 230
+    },
+    {
+      "epoch": 1.488888888888889,
+      "grad_norm": 1.7397032976150513,
+      "learning_rate": 2.4946921443736733e-05,
+      "loss": 1.2782,
+      "step": 235
+    },
+    {
+      "epoch": 1.5206349206349206,
+      "grad_norm": 1.4026210308074951,
+      "learning_rate": 2.5477707006369428e-05,
+      "loss": 1.154,
+      "step": 240
+    },
+    {
+      "epoch": 1.5523809523809524,
+      "grad_norm": 1.2906067371368408,
+      "learning_rate": 2.6008492569002126e-05,
+      "loss": 0.9141,
+      "step": 245
+    },
+    {
+      "epoch": 1.5841269841269843,
+      "grad_norm": 1.265598177909851,
+      "learning_rate": 2.653927813163482e-05,
+      "loss": 1.0625,
+      "step": 250
+    },
+    {
+      "epoch": 1.615873015873016,
+      "grad_norm": 1.6044715642929077,
+      "learning_rate": 2.707006369426752e-05,
+      "loss": 0.9624,
+      "step": 255
+    },
+    {
+      "epoch": 1.6476190476190475,
+      "grad_norm": 1.4612747430801392,
+      "learning_rate": 2.7600849256900213e-05,
+      "loss": 1.0413,
+      "step": 260
+    },
+    {
+      "epoch": 1.6793650793650794,
+      "grad_norm": 1.6222745180130005,
+      "learning_rate": 2.8131634819532908e-05,
+      "loss": 1.0929,
+      "step": 265
+    },
+    {
+      "epoch": 1.7111111111111112,
+      "grad_norm": 1.1456222534179688,
+      "learning_rate": 2.8662420382165606e-05,
+      "loss": 0.9957,
+      "step": 270
+    },
+    {
+      "epoch": 1.7428571428571429,
+      "grad_norm": 1.5746041536331177,
+      "learning_rate": 2.91932059447983e-05,
+      "loss": 1.0274,
+      "step": 275
+    },
+    {
+      "epoch": 1.7746031746031745,
+      "grad_norm": 1.3407832384109497,
+      "learning_rate": 2.9723991507431e-05,
+      "loss": 0.9487,
+      "step": 280
+    },
+    {
+      "epoch": 1.8063492063492064,
+      "grad_norm": 1.6232194900512695,
+      "learning_rate": 3.0254777070063693e-05,
+      "loss": 1.0966,
+      "step": 285
+    },
+    {
+      "epoch": 1.8380952380952382,
+      "grad_norm": 1.4920552968978882,
+      "learning_rate": 3.078556263269639e-05,
+      "loss": 0.9099,
+      "step": 290
+    },
+    {
+      "epoch": 1.8698412698412699,
+      "grad_norm": 1.2123301029205322,
+      "learning_rate": 3.1316348195329086e-05,
+      "loss": 1.0902,
+      "step": 295
+    },
+    {
+      "epoch": 1.9015873015873015,
+      "grad_norm": 1.2080968618392944,
+      "learning_rate": 3.184713375796178e-05,
+      "loss": 0.943,
+      "step": 300
+    },
+    {
+      "epoch": 1.9333333333333333,
+      "grad_norm": 1.190319299697876,
+      "learning_rate": 3.237791932059448e-05,
+      "loss": 0.7893,
+      "step": 305
+    },
+    {
+      "epoch": 1.9650793650793652,
+      "grad_norm": 1.5929204225540161,
+      "learning_rate": 3.2908704883227177e-05,
+      "loss": 1.0232,
+      "step": 310
+    },
+    {
+      "epoch": 1.9968253968253968,
+      "grad_norm": 1.0138347148895264,
+      "learning_rate": 3.343949044585987e-05,
+      "loss": 0.6693,
+      "step": 315
+    },
+    {
+      "epoch": 2.0253968253968253,
+      "grad_norm": 1.3012847900390625,
+      "learning_rate": 3.3970276008492566e-05,
+      "loss": 0.8355,
+      "step": 320
+    },
+    {
+      "epoch": 2.057142857142857,
+      "grad_norm": 1.2264782190322876,
+      "learning_rate": 3.450106157112527e-05,
+      "loss": 0.9872,
+      "step": 325
+    },
+    {
+      "epoch": 2.088888888888889,
+      "grad_norm": 1.139275312423706,
+      "learning_rate": 3.503184713375796e-05,
+      "loss": 0.8662,
+      "step": 330
+    },
+    {
+      "epoch": 2.1206349206349207,
+      "grad_norm": 1.3836581707000732,
+      "learning_rate": 3.5562632696390657e-05,
+      "loss": 0.9549,
+      "step": 335
+    },
+    {
+      "epoch": 2.1523809523809523,
+      "grad_norm": 1.368600845336914,
+      "learning_rate": 3.609341825902335e-05,
+      "loss": 0.9195,
+      "step": 340
+    },
+    {
+      "epoch": 2.1841269841269844,
+      "grad_norm": 1.8793011903762817,
+      "learning_rate": 3.662420382165605e-05,
+      "loss": 0.8505,
+      "step": 345
+    },
+    {
+      "epoch": 2.215873015873016,
+      "grad_norm": 1.305284023284912,
+      "learning_rate": 3.715498938428875e-05,
+      "loss": 0.7755,
+      "step": 350
+    },
+    {
+      "epoch": 2.2476190476190476,
+      "grad_norm": 1.7851749658584595,
+      "learning_rate": 3.768577494692145e-05,
+      "loss": 0.9242,
+      "step": 355
+    },
+    {
+      "epoch": 2.2793650793650793,
+      "grad_norm": 1.4341535568237305,
+      "learning_rate": 3.821656050955414e-05,
+      "loss": 0.8221,
+      "step": 360
+    },
+    {
+      "epoch": 2.311111111111111,
+      "grad_norm": 1.39107346534729,
+      "learning_rate": 3.874734607218684e-05,
+      "loss": 0.6999,
+      "step": 365
+    },
+    {
+      "epoch": 2.342857142857143,
+      "grad_norm": 1.2304264307022095,
+      "learning_rate": 3.927813163481954e-05,
+      "loss": 0.8362,
+      "step": 370
+    },
+    {
+      "epoch": 2.3746031746031746,
+      "grad_norm": 1.8470840454101562,
+      "learning_rate": 3.9808917197452234e-05,
+      "loss": 0.9398,
+      "step": 375
+    },
+    {
+      "epoch": 2.4063492063492062,
+      "grad_norm": 1.2533882856369019,
+      "learning_rate": 4.033970276008493e-05,
+      "loss": 0.7754,
+      "step": 380
+    },
+    {
+      "epoch": 2.4380952380952383,
+      "grad_norm": 1.5335006713867188,
+      "learning_rate": 4.087048832271762e-05,
+      "loss": 1.1124,
+      "step": 385
+    },
+    {
+      "epoch": 2.46984126984127,
+      "grad_norm": 1.5298357009887695,
+      "learning_rate": 4.1401273885350325e-05,
+      "loss": 1.017,
+      "step": 390
+    },
+    {
+      "epoch": 2.5015873015873016,
+      "grad_norm": 1.4403260946273804,
+      "learning_rate": 4.193205944798302e-05,
+      "loss": 0.8831,
+      "step": 395
+    },
+    {
+      "epoch": 2.533333333333333,
+      "grad_norm": 1.1528433561325073,
+      "learning_rate": 4.2462845010615714e-05,
+      "loss": 0.801,
+      "step": 400
+    },
+    {
+      "epoch": 2.565079365079365,
+      "grad_norm": 1.3371326923370361,
+      "learning_rate": 4.299363057324841e-05,
+      "loss": 0.8692,
+      "step": 405
+    },
+    {
+      "epoch": 2.596825396825397,
+      "grad_norm": 1.4064775705337524,
+      "learning_rate": 4.352441613588111e-05,
+      "loss": 0.9059,
+      "step": 410
+    },
+    {
+      "epoch": 2.6285714285714286,
+      "grad_norm": 1.4531422853469849,
+      "learning_rate": 4.4055201698513805e-05,
+      "loss": 0.7344,
+      "step": 415
+    },
+    {
+      "epoch": 2.66031746031746,
+      "grad_norm": 1.7043890953063965,
+      "learning_rate": 4.45859872611465e-05,
+      "loss": 0.8298,
+      "step": 420
+    },
+    {
+      "epoch": 2.6920634920634923,
+      "grad_norm": 1.5105586051940918,
+      "learning_rate": 4.5116772823779194e-05,
+      "loss": 0.7768,
+      "step": 425
+    },
+    {
+      "epoch": 2.723809523809524,
+      "grad_norm": 1.8101528882980347,
+      "learning_rate": 4.5647558386411895e-05,
+      "loss": 0.733,
+      "step": 430
+    },
+    {
+      "epoch": 2.7555555555555555,
+      "grad_norm": 1.6365174055099487,
+      "learning_rate": 4.617834394904459e-05,
+      "loss": 0.8061,
+      "step": 435
+    },
+    {
+      "epoch": 2.787301587301587,
+      "grad_norm": 1.7808202505111694,
+      "learning_rate": 4.6709129511677285e-05,
+      "loss": 0.8333,
+      "step": 440
+    },
+    {
+      "epoch": 2.819047619047619,
+      "grad_norm": 1.5223265886306763,
+      "learning_rate": 4.723991507430998e-05,
+      "loss": 0.7557,
+      "step": 445
+    },
+    {
+      "epoch": 2.850793650793651,
+      "grad_norm": 1.3064416646957397,
+      "learning_rate": 4.777070063694268e-05,
+      "loss": 0.8041,
+      "step": 450
+    },
+    {
+      "epoch": 2.8825396825396825,
+      "grad_norm": 1.8025637865066528,
+      "learning_rate": 4.8301486199575375e-05,
+      "loss": 0.9534,
+      "step": 455
+    },
+    {
+      "epoch": 2.914285714285714,
+      "grad_norm": 1.924846887588501,
+      "learning_rate": 4.883227176220807e-05,
+      "loss": 0.9066,
+      "step": 460
+    },
+    {
+      "epoch": 2.9460317460317462,
+      "grad_norm": 1.9862899780273438,
+      "learning_rate": 4.9363057324840765e-05,
+      "loss": 0.7994,
+      "step": 465
+    },
+    {
+      "epoch": 2.977777777777778,
+      "grad_norm": 1.9615916013717651,
+      "learning_rate": 4.9893842887473466e-05,
+      "loss": 0.7045,
+      "step": 470
+    },
+    {
+      "epoch": 3.0063492063492063,
+      "grad_norm": 1.519852876663208,
+      "learning_rate": 4.999989014936042e-05,
+      "loss": 0.7212,
+      "step": 475
+    },
+    {
+      "epoch": 3.038095238095238,
+      "grad_norm": 1.9328887462615967,
+      "learning_rate": 4.999944388279162e-05,
+      "loss": 0.6598,
+      "step": 480
+    },
+    {
+      "epoch": 3.06984126984127,
+      "grad_norm": 2.0340709686279297,
+      "learning_rate": 4.999865434075176e-05,
+      "loss": 0.6829,
+      "step": 485
+    },
+    {
+      "epoch": 3.1015873015873017,
+      "grad_norm": 1.8775280714035034,
+      "learning_rate": 4.999752153408229e-05,
+      "loss": 0.6664,
+      "step": 490
+    },
+    {
+      "epoch": 3.1333333333333333,
+      "grad_norm": 2.385218381881714,
+      "learning_rate": 4.999604547833814e-05,
+      "loss": 0.6836,
+      "step": 495
+    },
+    {
+      "epoch": 3.165079365079365,
+      "grad_norm": 2.1743783950805664,
+      "learning_rate": 4.999422619378752e-05,
+      "loss": 0.7,
+      "step": 500
+    },
+    {
+      "epoch": 3.196825396825397,
+      "grad_norm": 2.20786452293396,
+      "learning_rate": 4.999206370541162e-05,
+      "loss": 0.7253,
+      "step": 505
+    },
+    {
+      "epoch": 3.2285714285714286,
+      "grad_norm": 1.8182263374328613,
+      "learning_rate": 4.998955804290425e-05,
+      "loss": 0.6824,
+      "step": 510
+    },
+    {
+      "epoch": 3.2603174603174603,
+      "grad_norm": 2.2959372997283936,
+      "learning_rate": 4.9986709240671495e-05,
+      "loss": 0.601,
+      "step": 515
+    },
+    {
+      "epoch": 3.292063492063492,
+      "grad_norm": 2.385838031768799,
+      "learning_rate": 4.998351733783116e-05,
+      "loss": 0.7417,
+      "step": 520
+    },
+    {
+      "epoch": 3.323809523809524,
+      "grad_norm": 2.0416879653930664,
+      "learning_rate": 4.997998237821233e-05,
+      "loss": 0.6463,
+      "step": 525
+    },
+    {
+      "epoch": 3.3555555555555556,
+      "grad_norm": 2.2781031131744385,
+      "learning_rate": 4.9976104410354654e-05,
+      "loss": 0.6998,
+      "step": 530
+    },
+    {
+      "epoch": 3.3873015873015873,
+      "grad_norm": 2.146778106689453,
+      "learning_rate": 4.9971883487507775e-05,
+      "loss": 0.7694,
+      "step": 535
+    },
+    {
+      "epoch": 3.419047619047619,
+      "grad_norm": 2.1369104385375977,
+      "learning_rate": 4.9967319667630567e-05,
+      "loss": 0.6615,
+      "step": 540
+    },
+    {
+      "epoch": 3.450793650793651,
+      "grad_norm": 2.4529733657836914,
+      "learning_rate": 4.996241301339029e-05,
+      "loss": 0.6109,
+      "step": 545
+    },
+    {
+      "epoch": 3.4825396825396826,
+      "grad_norm": 2.07030987739563,
+      "learning_rate": 4.995716359216183e-05,
+      "loss": 0.7611,
+      "step": 550
+    },
+    {
+      "epoch": 3.5142857142857142,
+      "grad_norm": 2.4329919815063477,
+      "learning_rate": 4.995157147602669e-05,
+      "loss": 0.7515,
+      "step": 555
+    },
+    {
+      "epoch": 3.546031746031746,
+      "grad_norm": 2.056351900100708,
+      "learning_rate": 4.994563674177202e-05,
+      "loss": 0.6885,
+      "step": 560
+    },
+    {
+      "epoch": 3.5777777777777775,
+      "grad_norm": 2.3665318489074707,
+      "learning_rate": 4.993935947088958e-05,
+      "loss": 0.6271,
+      "step": 565
+    },
+    {
+      "epoch": 3.6095238095238096,
+      "grad_norm": 2.677706480026245,
+      "learning_rate": 4.993273974957463e-05,
+      "loss": 0.5586,
+      "step": 570
+    },
+    {
+      "epoch": 3.641269841269841,
+      "grad_norm": 3.422136068344116,
+      "learning_rate": 4.9925777668724685e-05,
+      "loss": 0.7552,
+      "step": 575
+    },
+    {
+      "epoch": 3.6730158730158733,
+      "grad_norm": 2.4525184631347656,
+      "learning_rate": 4.991847332393835e-05,
+      "loss": 0.7367,
+      "step": 580
+    },
+    {
+      "epoch": 3.704761904761905,
+      "grad_norm": 2.4242067337036133,
+      "learning_rate": 4.991082681551396e-05,
+      "loss": 0.7044,
+      "step": 585
+    },
+    {
+      "epoch": 3.7365079365079366,
+      "grad_norm": 1.8419867753982544,
+      "learning_rate": 4.9902838248448184e-05,
+      "loss": 0.5966,
+      "step": 590
+    },
+    {
+      "epoch": 3.768253968253968,
+      "grad_norm": 2.1394360065460205,
+      "learning_rate": 4.989450773243463e-05,
+      "loss": 0.6736,
+      "step": 595
+    },
+    {
+      "epoch": 3.8,
+      "grad_norm": 1.285447597503662,
+      "learning_rate": 4.9885835381862326e-05,
+      "loss": 0.5021,
+      "step": 600
+    },
+    {
+      "epoch": 3.831746031746032,
+      "grad_norm": 2.724978446960449,
+      "learning_rate": 4.987682131581413e-05,
+      "loss": 0.6128,
+      "step": 605
+    },
+    {
+      "epoch": 3.8634920634920635,
+      "grad_norm": 2.239682912826538,
+      "learning_rate": 4.986746565806508e-05,
+      "loss": 0.5457,
+      "step": 610
+    },
+    {
+      "epoch": 3.895238095238095,
+      "grad_norm": 2.48944091796875,
+      "learning_rate": 4.9857768537080784e-05,
+      "loss": 0.6927,
+      "step": 615
+    },
+    {
+      "epoch": 3.9269841269841272,
+      "grad_norm": 2.4086852073669434,
+      "learning_rate": 4.9847730086015534e-05,
+      "loss": 0.5963,
+      "step": 620
+    },
+    {
+      "epoch": 3.958730158730159,
+      "grad_norm": 2.0070106983184814,
+      "learning_rate": 4.9837350442710553e-05,
+      "loss": 0.5856,
+      "step": 625
+    },
+    {
+      "epoch": 3.9904761904761905,
+      "grad_norm": 1.9726545810699463,
+      "learning_rate": 4.98266297496921e-05,
+      "loss": 0.6208,
+      "step": 630
+    },
+    {
+      "epoch": 4.019047619047619,
+      "grad_norm": 2.6137828826904297,
+      "learning_rate": 4.981556815416948e-05,
+      "loss": 0.6319,
+      "step": 635
+    },
+    {
+      "epoch": 4.050793650793651,
+      "grad_norm": 2.3489890098571777,
+      "learning_rate": 4.9804165808033054e-05,
+      "loss": 0.5887,
+      "step": 640
+    },
+    {
+      "epoch": 4.082539682539682,
+      "grad_norm": 2.8010590076446533,
+      "learning_rate": 4.979242286785214e-05,
+      "loss": 0.5257,
+      "step": 645
+    },
+    {
+      "epoch": 4.114285714285714,
+      "grad_norm": 2.993411064147949,
+      "learning_rate": 4.978033949487284e-05,
+      "loss": 0.4545,
+      "step": 650
+    },
+    {
+      "epoch": 4.146031746031746,
+      "grad_norm": 2.669935703277588,
+      "learning_rate": 4.976791585501588e-05,
+      "loss": 0.5989,
+      "step": 655
+    },
+    {
+      "epoch": 4.177777777777778,
+      "grad_norm": 3.084409236907959,
+      "learning_rate": 4.9755152118874294e-05,
+      "loss": 0.528,
+      "step": 660
+    },
+    {
+      "epoch": 4.20952380952381,
+      "grad_norm": 2.797873020172119,
+      "learning_rate": 4.974204846171106e-05,
+      "loss": 0.5249,
+      "step": 665
+    },
+    {
+      "epoch": 4.241269841269841,
+      "grad_norm": 3.667867422103882,
+      "learning_rate": 4.9728605063456765e-05,
+      "loss": 0.5838,
+      "step": 670
+    },
+    {
+      "epoch": 4.273015873015873,
+      "grad_norm": 2.6918869018554688,
+      "learning_rate": 4.971482210870706e-05,
+      "loss": 0.5143,
+      "step": 675
+    },
+    {
+      "epoch": 4.304761904761905,
+      "grad_norm": 2.1545379161834717,
+      "learning_rate": 4.970069978672017e-05,
+      "loss": 0.5317,
+      "step": 680
+    },
+    {
+      "epoch": 4.336507936507936,
+      "grad_norm": 2.1043529510498047,
+      "learning_rate": 4.9686238291414275e-05,
+      "loss": 0.4815,
+      "step": 685
+    },
+    {
+      "epoch": 4.368253968253969,
+      "grad_norm": 2.1359753608703613,
+      "learning_rate": 4.9671437821364855e-05,
+      "loss": 0.4935,
+      "step": 690
+    },
+    {
+      "epoch": 4.4,
+      "grad_norm": 3.092057228088379,
+      "learning_rate": 4.965629857980197e-05,
+      "loss": 0.6831,
+      "step": 695
+    },
+    {
+      "epoch": 4.431746031746032,
+      "grad_norm": 2.5296835899353027,
+      "learning_rate": 4.964082077460745e-05,
+      "loss": 0.5323,
+      "step": 700
+    },
+    {
+      "epoch": 4.463492063492064,
+      "grad_norm": 1.6655627489089966,
+      "learning_rate": 4.962500461831207e-05,
+      "loss": 0.4553,
+      "step": 705
+    },
+    {
+      "epoch": 4.495238095238095,
+      "grad_norm": 2.6663475036621094,
+      "learning_rate": 4.9608850328092576e-05,
+      "loss": 0.463,
+      "step": 710
+    },
+    {
+      "epoch": 4.526984126984127,
+      "grad_norm": 2.3763060569763184,
+      "learning_rate": 4.959235812576879e-05,
+      "loss": 0.4861,
+      "step": 715
+    },
+    {
+      "epoch": 4.5587301587301585,
+      "grad_norm": 2.2217962741851807,
+      "learning_rate": 4.957552823780047e-05,
+      "loss": 0.468,
+      "step": 720
+    },
+    {
+      "epoch": 4.59047619047619,
+      "grad_norm": 2.8885600566864014,
+      "learning_rate": 4.9558360895284295e-05,
+      "loss": 0.4588,
+      "step": 725
+    },
+    {
+      "epoch": 4.622222222222222,
+      "grad_norm": 2.5661261081695557,
+      "learning_rate": 4.954085633395058e-05,
+      "loss": 0.4926,
+      "step": 730
+    },
+    {
+      "epoch": 4.653968253968254,
+      "grad_norm": 2.304365396499634,
+      "learning_rate": 4.952301479416015e-05,
+      "loss": 0.494,
+      "step": 735
+    },
+    {
+      "epoch": 4.685714285714286,
+      "grad_norm": 2.690577983856201,
+      "learning_rate": 4.9504836520900976e-05,
+      "loss": 0.5814,
+      "step": 740
+    },
+    {
+      "epoch": 4.717460317460318,
+      "grad_norm": 2.7180025577545166,
+      "learning_rate": 4.948632176378481e-05,
+      "loss": 0.5329,
+      "step": 745
+    },
+    {
+      "epoch": 4.749206349206349,
+      "grad_norm": 2.716587543487549,
+      "learning_rate": 4.9467470777043806e-05,
+      "loss": 0.5264,
+      "step": 750
+    },
+    {
+      "epoch": 4.780952380952381,
+      "grad_norm": 2.315419912338257,
+      "learning_rate": 4.9448283819526954e-05,
+      "loss": 0.4756,
+      "step": 755
+    },
+    {
+      "epoch": 4.8126984126984125,
+      "grad_norm": 2.1679515838623047,
+      "learning_rate": 4.9428761154696605e-05,
+      "loss": 0.4819,
+      "step": 760
+    },
+    {
+      "epoch": 4.844444444444444,
+      "grad_norm": 3.389266014099121,
+      "learning_rate": 4.9408903050624796e-05,
+      "loss": 0.5121,
+      "step": 765
+    },
+    {
+      "epoch": 4.876190476190477,
+      "grad_norm": 3.4317383766174316,
+      "learning_rate": 4.938870977998959e-05,
+      "loss": 0.4535,
+      "step": 770
+    },
+    {
+      "epoch": 4.907936507936508,
+      "grad_norm": 2.9491918087005615,
+      "learning_rate": 4.9368181620071344e-05,
+      "loss": 0.5333,
+      "step": 775
+    },
+    {
+      "epoch": 4.93968253968254,
+      "grad_norm": 2.516798496246338,
+      "learning_rate": 4.934731885274887e-05,
+      "loss": 0.5367,
+      "step": 780
+    },
+    {
+      "epoch": 4.9714285714285715,
+      "grad_norm": 3.0031046867370605,
+      "learning_rate": 4.9326121764495596e-05,
+      "loss": 0.4957,
+      "step": 785
+    },
+    {
+      "epoch": 5.0,
+      "grad_norm": 3.334085702896118,
+      "learning_rate": 4.9304590646375614e-05,
+      "loss": 0.5287,
+      "step": 790
+    },
+    {
+      "epoch": 5.031746031746032,
+      "grad_norm": 1.9608453512191772,
+      "learning_rate": 4.928272579403969e-05,
+      "loss": 0.36,
+      "step": 795
+    },
+    {
+      "epoch": 5.063492063492063,
+      "grad_norm": 2.328850746154785,
+      "learning_rate": 4.92605275077212e-05,
+      "loss": 0.3628,
+      "step": 800
+    },
+    {
+      "epoch": 5.095238095238095,
+      "grad_norm": 2.3446412086486816,
+      "learning_rate": 4.923799609223202e-05,
+      "loss": 0.3327,
+      "step": 805
+    },
+    {
+      "epoch": 5.1269841269841265,
+      "grad_norm": 2.476181745529175,
+      "learning_rate": 4.921513185695831e-05,
+      "loss": 0.4246,
+      "step": 810
+    },
+    {
+      "epoch": 5.158730158730159,
+      "grad_norm": 3.1026763916015625,
+      "learning_rate": 4.91919351158563e-05,
+      "loss": 0.5048,
+      "step": 815
+    },
+    {
+      "epoch": 5.190476190476191,
+      "grad_norm": 2.8165297508239746,
+      "learning_rate": 4.916840618744798e-05,
+      "loss": 0.4361,
+      "step": 820
+    },
+    {
+      "epoch": 5.222222222222222,
+      "grad_norm": 1.8732138872146606,
+      "learning_rate": 4.9144545394816687e-05,
+      "loss": 0.4693,
+      "step": 825
+    },
+    {
+      "epoch": 5.253968253968254,
+      "grad_norm": 1.7250264883041382,
+      "learning_rate": 4.91203530656027e-05,
+      "loss": 0.4076,
+      "step": 830
+    },
+    {
+      "epoch": 5.285714285714286,
+      "grad_norm": 2.105459690093994,
+      "learning_rate": 4.9095829531998725e-05,
+      "loss": 0.3589,
+      "step": 835
+    },
+    {
+      "epoch": 5.317460317460317,
+      "grad_norm": 3.6825687885284424,
+      "learning_rate": 4.9070975130745387e-05,
+      "loss": 0.5263,
+      "step": 840
+    },
+    {
+      "epoch": 5.349206349206349,
+      "grad_norm": 2.947052001953125,
+      "learning_rate": 4.90457902031265e-05,
+      "loss": 0.4632,
+      "step": 845
+    },
+    {
+      "epoch": 5.380952380952381,
+      "grad_norm": 1.9546104669570923,
+      "learning_rate": 4.902027509496448e-05,
+      "loss": 0.4348,
+      "step": 850
+    },
+    {
+      "epoch": 5.412698412698413,
+      "grad_norm": 2.4471983909606934,
+      "learning_rate": 4.899443015661557e-05,
+      "loss": 0.4209,
+      "step": 855
+    },
+    {
+      "epoch": 5.444444444444445,
+      "grad_norm": 1.827124834060669,
+      "learning_rate": 4.8968255742964975e-05,
+      "loss": 0.413,
+      "step": 860
+    },
+    {
+      "epoch": 5.476190476190476,
+      "grad_norm": 2.654707431793213,
+      "learning_rate": 4.894175221342207e-05,
+      "loss": 0.432,
+      "step": 865
+    },
+    {
+      "epoch": 5.507936507936508,
+      "grad_norm": 2.648967981338501,
+      "learning_rate": 4.8914919931915407e-05,
+      "loss": 0.4339,
+      "step": 870
+    },
+    {
+      "epoch": 5.5396825396825395,
+      "grad_norm": 2.874075412750244,
+      "learning_rate": 4.888775926688775e-05,
+      "loss": 0.4392,
+      "step": 875
+    },
+    {
+      "epoch": 5.571428571428571,
+      "grad_norm": 2.9674830436706543,
+      "learning_rate": 4.8860270591291e-05,
+      "loss": 0.4459,
+      "step": 880
+    },
+    {
+      "epoch": 5.603174603174603,
+      "grad_norm": 2.054748296737671,
+      "learning_rate": 4.883245428258107e-05,
+      "loss": 0.4313,
+      "step": 885
+    },
+    {
+      "epoch": 5.634920634920634,
+      "grad_norm": 1.9174392223358154,
+      "learning_rate": 4.880431072271272e-05,
+      "loss": 0.3906,
+      "step": 890
+    },
+    {
+      "epoch": 5.666666666666667,
+      "grad_norm": 2.5257787704467773,
+      "learning_rate": 4.87758402981343e-05,
+      "loss": 0.4219,
+      "step": 895
+    },
+    {
+      "epoch": 5.698412698412699,
+      "grad_norm": 2.6365532875061035,
+      "learning_rate": 4.8747043399782424e-05,
+      "loss": 0.3978,
+      "step": 900
+    },
+    {
+      "epoch": 5.73015873015873,
+      "grad_norm": 2.0583746433258057,
+      "learning_rate": 4.871792042307667e-05,
+      "loss": 0.4847,
+      "step": 905
+    },
+    {
+      "epoch": 5.761904761904762,
+      "grad_norm": 2.035872459411621,
+      "learning_rate": 4.868847176791406e-05,
+      "loss": 0.4675,
+      "step": 910
+    },
+    {
+      "epoch": 5.7936507936507935,
+      "grad_norm": 2.3722939491271973,
+      "learning_rate": 4.8658697838663625e-05,
+      "loss": 0.4586,
+      "step": 915
+    },
+    {
+      "epoch": 5.825396825396825,
+      "grad_norm": 1.2609732151031494,
+      "learning_rate": 4.862859904416085e-05,
+      "loss": 0.3274,
+      "step": 920
+    },
+    {
+      "epoch": 5.857142857142857,
+      "grad_norm": 2.3673977851867676,
+      "learning_rate": 4.8598175797702036e-05,
+      "loss": 0.4685,
+      "step": 925
+    },
+    {
+      "epoch": 5.888888888888889,
+      "grad_norm": 2.8414175510406494,
+      "learning_rate": 4.856742851703866e-05,
+      "loss": 0.4762,
+      "step": 930
+    },
+    {
+      "epoch": 5.920634920634921,
+      "grad_norm": 2.4126765727996826,
+      "learning_rate": 4.853635762437159e-05,
+      "loss": 0.4075,
+      "step": 935
+    },
+    {
+      "epoch": 5.9523809523809526,
+      "grad_norm": 1.8691045045852661,
+      "learning_rate": 4.8504963546345334e-05,
+      "loss": 0.4865,
+      "step": 940
+    },
+    {
+      "epoch": 5.984126984126984,
+      "grad_norm": 3.5297420024871826,
+      "learning_rate": 4.8473246714042155e-05,
+      "loss": 0.4623,
+      "step": 945
+    },
+    {
+      "epoch": 6.012698412698413,
+      "grad_norm": 2.059169054031372,
+      "learning_rate": 4.844120756297617e-05,
+      "loss": 0.4164,
+      "step": 950
+    },
+    {
+      "epoch": 6.044444444444444,
+      "grad_norm": 2.4746127128601074,
+      "learning_rate": 4.840884653308735e-05,
+      "loss": 0.3552,
+      "step": 955
+    },
+    {
+      "epoch": 6.076190476190476,
+      "grad_norm": 2.504425287246704,
+      "learning_rate": 4.8376164068735485e-05,
+      "loss": 0.3368,
+      "step": 960
+    },
+    {
+      "epoch": 6.1079365079365076,
+      "grad_norm": 2.062577486038208,
+      "learning_rate": 4.83431606186941e-05,
+      "loss": 0.3139,
+      "step": 965
+    },
+    {
+      "epoch": 6.13968253968254,
+      "grad_norm": 2.4934544563293457,
+      "learning_rate": 4.830983663614427e-05,
+      "loss": 0.3777,
+      "step": 970
+    },
+    {
+      "epoch": 6.171428571428572,
+      "grad_norm": 2.5747485160827637,
+      "learning_rate": 4.827619257866839e-05,
+      "loss": 0.373,
+      "step": 975
+    },
+    {
+      "epoch": 6.203174603174603,
+      "grad_norm": 2.449357271194458,
+      "learning_rate": 4.8242228908243946e-05,
+      "loss": 0.3936,
+      "step": 980
+    },
+    {
+      "epoch": 6.234920634920635,
+      "grad_norm": 2.952680826187134,
+      "learning_rate": 4.82079460912371e-05,
+      "loss": 0.407,
+      "step": 985
+    },
+    {
+      "epoch": 6.266666666666667,
+      "grad_norm": 2.1754496097564697,
+      "learning_rate": 4.817334459839633e-05,
+      "loss": 0.3189,
+      "step": 990
+    },
+    {
+      "epoch": 6.298412698412698,
+      "grad_norm": 2.8406214714050293,
+      "learning_rate": 4.8138424904845947e-05,
+      "loss": 0.3883,
+      "step": 995
+    },
+    {
+      "epoch": 6.33015873015873,
+      "grad_norm": 1.7533257007598877,
+      "learning_rate": 4.8103187490079604e-05,
+      "loss": 0.3131,
+      "step": 1000
+    },
+    {
+      "epoch": 6.3619047619047615,
+      "grad_norm": 2.4574601650238037,
+      "learning_rate": 4.806763283795366e-05,
+      "loss": 0.3606,
+      "step": 1005
+    },
+    {
+      "epoch": 6.393650793650794,
+      "grad_norm": 2.002281427383423,
+      "learning_rate": 4.8031761436680575e-05,
+      "loss": 0.37,
+      "step": 1010
+    },
+    {
+      "epoch": 6.425396825396826,
+      "grad_norm": 2.823315143585205,
+      "learning_rate": 4.79955737788222e-05,
+      "loss": 0.3791,
+      "step": 1015
+    },
+    {
+      "epoch": 6.457142857142857,
+      "grad_norm": 2.7891204357147217,
+      "learning_rate": 4.795907036128299e-05,
+      "loss": 0.3556,
+      "step": 1020
+    },
+    {
+      "epoch": 6.488888888888889,
+      "grad_norm": 2.2387146949768066,
+      "learning_rate": 4.7922251685303213e-05,
+      "loss": 0.3929,
+      "step": 1025
+    },
+    {
+      "epoch": 6.520634920634921,
+      "grad_norm": 2.5023891925811768,
+      "learning_rate": 4.788511825645205e-05,
+      "loss": 0.379,
+      "step": 1030
+    },
+    {
+      "epoch": 6.552380952380952,
+      "grad_norm": 2.2654805183410645,
+      "learning_rate": 4.7847670584620653e-05,
+      "loss": 0.3435,
+      "step": 1035
+    },
+    {
+      "epoch": 6.584126984126984,
+      "grad_norm": 3.3823065757751465,
+      "learning_rate": 4.7809909184015146e-05,
+      "loss": 0.4109,
+      "step": 1040
+    },
+    {
+      "epoch": 6.6158730158730155,
+      "grad_norm": 2.6096551418304443,
+      "learning_rate": 4.7771834573149576e-05,
+      "loss": 0.4233,
+      "step": 1045
+    },
+    {
+      "epoch": 6.647619047619048,
+      "grad_norm": 2.3933897018432617,
+      "learning_rate": 4.773344727483876e-05,
+      "loss": 0.3709,
+      "step": 1050
+    },
+    {
+      "epoch": 6.67936507936508,
+      "grad_norm": 2.189544916152954,
+      "learning_rate": 4.769474781619114e-05,
+      "loss": 0.3287,
+      "step": 1055
+    },
+    {
+      "epoch": 6.711111111111111,
+      "grad_norm": 2.450892686843872,
+      "learning_rate": 4.765573672860154e-05,
+      "loss": 0.4022,
+      "step": 1060
+    },
+    {
+      "epoch": 6.742857142857143,
+      "grad_norm": 2.4342429637908936,
+      "learning_rate": 4.761641454774386e-05,
+      "loss": 0.4029,
+      "step": 1065
+    },
+    {
+      "epoch": 6.7746031746031745,
+      "grad_norm": 2.2122364044189453,
+      "learning_rate": 4.75767818135637e-05,
+      "loss": 0.3322,
+      "step": 1070
+    },
+    {
+      "epoch": 6.806349206349206,
+      "grad_norm": 3.968445301055908,
+      "learning_rate": 4.7536839070271e-05,
+      "loss": 0.3836,
+      "step": 1075
+    },
+    {
+      "epoch": 6.838095238095238,
+      "grad_norm": 3.529158353805542,
+      "learning_rate": 4.749658686633251e-05,
+      "loss": 0.4745,
+      "step": 1080
+    },
+    {
+      "epoch": 6.86984126984127,
+      "grad_norm": 2.430727243423462,
+      "learning_rate": 4.7456025754464304e-05,
+      "loss": 0.3664,
+      "step": 1085
+    },
+    {
+      "epoch": 6.901587301587302,
+      "grad_norm": 2.6552302837371826,
+      "learning_rate": 4.7415156291624166e-05,
+      "loss": 0.4359,
+      "step": 1090
+    },
+    {
+      "epoch": 6.933333333333334,
+      "grad_norm": 2.134822130203247,
+      "learning_rate": 4.737397903900393e-05,
+      "loss": 0.3969,
+      "step": 1095
+    },
+    {
+      "epoch": 6.965079365079365,
+      "grad_norm": 2.5052947998046875,
+      "learning_rate": 4.7332494562021815e-05,
+      "loss": 0.4069,
+      "step": 1100
+    },
+    {
+      "epoch": 6.996825396825397,
+      "grad_norm": 2.1377065181732178,
+      "learning_rate": 4.729070343031463e-05,
+      "loss": 0.3853,
+      "step": 1105
+    },
+    {
+      "epoch": 7.025396825396825,
+      "grad_norm": 1.9704042673110962,
+      "learning_rate": 4.724860621772995e-05,
+      "loss": 0.3283,
+      "step": 1110
+    },
+    {
+      "epoch": 7.057142857142857,
+      "grad_norm": 2.476968765258789,
+      "learning_rate": 4.7206203502318256e-05,
+      "loss": 0.3325,
+      "step": 1115
+    },
+    {
+      "epoch": 7.088888888888889,
+      "grad_norm": 1.9231969118118286,
+      "learning_rate": 4.716349586632499e-05,
+      "loss": 0.2876,
+      "step": 1120
+    },
+    {
+      "epoch": 7.12063492063492,
+      "grad_norm": 2.6444814205169678,
+      "learning_rate": 4.712048389618254e-05,
+      "loss": 0.3005,
+      "step": 1125
+    },
+    {
+      "epoch": 7.152380952380953,
+      "grad_norm": 3.2589964866638184,
+      "learning_rate": 4.7077168182502216e-05,
+      "loss": 0.4023,
+      "step": 1130
+    },
+    {
+      "epoch": 7.184126984126984,
+      "grad_norm": 2.5481936931610107,
+      "learning_rate": 4.703354932006615e-05,
+      "loss": 0.3302,
+      "step": 1135
+    },
+    {
+      "epoch": 7.215873015873016,
+      "grad_norm": 1.7125908136367798,
+      "learning_rate": 4.698962790781906e-05,
+      "loss": 0.3329,
+      "step": 1140
+    },
+    {
+      "epoch": 7.247619047619048,
+      "grad_norm": 2.2756667137145996,
+      "learning_rate": 4.6945404548860115e-05,
+      "loss": 0.3369,
+      "step": 1145
+    },
+    {
+      "epoch": 7.279365079365079,
+      "grad_norm": 2.9158453941345215,
+      "learning_rate": 4.6900879850434604e-05,
+      "loss": 0.3339,
+      "step": 1150
+    },
+    {
+      "epoch": 7.311111111111111,
+      "grad_norm": 2.3047537803649902,
+      "learning_rate": 4.685605442392559e-05,
+      "loss": 0.3915,
+      "step": 1155
+    },
+    {
+      "epoch": 7.3428571428571425,
+      "grad_norm": 2.7815029621124268,
+      "learning_rate": 4.681092888484554e-05,
+      "loss": 0.3317,
+      "step": 1160
+    },
+    {
+      "epoch": 7.374603174603175,
+      "grad_norm": 2.2644097805023193,
+      "learning_rate": 4.676550385282787e-05,
+      "loss": 0.3314,
+      "step": 1165
+    },
+    {
+      "epoch": 7.406349206349207,
+      "grad_norm": 2.5144474506378174,
+      "learning_rate": 4.671977995161843e-05,
+      "loss": 0.3188,
+      "step": 1170
+    },
+    {
+      "epoch": 7.438095238095238,
+      "grad_norm": 3.120821714401245,
+      "learning_rate": 4.667375780906693e-05,
+      "loss": 0.3523,
+      "step": 1175
+    },
+    {
+      "epoch": 7.46984126984127,
+      "grad_norm": 4.47842264175415,
+      "learning_rate": 4.662743805711832e-05,
+      "loss": 0.3611,
+      "step": 1180
+    },
+    {
+      "epoch": 7.501587301587302,
+      "grad_norm": 1.9228928089141846,
+      "learning_rate": 4.658082133180416e-05,
+      "loss": 0.3612,
+      "step": 1185
+    },
+    {
+      "epoch": 7.533333333333333,
+      "grad_norm": 2.1507537364959717,
+      "learning_rate": 4.6533908273233815e-05,
+      "loss": 0.3321,
+      "step": 1190
+    },
+    {
+      "epoch": 7.565079365079365,
+      "grad_norm": 2.1849119663238525,
+      "learning_rate": 4.64866995255857e-05,
+      "loss": 0.2943,
+      "step": 1195
+    },
+    {
+      "epoch": 7.5968253968253965,
+      "grad_norm": 2.1777775287628174,
+      "learning_rate": 4.643919573709843e-05,
+      "loss": 0.353,
+      "step": 1200
+    },
+    {
+      "epoch": 7.628571428571428,
+      "grad_norm": 2.5231118202209473,
+      "learning_rate": 4.639139756006195e-05,
+      "loss": 0.3571,
+      "step": 1205
+    },
+    {
+      "epoch": 7.660317460317461,
+      "grad_norm": 1.8409479856491089,
+      "learning_rate": 4.6343305650808516e-05,
+      "loss": 0.3691,
+      "step": 1210
+    },
+    {
+      "epoch": 7.692063492063492,
+      "grad_norm": 1.7940895557403564,
+      "learning_rate": 4.629492066970373e-05,
+      "loss": 0.3738,
+      "step": 1215
+    },
+    {
+      "epoch": 7.723809523809524,
+      "grad_norm": 2.014902114868164,
+      "learning_rate": 4.6246243281137474e-05,
+      "loss": 0.361,
+      "step": 1220
+    },
+    {
+      "epoch": 7.7555555555555555,
+      "grad_norm": 3.4182560443878174,
+      "learning_rate": 4.6197274153514735e-05,
+      "loss": 0.3663,
+      "step": 1225
+    },
+    {
+      "epoch": 7.787301587301587,
+      "grad_norm": 2.518728256225586,
+      "learning_rate": 4.614801395924649e-05,
+      "loss": 0.3646,
+      "step": 1230
+    },
+    {
+      "epoch": 7.819047619047619,
+      "grad_norm": 2.154189109802246,
+      "learning_rate": 4.6098463374740466e-05,
+      "loss": 0.3331,
+      "step": 1235
+    },
+    {
+      "epoch": 7.85079365079365,
+      "grad_norm": 2.536081075668335,
+      "learning_rate": 4.604862308039177e-05,
+      "loss": 0.3742,
+      "step": 1240
+    },
+    {
+      "epoch": 7.882539682539683,
+      "grad_norm": 2.340764045715332,
+      "learning_rate": 4.599849376057366e-05,
+      "loss": 0.3352,
+      "step": 1245
+    },
+    {
+      "epoch": 7.914285714285715,
+      "grad_norm": 3.5488364696502686,
+      "learning_rate": 4.5948076103628094e-05,
+      "loss": 0.3663,
+      "step": 1250
+    },
+    {
+      "epoch": 7.946031746031746,
+      "grad_norm": 2.779360294342041,
+      "learning_rate": 4.589737080185625e-05,
+      "loss": 0.3362,
+      "step": 1255
+    },
+    {
+      "epoch": 7.977777777777778,
+      "grad_norm": 1.8792667388916016,
+      "learning_rate": 4.5846378551509097e-05,
+      "loss": 0.346,
+      "step": 1260
+    },
+    {
+      "epoch": 8.006349206349206,
+      "grad_norm": 2.453295946121216,
+      "learning_rate": 4.579510005277774e-05,
+      "loss": 0.3509,
+      "step": 1265
+    },
+    {
+      "epoch": 8.038095238095238,
+      "grad_norm": 1.9493130445480347,
+      "learning_rate": 4.574353600978388e-05,
+      "loss": 0.3062,
+      "step": 1270
+    },
+    {
+      "epoch": 8.06984126984127,
+      "grad_norm": 1.9360930919647217,
+      "learning_rate": 4.56916871305701e-05,
+      "loss": 0.3056,
+      "step": 1275
+    },
+    {
+      "epoch": 8.101587301587301,
+      "grad_norm": 1.5592070817947388,
+      "learning_rate": 4.563955412709021e-05,
+      "loss": 0.2785,
+      "step": 1280
+    },
+    {
+      "epoch": 8.133333333333333,
+      "grad_norm": 1.8093425035476685,
+      "learning_rate": 4.5587137715199354e-05,
+      "loss": 0.308,
+      "step": 1285
+    },
+    {
+      "epoch": 8.165079365079364,
+      "grad_norm": 2.2939181327819824,
+      "learning_rate": 4.5534438614644294e-05,
+      "loss": 0.3038,
+      "step": 1290
+    },
+    {
+      "epoch": 8.196825396825396,
+      "grad_norm": 2.4204866886138916,
+      "learning_rate": 4.548145754905346e-05,
+      "loss": 0.3375,
+      "step": 1295
+    },
+    {
+      "epoch": 8.228571428571428,
+      "grad_norm": 1.725534439086914,
+      "learning_rate": 4.5428195245927064e-05,
+      "loss": 0.3101,
+      "step": 1300
+    },
+    {
+      "epoch": 8.260317460317461,
+      "grad_norm": 1.637730360031128,
+      "learning_rate": 4.537465243662704e-05,
+      "loss": 0.2931,
+      "step": 1305
+    },
+    {
+      "epoch": 8.292063492063493,
+      "grad_norm": 1.3372169733047485,
+      "learning_rate": 4.532082985636709e-05,
+      "loss": 0.2763,
+      "step": 1310
+    },
+    {
+      "epoch": 8.323809523809524,
+      "grad_norm": 2.5993168354034424,
+      "learning_rate": 4.5266728244202494e-05,
+      "loss": 0.3458,
+      "step": 1315
+    },
+    {
+      "epoch": 8.355555555555556,
+      "grad_norm": 2.461862564086914,
+      "learning_rate": 4.521234834302006e-05,
+      "loss": 0.3693,
+      "step": 1320
+    },
+    {
+      "epoch": 8.387301587301588,
+      "grad_norm": 1.8519413471221924,
+      "learning_rate": 4.5157690899527816e-05,
+      "loss": 0.3327,
+      "step": 1325
+    },
+    {
+      "epoch": 8.41904761904762,
+      "grad_norm": 2.1535580158233643,
+      "learning_rate": 4.510275666424487e-05,
+      "loss": 0.3229,
+      "step": 1330
+    },
+    {
+      "epoch": 8.450793650793651,
+      "grad_norm": 1.6819690465927124,
+      "learning_rate": 4.5047546391491e-05,
+      "loss": 0.2925,
+      "step": 1335
+    },
+    {
+      "epoch": 8.482539682539683,
+      "grad_norm": 1.6538281440734863,
+      "learning_rate": 4.499206083937638e-05,
+      "loss": 0.3218,
+      "step": 1340
+    },
+    {
+      "epoch": 8.514285714285714,
+      "grad_norm": 1.8956862688064575,
+      "learning_rate": 4.493630076979112e-05,
+      "loss": 0.3423,
+      "step": 1345
+    },
+    {
+      "epoch": 8.546031746031746,
+      "grad_norm": 2.274681806564331,
+      "learning_rate": 4.48802669483948e-05,
+      "loss": 0.3152,
+      "step": 1350
+    },
+    {
+      "epoch": 8.577777777777778,
+      "grad_norm": 2.2956337928771973,
+      "learning_rate": 4.4823960144606014e-05,
+      "loss": 0.3417,
+      "step": 1355
+    },
+    {
+      "epoch": 8.60952380952381,
+      "grad_norm": 1.8650286197662354,
+      "learning_rate": 4.4767381131591734e-05,
+      "loss": 0.2896,
+      "step": 1360
+    },
+    {
+      "epoch": 8.64126984126984,
+      "grad_norm": 1.3998652696609497,
+      "learning_rate": 4.471053068625674e-05,
+      "loss": 0.3372,
+      "step": 1365
+    },
+    {
+      "epoch": 8.673015873015872,
+      "grad_norm": 2.855074167251587,
+      "learning_rate": 4.465340958923293e-05,
+      "loss": 0.332,
+      "step": 1370
+    },
+    {
+      "epoch": 8.704761904761904,
+      "grad_norm": 1.6865357160568237,
+      "learning_rate": 4.459601862486862e-05,
+      "loss": 0.3053,
+      "step": 1375
+    },
+    {
+      "epoch": 8.736507936507937,
+      "grad_norm": 2.501856803894043,
+      "learning_rate": 4.453835858121773e-05,
+      "loss": 0.3119,
+      "step": 1380
+    },
+    {
+      "epoch": 8.768253968253969,
+      "grad_norm": 2.4325456619262695,
+      "learning_rate": 4.4480430250029046e-05,
+      "loss": 0.3395,
+      "step": 1385
+    },
+    {
+      "epoch": 8.8,
+      "grad_norm": 1.4845948219299316,
+      "learning_rate": 4.4422234426735256e-05,
+      "loss": 0.3237,
+      "step": 1390
+    },
+    {
+      "epoch": 8.831746031746032,
+      "grad_norm": 1.3553249835968018,
+      "learning_rate": 4.436377191044208e-05,
+      "loss": 0.3387,
+      "step": 1395
+    },
+    {
+      "epoch": 8.863492063492064,
+      "grad_norm": 1.8338890075683594,
+      "learning_rate": 4.430504350391729e-05,
+      "loss": 0.3618,
+      "step": 1400
+    },
+    {
+      "epoch": 8.895238095238096,
+      "grad_norm": 2.291538953781128,
+      "learning_rate": 4.4246050013579686e-05,
+      "loss": 0.3608,
+      "step": 1405
+    },
+    {
+      "epoch": 8.926984126984127,
+      "grad_norm": 1.3809788227081299,
+      "learning_rate": 4.4186792249488005e-05,
+      "loss": 0.3077,
+      "step": 1410
+    },
+    {
+      "epoch": 8.958730158730159,
+      "grad_norm": 1.5944230556488037,
+      "learning_rate": 4.412727102532983e-05,
+      "loss": 0.3307,
+      "step": 1415
+    },
+    {
+      "epoch": 8.99047619047619,
+      "grad_norm": 2.2244362831115723,
+      "learning_rate": 4.4067487158410396e-05,
+      "loss": 0.3469,
+      "step": 1420
+    },
+    {
+      "epoch": 9.019047619047619,
+      "grad_norm": 1.444221019744873,
+      "learning_rate": 4.400744146964136e-05,
+      "loss": 0.3049,
+      "step": 1425
+    },
+    {
+      "epoch": 9.05079365079365,
+      "grad_norm": 1.5847752094268799,
+      "learning_rate": 4.394713478352955e-05,
+      "loss": 0.2715,
+      "step": 1430
+    },
+    {
+      "epoch": 9.082539682539682,
+      "grad_norm": 1.6062681674957275,
+      "learning_rate": 4.388656792816562e-05,
+      "loss": 0.2487,
+      "step": 1435
+    },
+    {
+      "epoch": 9.114285714285714,
+      "grad_norm": 2.099787712097168,
+      "learning_rate": 4.382574173521272e-05,
+      "loss": 0.2866,
+      "step": 1440
+    },
+    {
+      "epoch": 9.146031746031746,
+      "grad_norm": 1.0997334718704224,
+      "learning_rate": 4.376465703989502e-05,
+      "loss": 0.3052,
+      "step": 1445
+    },
+    {
+      "epoch": 9.177777777777777,
+      "grad_norm": 2.4327454566955566,
+      "learning_rate": 4.370331468098628e-05,
+      "loss": 0.3212,
+      "step": 1450
+    },
+    {
+      "epoch": 9.209523809523809,
+      "grad_norm": 1.4816385507583618,
+      "learning_rate": 4.364171550079833e-05,
+      "loss": 0.3046,
+      "step": 1455
+    },
+    {
+      "epoch": 9.24126984126984,
+      "grad_norm": 2.039186716079712,
+      "learning_rate": 4.357986034516947e-05,
+      "loss": 0.3165,
+      "step": 1460
+    },
+    {
+      "epoch": 9.273015873015874,
+      "grad_norm": 1.437852382659912,
+      "learning_rate": 4.3517750063452934e-05,
+      "loss": 0.3037,
+      "step": 1465
+    },
+    {
+      "epoch": 9.304761904761905,
+      "grad_norm": 1.818982720375061,
+      "learning_rate": 4.345538550850512e-05,
+      "loss": 0.3122,
+      "step": 1470
+    },
+    {
+      "epoch": 9.336507936507937,
+      "grad_norm": 1.12025785446167,
+      "learning_rate": 4.339276753667395e-05,
+      "loss": 0.2909,
+      "step": 1475
+    },
+    {
+      "epoch": 9.368253968253969,
+      "grad_norm": 1.6094844341278076,
+      "learning_rate": 4.3329897007787125e-05,
+      "loss": 0.2823,
+      "step": 1480
+    },
+    {
+      "epoch": 9.4,
+      "grad_norm": 1.916200041770935,
+      "learning_rate": 4.326677478514024e-05,
+      "loss": 0.2939,
+      "step": 1485
+    },
+    {
+      "epoch": 9.431746031746032,
+      "grad_norm": 1.97919499874115,
+      "learning_rate": 4.320340173548503e-05,
+      "loss": 0.2826,
+      "step": 1490
+    },
+    {
+      "epoch": 9.463492063492064,
+      "grad_norm": 2.0238938331604004,
+      "learning_rate": 4.313977872901737e-05,
+      "loss": 0.3273,
+      "step": 1495
+    },
+    {
+      "epoch": 9.495238095238095,
+      "grad_norm": 2.5840957164764404,
+      "learning_rate": 4.307590663936541e-05,
+      "loss": 0.2889,
+      "step": 1500
+    },
+    {
+      "epoch": 9.526984126984127,
+      "grad_norm": 2.3503904342651367,
+      "learning_rate": 4.30117863435775e-05,
+      "loss": 0.3012,
+      "step": 1505
+    },
+    {
+      "epoch": 9.558730158730159,
+      "grad_norm": 2.019792318344116,
+      "learning_rate": 4.294741872211024e-05,
+      "loss": 0.3267,
+      "step": 1510
+    },
+    {
+      "epoch": 9.59047619047619,
+      "grad_norm": 2.2713353633880615,
+      "learning_rate": 4.288280465881632e-05,
+      "loss": 0.3096,
+      "step": 1515
+    },
+    {
+      "epoch": 9.622222222222222,
+      "grad_norm": 2.4236693382263184,
+      "learning_rate": 4.281794504093237e-05,
+      "loss": 0.3291,
+      "step": 1520
+    },
+    {
+      "epoch": 9.653968253968253,
+      "grad_norm": 1.772703766822815,
+      "learning_rate": 4.275284075906686e-05,
+      "loss": 0.3117,
+      "step": 1525
+    },
+    {
+      "epoch": 9.685714285714285,
+      "grad_norm": 1.9665186405181885,
+      "learning_rate": 4.268749270718778e-05,
+      "loss": 0.326,
+      "step": 1530
+    },
+    {
+      "epoch": 9.717460317460317,
+      "grad_norm": 1.9472782611846924,
+      "learning_rate": 4.262190178261044e-05,
+      "loss": 0.2683,
+      "step": 1535
+    },
+    {
+      "epoch": 9.74920634920635,
+      "grad_norm": 2.0638089179992676,
+      "learning_rate": 4.255606888598508e-05,
+      "loss": 0.314,
+      "step": 1540
+    },
+    {
+      "epoch": 9.780952380952382,
+      "grad_norm": 2.1349925994873047,
+      "learning_rate": 4.248999492128456e-05,
+      "loss": 0.2897,
+      "step": 1545
+    },
+    {
+      "epoch": 9.812698412698413,
+      "grad_norm": 2.112536907196045,
+      "learning_rate": 4.242368079579192e-05,
+      "loss": 0.31,
+      "step": 1550
+    },
+    {
+      "epoch": 9.844444444444445,
+      "grad_norm": 1.6859878301620483,
+      "learning_rate": 4.2357127420087917e-05,
+      "loss": 0.3412,
+      "step": 1555
+    },
+    {
+      "epoch": 9.876190476190477,
+      "grad_norm": 1.9178651571273804,
+      "learning_rate": 4.229033570803853e-05,
+      "loss": 0.334,
+      "step": 1560
+    },
+    {
+      "epoch": 9.907936507936508,
+      "grad_norm": 2.562436103820801,
+      "learning_rate": 4.2223306576782426e-05,
+      "loss": 0.3379,
+      "step": 1565
+    },
+    {
+      "epoch": 9.93968253968254,
+      "grad_norm": 1.8472412824630737,
+      "learning_rate": 4.215604094671835e-05,
+      "loss": 0.3415,
+      "step": 1570
+    },
+    {
+      "epoch": 9.971428571428572,
+      "grad_norm": 1.9416279792785645,
+      "learning_rate": 4.208853974149246e-05,
+      "loss": 0.3085,
+      "step": 1575
+    },
+    {
+      "epoch": 10.0,
+      "grad_norm": 2.0056397914886475,
+      "learning_rate": 4.202080388798571e-05,
+      "loss": 0.3263,
+      "step": 1580
+    },
+    {
+      "epoch": 10.031746031746032,
+      "grad_norm": 2.195781946182251,
+      "learning_rate": 4.1952834316301065e-05,
+      "loss": 0.2867,
+      "step": 1585
+    },
+    {
+      "epoch": 10.063492063492063,
+      "grad_norm": 1.7489805221557617,
+      "learning_rate": 4.1884631959750766e-05,
+      "loss": 0.2589,
+      "step": 1590
+    },
+    {
+      "epoch": 10.095238095238095,
+      "grad_norm": 1.9361369609832764,
+      "learning_rate": 4.181619775484348e-05,
+      "loss": 0.2722,
+      "step": 1595
+    },
+    {
+      "epoch": 10.126984126984127,
+      "grad_norm": 2.24322509765625,
+      "learning_rate": 4.174753264127147e-05,
+      "loss": 0.2534,
+      "step": 1600
+    },
+    {
+      "epoch": 10.158730158730158,
+      "grad_norm": 2.4550466537475586,
+      "learning_rate": 4.167863756189767e-05,
+      "loss": 0.2777,
+      "step": 1605
+    },
+    {
+      "epoch": 10.19047619047619,
+      "grad_norm": 1.9439811706542969,
+      "learning_rate": 4.160951346274278e-05,
+      "loss": 0.2864,
+      "step": 1610
+    },
+    {
+      "epoch": 10.222222222222221,
+      "grad_norm": 1.633494257926941,
+      "learning_rate": 4.154016129297219e-05,
+      "loss": 0.2858,
+      "step": 1615
+    },
+    {
+      "epoch": 10.253968253968253,
+      "grad_norm": 1.69782292842865,
+      "learning_rate": 4.147058200488305e-05,
+      "loss": 0.2942,
+      "step": 1620
+    },
+    {
+      "epoch": 10.285714285714286,
+      "grad_norm": 1.613031268119812,
+      "learning_rate": 4.140077655389113e-05,
+      "loss": 0.2632,
+      "step": 1625
+    },
+    {
+      "epoch": 10.317460317460318,
+      "grad_norm": 2.0266177654266357,
+      "learning_rate": 4.1330745898517714e-05,
+      "loss": 0.3011,
+      "step": 1630
+    },
+    {
+      "epoch": 10.34920634920635,
+      "grad_norm": 1.8945387601852417,
+      "learning_rate": 4.1260491000376446e-05,
+      "loss": 0.2832,
+      "step": 1635
+    },
+    {
+      "epoch": 10.380952380952381,
+      "grad_norm": 1.7012510299682617,
+      "learning_rate": 4.119001282416009e-05,
+      "loss": 0.2718,
+      "step": 1640
+    },
+    {
+      "epoch": 10.412698412698413,
+      "grad_norm": 1.5538525581359863,
+      "learning_rate": 4.111931233762738e-05,
+      "loss": 0.3232,
+      "step": 1645
+    },
+    {
+      "epoch": 10.444444444444445,
+      "grad_norm": 2.3083150386810303,
+      "learning_rate": 4.1048390511589595e-05,
+      "loss": 0.3057,
+      "step": 1650
+    },
+    {
+      "epoch": 10.476190476190476,
+      "grad_norm": 1.293314814567566,
+      "learning_rate": 4.097724831989733e-05,
+      "loss": 0.2523,
+      "step": 1655
+    },
+    {
+      "epoch": 10.507936507936508,
+      "grad_norm": 2.517212152481079,
+      "learning_rate": 4.09058867394271e-05,
+      "loss": 0.3269,
+      "step": 1660
+    },
+    {
+      "epoch": 10.53968253968254,
+      "grad_norm": 2.057063102722168,
+      "learning_rate": 4.083430675006791e-05,
+      "loss": 0.2844,
+      "step": 1665
+    },
+    {
+      "epoch": 10.571428571428571,
+      "grad_norm": 1.5663833618164062,
+      "learning_rate": 4.0762509334707786e-05,
+      "loss": 0.3005,
+      "step": 1670
+    },
+    {
+      "epoch": 10.603174603174603,
+      "grad_norm": 2.5423505306243896,
+      "learning_rate": 4.069049547922035e-05,
+      "loss": 0.2802,
+      "step": 1675
+    },
+    {
+      "epoch": 10.634920634920634,
+      "grad_norm": 1.578316569328308,
+      "learning_rate": 4.061826617245119e-05,
+      "loss": 0.2667,
+      "step": 1680
+    },
+    {
+      "epoch": 10.666666666666666,
+      "grad_norm": 1.502928376197815,
+      "learning_rate": 4.0545822406204334e-05,
+      "loss": 0.3059,
+      "step": 1685
+    },
+    {
+      "epoch": 10.698412698412698,
+      "grad_norm": 1.2470905780792236,
+      "learning_rate": 4.047316517522864e-05,
+      "loss": 0.2879,
+      "step": 1690
+    },
+    {
+      "epoch": 10.73015873015873,
+      "grad_norm": 1.8238775730133057,
+      "learning_rate": 4.0400295477204105e-05,
+      "loss": 0.2923,
+      "step": 1695
+    },
+    {
+      "epoch": 10.761904761904763,
+      "grad_norm": 2.0516586303710938,
+      "learning_rate": 4.032721431272819e-05,
+      "loss": 0.3086,
+      "step": 1700
+    },
+    {
+      "epoch": 10.793650793650794,
+      "grad_norm": 1.3188791275024414,
+      "learning_rate": 4.0253922685302046e-05,
+      "loss": 0.2893,
+      "step": 1705
+    },
+    {
+      "epoch": 10.825396825396826,
+      "grad_norm": 1.7352266311645508,
+      "learning_rate": 4.01804216013168e-05,
+      "loss": 0.2981,
+      "step": 1710
+    },
+    {
+      "epoch": 10.857142857142858,
+      "grad_norm": 1.3449515104293823,
+      "learning_rate": 4.0106712070039656e-05,
+      "loss": 0.2841,
+      "step": 1715
+    },
+    {
+      "epoch": 10.88888888888889,
+      "grad_norm": 2.505431890487671,
+      "learning_rate": 4.00327951036001e-05,
+      "loss": 0.3034,
+      "step": 1720
+    },
+    {
+      "epoch": 10.920634920634921,
+      "grad_norm": 1.8870325088500977,
+      "learning_rate": 3.9958671716975966e-05,
+      "loss": 0.305,
+      "step": 1725
+    },
+    {
+      "epoch": 10.952380952380953,
+      "grad_norm": 2.913130044937134,
+      "learning_rate": 3.988434292797951e-05,
+      "loss": 0.3212,
+      "step": 1730
+    },
+    {
+      "epoch": 10.984126984126984,
+      "grad_norm": 1.7870115041732788,
+      "learning_rate": 3.980980975724344e-05,
+      "loss": 0.3108,
+      "step": 1735
+    },
+    {
+      "epoch": 11.012698412698413,
+      "grad_norm": 3.050985336303711,
+      "learning_rate": 3.9735073228206896e-05,
+      "loss": 0.3043,
+      "step": 1740
+    },
+    {
+      "epoch": 11.044444444444444,
+      "grad_norm": 1.5993611812591553,
+      "learning_rate": 3.96601343671014e-05,
+      "loss": 0.2465,
+      "step": 1745
+    },
+    {
+      "epoch": 11.076190476190476,
+      "grad_norm": 1.626888632774353,
+      "learning_rate": 3.9584994202936746e-05,
+      "loss": 0.2688,
+      "step": 1750
+    },
+    {
+      "epoch": 11.107936507936508,
+      "grad_norm": 1.7132880687713623,
+      "learning_rate": 3.950965376748689e-05,
+      "loss": 0.2458,
+      "step": 1755
+    },
+    {
+      "epoch": 11.13968253968254,
+      "grad_norm": 1.7764930725097656,
+      "learning_rate": 3.94341140952758e-05,
+      "loss": 0.2189,
+      "step": 1760
+    },
+    {
+      "epoch": 11.17142857142857,
+      "grad_norm": 2.5560712814331055,
+      "learning_rate": 3.9358376223563206e-05,
+      "loss": 0.2866,
+      "step": 1765
+    },
+    {
+      "epoch": 11.203174603174602,
+      "grad_norm": 1.1177359819412231,
+      "learning_rate": 3.928244119233038e-05,
+      "loss": 0.233,
+      "step": 1770
+    },
+    {
+      "epoch": 11.234920634920634,
+      "grad_norm": 1.584670901298523,
+      "learning_rate": 3.9206310044265866e-05,
+      "loss": 0.273,
+      "step": 1775
+    },
+    {
+      "epoch": 11.266666666666667,
+      "grad_norm": 1.6278687715530396,
+      "learning_rate": 3.912998382475115e-05,
+      "loss": 0.2746,
+      "step": 1780
+    },
+    {
+      "epoch": 11.2984126984127,
+      "grad_norm": 1.657038688659668,
+      "learning_rate": 3.905346358184629e-05,
+      "loss": 0.2885,
+      "step": 1785
+    },
+    {
+      "epoch": 11.33015873015873,
+      "grad_norm": 1.2840272188186646,
+      "learning_rate": 3.897675036627557e-05,
+      "loss": 0.2932,
+      "step": 1790
+    },
+    {
+      "epoch": 11.361904761904762,
+      "grad_norm": 1.5678766965866089,
+      "learning_rate": 3.8899845231413026e-05,
+      "loss": 0.2945,
+      "step": 1795
+    },
+    {
+      "epoch": 11.393650793650794,
+      "grad_norm": 1.788948655128479,
+      "learning_rate": 3.8822749233268006e-05,
+      "loss": 0.3013,
+      "step": 1800
+    },
+    {
+      "epoch": 11.425396825396826,
+      "grad_norm": 1.2259769439697266,
+      "learning_rate": 3.8745463430470664e-05,
+      "loss": 0.2582,
+      "step": 1805
+    },
+    {
+      "epoch": 11.457142857142857,
+      "grad_norm": 1.5430735349655151,
+      "learning_rate": 3.866798888425741e-05,
+      "loss": 0.275,
+      "step": 1810
+    },
+    {
+      "epoch": 11.488888888888889,
+      "grad_norm": 1.9102168083190918,
+      "learning_rate": 3.8590326658456376e-05,
+      "loss": 0.2909,
+      "step": 1815
+    },
+    {
+      "epoch": 11.52063492063492,
+      "grad_norm": 1.6118320226669312,
+      "learning_rate": 3.851247781947277e-05,
+      "loss": 0.2922,
+      "step": 1820
+    },
+    {
+      "epoch": 11.552380952380952,
+      "grad_norm": 1.393646478652954,
+      "learning_rate": 3.843444343627424e-05,
+      "loss": 0.2783,
+      "step": 1825
+    },
+    {
+      "epoch": 11.584126984126984,
+      "grad_norm": 2.522909641265869,
+      "learning_rate": 3.83562245803762e-05,
+      "loss": 0.2933,
+      "step": 1830
+    },
+    {
+      "epoch": 11.615873015873015,
+      "grad_norm": 2.2534332275390625,
+      "learning_rate": 3.827782232582714e-05,
+      "loss": 0.3081,
+      "step": 1835
+    },
+    {
+      "epoch": 11.647619047619047,
+      "grad_norm": 2.4088056087493896,
+      "learning_rate": 3.819923774919383e-05,
+      "loss": 0.276,
+      "step": 1840
+    },
+    {
+      "epoch": 11.679365079365079,
+      "grad_norm": 1.7626562118530273,
+      "learning_rate": 3.8120471929546576e-05,
+      "loss": 0.2697,
+      "step": 1845
+    },
+    {
+      "epoch": 11.71111111111111,
+      "grad_norm": 1.8656691312789917,
+      "learning_rate": 3.8041525948444414e-05,
+      "loss": 0.2979,
+      "step": 1850
+    },
+    {
+      "epoch": 11.742857142857144,
+      "grad_norm": 1.6537258625030518,
+      "learning_rate": 3.7962400889920185e-05,
+      "loss": 0.3042,
+      "step": 1855
+    },
+    {
+      "epoch": 11.774603174603175,
+      "grad_norm": 1.696975827217102,
+      "learning_rate": 3.788309784046574e-05,
+      "loss": 0.2984,
+      "step": 1860
+    },
+    {
+      "epoch": 11.806349206349207,
+      "grad_norm": 1.976236343383789,
+      "learning_rate": 3.780361788901696e-05,
+      "loss": 0.2711,
+      "step": 1865
+    },
+    {
+      "epoch": 11.838095238095239,
+      "grad_norm": 1.7781676054000854,
+      "learning_rate": 3.772396212693885e-05,
+      "loss": 0.3116,
+      "step": 1870
+    },
+    {
+      "epoch": 11.86984126984127,
+      "grad_norm": 1.6252080202102661,
+      "learning_rate": 3.7644131648010494e-05,
+      "loss": 0.2879,
+      "step": 1875
+    },
+    {
+      "epoch": 11.901587301587302,
+      "grad_norm": 1.7511012554168701,
+      "learning_rate": 3.75641275484101e-05,
+      "loss": 0.3106,
+      "step": 1880
+    },
+    {
+      "epoch": 11.933333333333334,
+      "grad_norm": 1.5861046314239502,
+      "learning_rate": 3.7483950926699885e-05,
+      "loss": 0.2703,
+      "step": 1885
+    },
+    {
+      "epoch": 11.965079365079365,
+      "grad_norm": 1.903902530670166,
+      "learning_rate": 3.740360288381105e-05,
+      "loss": 0.2808,
+      "step": 1890
+    },
+    {
+      "epoch": 11.996825396825397,
+      "grad_norm": 1.644740104675293,
+      "learning_rate": 3.732308452302864e-05,
+      "loss": 0.2883,
+      "step": 1895
+    },
+    {
+      "epoch": 12.025396825396825,
+      "grad_norm": 1.3671797513961792,
+      "learning_rate": 3.724239694997637e-05,
+      "loss": 0.2661,
+      "step": 1900
+    },
+    {
+      "epoch": 12.057142857142857,
+      "grad_norm": 1.1910501718521118,
+      "learning_rate": 3.716154127260147e-05,
+      "loss": 0.2352,
+      "step": 1905
+    },
+    {
+      "epoch": 12.088888888888889,
+      "grad_norm": 1.45137619972229,
+      "learning_rate": 3.708051860115947e-05,
+      "loss": 0.2703,
+      "step": 1910
+    },
+    {
+      "epoch": 12.12063492063492,
+      "grad_norm": 2.625089645385742,
+      "learning_rate": 3.699933004819895e-05,
+      "loss": 0.2524,
+      "step": 1915
+    },
+    {
+      "epoch": 12.152380952380952,
+      "grad_norm": 1.731602430343628,
+      "learning_rate": 3.691797672854625e-05,
+      "loss": 0.2533,
+      "step": 1920
+    },
+    {
+      "epoch": 12.184126984126983,
+      "grad_norm": 2.1266143321990967,
+      "learning_rate": 3.683645975929019e-05,
+      "loss": 0.2666,
+      "step": 1925
+    },
+    {
+      "epoch": 12.215873015873015,
+      "grad_norm": 1.7163398265838623,
+      "learning_rate": 3.675478025976671e-05,
+      "loss": 0.2838,
+      "step": 1930
+    },
+    {
+      "epoch": 12.247619047619047,
+      "grad_norm": 1.8726729154586792,
+      "learning_rate": 3.66729393515435e-05,
+      "loss": 0.2646,
+      "step": 1935
+    },
+    {
+      "epoch": 12.27936507936508,
+      "grad_norm": 1.2229722738265991,
+      "learning_rate": 3.659093815840462e-05,
+      "loss": 0.267,
+      "step": 1940
+    },
+    {
+      "epoch": 12.311111111111112,
+      "grad_norm": 1.341997504234314,
+      "learning_rate": 3.650877780633505e-05,
+      "loss": 0.2464,
+      "step": 1945
+    },
+    {
+      "epoch": 12.342857142857143,
+      "grad_norm": 1.3466174602508545,
+      "learning_rate": 3.6426459423505214e-05,
+      "loss": 0.2521,
+      "step": 1950
+    },
+    {
+      "epoch": 12.374603174603175,
+      "grad_norm": 1.7022134065628052,
+      "learning_rate": 3.6343984140255516e-05,
+      "loss": 0.2662,
+      "step": 1955
+    },
+    {
+      "epoch": 12.406349206349207,
+      "grad_norm": 2.5058987140655518,
+      "learning_rate": 3.626135308908084e-05,
+      "loss": 0.2745,
+      "step": 1960
+    },
+    {
+      "epoch": 12.438095238095238,
+      "grad_norm": 1.7667903900146484,
+      "learning_rate": 3.6178567404614936e-05,
+      "loss": 0.2752,
+      "step": 1965
+    },
+    {
+      "epoch": 12.46984126984127,
+      "grad_norm": 1.7334171533584595,
+      "learning_rate": 3.609562822361487e-05,
+      "loss": 0.2667,
+      "step": 1970
+    },
+    {
+      "epoch": 12.501587301587302,
+      "grad_norm": 1.7093929052352905,
+      "learning_rate": 3.601253668494546e-05,
+      "loss": 0.2829,
+      "step": 1975
+    },
+    {
+      "epoch": 12.533333333333333,
+      "grad_norm": 1.8272080421447754,
+      "learning_rate": 3.592929392956355e-05,
+      "loss": 0.2583,
+      "step": 1980
+    },
+    {
+      "epoch": 12.565079365079365,
+      "grad_norm": 1.5226343870162964,
+      "learning_rate": 3.584590110050241e-05,
+      "loss": 0.2652,
+      "step": 1985
+    },
+    {
+      "epoch": 12.596825396825396,
+      "grad_norm": 1.7395323514938354,
+      "learning_rate": 3.5762359342856036e-05,
+      "loss": 0.2585,
+      "step": 1990
+    },
+    {
+      "epoch": 12.628571428571428,
+      "grad_norm": 2.278144359588623,
+      "learning_rate": 3.567866980376337e-05,
+      "loss": 0.2862,
+      "step": 1995
+    },
+    {
+      "epoch": 12.66031746031746,
+      "grad_norm": 1.9335429668426514,
+      "learning_rate": 3.559483363239262e-05,
+      "loss": 0.2945,
+      "step": 2000
+    },
+    {
+      "epoch": 12.692063492063491,
+      "grad_norm": 1.3866902589797974,
+      "learning_rate": 3.551085197992545e-05,
+      "loss": 0.2675,
+      "step": 2005
+    },
+    {
+      "epoch": 12.723809523809523,
+      "grad_norm": 1.7110604047775269,
+      "learning_rate": 3.5426725999541174e-05,
+      "loss": 0.2705,
+      "step": 2010
+    },
+    {
+      "epoch": 12.755555555555556,
+      "grad_norm": 1.6411776542663574,
+      "learning_rate": 3.534245684640089e-05,
+      "loss": 0.2482,
+      "step": 2015
+    },
+    {
+      "epoch": 12.787301587301588,
+      "grad_norm": 1.4663981199264526,
+      "learning_rate": 3.525804567763167e-05,
+      "loss": 0.2838,
+      "step": 2020
+    },
+    {
+      "epoch": 12.81904761904762,
+      "grad_norm": 1.8676140308380127,
+      "learning_rate": 3.517349365231065e-05,
+      "loss": 0.2906,
+      "step": 2025
+    },
+    {
+      "epoch": 12.850793650793651,
+      "grad_norm": 1.6150639057159424,
+      "learning_rate": 3.508880193144911e-05,
+      "loss": 0.3011,
+      "step": 2030
+    },
+    {
+      "epoch": 12.882539682539683,
+      "grad_norm": 2.5578598976135254,
+      "learning_rate": 3.500397167797654e-05,
+      "loss": 0.2846,
+      "step": 2035
+    },
+    {
+      "epoch": 12.914285714285715,
+      "grad_norm": 1.3131941556930542,
+      "learning_rate": 3.491900405672466e-05,
+      "loss": 0.2509,
+      "step": 2040
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 4710,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 30,
+  "save_steps": 157,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.384151192993792e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-2041/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-2355/adapter_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2.5-Coder-14B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "k_proj",
+    "o_proj",
+    "q_proj",
+    "up_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-2355/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,209 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "right",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-2669/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.0

checkpoint-2669/added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}