Upload 11 files
Browse files- README.md +108 -3
- all_results.json +16 -0
- config.json +32 -0
- eval_results.json +10 -0
- generation_config.json +6 -0
- model.safetensors +3 -0
- special_tokens_map.json +23 -0
- tokenizer.json +0 -0
- tokenizer_config.json +214 -0
- train_results.json +9 -0
- trainer_state.json +252 -0
    	
        README.md
    CHANGED
    
    | @@ -1,3 +1,108 @@ | |
| 1 | 
            -
             | 
| 2 | 
            -
             | 
| 3 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            # Pythia-14M Fine-Tuned for High-Quality English Sentence Generation
         | 
| 2 | 
            +
             | 
| 3 | 
            +
            This model is a fine-tuned version of the Pythia-14M language model, optimized for generating high-quality English sentences. It builds upon the base model [agentlans/pythia-14m-finewebedu-sentences](https://huggingface.co/agentlans/pythia-14m-finewebedu-sentences) and has been further trained on a curated dataset of well-formed English sentences [agentlans/high-quality-english-sentences](https://huggingface.co/datasets/agentlans/high-quality-english-sentences).
         | 
| 4 | 
            +
             | 
| 5 | 
            +
            ## Model Description
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            The model is based on the Pythia-14M architecture, which is a relatively compact language model. It has been fine-tuned specifically for generating (mostly) grammatically correct and coherent English sentences across a variety of topics and styles.
         | 
| 8 | 
            +
             | 
| 9 | 
            +
            ## Intended Uses & Limitations
         | 
| 10 | 
            +
             | 
| 11 | 
            +
            This model is designed for:
         | 
| 12 | 
            +
            - Generating high-quality English sentences
         | 
| 13 | 
            +
            - Completing partial sentences
         | 
| 14 | 
            +
            - Assisting with writing tasks that require well-formed English
         | 
| 15 | 
            +
             | 
| 16 | 
            +
            Limitations:
         | 
| 17 | 
            +
            - Not suitable for tasks requiring deep domain knowledge
         | 
| 18 | 
            +
            - May struggle with very long-form text generation
         | 
| 19 | 
            +
            - Fails on non-English text
         | 
| 20 | 
            +
            - It's tiny so don't expect too much
         | 
| 21 | 
            +
             | 
| 22 | 
            +
            ## Training Data
         | 
| 23 | 
            +
             | 
| 24 | 
            +
            The model was fine-tuned on a combination of datasets:
         | 
| 25 | 
            +
            - Web-scraped educational content (finewebedu)
         | 
| 26 | 
            +
            - High-quality web text (fineweb)
         | 
| 27 | 
            +
            - Filtered Common Crawl data (C4)
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            For the composition and preprocessing of the training data, see [agentlans/high-quality-english-sentences](https://huggingface.co/datasets/agentlans/high-quality-english-sentences).
         | 
| 30 | 
            +
             | 
| 31 | 
            +
            ## How To Use
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            To generate 10 random sentences starting from an empty string on a CUDA device:
         | 
| 34 | 
            +
             | 
| 35 | 
            +
            ```python
         | 
| 36 | 
            +
            from transformers import pipeline, set_seed
         | 
| 37 | 
            +
             | 
| 38 | 
            +
            generator = pipeline('text-generation', model='agentlans/pythia-14m-sentences', device='cuda')
         | 
| 39 | 
            +
             | 
| 40 | 
            +
            set_seed(1234)
         | 
| 41 | 
            +
            results = generator("", max_length=100, num_return_sequences=10, do_sample=True)
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            for x in results:
         | 
| 44 | 
            +
            	print(x['generated_text'])
         | 
| 45 | 
            +
            ```
         | 
| 46 | 
            +
             | 
| 47 | 
            +
            Output:
         | 
| 48 | 
            +
            ```text
         | 
| 49 | 
            +
            The most common cause of the number of diseases is the common cause of death.
         | 
| 50 | 
            +
            And there are many people in the war.
         | 
| 51 | 
            +
            The average household income is 35.5 percent.
         | 
| 52 | 
            +
            He was the most influential theologians of the country in this world.
         | 
| 53 | 
            +
            On the other hand, the students will be able to learn the value of the current and the time.
         | 
| 54 | 
            +
            However, the effect of the study would be greater than that of a drug-related drug drug.
         | 
| 55 | 
            +
            To understand today, our nation's largest international commitment to the use of new technology and technology across the country.
         | 
| 56 | 
            +
            On Sunday, the UK was first held in the state of the Australian, where a foreign trade union was used since the first year.
         | 
| 57 | 
            +
            I've said that the program is most effective in education in the middle of the world.
         | 
| 58 | 
            +
            So a year, it is important to identify a community where a student has a disability.
         | 
| 59 | 
            +
            ```
         | 
| 60 | 
            +
             | 
| 61 | 
            +
            To let the model continue the sentence:
         | 
| 62 | 
            +
             | 
| 63 | 
            +
            ```python
         | 
| 64 | 
            +
            results = generator("The meaning of life is", max_length=100, num_return_sequences=10, do_sample=True)
         | 
| 65 | 
            +
            for x in results:
         | 
| 66 | 
            +
            	print(x['generated_text'])
         | 
| 67 | 
            +
            ```
         | 
| 68 | 
            +
             | 
| 69 | 
            +
            Output:
         | 
| 70 | 
            +
            ```text
         | 
| 71 | 
            +
            The meaning of life is one of the most extraordinary stories of the great world, and some of the most brilliant examples of the world of science.
         | 
| 72 | 
            +
            The meaning of life is to develop.
         | 
| 73 | 
            +
            The meaning of life is to the person, or to make it a personal impression of what is the case for the reader.
         | 
| 74 | 
            +
            The meaning of life is no longer the most important concept of the human language.
         | 
| 75 | 
            +
            The meaning of life is the form of a personal or personal character.
         | 
| 76 | 
            +
            The meaning of life is the world's real and our future.
         | 
| 77 | 
            +
            The meaning of life is the true one of the nation's largest historical experiences.
         | 
| 78 | 
            +
            The meaning of life is the basis of the Church's first, the church of the Holy Spirit, and a living faith.
         | 
| 79 | 
            +
            The meaning of life is that the law requires that the truth be lost.
         | 
| 80 | 
            +
            The meaning of life is the best reason for the poor and poor economy.
         | 
| 81 | 
            +
            ```
         | 
| 82 | 
            +
             | 
| 83 | 
            +
            ## Training Procedure
         | 
| 84 | 
            +
             | 
| 85 | 
            +
            The model was trained using the following hyperparameters:
         | 
| 86 | 
            +
            - Learning rate: 5e-05
         | 
| 87 | 
            +
            - Train batch size: 8 
         | 
| 88 | 
            +
            - Eval batch size: 8
         | 
| 89 | 
            +
            - Optimizer: Adam (betas=(0.9,0.999), epsilon=1e-08)
         | 
| 90 | 
            +
            - LR scheduler: Linear
         | 
| 91 | 
            +
            - Number of epochs: 3.0
         | 
| 92 | 
            +
             | 
| 93 | 
            +
            ## Evaluation Results
         | 
| 94 | 
            +
             | 
| 95 | 
            +
            On the evaluation set, the model achieved:
         | 
| 96 | 
            +
            - Loss: 6.2540
         | 
| 97 | 
            +
            - Accuracy: 0.1776
         | 
| 98 | 
            +
             | 
| 99 | 
            +
            ## Ethical Considerations
         | 
| 100 | 
            +
             | 
| 101 | 
            +
            As with any text generation model, users should be aware of potential biases in the training data that may be reflected in the model's outputs. The model should not be used to generate or propagate harmful content.
         | 
| 102 | 
            +
             | 
| 103 | 
            +
            ## Technical Specifications
         | 
| 104 | 
            +
             | 
| 105 | 
            +
            - Library: Transformers 4.45.1
         | 
| 106 | 
            +
            - Framework: PyTorch 2.4.1+cu121
         | 
| 107 | 
            +
            - Datasets: 3.0.1
         | 
| 108 | 
            +
            - Tokenizers: 0.20.0
         | 
    	
        all_results.json
    ADDED
    
    | @@ -0,0 +1,16 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
                "epoch": 3.0,
         | 
| 3 | 
            +
                "eval_accuracy": 0.1775901465932144,
         | 
| 4 | 
            +
                "eval_loss": 6.25395393371582,
         | 
| 5 | 
            +
                "eval_runtime": 26.5184,
         | 
| 6 | 
            +
                "eval_samples": 4553,
         | 
| 7 | 
            +
                "eval_samples_per_second": 171.692,
         | 
| 8 | 
            +
                "eval_steps_per_second": 21.495,
         | 
| 9 | 
            +
                "perplexity": 520.0650675835933,
         | 
| 10 | 
            +
                "total_flos": 5764753863475200.0,
         | 
| 11 | 
            +
                "train_loss": 6.457221655868903,
         | 
| 12 | 
            +
                "train_runtime": 1165.3935,
         | 
| 13 | 
            +
                "train_samples": 40997,
         | 
| 14 | 
            +
                "train_samples_per_second": 105.536,
         | 
| 15 | 
            +
                "train_steps_per_second": 13.193
         | 
| 16 | 
            +
            }
         | 
    	
        config.json
    ADDED
    
    | @@ -0,0 +1,32 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "_name_or_path": "agentlans/pythia-14m-finewebedu-sentences",
         | 
| 3 | 
            +
              "architectures": [
         | 
| 4 | 
            +
                "GPTNeoXForCausalLM"
         | 
| 5 | 
            +
              ],
         | 
| 6 | 
            +
              "attention_bias": true,
         | 
| 7 | 
            +
              "attention_dropout": 0.0,
         | 
| 8 | 
            +
              "bos_token_id": 0,
         | 
| 9 | 
            +
              "classifier_dropout": 0.1,
         | 
| 10 | 
            +
              "eos_token_id": 0,
         | 
| 11 | 
            +
              "hidden_act": "gelu",
         | 
| 12 | 
            +
              "hidden_dropout": 0.0,
         | 
| 13 | 
            +
              "hidden_size": 128,
         | 
| 14 | 
            +
              "initializer_range": 0.02,
         | 
| 15 | 
            +
              "intermediate_size": 512,
         | 
| 16 | 
            +
              "layer_norm_eps": 1e-05,
         | 
| 17 | 
            +
              "max_position_embeddings": 2048,
         | 
| 18 | 
            +
              "model_type": "gpt_neox",
         | 
| 19 | 
            +
              "num_attention_heads": 4,
         | 
| 20 | 
            +
              "num_hidden_layers": 6,
         | 
| 21 | 
            +
              "partial_rotary_factor": 0.25,
         | 
| 22 | 
            +
              "rope_scaling": null,
         | 
| 23 | 
            +
              "rope_theta": 10000,
         | 
| 24 | 
            +
              "rotary_emb_base": 10000,
         | 
| 25 | 
            +
              "rotary_pct": 0.25,
         | 
| 26 | 
            +
              "tie_word_embeddings": false,
         | 
| 27 | 
            +
              "torch_dtype": "float32",
         | 
| 28 | 
            +
              "transformers_version": "4.45.1",
         | 
| 29 | 
            +
              "use_cache": true,
         | 
| 30 | 
            +
              "use_parallel_residual": true,
         | 
| 31 | 
            +
              "vocab_size": 50304
         | 
| 32 | 
            +
            }
         | 
    	
        eval_results.json
    ADDED
    
    | @@ -0,0 +1,10 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
                "epoch": 3.0,
         | 
| 3 | 
            +
                "eval_accuracy": 0.1775901465932144,
         | 
| 4 | 
            +
                "eval_loss": 6.25395393371582,
         | 
| 5 | 
            +
                "eval_runtime": 26.5184,
         | 
| 6 | 
            +
                "eval_samples": 4553,
         | 
| 7 | 
            +
                "eval_samples_per_second": 171.692,
         | 
| 8 | 
            +
                "eval_steps_per_second": 21.495,
         | 
| 9 | 
            +
                "perplexity": 520.0650675835933
         | 
| 10 | 
            +
            }
         | 
    	
        generation_config.json
    ADDED
    
    | @@ -0,0 +1,6 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "_from_model_config": true,
         | 
| 3 | 
            +
              "bos_token_id": 0,
         | 
| 4 | 
            +
              "eos_token_id": 0,
         | 
| 5 | 
            +
              "transformers_version": "4.45.1"
         | 
| 6 | 
            +
            }
         | 
    	
        model.safetensors
    ADDED
    
    | @@ -0,0 +1,3 @@ | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            version https://git-lfs.github.com/spec/v1
         | 
| 2 | 
            +
            oid sha256:d57815e9690ab850851289477ac83e03a76c877aea0f87ed07a16c3f13da5507
         | 
| 3 | 
            +
            size 56279344
         | 
    	
        special_tokens_map.json
    ADDED
    
    | @@ -0,0 +1,23 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "bos_token": {
         | 
| 3 | 
            +
                "content": "<|endoftext|>",
         | 
| 4 | 
            +
                "lstrip": false,
         | 
| 5 | 
            +
                "normalized": false,
         | 
| 6 | 
            +
                "rstrip": false,
         | 
| 7 | 
            +
                "single_word": false
         | 
| 8 | 
            +
              },
         | 
| 9 | 
            +
              "eos_token": {
         | 
| 10 | 
            +
                "content": "<|endoftext|>",
         | 
| 11 | 
            +
                "lstrip": false,
         | 
| 12 | 
            +
                "normalized": false,
         | 
| 13 | 
            +
                "rstrip": false,
         | 
| 14 | 
            +
                "single_word": false
         | 
| 15 | 
            +
              },
         | 
| 16 | 
            +
              "unk_token": {
         | 
| 17 | 
            +
                "content": "<|endoftext|>",
         | 
| 18 | 
            +
                "lstrip": false,
         | 
| 19 | 
            +
                "normalized": false,
         | 
| 20 | 
            +
                "rstrip": false,
         | 
| 21 | 
            +
                "single_word": false
         | 
| 22 | 
            +
              }
         | 
| 23 | 
            +
            }
         | 
    	
        tokenizer.json
    ADDED
    
    | The diff for this file is too large to render. 
		See raw diff | 
|  | 
    	
        tokenizer_config.json
    ADDED
    
    | @@ -0,0 +1,214 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "add_bos_token": false,
         | 
| 3 | 
            +
              "add_eos_token": false,
         | 
| 4 | 
            +
              "add_prefix_space": false,
         | 
| 5 | 
            +
              "added_tokens_decoder": {
         | 
| 6 | 
            +
                "0": {
         | 
| 7 | 
            +
                  "content": "<|endoftext|>",
         | 
| 8 | 
            +
                  "lstrip": false,
         | 
| 9 | 
            +
                  "normalized": false,
         | 
| 10 | 
            +
                  "rstrip": false,
         | 
| 11 | 
            +
                  "single_word": false,
         | 
| 12 | 
            +
                  "special": true
         | 
| 13 | 
            +
                },
         | 
| 14 | 
            +
                "1": {
         | 
| 15 | 
            +
                  "content": "<|padding|>",
         | 
| 16 | 
            +
                  "lstrip": false,
         | 
| 17 | 
            +
                  "normalized": false,
         | 
| 18 | 
            +
                  "rstrip": false,
         | 
| 19 | 
            +
                  "single_word": false,
         | 
| 20 | 
            +
                  "special": true
         | 
| 21 | 
            +
                },
         | 
| 22 | 
            +
                "50254": {
         | 
| 23 | 
            +
                  "content": "                        ",
         | 
| 24 | 
            +
                  "lstrip": false,
         | 
| 25 | 
            +
                  "normalized": true,
         | 
| 26 | 
            +
                  "rstrip": false,
         | 
| 27 | 
            +
                  "single_word": false,
         | 
| 28 | 
            +
                  "special": false
         | 
| 29 | 
            +
                },
         | 
| 30 | 
            +
                "50255": {
         | 
| 31 | 
            +
                  "content": "                       ",
         | 
| 32 | 
            +
                  "lstrip": false,
         | 
| 33 | 
            +
                  "normalized": true,
         | 
| 34 | 
            +
                  "rstrip": false,
         | 
| 35 | 
            +
                  "single_word": false,
         | 
| 36 | 
            +
                  "special": false
         | 
| 37 | 
            +
                },
         | 
| 38 | 
            +
                "50256": {
         | 
| 39 | 
            +
                  "content": "                      ",
         | 
| 40 | 
            +
                  "lstrip": false,
         | 
| 41 | 
            +
                  "normalized": true,
         | 
| 42 | 
            +
                  "rstrip": false,
         | 
| 43 | 
            +
                  "single_word": false,
         | 
| 44 | 
            +
                  "special": false
         | 
| 45 | 
            +
                },
         | 
| 46 | 
            +
                "50257": {
         | 
| 47 | 
            +
                  "content": "                     ",
         | 
| 48 | 
            +
                  "lstrip": false,
         | 
| 49 | 
            +
                  "normalized": true,
         | 
| 50 | 
            +
                  "rstrip": false,
         | 
| 51 | 
            +
                  "single_word": false,
         | 
| 52 | 
            +
                  "special": false
         | 
| 53 | 
            +
                },
         | 
| 54 | 
            +
                "50258": {
         | 
| 55 | 
            +
                  "content": "                    ",
         | 
| 56 | 
            +
                  "lstrip": false,
         | 
| 57 | 
            +
                  "normalized": true,
         | 
| 58 | 
            +
                  "rstrip": false,
         | 
| 59 | 
            +
                  "single_word": false,
         | 
| 60 | 
            +
                  "special": false
         | 
| 61 | 
            +
                },
         | 
| 62 | 
            +
                "50259": {
         | 
| 63 | 
            +
                  "content": "                   ",
         | 
| 64 | 
            +
                  "lstrip": false,
         | 
| 65 | 
            +
                  "normalized": true,
         | 
| 66 | 
            +
                  "rstrip": false,
         | 
| 67 | 
            +
                  "single_word": false,
         | 
| 68 | 
            +
                  "special": false
         | 
| 69 | 
            +
                },
         | 
| 70 | 
            +
                "50260": {
         | 
| 71 | 
            +
                  "content": "                  ",
         | 
| 72 | 
            +
                  "lstrip": false,
         | 
| 73 | 
            +
                  "normalized": true,
         | 
| 74 | 
            +
                  "rstrip": false,
         | 
| 75 | 
            +
                  "single_word": false,
         | 
| 76 | 
            +
                  "special": false
         | 
| 77 | 
            +
                },
         | 
| 78 | 
            +
                "50261": {
         | 
| 79 | 
            +
                  "content": "                 ",
         | 
| 80 | 
            +
                  "lstrip": false,
         | 
| 81 | 
            +
                  "normalized": true,
         | 
| 82 | 
            +
                  "rstrip": false,
         | 
| 83 | 
            +
                  "single_word": false,
         | 
| 84 | 
            +
                  "special": false
         | 
| 85 | 
            +
                },
         | 
| 86 | 
            +
                "50262": {
         | 
| 87 | 
            +
                  "content": "                ",
         | 
| 88 | 
            +
                  "lstrip": false,
         | 
| 89 | 
            +
                  "normalized": true,
         | 
| 90 | 
            +
                  "rstrip": false,
         | 
| 91 | 
            +
                  "single_word": false,
         | 
| 92 | 
            +
                  "special": false
         | 
| 93 | 
            +
                },
         | 
| 94 | 
            +
                "50263": {
         | 
| 95 | 
            +
                  "content": "               ",
         | 
| 96 | 
            +
                  "lstrip": false,
         | 
| 97 | 
            +
                  "normalized": true,
         | 
| 98 | 
            +
                  "rstrip": false,
         | 
| 99 | 
            +
                  "single_word": false,
         | 
| 100 | 
            +
                  "special": false
         | 
| 101 | 
            +
                },
         | 
| 102 | 
            +
                "50264": {
         | 
| 103 | 
            +
                  "content": "              ",
         | 
| 104 | 
            +
                  "lstrip": false,
         | 
| 105 | 
            +
                  "normalized": true,
         | 
| 106 | 
            +
                  "rstrip": false,
         | 
| 107 | 
            +
                  "single_word": false,
         | 
| 108 | 
            +
                  "special": false
         | 
| 109 | 
            +
                },
         | 
| 110 | 
            +
                "50265": {
         | 
| 111 | 
            +
                  "content": "             ",
         | 
| 112 | 
            +
                  "lstrip": false,
         | 
| 113 | 
            +
                  "normalized": true,
         | 
| 114 | 
            +
                  "rstrip": false,
         | 
| 115 | 
            +
                  "single_word": false,
         | 
| 116 | 
            +
                  "special": false
         | 
| 117 | 
            +
                },
         | 
| 118 | 
            +
                "50266": {
         | 
| 119 | 
            +
                  "content": "            ",
         | 
| 120 | 
            +
                  "lstrip": false,
         | 
| 121 | 
            +
                  "normalized": true,
         | 
| 122 | 
            +
                  "rstrip": false,
         | 
| 123 | 
            +
                  "single_word": false,
         | 
| 124 | 
            +
                  "special": false
         | 
| 125 | 
            +
                },
         | 
| 126 | 
            +
                "50267": {
         | 
| 127 | 
            +
                  "content": "           ",
         | 
| 128 | 
            +
                  "lstrip": false,
         | 
| 129 | 
            +
                  "normalized": true,
         | 
| 130 | 
            +
                  "rstrip": false,
         | 
| 131 | 
            +
                  "single_word": false,
         | 
| 132 | 
            +
                  "special": false
         | 
| 133 | 
            +
                },
         | 
| 134 | 
            +
                "50268": {
         | 
| 135 | 
            +
                  "content": "          ",
         | 
| 136 | 
            +
                  "lstrip": false,
         | 
| 137 | 
            +
                  "normalized": true,
         | 
| 138 | 
            +
                  "rstrip": false,
         | 
| 139 | 
            +
                  "single_word": false,
         | 
| 140 | 
            +
                  "special": false
         | 
| 141 | 
            +
                },
         | 
| 142 | 
            +
                "50269": {
         | 
| 143 | 
            +
                  "content": "         ",
         | 
| 144 | 
            +
                  "lstrip": false,
         | 
| 145 | 
            +
                  "normalized": true,
         | 
| 146 | 
            +
                  "rstrip": false,
         | 
| 147 | 
            +
                  "single_word": false,
         | 
| 148 | 
            +
                  "special": false
         | 
| 149 | 
            +
                },
         | 
| 150 | 
            +
                "50270": {
         | 
| 151 | 
            +
                  "content": "        ",
         | 
| 152 | 
            +
                  "lstrip": false,
         | 
| 153 | 
            +
                  "normalized": true,
         | 
| 154 | 
            +
                  "rstrip": false,
         | 
| 155 | 
            +
                  "single_word": false,
         | 
| 156 | 
            +
                  "special": false
         | 
| 157 | 
            +
                },
         | 
| 158 | 
            +
                "50271": {
         | 
| 159 | 
            +
                  "content": "       ",
         | 
| 160 | 
            +
                  "lstrip": false,
         | 
| 161 | 
            +
                  "normalized": true,
         | 
| 162 | 
            +
                  "rstrip": false,
         | 
| 163 | 
            +
                  "single_word": false,
         | 
| 164 | 
            +
                  "special": false
         | 
| 165 | 
            +
                },
         | 
| 166 | 
            +
                "50272": {
         | 
| 167 | 
            +
                  "content": "      ",
         | 
| 168 | 
            +
                  "lstrip": false,
         | 
| 169 | 
            +
                  "normalized": true,
         | 
| 170 | 
            +
                  "rstrip": false,
         | 
| 171 | 
            +
                  "single_word": false,
         | 
| 172 | 
            +
                  "special": false
         | 
| 173 | 
            +
                },
         | 
| 174 | 
            +
                "50273": {
         | 
| 175 | 
            +
                  "content": "     ",
         | 
| 176 | 
            +
                  "lstrip": false,
         | 
| 177 | 
            +
                  "normalized": true,
         | 
| 178 | 
            +
                  "rstrip": false,
         | 
| 179 | 
            +
                  "single_word": false,
         | 
| 180 | 
            +
                  "special": false
         | 
| 181 | 
            +
                },
         | 
| 182 | 
            +
                "50274": {
         | 
| 183 | 
            +
                  "content": "    ",
         | 
| 184 | 
            +
                  "lstrip": false,
         | 
| 185 | 
            +
                  "normalized": true,
         | 
| 186 | 
            +
                  "rstrip": false,
         | 
| 187 | 
            +
                  "single_word": false,
         | 
| 188 | 
            +
                  "special": false
         | 
| 189 | 
            +
                },
         | 
| 190 | 
            +
                "50275": {
         | 
| 191 | 
            +
                  "content": "   ",
         | 
| 192 | 
            +
                  "lstrip": false,
         | 
| 193 | 
            +
                  "normalized": true,
         | 
| 194 | 
            +
                  "rstrip": false,
         | 
| 195 | 
            +
                  "single_word": false,
         | 
| 196 | 
            +
                  "special": false
         | 
| 197 | 
            +
                },
         | 
| 198 | 
            +
                "50276": {
         | 
| 199 | 
            +
                  "content": "  ",
         | 
| 200 | 
            +
                  "lstrip": false,
         | 
| 201 | 
            +
                  "normalized": true,
         | 
| 202 | 
            +
                  "rstrip": false,
         | 
| 203 | 
            +
                  "single_word": false,
         | 
| 204 | 
            +
                  "special": false
         | 
| 205 | 
            +
                }
         | 
| 206 | 
            +
              },
         | 
| 207 | 
            +
              "bos_token": "<|endoftext|>",
         | 
| 208 | 
            +
              "clean_up_tokenization_spaces": true,
         | 
| 209 | 
            +
              "eos_token": "<|endoftext|>",
         | 
| 210 | 
            +
              "model_max_length": 1000000000000000019884624838656,
         | 
| 211 | 
            +
              "pad_token": null,
         | 
| 212 | 
            +
              "tokenizer_class": "GPTNeoXTokenizer",
         | 
| 213 | 
            +
              "unk_token": "<|endoftext|>"
         | 
| 214 | 
            +
            }
         | 
    	
        train_results.json
    ADDED
    
    | @@ -0,0 +1,9 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
                "epoch": 3.0,
         | 
| 3 | 
            +
                "total_flos": 5764753863475200.0,
         | 
| 4 | 
            +
                "train_loss": 6.457221655868903,
         | 
| 5 | 
            +
                "train_runtime": 1165.3935,
         | 
| 6 | 
            +
                "train_samples": 40997,
         | 
| 7 | 
            +
                "train_samples_per_second": 105.536,
         | 
| 8 | 
            +
                "train_steps_per_second": 13.193
         | 
| 9 | 
            +
            }
         | 
    	
        trainer_state.json
    ADDED
    
    | @@ -0,0 +1,252 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            {
         | 
| 2 | 
            +
              "best_metric": null,
         | 
| 3 | 
            +
              "best_model_checkpoint": null,
         | 
| 4 | 
            +
              "epoch": 3.0,
         | 
| 5 | 
            +
              "eval_steps": 500,
         | 
| 6 | 
            +
              "global_step": 15375,
         | 
| 7 | 
            +
              "is_hyper_param_search": false,
         | 
| 8 | 
            +
              "is_local_process_zero": true,
         | 
| 9 | 
            +
              "is_world_process_zero": true,
         | 
| 10 | 
            +
              "log_history": [
         | 
| 11 | 
            +
                {
         | 
| 12 | 
            +
                  "epoch": 0.0975609756097561,
         | 
| 13 | 
            +
                  "grad_norm": 19.442411422729492,
         | 
| 14 | 
            +
                  "learning_rate": 4.8373983739837406e-05,
         | 
| 15 | 
            +
                  "loss": 6.7559,
         | 
| 16 | 
            +
                  "step": 500
         | 
| 17 | 
            +
                },
         | 
| 18 | 
            +
                {
         | 
| 19 | 
            +
                  "epoch": 0.1951219512195122,
         | 
| 20 | 
            +
                  "grad_norm": 22.672739028930664,
         | 
| 21 | 
            +
                  "learning_rate": 4.6747967479674795e-05,
         | 
| 22 | 
            +
                  "loss": 6.6932,
         | 
| 23 | 
            +
                  "step": 1000
         | 
| 24 | 
            +
                },
         | 
| 25 | 
            +
                {
         | 
| 26 | 
            +
                  "epoch": 0.2926829268292683,
         | 
| 27 | 
            +
                  "grad_norm": 21.795516967773438,
         | 
| 28 | 
            +
                  "learning_rate": 4.51219512195122e-05,
         | 
| 29 | 
            +
                  "loss": 6.6652,
         | 
| 30 | 
            +
                  "step": 1500
         | 
| 31 | 
            +
                },
         | 
| 32 | 
            +
                {
         | 
| 33 | 
            +
                  "epoch": 0.3902439024390244,
         | 
| 34 | 
            +
                  "grad_norm": 19.84381866455078,
         | 
| 35 | 
            +
                  "learning_rate": 4.3495934959349595e-05,
         | 
| 36 | 
            +
                  "loss": 6.6335,
         | 
| 37 | 
            +
                  "step": 2000
         | 
| 38 | 
            +
                },
         | 
| 39 | 
            +
                {
         | 
| 40 | 
            +
                  "epoch": 0.4878048780487805,
         | 
| 41 | 
            +
                  "grad_norm": 14.22912883758545,
         | 
| 42 | 
            +
                  "learning_rate": 4.186991869918699e-05,
         | 
| 43 | 
            +
                  "loss": 6.6248,
         | 
| 44 | 
            +
                  "step": 2500
         | 
| 45 | 
            +
                },
         | 
| 46 | 
            +
                {
         | 
| 47 | 
            +
                  "epoch": 0.5853658536585366,
         | 
| 48 | 
            +
                  "grad_norm": 14.391462326049805,
         | 
| 49 | 
            +
                  "learning_rate": 4.0243902439024395e-05,
         | 
| 50 | 
            +
                  "loss": 6.5929,
         | 
| 51 | 
            +
                  "step": 3000
         | 
| 52 | 
            +
                },
         | 
| 53 | 
            +
                {
         | 
| 54 | 
            +
                  "epoch": 0.6829268292682927,
         | 
| 55 | 
            +
                  "grad_norm": 19.81720733642578,
         | 
| 56 | 
            +
                  "learning_rate": 3.861788617886179e-05,
         | 
| 57 | 
            +
                  "loss": 6.5589,
         | 
| 58 | 
            +
                  "step": 3500
         | 
| 59 | 
            +
                },
         | 
| 60 | 
            +
                {
         | 
| 61 | 
            +
                  "epoch": 0.7804878048780488,
         | 
| 62 | 
            +
                  "grad_norm": 15.33761978149414,
         | 
| 63 | 
            +
                  "learning_rate": 3.699186991869919e-05,
         | 
| 64 | 
            +
                  "loss": 6.5327,
         | 
| 65 | 
            +
                  "step": 4000
         | 
| 66 | 
            +
                },
         | 
| 67 | 
            +
                {
         | 
| 68 | 
            +
                  "epoch": 0.8780487804878049,
         | 
| 69 | 
            +
                  "grad_norm": 14.190281867980957,
         | 
| 70 | 
            +
                  "learning_rate": 3.5365853658536584e-05,
         | 
| 71 | 
            +
                  "loss": 6.5175,
         | 
| 72 | 
            +
                  "step": 4500
         | 
| 73 | 
            +
                },
         | 
| 74 | 
            +
                {
         | 
| 75 | 
            +
                  "epoch": 0.975609756097561,
         | 
| 76 | 
            +
                  "grad_norm": 16.57828712463379,
         | 
| 77 | 
            +
                  "learning_rate": 3.373983739837399e-05,
         | 
| 78 | 
            +
                  "loss": 6.5137,
         | 
| 79 | 
            +
                  "step": 5000
         | 
| 80 | 
            +
                },
         | 
| 81 | 
            +
                {
         | 
| 82 | 
            +
                  "epoch": 1.0731707317073171,
         | 
| 83 | 
            +
                  "grad_norm": 16.75761604309082,
         | 
| 84 | 
            +
                  "learning_rate": 3.2113821138211384e-05,
         | 
| 85 | 
            +
                  "loss": 6.495,
         | 
| 86 | 
            +
                  "step": 5500
         | 
| 87 | 
            +
                },
         | 
| 88 | 
            +
                {
         | 
| 89 | 
            +
                  "epoch": 1.170731707317073,
         | 
| 90 | 
            +
                  "grad_norm": 18.840726852416992,
         | 
| 91 | 
            +
                  "learning_rate": 3.048780487804878e-05,
         | 
| 92 | 
            +
                  "loss": 6.4757,
         | 
| 93 | 
            +
                  "step": 6000
         | 
| 94 | 
            +
                },
         | 
| 95 | 
            +
                {
         | 
| 96 | 
            +
                  "epoch": 1.2682926829268293,
         | 
| 97 | 
            +
                  "grad_norm": 17.630483627319336,
         | 
| 98 | 
            +
                  "learning_rate": 2.886178861788618e-05,
         | 
| 99 | 
            +
                  "loss": 6.4633,
         | 
| 100 | 
            +
                  "step": 6500
         | 
| 101 | 
            +
                },
         | 
| 102 | 
            +
                {
         | 
| 103 | 
            +
                  "epoch": 1.3658536585365852,
         | 
| 104 | 
            +
                  "grad_norm": 16.721818923950195,
         | 
| 105 | 
            +
                  "learning_rate": 2.7235772357723577e-05,
         | 
| 106 | 
            +
                  "loss": 6.4462,
         | 
| 107 | 
            +
                  "step": 7000
         | 
| 108 | 
            +
                },
         | 
| 109 | 
            +
                {
         | 
| 110 | 
            +
                  "epoch": 1.4634146341463414,
         | 
| 111 | 
            +
                  "grad_norm": 14.650636672973633,
         | 
| 112 | 
            +
                  "learning_rate": 2.5609756097560977e-05,
         | 
| 113 | 
            +
                  "loss": 6.4404,
         | 
| 114 | 
            +
                  "step": 7500
         | 
| 115 | 
            +
                },
         | 
| 116 | 
            +
                {
         | 
| 117 | 
            +
                  "epoch": 1.5609756097560976,
         | 
| 118 | 
            +
                  "grad_norm": 13.825970649719238,
         | 
| 119 | 
            +
                  "learning_rate": 2.3983739837398377e-05,
         | 
| 120 | 
            +
                  "loss": 6.4326,
         | 
| 121 | 
            +
                  "step": 8000
         | 
| 122 | 
            +
                },
         | 
| 123 | 
            +
                {
         | 
| 124 | 
            +
                  "epoch": 1.6585365853658538,
         | 
| 125 | 
            +
                  "grad_norm": 11.85326862335205,
         | 
| 126 | 
            +
                  "learning_rate": 2.2357723577235773e-05,
         | 
| 127 | 
            +
                  "loss": 6.4239,
         | 
| 128 | 
            +
                  "step": 8500
         | 
| 129 | 
            +
                },
         | 
| 130 | 
            +
                {
         | 
| 131 | 
            +
                  "epoch": 1.7560975609756098,
         | 
| 132 | 
            +
                  "grad_norm": 13.92196273803711,
         | 
| 133 | 
            +
                  "learning_rate": 2.073170731707317e-05,
         | 
| 134 | 
            +
                  "loss": 6.4098,
         | 
| 135 | 
            +
                  "step": 9000
         | 
| 136 | 
            +
                },
         | 
| 137 | 
            +
                {
         | 
| 138 | 
            +
                  "epoch": 1.8536585365853657,
         | 
| 139 | 
            +
                  "grad_norm": 12.077308654785156,
         | 
| 140 | 
            +
                  "learning_rate": 1.9105691056910573e-05,
         | 
| 141 | 
            +
                  "loss": 6.3987,
         | 
| 142 | 
            +
                  "step": 9500
         | 
| 143 | 
            +
                },
         | 
| 144 | 
            +
                {
         | 
| 145 | 
            +
                  "epoch": 1.951219512195122,
         | 
| 146 | 
            +
                  "grad_norm": 12.406614303588867,
         | 
| 147 | 
            +
                  "learning_rate": 1.747967479674797e-05,
         | 
| 148 | 
            +
                  "loss": 6.3957,
         | 
| 149 | 
            +
                  "step": 10000
         | 
| 150 | 
            +
                },
         | 
| 151 | 
            +
                {
         | 
| 152 | 
            +
                  "epoch": 2.048780487804878,
         | 
| 153 | 
            +
                  "grad_norm": 14.001736640930176,
         | 
| 154 | 
            +
                  "learning_rate": 1.5853658536585366e-05,
         | 
| 155 | 
            +
                  "loss": 6.3752,
         | 
| 156 | 
            +
                  "step": 10500
         | 
| 157 | 
            +
                },
         | 
| 158 | 
            +
                {
         | 
| 159 | 
            +
                  "epoch": 2.1463414634146343,
         | 
| 160 | 
            +
                  "grad_norm": 12.691810607910156,
         | 
| 161 | 
            +
                  "learning_rate": 1.4227642276422764e-05,
         | 
| 162 | 
            +
                  "loss": 6.3566,
         | 
| 163 | 
            +
                  "step": 11000
         | 
| 164 | 
            +
                },
         | 
| 165 | 
            +
                {
         | 
| 166 | 
            +
                  "epoch": 2.2439024390243905,
         | 
| 167 | 
            +
                  "grad_norm": 10.062420845031738,
         | 
| 168 | 
            +
                  "learning_rate": 1.2601626016260162e-05,
         | 
| 169 | 
            +
                  "loss": 6.3492,
         | 
| 170 | 
            +
                  "step": 11500
         | 
| 171 | 
            +
                },
         | 
| 172 | 
            +
                {
         | 
| 173 | 
            +
                  "epoch": 2.341463414634146,
         | 
| 174 | 
            +
                  "grad_norm": 11.78906536102295,
         | 
| 175 | 
            +
                  "learning_rate": 1.0975609756097562e-05,
         | 
| 176 | 
            +
                  "loss": 6.3447,
         | 
| 177 | 
            +
                  "step": 12000
         | 
| 178 | 
            +
                },
         | 
| 179 | 
            +
                {
         | 
| 180 | 
            +
                  "epoch": 2.4390243902439024,
         | 
| 181 | 
            +
                  "grad_norm": 13.368131637573242,
         | 
| 182 | 
            +
                  "learning_rate": 9.34959349593496e-06,
         | 
| 183 | 
            +
                  "loss": 6.339,
         | 
| 184 | 
            +
                  "step": 12500
         | 
| 185 | 
            +
                },
         | 
| 186 | 
            +
                {
         | 
| 187 | 
            +
                  "epoch": 2.5365853658536586,
         | 
| 188 | 
            +
                  "grad_norm": 12.125652313232422,
         | 
| 189 | 
            +
                  "learning_rate": 7.723577235772358e-06,
         | 
| 190 | 
            +
                  "loss": 6.3305,
         | 
| 191 | 
            +
                  "step": 13000
         | 
| 192 | 
            +
                },
         | 
| 193 | 
            +
                {
         | 
| 194 | 
            +
                  "epoch": 2.6341463414634148,
         | 
| 195 | 
            +
                  "grad_norm": 13.748695373535156,
         | 
| 196 | 
            +
                  "learning_rate": 6.0975609756097564e-06,
         | 
| 197 | 
            +
                  "loss": 6.3205,
         | 
| 198 | 
            +
                  "step": 13500
         | 
| 199 | 
            +
                },
         | 
| 200 | 
            +
                {
         | 
| 201 | 
            +
                  "epoch": 2.7317073170731705,
         | 
| 202 | 
            +
                  "grad_norm": 13.787367820739746,
         | 
| 203 | 
            +
                  "learning_rate": 4.471544715447155e-06,
         | 
| 204 | 
            +
                  "loss": 6.3196,
         | 
| 205 | 
            +
                  "step": 14000
         | 
| 206 | 
            +
                },
         | 
| 207 | 
            +
                {
         | 
| 208 | 
            +
                  "epoch": 2.8292682926829267,
         | 
| 209 | 
            +
                  "grad_norm": 15.013029098510742,
         | 
| 210 | 
            +
                  "learning_rate": 2.8455284552845528e-06,
         | 
| 211 | 
            +
                  "loss": 6.3116,
         | 
| 212 | 
            +
                  "step": 14500
         | 
| 213 | 
            +
                },
         | 
| 214 | 
            +
                {
         | 
| 215 | 
            +
                  "epoch": 2.926829268292683,
         | 
| 216 | 
            +
                  "grad_norm": 15.244904518127441,
         | 
| 217 | 
            +
                  "learning_rate": 1.2195121951219514e-06,
         | 
| 218 | 
            +
                  "loss": 6.3107,
         | 
| 219 | 
            +
                  "step": 15000
         | 
| 220 | 
            +
                },
         | 
| 221 | 
            +
                {
         | 
| 222 | 
            +
                  "epoch": 3.0,
         | 
| 223 | 
            +
                  "step": 15375,
         | 
| 224 | 
            +
                  "total_flos": 5764753863475200.0,
         | 
| 225 | 
            +
                  "train_loss": 6.457221655868903,
         | 
| 226 | 
            +
                  "train_runtime": 1165.3935,
         | 
| 227 | 
            +
                  "train_samples_per_second": 105.536,
         | 
| 228 | 
            +
                  "train_steps_per_second": 13.193
         | 
| 229 | 
            +
                }
         | 
| 230 | 
            +
              ],
         | 
| 231 | 
            +
              "logging_steps": 500,
         | 
| 232 | 
            +
              "max_steps": 15375,
         | 
| 233 | 
            +
              "num_input_tokens_seen": 0,
         | 
| 234 | 
            +
              "num_train_epochs": 3,
         | 
| 235 | 
            +
              "save_steps": 500,
         | 
| 236 | 
            +
              "stateful_callbacks": {
         | 
| 237 | 
            +
                "TrainerControl": {
         | 
| 238 | 
            +
                  "args": {
         | 
| 239 | 
            +
                    "should_epoch_stop": false,
         | 
| 240 | 
            +
                    "should_evaluate": false,
         | 
| 241 | 
            +
                    "should_log": false,
         | 
| 242 | 
            +
                    "should_save": true,
         | 
| 243 | 
            +
                    "should_training_stop": true
         | 
| 244 | 
            +
                  },
         | 
| 245 | 
            +
                  "attributes": {}
         | 
| 246 | 
            +
                }
         | 
| 247 | 
            +
              },
         | 
| 248 | 
            +
              "total_flos": 5764753863475200.0,
         | 
| 249 | 
            +
              "train_batch_size": 8,
         | 
| 250 | 
            +
              "trial_name": null,
         | 
| 251 | 
            +
              "trial_params": null
         | 
| 252 | 
            +
            }
         |