Commit 
							
							·
						
						a5f7239
	
1
								Parent(s):
							
							5232dfb
								
Fix README
Browse files
    	
        README.md
    CHANGED
    
    | @@ -3,16 +3,17 @@ language: en | |
| 3 | 
             
            tags:
         | 
| 4 | 
             
            - exbert
         | 
| 5 | 
             
            - multiberts
         | 
| 6 | 
            -
            - multiberts-seed- | 
| 7 | 
             
            license: apache-2.0
         | 
| 8 | 
             
            datasets:
         | 
| 9 | 
             
            - bookcorpus
         | 
| 10 | 
             
            - wikipedia
         | 
| 11 | 
             
            ---
         | 
| 12 | 
            -
            # MultiBERTs Seed  | 
| 13 | 
            -
            Seed  | 
| 14 | 
             
            [this paper](https://arxiv.org/pdf/2106.16163.pdf) and first released in
         | 
| 15 | 
            -
            [this repository](https://github.com/google-research/language/tree/master/language/multiberts). This  | 
|  | |
| 16 | 
             
            between english and English.
         | 
| 17 |  | 
| 18 | 
             
            Disclaimer: The team releasing MultiBERTs did not write a model card for this model so this model card has been written by [gchhablani](https://hf.co/gchhablani).
         | 
| @@ -47,7 +48,7 @@ Here is how to use this model to get the features of a given text in PyTorch: | |
| 47 | 
             
            ```python
         | 
| 48 | 
             
            from transformers import BertTokenizer, BertModel
         | 
| 49 | 
             
            tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
         | 
| 50 | 
            -
            model = BertModel.from_pretrained("multiberts-seed- | 
| 51 | 
             
            text = "Replace me by any text you'd like."
         | 
| 52 | 
             
            encoded_input = tokenizer(text, return_tensors='pt')
         | 
| 53 | 
             
            output = model(**encoded_input)
         | 
| @@ -81,7 +82,7 @@ The details of the masking procedure for each sentence are the following: | |
| 81 | 
             
            - In the 10% remaining cases, the masked tokens are left as is.
         | 
| 82 |  | 
| 83 | 
             
            ### Pretraining
         | 
| 84 | 
            -
            The model was trained on 16 Cloud TPU v2 chips for two million steps with a batch size
         | 
| 85 | 
             
            of 256. The sequence length was set to 512 throughout. The optimizer
         | 
| 86 | 
             
            used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
         | 
| 87 | 
             
            learning rate warmup for 10,000 steps and linear decay of the learning rate after.
         | 
|  | |
| 3 | 
             
            tags:
         | 
| 4 | 
             
            - exbert
         | 
| 5 | 
             
            - multiberts
         | 
| 6 | 
            +
            - multiberts-seed-0
         | 
| 7 | 
             
            license: apache-2.0
         | 
| 8 | 
             
            datasets:
         | 
| 9 | 
             
            - bookcorpus
         | 
| 10 | 
             
            - wikipedia
         | 
| 11 | 
             
            ---
         | 
| 12 | 
            +
            # MultiBERTs Seed 0 Checkpoint 900k (uncased)
         | 
| 13 | 
            +
            Seed 0 intermediate checkpoint 900k MultiBERTs (pretrained BERT) model on English language using a masked language modeling (MLM) objective. It was introduced in
         | 
| 14 | 
             
            [this paper](https://arxiv.org/pdf/2106.16163.pdf) and first released in
         | 
| 15 | 
            +
            [this repository](https://github.com/google-research/language/tree/master/language/multiberts). This is an intermediate checkpoint.
         | 
| 16 | 
            +
            The final checkpoint can be found at [multiberts-seed-0](https://hf.co/multberts-seed-0). This model is uncased: it does not make a difference
         | 
| 17 | 
             
            between english and English.
         | 
| 18 |  | 
| 19 | 
             
            Disclaimer: The team releasing MultiBERTs did not write a model card for this model so this model card has been written by [gchhablani](https://hf.co/gchhablani).
         | 
|  | |
| 48 | 
             
            ```python
         | 
| 49 | 
             
            from transformers import BertTokenizer, BertModel
         | 
| 50 | 
             
            tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
         | 
| 51 | 
            +
            model = BertModel.from_pretrained("multiberts-seed-0-900k")
         | 
| 52 | 
             
            text = "Replace me by any text you'd like."
         | 
| 53 | 
             
            encoded_input = tokenizer(text, return_tensors='pt')
         | 
| 54 | 
             
            output = model(**encoded_input)
         | 
|  | |
| 82 | 
             
            - In the 10% remaining cases, the masked tokens are left as is.
         | 
| 83 |  | 
| 84 | 
             
            ### Pretraining
         | 
| 85 | 
            +
            The full model was trained on 16 Cloud TPU v2 chips for two million steps with a batch size
         | 
| 86 | 
             
            of 256. The sequence length was set to 512 throughout. The optimizer
         | 
| 87 | 
             
            used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
         | 
| 88 | 
             
            learning rate warmup for 10,000 steps and linear decay of the learning rate after.
         | 
