readme file
Browse files
README.md
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
tags:
|
| 4 |
+
- roberta-base
|
| 5 |
+
- roberta-base-epoch_3
|
| 6 |
+
license: mit
|
| 7 |
+
datasets:
|
| 8 |
+
- wikipedia
|
| 9 |
+
- bookcorpus
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# RoBERTa, Intermediate Checkpoint - Epoch 3
|
| 13 |
+
|
| 14 |
+
This model is part of our reimplementation of the [RoBERTa model](https://arxiv.org/abs/1907.11692),
|
| 15 |
+
trained on Wikipedia and the Book Corpus only.
|
| 16 |
+
We train this model for almost 100K steps, corresponding to 83 epochs.
|
| 17 |
+
We provide the 84 checkpoints (including the randomly initialized weights before the training)
|
| 18 |
+
to provide the ability to study the training dynamics of such models, and other possible use-cases.
|
| 19 |
+
|
| 20 |
+
These models were trained in part of a work that studies how simple statistics from data,
|
| 21 |
+
such as co-occurrences affects model predictions, which are described in the paper
|
| 22 |
+
[Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions](https://arxiv.org/abs/2207.14251).
|
| 23 |
+
|
| 24 |
+
This is RoBERTa-base epoch_3.
|
| 25 |
+
|
| 26 |
+
## Model Description
|
| 27 |
+
|
| 28 |
+
This model was captured during a reproduction of
|
| 29 |
+
[RoBERTa-base](https://huggingface.co/roberta-base), for English: it
|
| 30 |
+
is a Transformers model pretrained on a large corpus of English data, using the
|
| 31 |
+
Masked Language Modelling (MLM).
|
| 32 |
+
|
| 33 |
+
The intended uses, limitations, training data and training procedure for the fully trained model are similar
|
| 34 |
+
to [RoBERTa-base](https://huggingface.co/roberta-base). Two major
|
| 35 |
+
differences with the original model:
|
| 36 |
+
|
| 37 |
+
* We trained our model for 100K steps, instead of 500K
|
| 38 |
+
* We only use Wikipedia and the Book Corpus, as corpora which are publicly available.
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
### How to use
|
| 42 |
+
|
| 43 |
+
Using code from
|
| 44 |
+
[RoBERTa-base](https://huggingface.co/roberta-base), here is an example based on
|
| 45 |
+
PyTorch:
|
| 46 |
+
|
| 47 |
+
```
|
| 48 |
+
from transformers import pipeline
|
| 49 |
+
|
| 50 |
+
model = pipeline("fill-mask", model='yanaiela/roberta-base-epoch_83', device=-1, top_k=10)
|
| 51 |
+
model("Hello, I'm the <mask> RoBERTa-base language model")
|
| 52 |
+
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
## Citation info
|
| 56 |
+
|
| 57 |
+
```bibtex
|
| 58 |
+
@article{2207.14251,
|
| 59 |
+
Author = {Yanai Elazar and Nora Kassner and Shauli Ravfogel and Amir Feder and Abhilasha Ravichander and Marius Mosbach and Yonatan Belinkov and Hinrich Schütze and Yoav Goldberg},
|
| 60 |
+
Title = {Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions},
|
| 61 |
+
Year = {2022},
|
| 62 |
+
Eprint = {arXiv:2207.14251},
|
| 63 |
+
}
|
| 64 |
+
```
|