yanaiela
/

roberta-base-epoch_3

roberta-base-epoch_3

Model card Files Files and versions

yanaiela commited on Jul 29, 2022

Commit

355e6f8

·

1 Parent(s): af2763a

readme file

Files changed (1) hide show

README.md +64 -0

README.md ADDED Viewed

	@@ -0,0 +1,64 @@

+---
+language: en
+tags:
+- roberta-base
+- roberta-base-epoch_3
+license: mit
+datasets:
+- wikipedia
+- bookcorpus
+---
+# RoBERTa, Intermediate Checkpoint - Epoch 3
+This model is part of our reimplementation of the [RoBERTa model](https://arxiv.org/abs/1907.11692),
+trained on Wikipedia and the Book Corpus only.
+We train this model for almost 100K steps, corresponding to 83 epochs.
+We provide the 84 checkpoints (including the randomly initialized weights before the training)
+to provide the ability to study the training dynamics of such models, and other possible use-cases.
+These models were trained in part of a work that studies how simple statistics from data,
+such as co-occurrences affects model predictions, which are described in the paper
+[Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions](https://arxiv.org/abs/2207.14251).
+This is RoBERTa-base epoch_3.
+## Model Description
+This model was captured during a reproduction of
+[RoBERTa-base](https://huggingface.co/roberta-base), for English: it
+is a Transformers model pretrained on a large corpus of English data, using the
+Masked Language Modelling (MLM).
+The intended uses, limitations, training data and training procedure for the fully trained model are similar
+to [RoBERTa-base](https://huggingface.co/roberta-base). Two major
+differences with the original model:
+*   We trained our model for 100K steps, instead of 500K
+*   We only use Wikipedia and the Book Corpus, as corpora which are publicly available.
+### How to use
+Using code from
+[RoBERTa-base](https://huggingface.co/roberta-base), here is an example based on
+PyTorch:
+```
+from transformers import pipeline
+model = pipeline("fill-mask", model='yanaiela/roberta-base-epoch_83', device=-1, top_k=10)
+model("Hello, I'm the <mask> RoBERTa-base language model")
+```
+## Citation info
+```bibtex
+@article{2207.14251,
+Author = {Yanai Elazar and Nora Kassner and Shauli Ravfogel and Amir Feder and Abhilasha Ravichander and Marius Mosbach and Yonatan Belinkov and Hinrich Schütze and Yoav Goldberg},
+Title = {Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions},
+Year = {2022},
+Eprint = {arXiv:2207.14251},
+}
+```