bert-large-uncased fine-tuned on RTE dataset, using torchdistill and Google Colab.
The hyperparameters are the same as those in Hugging Face's example and/or the paper of BERT, and the training configuration (including hyperparameters) is available here.
I submitted prediction files to the GLUE leaderboard, and the overall GLUE score was 80.2.

Yoshitomo Matsubara: "torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP" at EMNLP 2023 Workshop for Natural Language Processing Open Source Software (NLP-OSS)

[Paper] [OpenReview] [Preprint]

@inproceedings{matsubara2023torchdistill,
  title={{torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP}},
  author={Matsubara, Yoshitomo},
  booktitle={Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)},
  publisher={Empirical Methods in Natural Language Processing},
  pages={153--164},
  year={2023}
}

Downloads last month: 13

Safetensors

Model size

0.3B params

Tensor type

I64

F32

Collection including yoshitomo-matsubara/bert-large-uncased-rte

torchdistill: Reproducing fine-tuned BERT models

Collection

GLUE leaderboard: https://gluebenchmark.com/leaderboard/ Code: https://github.com/yoshitomo-matsubara/torchdistill?tab=readme-ov-file#glue • 18 items • Updated Dec 5, 2024

Paper for yoshitomo-matsubara/bert-large-uncased-rte

torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP

Paper • 2310.17644 • Published Oct 26, 2023