Commit
·
29fb934
1
Parent(s):
5e25e0d
Update README.md
Browse files
README.md
CHANGED
|
@@ -17,7 +17,7 @@ widget:
|
|
| 17 |
pipeline_tag: fill-mask
|
| 18 |
---
|
| 19 |
|
| 20 |
-
# DistilRoBERTa-base-ca
|
| 21 |
|
| 22 |
## Overview
|
| 23 |
- **Architecture:** DistilRoBERTa-base
|
|
@@ -25,6 +25,31 @@ pipeline_tag: fill-mask
|
|
| 25 |
- **Task:** Fill-Mask
|
| 26 |
- **Data:** Crawling
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
## Model description
|
| 30 |
|
|
@@ -77,14 +102,6 @@ At the time of submission, no measures have been taken to estimate the bias embe
|
|
| 77 |
|
| 78 |
## Training
|
| 79 |
|
| 80 |
-
### Training procedure
|
| 81 |
-
|
| 82 |
-
This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
|
| 83 |
-
|
| 84 |
-
It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
|
| 85 |
-
|
| 86 |
-
So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
|
| 87 |
-
|
| 88 |
### Training data
|
| 89 |
|
| 90 |
The training corpus consists of several corpora gathered from web crawling and public corpora, as shown in the table below:
|
|
@@ -106,9 +123,17 @@ The training corpus consists of several corpora gathered from web crawling and p
|
|
| 106 |
| Catalan Open Subtitles | 0.02 |
|
| 107 |
| Tweets | 0.02 |
|
| 108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
## Evaluation
|
| 110 |
|
| 111 |
-
###
|
| 112 |
|
| 113 |
This model has been fine-tuned on the downstream tasks of the [Catalan Language Understanding Evaluation benchmark (CLUB)](https://club.aina.bsc.es/), which includes the following datasets:
|
| 114 |
|
|
@@ -128,7 +153,7 @@ This is how it compares to its teacher when fine-tuned on the aforementioned dow
|
|
| 128 |
|
| 129 |
| Model \ Task |NER (F1)|POS (F1)|STS-ca (Comb.)|TeCla (Acc.)|TEca (Acc.)|CatalanQA (F1/EM)| XQuAD-ca <sup>1</sup> (F1/EM) |
|
| 130 |
| ------------------------|:-------|:-------|:-------------|:-----------|:----------|:----------------|:------------------------------|
|
| 131 |
-
| RoBERTa-base-ca-v2 | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 89.50
|
| 132 |
| DistilRoBERTa-base-ca | 87.88 | 98.83 | 77.26 | 73.20 | 76.00 | 84.07/70.77 | 62.93/45.08 |
|
| 133 |
|
| 134 |
<sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
|
|
@@ -137,7 +162,7 @@ This is how it compares to its teacher when fine-tuned on the aforementioned dow
|
|
| 137 |
|
| 138 |
### Authors
|
| 139 |
|
| 140 |
-
|
| 141 |
|
| 142 |
### Contact information
|
| 143 |
|
|
@@ -145,7 +170,7 @@ For further information, send an email to [[email protected]]([email protected]).
|
|
| 145 |
|
| 146 |
### Copyright
|
| 147 |
|
| 148 |
-
Copyright by the
|
| 149 |
|
| 150 |
### Licensing information
|
| 151 |
|
|
|
|
| 17 |
pipeline_tag: fill-mask
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# DistilRoBERTa-base-ca-v2
|
| 21 |
|
| 22 |
## Overview
|
| 23 |
- **Architecture:** DistilRoBERTa-base
|
|
|
|
| 25 |
- **Task:** Fill-Mask
|
| 26 |
- **Data:** Crawling
|
| 27 |
|
| 28 |
+
## Table of Contents
|
| 29 |
+
<details>
|
| 30 |
+
<summary>Click to expand</summary>
|
| 31 |
+
|
| 32 |
+
- [Model description](#model-description)
|
| 33 |
+
- [Intended uses and limitations](#intended-use)
|
| 34 |
+
- [How to use](#how-to-use)
|
| 35 |
+
- [Limitations and bias](#limitations-and-bias)
|
| 36 |
+
- [Training](#training)
|
| 37 |
+
- [Training data](#training-data)
|
| 38 |
+
- [Training procedure](#training-procedure)
|
| 39 |
+
- [Evaluation](#evaluation)
|
| 40 |
+
- [CLUB benchmark](#club-benchmark)
|
| 41 |
+
- [Evaluation results](#evaluation-results)
|
| 42 |
+
- [Licensing Information](#licensing-information)
|
| 43 |
+
- [Additional information](#additional-information)
|
| 44 |
+
- [Author](#author)
|
| 45 |
+
- [Contact information](#contact-information)
|
| 46 |
+
- [Copyright](#copyright)
|
| 47 |
+
- [Licensing information](#licensing-information)
|
| 48 |
+
- [Funding](#funding)
|
| 49 |
+
- [Citing information](#citing-information)
|
| 50 |
+
- [Disclaimer](#disclaimer)
|
| 51 |
+
|
| 52 |
+
</details>
|
| 53 |
|
| 54 |
## Model description
|
| 55 |
|
|
|
|
| 102 |
|
| 103 |
## Training
|
| 104 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
### Training data
|
| 106 |
|
| 107 |
The training corpus consists of several corpora gathered from web crawling and public corpora, as shown in the table below:
|
|
|
|
| 123 |
| Catalan Open Subtitles | 0.02 |
|
| 124 |
| Tweets | 0.02 |
|
| 125 |
|
| 126 |
+
### Training procedure
|
| 127 |
+
|
| 128 |
+
This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
|
| 129 |
+
|
| 130 |
+
It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
|
| 131 |
+
|
| 132 |
+
So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
|
| 133 |
+
|
| 134 |
## Evaluation
|
| 135 |
|
| 136 |
+
### CLUB benchmark
|
| 137 |
|
| 138 |
This model has been fine-tuned on the downstream tasks of the [Catalan Language Understanding Evaluation benchmark (CLUB)](https://club.aina.bsc.es/), which includes the following datasets:
|
| 139 |
|
|
|
|
| 153 |
|
| 154 |
| Model \ Task |NER (F1)|POS (F1)|STS-ca (Comb.)|TeCla (Acc.)|TEca (Acc.)|CatalanQA (F1/EM)| XQuAD-ca <sup>1</sup> (F1/EM) |
|
| 155 |
| ------------------------|:-------|:-------|:-------------|:-----------|:----------|:----------------|:------------------------------|
|
| 156 |
+
| RoBERTa-base-ca-v2 | **89.29** | **98.96** | **79.07** | **74.26** | **83.14** | **89.50**/**76.63** | **73.64**/**55.42** |
|
| 157 |
| DistilRoBERTa-base-ca | 87.88 | 98.83 | 77.26 | 73.20 | 76.00 | 84.07/70.77 | 62.93/45.08 |
|
| 158 |
|
| 159 |
<sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
|
|
|
|
| 162 |
|
| 163 |
### Authors
|
| 164 |
|
| 165 |
+
Language Technologies Unit at Barcelona Supercomputing Center ([langtech@bsc.es](langtech@bsc.es)).
|
| 166 |
|
| 167 |
### Contact information
|
| 168 |
|
|
|
|
| 170 |
|
| 171 |
### Copyright
|
| 172 |
|
| 173 |
+
Copyright by the Language Technologies Unit at Barcelona Supercomputing Center.
|
| 174 |
|
| 175 |
### Licensing information
|
| 176 |
|