projecte-aina
/

distilroberta-base-ca-v2

@@ -17,7 +17,7 @@ widget:
 pipeline_tag: fill-mask
 ---
-# DistilRoBERTa-base-ca
 ## Overview
 - **Architecture:** DistilRoBERTa-base
@@ -25,6 +25,31 @@ pipeline_tag: fill-mask
 - **Task:** Fill-Mask
 - **Data:** Crawling
 ## Model description
@@ -77,14 +102,6 @@ At the time of submission, no measures have been taken to estimate the bias embe
 ## Training
-### Training procedure
-This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
-It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
-So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
 ### Training data
 The training corpus consists of several corpora gathered from web crawling and public corpora, as shown in the table below:
@@ -106,9 +123,17 @@ The training corpus consists of several corpora gathered from web crawling and p
 | Catalan Open Subtitles   | 0.02       |
 | Tweets                   | 0.02       |
 ## Evaluation
-### Evaluation benchmark
 This model has been fine-tuned on the downstream tasks of the [Catalan Language Understanding Evaluation benchmark (CLUB)](https://club.aina.bsc.es/), which includes the following datasets:
@@ -128,7 +153,7 @@ This is how it compares to its teacher when fine-tuned on the aforementioned dow
 |      Model  \  Task     |NER (F1)|POS (F1)|STS-ca (Comb.)|TeCla (Acc.)|TEca (Acc.)|CatalanQA (F1/EM)| XQuAD-ca <sup>1</sup> (F1/EM) |
 | ------------------------|:-------|:-------|:-------------|:-----------|:----------|:----------------|:------------------------------|
-| RoBERTa-base-ca-v2      | 89.29  | 98.96  | 79.07        | 74.26      | 83.14     | 89.50/76.63     | 73.64/55.42                   |
 | DistilRoBERTa-base-ca   | 87.88  | 98.83  | 77.26        | 73.20      | 76.00     | 84.07/70.77     | 62.93/45.08                   |
 <sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
@@ -137,7 +162,7 @@ This is how it compares to its teacher when fine-tuned on the aforementioned dow
 ### Authors
-The Text Mining Unit (TeMU) from Barcelona Supercomputing Center ([bsc-temu@bsc.es](bsc-temu@bsc.es)).
 ### Contact information
@@ -145,7 +170,7 @@ For further information, send an email to [[email protected]]([email protected]).
 ### Copyright
-Copyright by the Text Mining Unit at Barcelona Supercomputing Center.
 ### Licensing information

 pipeline_tag: fill-mask
 ---
+# DistilRoBERTa-base-ca-v2
 ## Overview
 - **Architecture:** DistilRoBERTa-base
 - **Task:** Fill-Mask
 - **Data:** Crawling
+## Table of Contents
+<details>
+<summary>Click to expand</summary>
+- [Model description](#model-description)
+- [Intended uses and limitations](#intended-use)
+- [How to use](#how-to-use)
+- [Limitations and bias](#limitations-and-bias)
+- [Training](#training)
+  - [Training data](#training-data)
+  - [Training procedure](#training-procedure)
+- [Evaluation](#evaluation)
+   - [CLUB benchmark](#club-benchmark)
+   - [Evaluation results](#evaluation-results)
+- [Licensing Information](#licensing-information)
+- [Additional information](#additional-information)
+  - [Author](#author)
+  - [Contact information](#contact-information)
+  - [Copyright](#copyright)
+  - [Licensing information](#licensing-information)
+  - [Funding](#funding)
+  - [Citing information](#citing-information)
+  - [Disclaimer](#disclaimer)
+</details>
 ## Model description
 ## Training
 ### Training data
 The training corpus consists of several corpora gathered from web crawling and public corpora, as shown in the table below:
 | Catalan Open Subtitles   | 0.02       |
 | Tweets                   | 0.02       |
+### Training procedure
+This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
+It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
+So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
 ## Evaluation
+### CLUB benchmark
 This model has been fine-tuned on the downstream tasks of the [Catalan Language Understanding Evaluation benchmark (CLUB)](https://club.aina.bsc.es/), which includes the following datasets:
 |      Model  \  Task     |NER (F1)|POS (F1)|STS-ca (Comb.)|TeCla (Acc.)|TEca (Acc.)|CatalanQA (F1/EM)| XQuAD-ca <sup>1</sup> (F1/EM) |
 | ------------------------|:-------|:-------|:-------------|:-----------|:----------|:----------------|:------------------------------|
+| RoBERTa-base-ca-v2      | **89.29**  | **98.96**  | **79.07**        | **74.26**      | **83.14**     | **89.50**/**76.63**     | **73.64**/**55.42**                   |
 | DistilRoBERTa-base-ca   | 87.88  | 98.83  | 77.26        | 73.20      | 76.00     | 84.07/70.77     | 62.93/45.08                   |
 <sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
 ### Authors
+Language Technologies Unit at Barcelona Supercomputing Center ([langtech@bsc.es](langtech@bsc.es)).
 ### Contact information
 ### Copyright
+Copyright by the Language Technologies Unit at Barcelona Supercomputing Center.
 ### Licensing information