Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ For detailed documentation, look here: https://github.com/AI4Bharat/indic-bart/
|
|
| 12 |
|
| 13 |
# Pre-training corpus
|
| 14 |
|
| 15 |
-
We used the IndicCorp data spanning 12 languages with 452 million sentences (9 billion tokens). The model was trained using the text-infilling objective used in mBART.
|
| 16 |
|
| 17 |
# Usage:
|
| 18 |
|
|
@@ -78,7 +78,7 @@ print(decoded_output) # I am happy
|
|
| 78 |
|
| 79 |
# Fine-tuning on a downstream task
|
| 80 |
|
| 81 |
-
1. If you wish to fine-tune this model, then you can do so using the toolkit <a href="https://github.com/prajdabre/yanmtt">YANMTT</a> following the instructions <a href="https://github.com/AI4Bharat/indic-bart ">here
|
| 82 |
2. (Untested) Alternatively, you may use the official huggingface scripts for <a href="https://github.com/huggingface/transformers/tree/master/examples/pytorch/translation">translation</a> and <a href="https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization">summarization</a>.
|
| 83 |
|
| 84 |
# Contributors
|
|
|
|
| 12 |
|
| 13 |
# Pre-training corpus
|
| 14 |
|
| 15 |
+
We used the <a href="https://indicnlp.ai4bharat.org/corpora/">IndicCorp</a> data spanning 12 languages with 452 million sentences (9 billion tokens). The model was trained using the text-infilling objective used in mBART.
|
| 16 |
|
| 17 |
# Usage:
|
| 18 |
|
|
|
|
| 78 |
|
| 79 |
# Fine-tuning on a downstream task
|
| 80 |
|
| 81 |
+
1. If you wish to fine-tune this model, then you can do so using the toolkit <a href="https://github.com/prajdabre/yanmtt">YANMTT</a> following the instructions <a href="https://github.com/AI4Bharat/indic-bart ">here</a>.
|
| 82 |
2. (Untested) Alternatively, you may use the official huggingface scripts for <a href="https://github.com/huggingface/transformers/tree/master/examples/pytorch/translation">translation</a> and <a href="https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization">summarization</a>.
|
| 83 |
|
| 84 |
# Contributors
|