Update README.md
Browse files
README.md
CHANGED
@@ -19,14 +19,13 @@ Pre-training data was extracted from a combination of:
|
|
19 |
|
20 |
More information, incl. the training manifest and configuration is available in the [Wav2Vec2-NL repository on Zenodo](http://doi.org/10.5281/zenodo.15550628).
|
21 |
|
22 |
-
Analyses of Dutch phonetic and lexical features encoded in Wav2Vec2-NL hidden states are reported in the paper [What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training](https://arxiv.org/abs/2506.00981) (Interspeech 2025; see full citation [below](#
|
23 |
|
24 |
Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for an explanation of fine-tuning Wav2Vec2 models on HuggingFace.
|
25 |
|
26 |
# Usage
|
27 |
```python
|
28 |
-
from transformers import Wav2Vec2Model
|
29 |
-
from transformers import Wav2Vec2FeatureExtractor
|
30 |
|
31 |
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('amsterdamNLP/Wav2Vec2-NL')
|
32 |
model = Wav2Vec2Model.from_pretrained('amsterdamNLP/Wav2Vec2-NL')
|
@@ -34,7 +33,7 @@ model = Wav2Vec2Model.from_pretrained('amsterdamNLP/Wav2Vec2-NL')
|
|
34 |
|
35 |
# Citation
|
36 |
The _Wav2Vec2-NL_ model was published as part of:
|
37 |
-
de Heer Kloots, M., Mohebbi, H., Pouw, C., Shen, G., Zuidema, W., Bentum, M. (2025). What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training. _Proc. INTERSPEECH 2025_. https://doi.org/10.
|
38 |
|
39 |
BibTex entry:
|
40 |
```bibtex
|
|
|
19 |
|
20 |
More information, incl. the training manifest and configuration is available in the [Wav2Vec2-NL repository on Zenodo](http://doi.org/10.5281/zenodo.15550628).
|
21 |
|
22 |
+
Analyses of Dutch phonetic and lexical features encoded in Wav2Vec2-NL hidden states are reported in the paper [What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training](https://arxiv.org/abs/2506.00981) (Interspeech 2025; see full citation [below](#citation)).
|
23 |
|
24 |
Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for an explanation of fine-tuning Wav2Vec2 models on HuggingFace.
|
25 |
|
26 |
# Usage
|
27 |
```python
|
28 |
+
from transformers import Wav2Vec2FeatureExtractor, Wav2Vec2Model
|
|
|
29 |
|
30 |
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('amsterdamNLP/Wav2Vec2-NL')
|
31 |
model = Wav2Vec2Model.from_pretrained('amsterdamNLP/Wav2Vec2-NL')
|
|
|
33 |
|
34 |
# Citation
|
35 |
The _Wav2Vec2-NL_ model was published as part of:
|
36 |
+
de Heer Kloots, M., Mohebbi, H., Pouw, C., Shen, G., Zuidema, W., Bentum, M. (2025). What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training. _Proc. INTERSPEECH 2025_. https://doi.org/10.48550/arXiv.2506.00981
|
37 |
|
38 |
BibTex entry:
|
39 |
```bibtex
|