amsterdamNLP
/

Wav2Vec2-NL

self-supervised

Model card Files Files and versions

mariannedhk commited on Aug 27

Commit

7f8f4d7

·

verified ·

1 Parent(s): 0e615e8

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -19,14 +19,13 @@ Pre-training data was extracted from a combination of:
 More information, incl. the training manifest and configuration is available in the [Wav2Vec2-NL repository on Zenodo](http://doi.org/10.5281/zenodo.15550628).
-Analyses of Dutch phonetic and lexical features encoded in Wav2Vec2-NL hidden states are reported in the paper [What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training](https://arxiv.org/abs/2506.00981) (Interspeech 2025; see full citation [below](#Citation)).
 Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for an explanation of fine-tuning Wav2Vec2 models on HuggingFace.
 # Usage
 ```python
-from transformers import Wav2Vec2Model
-from transformers import Wav2Vec2FeatureExtractor
 feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('amsterdamNLP/Wav2Vec2-NL')
 model = Wav2Vec2Model.from_pretrained('amsterdamNLP/Wav2Vec2-NL')
@@ -34,7 +33,7 @@ model = Wav2Vec2Model.from_pretrained('amsterdamNLP/Wav2Vec2-NL')
 # Citation
 The _Wav2Vec2-NL_ model was published as part of:
-de Heer Kloots, M., Mohebbi, H., Pouw, C., Shen, G., Zuidema, W., Bentum, M. (2025). What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training. _Proc. INTERSPEECH 2025_. https://doi.org/10.21437/Interspeech.2025-1526
 BibTex entry:
 ```bibtex

 More information, incl. the training manifest and configuration is available in the [Wav2Vec2-NL repository on Zenodo](http://doi.org/10.5281/zenodo.15550628).
+Analyses of Dutch phonetic and lexical features encoded in Wav2Vec2-NL hidden states are reported in the paper [What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training](https://arxiv.org/abs/2506.00981) (Interspeech 2025; see full citation [below](#citation)).
 Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for an explanation of fine-tuning Wav2Vec2 models on HuggingFace.
 # Usage
 ```python
+from transformers import Wav2Vec2FeatureExtractor, Wav2Vec2Model
 feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('amsterdamNLP/Wav2Vec2-NL')
 model = Wav2Vec2Model.from_pretrained('amsterdamNLP/Wav2Vec2-NL')
 # Citation
 The _Wav2Vec2-NL_ model was published as part of:
+de Heer Kloots, M., Mohebbi, H., Pouw, C., Shen, G., Zuidema, W., Bentum, M. (2025). What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training. _Proc. INTERSPEECH 2025_. https://doi.org/10.48550/arXiv.2506.00981
 BibTex entry:
 ```bibtex