Commit 
							
							·
						
						62dd410
	
1
								Parent(s):
							
							3228b61
								
Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -1,181 +0,0 @@ | |
| 1 | 
            -
            ---
         | 
| 2 | 
            -
            language:
         | 
| 3 | 
            -
            - sv-SE
         | 
| 4 | 
            -
            license: apache-2.0
         | 
| 5 | 
            -
            tags:
         | 
| 6 | 
            -
            - automatic-speech-recognition
         | 
| 7 | 
            -
            - mozilla-foundation/common_voice_7_0
         | 
| 8 | 
            -
            - generated_from_trainer
         | 
| 9 | 
            -
            - sv
         | 
| 10 | 
            -
            - robust-speech-event
         | 
| 11 | 
            -
            - model_for_talk
         | 
| 12 | 
            -
            datasets:
         | 
| 13 | 
            -
            - mozilla-foundation/common_voice_7_0
         | 
| 14 | 
            -
            model-index:
         | 
| 15 | 
            -
            - name: XLS-R-300M - Swedish
         | 
| 16 | 
            -
              results:
         | 
| 17 | 
            -
              - task: 
         | 
| 18 | 
            -
                  name: Automatic Speech Recognition 
         | 
| 19 | 
            -
                  type: automatic-speech-recognition
         | 
| 20 | 
            -
                dataset:
         | 
| 21 | 
            -
                  name: Common Voice 7
         | 
| 22 | 
            -
                  type: mozilla-foundation/common_voice_7_0
         | 
| 23 | 
            -
                  args: sv-SE
         | 
| 24 | 
            -
                metrics:
         | 
| 25 | 
            -
                   - name: Test WER
         | 
| 26 | 
            -
                     type: wer
         | 
| 27 | 
            -
                     value: 18.85
         | 
| 28 | 
            -
                   - name: Test CER
         | 
| 29 | 
            -
                     type: cer
         | 
| 30 | 
            -
                     value: 6.6
         | 
| 31 | 
            -
              - task: 
         | 
| 32 | 
            -
                  name: Automatic Speech Recognition
         | 
| 33 | 
            -
                  type: automatic-speech-recognition
         | 
| 34 | 
            -
                dataset:
         | 
| 35 | 
            -
                  name: Robust Speech Event - Dev Data
         | 
| 36 | 
            -
                  type: speech-recognition-community-v2/dev_data
         | 
| 37 | 
            -
                  args: sv
         | 
| 38 | 
            -
                metrics:
         | 
| 39 | 
            -
                   - name: Test WER
         | 
| 40 | 
            -
                     type: wer
         | 
| 41 | 
            -
                     value: 27.01
         | 
| 42 | 
            -
                   - name: Test CER
         | 
| 43 | 
            -
                     type: cer
         | 
| 44 | 
            -
                     value: 13.14
         | 
| 45 | 
            -
            ---
         | 
| 46 | 
            -
             | 
| 47 | 
            -
            <!-- This model card has been generated automatically according to the information the Trainer had access to. You
         | 
| 48 | 
            -
            should probably proofread and complete it, then remove this comment. -->
         | 
| 49 | 
            -
             | 
| 50 | 
            -
            # XLS-R-300m-SV
         | 
| 51 | 
            -
             | 
| 52 | 
            -
            This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 - SV-SE dataset.
         | 
| 53 | 
            -
            It achieves the following results on the evaluation set:
         | 
| 54 | 
            -
            - Loss: 0.3171
         | 
| 55 | 
            -
            - Wer: 0.2730
         | 
| 56 | 
            -
             | 
| 57 | 
            -
            ## Model description
         | 
| 58 | 
            -
             | 
| 59 | 
            -
            More information needed
         | 
| 60 | 
            -
             | 
| 61 | 
            -
            ## Intended uses & limitations
         | 
| 62 | 
            -
             | 
| 63 | 
            -
            More information needed
         | 
| 64 | 
            -
             | 
| 65 | 
            -
            ## Training and evaluation data
         | 
| 66 | 
            -
             | 
| 67 | 
            -
            More information needed
         | 
| 68 | 
            -
             | 
| 69 | 
            -
            ## Training procedure
         | 
| 70 | 
            -
             | 
| 71 | 
            -
            ### Training hyperparameters
         | 
| 72 | 
            -
             | 
| 73 | 
            -
            The following hyperparameters were used during training:
         | 
| 74 | 
            -
            - learning_rate: 7.5e-05
         | 
| 75 | 
            -
            - train_batch_size: 8
         | 
| 76 | 
            -
            - eval_batch_size: 8
         | 
| 77 | 
            -
            - seed: 42
         | 
| 78 | 
            -
            - gradient_accumulation_steps: 4
         | 
| 79 | 
            -
            - total_train_batch_size: 32
         | 
| 80 | 
            -
            - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
         | 
| 81 | 
            -
            - lr_scheduler_type: linear
         | 
| 82 | 
            -
            - lr_scheduler_warmup_steps: 2000
         | 
| 83 | 
            -
            - num_epochs: 50.0
         | 
| 84 | 
            -
            - mixed_precision_training: Native AMP
         | 
| 85 | 
            -
             | 
| 86 | 
            -
            ### Training results
         | 
| 87 | 
            -
             | 
| 88 | 
            -
            | Training Loss | Epoch | Step  | Validation Loss | Wer    |
         | 
| 89 | 
            -
            |:-------------:|:-----:|:-----:|:---------------:|:------:|
         | 
| 90 | 
            -
            | 3.3349        | 1.45  | 500   | 3.2858          | 1.0    |
         | 
| 91 | 
            -
            | 2.9298        | 2.91  | 1000  | 2.9225          | 1.0000 |
         | 
| 92 | 
            -
            | 2.0839        | 4.36  | 1500  | 1.1546          | 0.8295 |
         | 
| 93 | 
            -
            | 1.7093        | 5.81  | 2000  | 0.6827          | 0.5701 |
         | 
| 94 | 
            -
            | 1.5855        | 7.27  | 2500  | 0.5597          | 0.4947 |
         | 
| 95 | 
            -
            | 1.4831        | 8.72  | 3000  | 0.4923          | 0.4527 |
         | 
| 96 | 
            -
            | 1.4416        | 10.17 | 3500  | 0.4670          | 0.4270 |
         | 
| 97 | 
            -
            | 1.3848        | 11.63 | 4000  | 0.4341          | 0.3980 |
         | 
| 98 | 
            -
            | 1.3749        | 13.08 | 4500  | 0.4203          | 0.4011 |
         | 
| 99 | 
            -
            | 1.3311        | 14.53 | 5000  | 0.4310          | 0.3961 |
         | 
| 100 | 
            -
            | 1.317         | 15.99 | 5500  | 0.3898          | 0.4322 |
         | 
| 101 | 
            -
            | 1.2799        | 17.44 | 6000  | 0.3806          | 0.3572 |
         | 
| 102 | 
            -
            | 1.2771        | 18.89 | 6500  | 0.3828          | 0.3427 |
         | 
| 103 | 
            -
            | 1.2451        | 20.35 | 7000  | 0.3702          | 0.3359 |
         | 
| 104 | 
            -
            | 1.2182        | 21.8  | 7500  | 0.3685          | 0.3270 |
         | 
| 105 | 
            -
            | 1.2152        | 23.26 | 8000  | 0.3650          | 0.3308 |
         | 
| 106 | 
            -
            | 1.1837        | 24.71 | 8500  | 0.3568          | 0.3187 |
         | 
| 107 | 
            -
            | 1.1721        | 26.16 | 9000  | 0.3659          | 0.3249 |
         | 
| 108 | 
            -
            | 1.1764        | 27.61 | 9500  | 0.3547          | 0.3145 |
         | 
| 109 | 
            -
            | 1.1606        | 29.07 | 10000 | 0.3514          | 0.3104 |
         | 
| 110 | 
            -
            | 1.1431        | 30.52 | 10500 | 0.3469          | 0.3062 |
         | 
| 111 | 
            -
            | 1.1047        | 31.97 | 11000 | 0.3313          | 0.2979 |
         | 
| 112 | 
            -
            | 1.1315        | 33.43 | 11500 | 0.3298          | 0.2992 |
         | 
| 113 | 
            -
            | 1.1022        | 34.88 | 12000 | 0.3296          | 0.2973 |
         | 
| 114 | 
            -
            | 1.0935        | 36.34 | 12500 | 0.3278          | 0.2926 |
         | 
| 115 | 
            -
            | 1.0676        | 37.79 | 13000 | 0.3208          | 0.2868 |
         | 
| 116 | 
            -
            | 1.0571        | 39.24 | 13500 | 0.3322          | 0.2885 |
         | 
| 117 | 
            -
            | 1.0536        | 40.7  | 14000 | 0.3245          | 0.2831 |
         | 
| 118 | 
            -
            | 1.0525        | 42.15 | 14500 | 0.3285          | 0.2826 |
         | 
| 119 | 
            -
            | 1.0464        | 43.6  | 15000 | 0.3223          | 0.2796 |
         | 
| 120 | 
            -
            | 1.0415        | 45.06 | 15500 | 0.3166          | 0.2774 |
         | 
| 121 | 
            -
            | 1.0356        | 46.51 | 16000 | 0.3177          | 0.2746 |
         | 
| 122 | 
            -
            | 1.04          | 47.96 | 16500 | 0.3150          | 0.2735 |
         | 
| 123 | 
            -
            | 1.0209        | 49.42 | 17000 | 0.3175          | 0.2731 |
         | 
| 124 | 
            -
             | 
| 125 | 
            -
             | 
| 126 | 
            -
            ### Framework versions
         | 
| 127 | 
            -
             | 
| 128 | 
            -
            - Transformers 4.16.0.dev0
         | 
| 129 | 
            -
            - Pytorch 1.10.0+cu102
         | 
| 130 | 
            -
            - Datasets 1.17.1.dev0
         | 
| 131 | 
            -
            - Tokenizers 0.10.3
         | 
| 132 | 
            -
             | 
| 133 | 
            -
            #### Evaluation Commands
         | 
| 134 | 
            -
             | 
| 135 | 
            -
            1. To evaluate on `mozilla-foundation/common_voice_7_0` with split `test`
         | 
| 136 | 
            -
             | 
| 137 | 
            -
            ```bash
         | 
| 138 | 
            -
            python eval.py --model_id hf-test/xls-r-300m-sv --dataset mozilla-foundation/common_voice_7_0 --config sv-SE --split test
         | 
| 139 | 
            -
            ```
         | 
| 140 | 
            -
             | 
| 141 | 
            -
            2. To evaluate on `speech-recognition-community-v2/dev_data`
         | 
| 142 | 
            -
             | 
| 143 | 
            -
            ```bash
         | 
| 144 | 
            -
            python eval.py --model_id hf-test/xls-r-300m-sv --dataset speech-recognition-community-v2/dev_data --config sv --split validation --chunk_length_s 5.0 --stride_length_s 1.0
         | 
| 145 | 
            -
            ```
         | 
| 146 | 
            -
             | 
| 147 | 
            -
            ### Inference With LM
         | 
| 148 | 
            -
             | 
| 149 | 
            -
            ```python
         | 
| 150 | 
            -
            import torch
         | 
| 151 | 
            -
            from datasets import load_dataset
         | 
| 152 | 
            -
            from transformers import AutoModelForCTC, AutoProcessor
         | 
| 153 | 
            -
            import torchaudio.functional as F
         | 
| 154 | 
            -
             | 
| 155 | 
            -
             | 
| 156 | 
            -
            model_id = "hf-test/xls-r-300m-sv"
         | 
| 157 | 
            -
             | 
| 158 | 
            -
            sample_iter = iter(load_dataset("mozilla-foundation/common_voice_7_0", "sv-SE", split="test", streaming=True, use_auth_token=True))
         | 
| 159 | 
            -
             | 
| 160 | 
            -
            sample = next(sample_iter)
         | 
| 161 | 
            -
            resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
         | 
| 162 | 
            -
             | 
| 163 | 
            -
            model = AutoModelForCTC.from_pretrained(model_id)
         | 
| 164 | 
            -
            processor = AutoProcessor.from_pretrained(model_id)
         | 
| 165 | 
            -
             | 
| 166 | 
            -
            input_values = processor(resampled_audio, return_tensors="pt").input_values
         | 
| 167 | 
            -
             | 
| 168 | 
            -
            with torch.no_grad():
         | 
| 169 | 
            -
                logits = model(input_values).logits
         | 
| 170 | 
            -
             | 
| 171 | 
            -
            transcription = processor.batch_decode(logits.numpy()).text
         | 
| 172 | 
            -
            # => "jag lämnade grovjobbet åt honom"
         | 
| 173 | 
            -
            ```
         | 
| 174 | 
            -
             | 
| 175 | 
            -
            ### Eval results on Common Voice 7 "test" (WER):
         | 
| 176 | 
            -
             | 
| 177 | 
            -
            | Without LM | With LM (run `./eval.py`) |
         | 
| 178 | 
            -
            |---|---|
         | 
| 179 | 
            -
            | 27.30 | 18.85 |
         | 
| 180 | 
            -
             | 
| 181 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
