Just can't run!
copied from your example. This just raise following error:
AttributeError: 'GenerationConfig' object has no attribute 'lang_to_id'
Seeing the same issue.
Great catch - fixed in https://huggingface.co/distil-whisper/distil-medium.en/commit/26f298e3a65ea076cbe4498ff70b84d33a8cca32
this does not solve the problem during finetuning 
@sanchit-gandhi
	
I still get the same error whenever my code wants to enter the eval loop during finetuning 
I am facing the same issue when running the evaluation.
Hi @Owos & @thoool - This seems to work for me, here's a repro: https://github.com/Vaibhavs10/scratchpad/blob/main/distil_whisper_medium_repro.ipynb
Can you try upgrading the version of transformers or please share a reproducible snippet!
I also made a bunch of language detection fixes to the Whisper fine-tuning blog post and Colab - could you try using the latest versions to ensure you receive the bug fixes?
- Blog post: https://huggingface.co/blog/fine-tune-whisper
 - Colab: https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb
 - Script: https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition#single-gpu-whisper-training
 
Let me know if the issue persists!
I just upgraded transformers from 4.38.2 to 4.41.2, however, the error persists.
My setup is somewhat different because I have been trying to fine-tune a German version of Distil-Whisper, like so:
accelerate launch run_distillation.py   
--model_name_or_path "./distil-large-v3-init"   
--teacher_model_name_or_path "openai/whisper-large-v3"   
--train_dataset_name "mozilla-foundation/common_voice_17_0"   
--train_dataset_config_name "de"  
--train_split_name "train"   
--text_column_name "sentence"   
--eval_dataset_name "mozilla-foundation/common_voice_17_0"   
--eval_dataset_config_name "de"  
--eval_split_name "validation"   
--eval_text_column_name "sentence"   
--eval_steps 1_000   
--save_steps 1_000   
--warmup_steps 100   
--learning_rate 0.0001   
--lr_scheduler_type "constant_with_warmup"   
--timestamp_probability 0.2   
--condition_on_prev_probability 0.2   
--language "de"   
--task "transcribe"   
--logging_steps 25   
--save_total_limit 3   
--max_steps 100_000   
--wer_threshold 20   
--per_device_train_batch_size 32   
--per_device_eval_batch_size 32   
--dataloader_num_workers 2   
--preprocessing_num_workers 2   
--ddp_timeout 7200   
--dtype "bfloat16"   
--attn_implementation "sdpa"   
--output_dir "./"   
--do_train   
--do_eval   
--gradient_checkpointing   
--overwrite_output_dir   
--predict_with_generate   
--freeze_encoder   
--freeze_embed_positions 
--use_pseudo_labels=False
For the evaluation, I am now inside my checkpoint folder when running the following command:
python run_eval.py   
--model_name_or_path "./"   
--dataset_name "mozilla-foundation/common_voice_17_0"   
--dataset_config_name "de"   
--dataset_split_name "test"   
--text_column_name "sentence"   
--batch_size 16   
--dtype "bfloat16"   
--generation_max_length 256   
--language "de"   
--attn_implementation "sdpa"   
--streaming
Sure
8 Traceback (most recent call last):
 9   File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 825, in <module>
10     main()
11   File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 572, in main
12     language = language_to_id(data_args.language, model.generation_config) if data_args.language else None
13   File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 378, in language_to_id
14     if language in generation_config.lang_to_id.keys():
15 AttributeError: 'GenerationConfig' object has no attribute 'lang_to_id'
Are you passing the language argument to run_eval.py when evaluating an English only checkpoint? Note that the language argument should only be passed for multilingual checkpoints. I've opened a PR to throw a better warning here: https://github.com/huggingface/distil-whisper/pull/139
Otherwise, you're likely using a model with an outdated generation config for distillation! Could you update the generation config to match that of the original pre-trained model?
from transformers import GenerationConfig, AutoConfig
# fill me with the hub model id of the checkpoint you're distilling
MODEL_NAME = "sanchit-gandhi/whisper-small-hi"
vocab_size = AutoConfig.from_pretrained(MODEL_NAME).vocab_size
if vocab_size == 51864:
    original_model = "openai/whisper-tiny.en"    
elif vocab_size == 51865:
    original_model = "openai/whisper-tiny"
else:
    original_model = "openai/whisper-large-v3"
# load updated generation config
generation_config = GenerationConfig.from_pretrained(original_model)
# push updated generation config to the Hub
generation_config.push_to_hub(MODEL_NAME)
I am not quite sure if I understand this correctly.
The model that I used as a teacher model is --teacher_model_name_or_path "openai/whisper-large-v3", and I set --language "de" while using --train_dataset_name "mozilla-foundation/common_voice_17_0". So I end up with a German distilled version of whisper-large-v3 which l is stored locally.
When executing the run_eval.py file, I indeed pass --language "de" just like I did during training. Do you mean I don't have to set language as I now have a German version and no longer a multilingual version of Whisper?
FWIW:
python run_eval.py   
 --model_name_or_path "./"  
 --dataset_name "mozilla-foundation/common_voice_17_0" 
 --dataset_config_name "de"  
 --dataset_split_name "test"  
 --text_column_name "sentence"  
 --batch_size 16  
 --dtype "bfloat16"  
 --generation_max_length 256  
 --attn_implementation "sdpa"   
 --streaming
 --return_timestamps False
seems to be circumventing the problem. That being said, I now face this error:
Start benchmarking common_voice_17_0/test...                                                                                                                                                                                                                                                                                                                                                                        
Reading metadata...: 16183it [00:00, 41952.06it/s]                                                                                                                                                                                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:537: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
  warnings.warn(...: 1it [00:00,  3.35it/s]
Samples: 16183it [13:45, 19.60it/s]
Datasets:   0%|                                                                                                                                                                                                                                                                                                                                                                               | 0/1 [13:45<?, ?it/s]
Traceback (most recent call last):
  File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 825, in <module>
    main()
  File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 763, in main
    norm_transcriptions = [normalizer(pred) for pred in transcriptions]
  File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 763, in <listcomp>
    norm_transcriptions = [normalizer(pred) for pred in transcriptions]
  File "/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/models/whisper/english_normalizer.py", line 587, in __call__
    s = self.standardize_spellings(s)
  File "/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/models/whisper/english_normalizer.py", line 507, in __call__
    return " ".join(self.mapping.get(word, word) for word in s.split())
  File "/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/models/whisper/english_normalizer.py", line 507, in <genexpr>
    return " ".join(self.mapping.get(word, word) for word in s.split())
AttributeError: 'NoneType' object has no attribute 'get'
Ah I see what's happening! The checkpoint you're evaluating is an intermediate checkpoint (i.e. one saved partway during training with accelerator.save_state). This saves the model weights to checkpoint-35000-epoch-1, but not the config, tokenizer, feature extractor or generation config.
To remedy this, could you copy the corresponding files into this checkpoint dir?
from transformers import GenerationConfig , WhisperConfig, WhisperProcessor
BASE_DIR = "/home/operation/whisper_finetune/distil-whisper/training/"
CHECKPOINT = "checkpoint-35000-epoch-1"
config = WhisperConfig.from_pretrained(BASE_DIR)
processor = WhisperProcessor.from_pretrained(BASE_DIR)
generation_config = GenerationConfig.from_pretrained(BASE_DIR)
config.save_pretrained(BASE_DIR + CHECKPOINT)
processor.save_pretrained(BASE_DIR + CHECKPOINT)
generation_config.save_pretrained(BASE_DIR + CHECKPOINT)
You should then be able to run evaluation using the scripts you shared above
What do you think about updating the distillation script to save the config/processor/generation config during intermediate saves @eustlb ? Would be useful for evaluating intermediate checkpoints.