Available Gradio with VAD and diarization πŸš€

#19
by 0livi3r - opened

πŸš€ I've integrated the Voxtral-mini-3b model into a Whisper-WebUI project! Early tests are impressive: the French transcription quality is significantly better than with standard Whisper models.

I also added compatible VAD and diarization, and removed the audio length limitations.

Curious? Check out the branch here:
https://github.com/OlivierAlbertini/Voxtral-WebUI

0livi3r changed discussion title from Gradio to Available Gradio - nice model

You can use this branch https://github.com/OlivierAlbertini/Whisper-WebUI/tree/feature/voxtral
I notice better quality for french transcription

After I run it, it prompts model download error

You can use this branch https://github.com/OlivierAlbertini/Whisper-WebUI/tree/feature/voxtral
I notice better quality for french transcription

After I run it, it prompts model download error

You need to be logged with HF (https://huggingface.co/docs/huggingface_hub/en/guides/cli)
also https://github.com/OlivierAlbertini/Whisper-WebUI/blob/feature/voxtral/VOXTRAL_SETUP.md

0livi3r changed discussion title from Available Gradio - nice model to Available Gradio with VAD and diarization
0livi3r changed discussion title from Available Gradio with VAD and diarization to Available Gradio with VAD and diarization πŸš€

You can use this branch https://github.com/OlivierAlbertini/Whisper-WebUI/tree/feature/voxtral
I notice better quality for french transcription

After I run it, it prompts model download error

You need to be logged with HF (https://huggingface.co/docs/huggingface_hub/en/guides/cli)
also https://github.com/OlivierAlbertini/Whisper-WebUI/blob/feature/voxtral/VOXTRAL_SETUP.md

Thanks for your reply. With your help, I successfully installed and ran it, but there is a problem. When I use the Voxtral-Mini-3B-2507 model to transcribe, the subtitles of the transcription result are divided into one sentence every 30 seconds,I turned on the VAD function. Is there something wrong?
QQ20250721-160035.png

Console log:
C:\Users\tk199\Voxtral-WebUI-feature-voxtral\venv\Lib\site-packages\ctranslate2_init_.py:8: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
Use "voxtral-mini" implementation
Device "cuda" is detected

  • Running on local URL: http://127.0.0.1:7860
  • To create a public link, set share=True in launch().
    Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:03<00:00, 1.71s/it]
    C:\Users\tk199\Voxtral-WebUI-feature-voxtral\modules\whisper\voxtral_whisper_inference.py:357: UserWarning: PySoundFile failed. Trying audioread instead.
    audio_data, sr = librosa.load(audio_path, sr=None)
    C:\Users\tk199\Voxtral-WebUI-feature-voxtral\venv\Lib\site-packages\librosa\core\audio.py:184: FutureWarning: librosa.core.audio.__audioread_load
    Deprecated as of librosa version 0.10.0.
    It will be removed in librosa version 1.0.
    y, sr_native = __audioread_load(path, offset, duration, dtype)
    C:\Users\tk199\Voxtral-WebUI-feature-voxtral\modules\whisper\voxtral_whisper_inference.py:71: UserWarning: PySoundFile failed. Trying audioread instead.
    audio_data, sr = librosa.load(audio_path, sr=None)
    C:\Users\tk199\Voxtral-WebUI-feature-voxtral\venv\Lib\site-packages\librosa\core\audio.py:184: FutureWarning: librosa.core.audio.__audioread_load
    Deprecated as of librosa version 0.10.0.
    It will be removed in librosa version 1.0.
    y, sr_native = __audioread_load(path, offset, duration, dtype)
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.
    The following generation flags are not valid and may be ignored: ['temperature']. Set TRANSFORMERS_VERBOSITY=info for more details.

Repository Not Found for url: https://huggingface.co/api/models/voxtral-mini-3b/revision/main.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication

Sign up or log in to comment