Whisper-WebUI Premium - Ultra Fast and High Accuracy Speech to Text Transcripton App for All Languages - Windows, RunPod, Massed Compute 1-Click Installers - Supporting RTX 1000 to 5000 series

#222

by MonsterMMORPG - opened 3 days ago

Discussion

MonsterMMORPG

3 days ago

•

edited 2 days ago

145395299

Whisper-WebUI Premium - Ultra Fast and High Accuracy Speech to Text Transcripton App for All Languages - Windows, RunPod, Massed Compute 1-Click Installers - Supporting RTX 1000 to 5000 series

Download Installers and App

https://www.patreon.com/posts/145395299

Features

It has better interface, more features, default settings set for maximum accuracy
It will show transcription realtime both on Gradio interface and also on CMD
It will show better status and output at the cmd like starting time, starting file, etc
It will save every generated transcription properly with same name as input file name with proper name sanitization
After deep scan of the entire pipeline, default parameters are set for maximum accuracy and quality
Supports both audio and video upload to transcribe ultra fast
1-Click installers for Windows local PC, RunPod (Linux-Cloud) and Massed Compute (Linux-Cloud)
The app the installers are made for RTX 1000 series to RTX 5000 series with pre-compiled libraries
We install with Torch 2.8, CUDA 12.9, latest Flash Attention, Sage Attention, xFormers - all precompiled
As low as 6 GB VRAM GPUs can use
OpenAI Whisper Supported Models (auto downloaded into models sub folder):
- tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, large-v3-turbo, turbo
Distil-Whisper Supported Models (Faster-Whisper & Insanely-Fast-Whisper - (auto downloaded into models sub folder)):
- distil-large-v2, distil-large-v3, distil-medium.en, distil-small.en
Supported transcription output formats
- SRT (SubRip) - .srt, VTT/WebVTT (Web Video Text Tracks) - .vtt, TXT (Plain Text) - .txt
- LRC (Lyrics File) - .lrc, JSON - .json, TSV (Tab-Separated Values) - .tsv
Batch folder processing and multiple output formats at once
Supported languages for Audio to Text transcription is as below
- Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Cantonese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Faroese, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latin, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Myanmar, Nepali, Norwegian, Nynorsk, Occitan, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Turkish, Turkmen, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba

Screenshots

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment