varshamishra commited on
Commit
3eb3e49
Β·
verified Β·
1 Parent(s): 91efe81

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OpenAI Whisper-Base Fine-Tuned Model for Speech-to-Text
2
+
3
+ This repository hosts a fine-tuned version of the OpenAI Whisper-Base model optimized for speech-to-text tasks using the [Mozilla Common Voice 13.0](https://commonvoice.mozilla.org/) dataset. The model is designed to efficiently transcribe speech into text while maintaining high accuracy.
4
+
5
+ ## Model Details
6
+ - **Model Architecture**: OpenAI Whisper-Base
7
+ - **Task**: Speech-to-Text
8
+ - **Dataset**: [Mozilla Common Voice 13.0](https://commonvoice.mozilla.org/)
9
+ - **Quantization**: FP16
10
+ - **Fine-tuning Framework**: Hugging Face Transformers
11
+
12
+ ## πŸš€ Usage
13
+
14
+ ### Installation
15
+ ```bash
16
+ pip install transformers torch
17
+ ```
18
+
19
+ ### Loading the Model
20
+ ```python
21
+ from transformers import WhisperProcessor, WhisperForConditionalGeneration
22
+ import torch
23
+
24
+ device = "cuda" if torch.cuda.is_available() else "cpu"
25
+
26
+ model_name = "AventIQ-AI/whisper-speech-text"
27
+ model = WhisperForConditionalGeneration.from_pretrained(model_name).to(device)
28
+ processor = WhisperProcessor.from_pretrained(model_name)
29
+ ```
30
+
31
+ ### Speech-to-Text Inference
32
+ ```python
33
+ import torchaudio
34
+
35
+ # Load and process audio file
36
+ def transcribe(audio_path):
37
+ waveform, sample_rate = torchaudio.load(audio_path)
38
+ inputs = processor(waveform, sampling_rate=sample_rate, return_tensors="pt").input_features.to(device)
39
+
40
+ # Generate transcription
41
+ with torch.no_grad():
42
+ predicted_ids = model.generate(inputs)
43
+ transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
44
+ return transcription
45
+
46
+ # Example usage
47
+ audio_file = "sample_audio.wav"
48
+ print(transcribe(audio_file))
49
+ ```
50
+
51
+ ## πŸ“Š Evaluation Results
52
+ After fine-tuning the Whisper-Base model for speech-to-text, we evaluated the model's performance on the validation set from the Common Voice 13.0 dataset. The following results were obtained:
53
+
54
+ | Metric | Score | Meaning |
55
+ |------------|--------|------------------------------------------------|
56
+ | **WER** | 8.2% | Word Error Rate: Measures transcription accuracy |
57
+ | **CER** | 4.5% | Character Error Rate: Measures character-level accuracy |
58
+
59
+ ## Fine-Tuning Details
60
+
61
+ ### Dataset
62
+ The Mozilla Common Voice 13.0 dataset, containing diverse multilingual speech samples, was used for fine-tuning the model.
63
+
64
+ ### Training
65
+ - **Number of epochs**: 3
66
+ - **Batch size**: 8
67
+ - **Evaluation strategy**: epochs
68
+
69
+ ### Quantization
70
+ Post-training quantization was applied using PyTorch's built-in quantization framework to reduce the model size and improve inference efficiency.
71
+
72
+ ## πŸ“‚ Repository Structure
73
+ ```bash
74
+ .
75
+ β”œβ”€β”€ model/ # Contains the quantized model files
76
+ β”œβ”€β”€ tokenizer_config/ # Tokenizer configuration and vocabulary files
77
+ β”œβ”€β”€ model.safetensors/ # Quantized Model
78
+ β”œβ”€β”€ README.md # Model documentation
79
+ ```
80
+
81
+ ## ⚠️ Limitations
82
+ - The model may struggle with highly noisy or overlapping speech.
83
+ - Quantization may lead to slight degradation in accuracy compared to full-precision models.
84
+ - Performance may vary across different accents and dialects.
85
+
86
+ ## 🀝 Contributing
87
+ Contributions are welcome! Feel free to open an issue or submit a pull request if you have suggestions or improvements.
88
+