Habib-HF/tarbiyah-ai-v1-1 - Fine-tuned Whisper Small for Quranic Recitation

Model Description

This model is a fine-tuned version of OpenAI's whisper-small model, specifically adapted for Automatic Speech Recognition (ASR) of Quranic Arabic recitation. It has been trained to accurately transcribe spoken Quranic verses into text.

The primary goal of this model is to serve as the core AI engine for the [Your App Name] app, which aims to provide real-time feedback and learning tools for Quranic recitation.

Intended Use

This model is intended for:

  • Quranic Recitation Practice: Assisting individuals in practicing their Quran recitation.
  • Transcription: Generating text from spoken Quranic verses.
  • Integration: As a backend API for mobile or web applications focused on Quranic learning and Tajweed.

Limitations and Bias

It's important to understand the limitations of this model for responsible use:

  • Data Specificity: While fine-tuned on Quranic recitation, its performance may vary with different recitation styles or Qira'at not present in the training data.
  • Speaker Characteristics: The model was primarily fine-tuned on adult voices. Its performance on children's voices or highly varied accents (e.g., strong regional Arabic accents not represented in the training data) is expected to be suboptimal. Future iterations will address children's voices specifically.
  • Audio Quality: Performance may degrade significantly with noisy backgrounds, poor microphone quality, or very fast/unclear recitation.
  • No Tajweed Correction (Yet): This version primarily focuses on word-level transcription accuracy (WER). Advanced Tajweed rule detection (like Madd duration, Ghunna quality) is a future development phase built on top of this model.

Training Data

This model was fine-tuned on a subset of the MohamedRashad/Quran-Recitations dataset from the Hugging Face Hub.

  • Training Samples: Approximately [10,000] samples from the train split.
  • Evaluation Samples: Approximately [1,000] samples from the train split (used for validation).
  • Data Characteristics: The dataset consists of various reciters reading Quranic verses.

Training Procedure

The model was fine-tuned using the transformers library's Seq2SeqTrainer in a Google Colab Pro environment.

  • Base Model: openai/whisper-small
  • Training Steps: max_steps=4000
  • Gradient Accumulation: gradient_accumulation_steps=2
  • Mixed Precision: fp16=True
  • Data Loading: Streaming (streaming=True) with custom error handling to skip malformed audio files and long text sequences. dataloader_num_workers=0 was used to prevent pickling errors.
  • Optimizer: AdamW
  • Learning Rate: 1e-5
  • Evaluation Strategy: Evaluated every 500 steps (eval_steps=500).
  • Best Model Saving: The best model checkpoint (based on lowest WER) was loaded and saved at the end of training (load_best_model_at_end=True).

Evaluation Results

The best performance achieved on the evaluation set during training was:

  • Word Error Rate (WER): [40.48%] (achieved at approximately Step [2000])

(Note: While initial, this WER is a strong foundation for a fine-tuned model on new data, indicating significant improvement from the base Whisper's general Arabic WER on recitation. Further training on more data is planned to achieve lower WER for production use.)

Acknowledgements

  • OpenAI: For developing the groundbreaking Whisper model.
  • MohamedRashad: For curating and open-sourcing the Quran-Recitations dataset on Hugging Face.

License

This model is licensed under the MIT License.


Downloads last month
1,781
Safetensors
Model size
242M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support