YosefA
/

wave2vec2_amharic_stt

@@ -1,71 +1,39 @@
----
-language: "dar"
-thumbnail:
-pipeline_tag: automatic-speech-recognition
-tags:
-- CTC
-- pytorch
-- speechbrain
-- Transformer
-license: "apache-2.0"
-datasets:
-- Dvoice
-metrics:
-- wer
-- cer
 ---
-<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
-<br/><br/>
-# Pipeline description
-This ASR system is composed of 2 different but linked blocks:
-- Tokenizer (unigram) that transforms words into subword units and is trained with the train transcriptions.
-- Acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)) is combined with two DNN layers and finetuned on the Darija dataset.
-The obtained final acoustic representation is given to the CTC greedy decoder.
-The system is trained with recordings sampled at 16kHz (single channel).
-The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
-# Install SpeechBrain
-First of all, please install transformers and SpeechBrain with the following command:
-```
-pip install speechbrain transformers
-```
-Please notice that we encourage you to read the SpeechBrain tutorials and learn more about
-[SpeechBrain](https://speechbrain.github.io).
-# Transcribing your own audio files (in Amharic)
-```python
-from speechbrain.inference.ASR import EncoderASR
-asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-dvoice-amharic", savedir="pretrained_models/asr-wav2vec2-dvoice-amharic")
-asr_model.transcribe_file('speechbrain/asr-wav2vec2-dvoice-amharic/example_amharic.wav')
-```
-# Inference on GPU
-To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
-# Training
-The model was trained with SpeechBrain.
-To train it from scratch follow these steps:
-1. Clone SpeechBrain:
-```bash
-git clone https://github.com/speechbrain/speechbrain/
-```
-2. Install it:
-```bash
-cd speechbrain
-pip install -r requirements.txt
-pip install -e .
-```
-3. Run Training:
-```bash
-cd recipes/DVoice/ASR/CTC
-python train_with_wav2vec2.py hparams/train_amh_with_wav2vec.yaml --data_folder=/localscratch/ALFFA_PUBLIC/ASR/AMHARIC/data/
-```
-You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1vNT7RjRuELs7pumBHmfYsrOp9m46D0ym?usp=sharing).
-# Limitations
-The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.

+# Amharic Speech-to-Text Transcription
+## Group 10
+### Project Description
+This project focuses on developing a system for transcribing Amharic speech into text. The system aims to provide accurate and efficient transcription capabilities for the Amharic language, leveraging state-of-the-art technologies in speech recognition and natural language processing.
 ---
+### Group Members
+| **Name**                  | **ID**         |
+|---------------------------|----------------|
+| Yosef Ayele Eshetu        | UGR/2067/13    |
+| Yonas Engdu               | UGR/4575/13    |
+| Yosef Aweke Dinku         | UGR/5887/13    |
+| Yosef Muluneh Bane        | UGR/5715/13    |
+---
+### Technologies
+- Python for writing training scripts
+- Facebook's Wav2Vec2 as the base model
+- SpeechBrain for training
+---
+### About the Repository
+This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on the [ALFFA Amharic dataset](https://github.com/besacier/ALFFA_PUBLIC/tree/master/ASR/AMHARIC) within SpeechBrain.
+---
+### Datasets Used for Fine-tuning
+- `facebook/2M-Belebele`
+- `fsicoli/common_voice_19_0`
+---