YosefA commited on
Commit
89dde49
·
verified ·
1 Parent(s): 5e9f4ed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -60
README.md CHANGED
@@ -1,71 +1,39 @@
1
- ---
2
- language: "dar"
3
- thumbnail:
4
- pipeline_tag: automatic-speech-recognition
5
- tags:
6
- - CTC
7
- - pytorch
8
- - speechbrain
9
- - Transformer
10
- license: "apache-2.0"
11
- datasets:
12
- - Dvoice
13
- metrics:
14
- - wer
15
- - cer
16
  ---
17
 
18
- <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
19
- <br/><br/>
20
 
 
 
 
 
 
 
21
 
22
- # Pipeline description
23
- This ASR system is composed of 2 different but linked blocks:
24
- - Tokenizer (unigram) that transforms words into subword units and is trained with the train transcriptions.
25
- - Acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)) is combined with two DNN layers and finetuned on the Darija dataset.
26
- The obtained final acoustic representation is given to the CTC greedy decoder.
27
- The system is trained with recordings sampled at 16kHz (single channel).
28
- The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
29
 
30
- # Install SpeechBrain
31
- First of all, please install transformers and SpeechBrain with the following command:
32
- ```
33
- pip install speechbrain transformers
34
- ```
35
- Please notice that we encourage you to read the SpeechBrain tutorials and learn more about
36
- [SpeechBrain](https://speechbrain.github.io).
37
 
38
- # Transcribing your own audio files (in Amharic)
39
- ```python
40
- from speechbrain.inference.ASR import EncoderASR
41
- asr_model = EncoderASR.from_hparams(source="speechbrain/asr-wav2vec2-dvoice-amharic", savedir="pretrained_models/asr-wav2vec2-dvoice-amharic")
42
- asr_model.transcribe_file('speechbrain/asr-wav2vec2-dvoice-amharic/example_amharic.wav')
43
- ```
44
 
45
- # Inference on GPU
46
- To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
47
 
 
 
 
48
 
49
- # Training
50
- The model was trained with SpeechBrain.
51
- To train it from scratch follow these steps:
52
- 1. Clone SpeechBrain:
53
- ```bash
54
- git clone https://github.com/speechbrain/speechbrain/
55
- ```
56
- 2. Install it:
57
- ```bash
58
- cd speechbrain
59
- pip install -r requirements.txt
60
- pip install -e .
61
- ```
62
- 3. Run Training:
63
- ```bash
64
- cd recipes/DVoice/ASR/CTC
65
- python train_with_wav2vec2.py hparams/train_amh_with_wav2vec.yaml --data_folder=/localscratch/ALFFA_PUBLIC/ASR/AMHARIC/data/
66
- ```
67
- You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1vNT7RjRuELs7pumBHmfYsrOp9m46D0ym?usp=sharing).
68
 
69
 
70
- # Limitations
71
- The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
 
1
+ # Amharic Speech-to-Text Transcription
2
+
3
+ ## Group 10
4
+
5
+ ### Project Description
6
+ This project focuses on developing a system for transcribing Amharic speech into text. The system aims to provide accurate and efficient transcription capabilities for the Amharic language, leveraging state-of-the-art technologies in speech recognition and natural language processing.
7
+
 
 
 
 
 
 
 
 
8
  ---
9
 
10
+ ### Group Members
 
11
 
12
+ | **Name** | **ID** |
13
+ |---------------------------|----------------|
14
+ | Yosef Ayele Eshetu | UGR/2067/13 |
15
+ | Yonas Engdu | UGR/4575/13 |
16
+ | Yosef Aweke Dinku | UGR/5887/13 |
17
+ | Yosef Muluneh Bane | UGR/5715/13 |
18
 
19
+ ---
 
 
 
 
 
 
20
 
21
+ ### Technologies
22
+ - Python for writing training scripts
23
+ - Facebook's Wav2Vec2 as the base model
24
+ - SpeechBrain for training
 
 
 
25
 
26
+ ---
27
+
28
+ ### About the Repository
29
+ This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on the [ALFFA Amharic dataset](https://github.com/besacier/ALFFA_PUBLIC/tree/master/ASR/AMHARIC) within SpeechBrain.
 
 
30
 
31
+ ---
 
32
 
33
+ ### Datasets Used for Fine-tuning
34
+ - `facebook/2M-Belebele`
35
+ - `fsicoli/common_voice_19_0`
36
 
37
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39