EgypTalk-ASR-v2
NAMAA-Space/EgypTalk-ASR-v2 is a high-performance automatic speech recognition (ASR) model for Egyptian Arabic, trained using NVIDIA NeMo and optimized for real-world speech from native Egyptian speakers.
The model was trained on over 200 hours of high-quality, manually curated audio data collected and prepared by the NAMAA team. It is built upon NVIDIAβs FastConformer Hybrid Large architecture and fine-tuned for Egyptian Arabic, enabling highly accurate transcription in casual, formal, and mixed dialect settings.
Demo: Try it here
π£οΈ Model Description
- Architecture: FastConformer Hybrid Large from NVIDIA NeMo ASR collection.
 - Framework: PyTorch Lightning + NVIDIA NeMo.
 - Languages: Egyptian Arabic (with capability to handle some Modern Standard Arabic).
 - Dataset: 200+ hours of proprietary, high-quality audio for Egyptian Arabic, covering:
- Spontaneous conversation
 - Broadcast media
 - Interviews
 - Read speech
 
 - Tokenizer: SentencePiece (trained specifically for Egyptian Arabic phonetic coverage).
 - Input Format: 16 kHz mono WAV files.
 - Output: Raw transcribed text in Arabic.
 
π Key Features
- Egyptian Arabic Dialect Optimized β Designed to handle local pronunciations, colloquialisms, and speech patterns.
 - High Accuracy β Achieves strong WER performance on Egyptian test sets.
 - FastConformer Efficiency β Low-latency, streaming-capable inference.
 - Robust Dataset β Covers multiple domains (media, conversation, formal speech).
 
π» Usage
import torch
from nemo.collections.asr.models import ASRModel
# Load model
model = ASRModel.from_pretrained("NAMAA-Space/EgypTalk-ASR-v2")
# Transcribe audio
transcription = model.transcribe(["sample.wav"])
print(transcription)
π οΈ Training Details
- Pretrained Base Model: nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0
 - Training Framework: PyTorch Lightning (DDP strategy)
 - Training Duration: 100 epochs, mixed precision enabled
 - Optimizer: Adam with learning rate 1e-3
 - Batch Size: 32 (train) / 8 (validation, test)
 - Augmentations: Silence trimming, start/end token usage
 
Citation
@misc{,
  title={NAMAA-Space/EgypTalk-ASR-v2},
  author={NAMAA},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/NAMAA-Space/NAMAA-Space/EgypTalk-ASR-v2}},
  note={Accessed: 2025-03-02}
}
- Downloads last month
 - 230
 
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	π
			1
		Ask for provider support
