SmolLM2-135M Arabic End-of-Utterance Detector

Fine-tuned SmolLM2-135M model for detecting end-of-utterance (EOU) in Arabic conversations.

Model Description

This model predicts when an Arabic speaker has finished their turn in a conversation based on transcribed speech. It's designed for real-time voice assistants, LiveKit agents, and conversational AI systems.

Key Features:

🎯 High Accuracy: F1-Score of 0.913
🌍 Multi-Dialect: Supports Levantine, Egyptian, and Gulf Arabic
⚡ Fast Inference: <50ms per prediction on GPU
🔄 Context-Aware: Can use previous utterances for better predictions
🎙️ Production-Ready: Integrated with LiveKit for real-time use

Performance

Metric	Score
F1 Score	0.913
Accuracy	0.913
Precision	0.906
Recall	0.921
AUC-ROC	0.958

Inference Speed:

CPU: 30-50ms per prediction
GPU (RTX 4070): 10-20ms per prediction
Batch (32 samples): 3-6ms per prediction

Training Details

Training Data

Dataset: Reverb/arabic-eou-conversations
Total Examples: 11,660 (balanced 50/50 EOU/NOT_EOU)
Dialects:
- Levantine (شامي)
- Egyptian (مصري)
- Gulf (خليجي)
Split: 80% train, 10% validation, 10% test

Training Configuration

Base Model: HuggingFaceTB/SmolLM2-135M
Parameters: 135 million
Hardware: NVIDIA RTX 4070 (8GB VRAM)
Batch Size: 32 (effective: 64 with gradient accumulation)
Learning Rate: 2e-5
Epochs: 5
Optimizer: AdamW
Mixed Precision: FP16

Usage

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
    "Reverb/smollm2-135m-arabic-eou"
)
tokenizer = AutoTokenizer.from_pretrained(
    "Reverb/smollm2-135m-arabic-eou"
)

# Predict
text = "شو رأيك نروح نتغدا؟"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    prediction = torch.argmax(probs, dim=-1).item()
    confidence = probs[0][prediction].item()

print(f"EOU: {prediction == 1}, Confidence: {confidence:.3f}")
# Output: EOU: True, Confidence: 0.952

With Context

# Using previous utterance as context
context = "كيف حالك؟"
current = "الحمد لله بخير"
text_with_context = f"{context} [SEP] {current}"

inputs = tokenizer(text_with_context, return_tensors="pt", max_length=256, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    is_eou = torch.argmax(probs, dim=-1).item() == 1
    confidence = probs[0][1 if is_eou else 0].item()

print(f"EOU: {is_eou}, Confidence: {confidence:.3f}")

Batch Prediction

texts = [
    "شو رأيك",           # Partial - NOT_EOU
    "شو رأيك نروح نتغدا؟"  # Complete - EOU
]

inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=256)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    predictions = torch.argmax(probs, dim=-1)

for text, pred, prob in zip(texts, predictions, probs):
    is_eou = pred.item() == 1
    conf = prob[pred].item()
    print(f"'{text}' → {'EOU' if is_eou else 'NOT_EOU'} ({conf:.3f})")

Intended Use

Primary Use Cases

Voice Assistants: Detect when users finish speaking
LiveKit Agents: Real-time turn detection in voice conversations
Dialogue Systems: Turn-taking in conversational AI
Transcription Systems: Add turn boundaries to speech transcripts
Conversation Analysis: Analyze turn-taking patterns

Example Applications

Real-time Voice Agent

# Process STT transcription
is_eou, confidence = detect_eou(transcription)
if is_eou and confidence > 0.7:
    # User finished speaking, generate response
    agent_response = generate_response(transcription)

LiveKit Integration

from livekit_eou_sdk import ArabicEOUTurnDetector

detector = ArabicEOUTurnDetector(threshold=0.7)
is_eou, conf = await detector.process_transcription(text, is_final=True)

Limitations

Dialect Coverage: Optimized for Levantine, Egyptian, and Gulf dialects. May not perform as well on other Arabic dialects.
Formal Arabic: Designed for conversational/colloquial Arabic. Performance on Modern Standard Arabic (MSA) or Classical Arabic may vary.
Domain: Trained on general conversational data. May require fine-tuning for specialized domains (medical, legal, etc.).
Context: Best results when using conversation context. Single utterances without context may have lower accuracy.
Spoken Language: Designed for transcribed spoken language, not written text.

Bias and Fairness

The model was trained on balanced data across three major Arabic dialects
Performance is consistent across all three dialects (Levantine, Egyptian, Gulf)
May have reduced performance on underrepresented dialects or regional variations
No demographic or gender-based biases were intentionally introduced

Model Architecture

Type: Sequence Classification (Binary)
Base: LlamaForSequenceClassification (SmolLM2-135M)
Input: Arabic text (max 256 tokens)
Output: Binary classification (0=NOT_EOU, 1=EOU)
Classes: 2 (NOT_EOU, EOU)
Model Size: ~270MB

Citation

@misc{arabic-eou-detector-2025,
  author = {Reverb},
  title = {Arabic End-of-Utterance Detector},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Reverb/smollm2-135m-arabic-eou}},
  note = {Fine-tuned SmolLM2-135M for Arabic EOU detection}
}

License

MIT License

Acknowledgments

Base Model: SmolLM2-135M by Hugging Face
Framework: PyTorch, Transformers
Dataset: Arabic EOU Conversations

Contact

For questions or issues, please open an issue on the model repository.

Related Resources

Dataset: Reverb/arabic-eou-conversations
Code Repository: Available in model files
LiveKit SDK: Included for real-time integration

Model Card Version: 1.0
Last Updated: December 2025

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Reverb/smollm2-135m-arabic-eou

Base model

HuggingFaceTB/SmolLM2-135M

Finetuned

(907)

this model

Dataset used to train Reverb/smollm2-135m-arabic-eou

Evaluation results

F1 Score on Arabic EOU Conversations
self-reported

0.913
Accuracy on Arabic EOU Conversations
self-reported

0.913
Precision on Arabic EOU Conversations
self-reported

0.906
Recall on Arabic EOU Conversations
self-reported

0.921
AUC-ROC on Arabic EOU Conversations
self-reported

0.958