ุจูุณูู ู ุงูููููู ุงูุฑููุญูู ูููฐูู ุงูุฑููุญููู ู
Shehab 0.5B โ Egyptian LLM
Model ID: Prickly-Labs/Shehab-0.5B-Instruct-v0.1a
Base Model: Qwen2.5-0.5B
Author: Ahmed Sherief under Prickly Labs
License: MIT
Access: Private โ request-only (requests will not be approved for now)
๐ง Overview
Shehab 0.5B is a compact instruction-tuned large language model built to understand and generate natural Egyptian Arabic. It was developed with zero budget, using only free-tier tools (Google Gemini and Kaggle), and trained entirely on consumer-grade infrastructure. This project serves as a proof-of-concept for efficient, culturally relevant LLM development without enterprise resources.
This model is not deployed publicly, but is shared selectively in private Gradio apps and Discord environments as a portfolio showcase.
๐ Datasets
Shehab was trained on two private datasets created by Prickly Labs:
Prickly-Labs/1.9M-Egyptian-Corpus
โ used for continued pretrainingPrickly-Labs/Shehab-230k-Instruct
โ used for instruction tuning
โ ๏ธ Note: These datasets are currently private for safety and ethical reasons. They contain potentially harmful content, and releasing them without filtering could cause misuse. Public release may happen in the future, but no date is planned.
Hereโs a simple example of a typical interaction:
Prompt:
ุงุฒุงู ุงุฒูุฏ ุฐูุงุฆูุ
Shehab's Response:
ุงูุฐูุงุก ุฏู ุฒู ุงูุนุถูุฉุ ูู ู
ุง ุชุชู
ุฑู ุนูููุง ุจุชูุจุฑ. ุญุงูู ุชุนู
ู ุญุงุฌุงุช ุฌุฏูุฏุฉ ูู
ุฎุชููุฉุ ุญุชู ูู ุญุงุฌุฉ ุจุณูุทุฉ.
๐ง Training Details
Base Training (Continued Pretraining)
- Objective: Continue pretraining
Qwen2.5-0.5B
on a rich, culturally-rooted Egyptian corpus - Dataset: 1.94 million Egyptian Arabic samples
- Epochs: 1
- Batch size: 24
- Gradient Accumulation: 32
- Learning Rate:
1e-4
- Kernel: Liger
- Trainer:
sft_trainer
withFSDP (full_shard)
- Compute: Kaggle (free T4 GPU)
- Training Time: ~44.5 hours
Instruction Tuning
- Dataset: 230k Egyptian instruction-response pairs
- Split: 95% train / 5% test
- Epochs: 1
- Batch size: 16
- Gradient Accumulation: 32
- Learning Rate:
1e-5
- Trainer:
sft_trainer
withFSDP (full_shard)
- Training Time: ~5 hours
๐ Notes
- This model is not a fine-tune in the traditional sense โ it was continued pretraining, followed by instruction tuning.
- It demonstrates what is possible with zero funding, creative workflows, and deep cultural intention.
- Built entirely under Prickly Labs, a grassroots Arabic AI research initiative.
๐ Dataset Disclosure
Both datasets used in training are currently private, and this is intentional. They contain potentially harmful, emotionally heavy, or offensive content due to the data collection strategy which prioritizes realism, rawness, and cultural relevance.
While there are plans to eventually clean and publish parts of these datasets for academic and community benefit, this will not happen soon and will depend on resources and proper curation.
๐ซ Access & Requests
The model is currently private and request-only on the HuggingFace Hub.
However, requests will not be approved at this stage.
If you're a trusted collaborator, you may be granted private access via Discord bots or custom Gradio apps.
This restricted setup ensures:
- Ethical oversight
- Controlled feedback loops during experimentation
- No misuse while the model is still under refinement
If you're truly interested in using or studying the model, you can still:
- Click the "Request Access" button on the HuggingFace model page and briefly explain why.
- If approved, you'll receive an email with access and/or collaboration options.
Note: Access is being filtered strictly for now.
๐ Want to Contribute or Learn?
If you're genuinely interested in:
- Understanding how this model was built
- Learning the methods I used for emotional tuning in Arabic
- Collaborating on future experiments in low-resource model training
Feel free to request access and mention your interest โ Iโm always happy to teach serious learners or collaborate with passionate builders.
โค๏ธ Special Thanks
Thanks to my close friends who supported the development of this project with feedback, patience, and moral fuel.
This is only the beginning โ Prickly-Labs/Shehab-0.5B-Instruct-v0.1a
is the base for future, larger models that may eventually be open and public, built with the same spirit: local, emotionally aware, and culturally fluent AI for Arabs.
Ahmed Sherief
Founder, Prickly Labs ๐ต
- Downloads last month
- 20