بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ

Shehab 0.5B – Egyptian LLM

Model ID: Prickly-Labs/Shehab-0.5B-Instruct-v0.1a
Base Model: Qwen2.5-0.5B
Author: Ahmed Sherief under Prickly Labs
License: MIT
Access: Private – request-only (requests will not be approved for now)

🧠 Overview

Shehab 0.5B is a compact instruction-tuned large language model built to understand and generate natural Egyptian Arabic. It was developed with zero budget, using only free-tier tools (Google Gemini and Kaggle), and trained entirely on consumer-grade infrastructure. This project serves as a proof-of-concept for efficient, culturally relevant LLM development without enterprise resources.

This model is not deployed publicly, but is shared selectively in private Gradio apps and Discord environments as a portfolio showcase.

📚 Datasets

Shehab was trained on two private datasets created by Prickly Labs:

Prickly-Labs/1.9M-Egyptian-Corpus – used for continued pretraining
Prickly-Labs/Shehab-230k-Instruct – used for instruction tuning

⚠️ Note: These datasets are currently private for safety and ethical reasons. They contain potentially harmful content, and releasing them without filtering could cause misuse. Public release may happen in the future, but no date is planned.

Here’s a simple example of a typical interaction:

Prompt:
ازاي ازود ذكائي؟

Shehab's Response:
الذكاء ده زي العضلة، كل ما تتمرن عليها بتكبر. حاول تعمل حاجات جديدة ومختلفة، حتى لو حاجة بسيطة.

🔧 Training Details

Base Training (Continued Pretraining)

Objective: Continue pretraining Qwen2.5-0.5B on a rich, culturally-rooted Egyptian corpus
Dataset: 1.94 million Egyptian Arabic samples
Epochs: 1
Batch size: 24
Gradient Accumulation: 32
Learning Rate: 1e-4
Kernel: Liger
Trainer: sft_trainer with FSDP (full_shard)
Compute: Kaggle (free T4 GPU)
Training Time: ~44.5 hours

Instruction Tuning

Dataset: 230k Egyptian instruction-response pairs
Split: 95% train / 5% test
Epochs: 1
Batch size: 16
Gradient Accumulation: 32
Learning Rate: 1e-5
Trainer: sft_trainer with FSDP (full_shard)
Training Time: ~5 hours

📌 Notes

This model is not a fine-tune in the traditional sense — it was continued pretraining, followed by instruction tuning.
It demonstrates what is possible with zero funding, creative workflows, and deep cultural intention.
Built entirely under Prickly Labs, a grassroots Arabic AI research initiative.

🔒 Dataset Disclosure

Both datasets used in training are currently private, and this is intentional. They contain potentially harmful, emotionally heavy, or offensive content due to the data collection strategy which prioritizes realism, rawness, and cultural relevance.

While there are plans to eventually clean and publish parts of these datasets for academic and community benefit, this will not happen soon and will depend on resources and proper curation.

🚫 Access & Requests

The model is currently private and request-only on the HuggingFace Hub.
However, requests will not be approved at this stage.

If you're a trusted collaborator, you may be granted private access via Discord bots or custom Gradio apps.

This restricted setup ensures:

Ethical oversight
Controlled feedback loops during experimentation
No misuse while the model is still under refinement

If you're truly interested in using or studying the model, you can still:

Click the "Request Access" button on the HuggingFace model page and briefly explain why.
If approved, you'll receive an email with access and/or collaboration options.

Note: Access is being filtered strictly for now.

🙋 Want to Contribute or Learn?

If you're genuinely interested in:

Understanding how this model was built
Learning the methods I used for emotional tuning in Arabic
Collaborating on future experiments in low-resource model training

Feel free to request access and mention your interest — I’m always happy to teach serious learners or collaborate with passionate builders.

❤️ Special Thanks

Thanks to my close friends who supported the development of this project with feedback, patience, and moral fuel.
This is only the beginning — Prickly-Labs/Shehab-0.5B-Instruct-v0.1a is the base for future, larger models that may eventually be open and public, built with the same spirit: local, emotionally aware, and culturally fluent AI for Arabs.

Ahmed Sherief
Founder, Prickly Labs 🌵

Prickly-Labs
/

Shehab-0.5B-Instruct-v0.1a

You need to agree to share your contact information to access this model