|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- Qwen/Qwen3-0.6B |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
tags: |
|
|
- moe |
|
|
- qwen3 |
|
|
- code |
|
|
- math |
|
|
- reasoning |
|
|
- medical |
|
|
- instruction |
|
|
- if |
|
|
datasets: |
|
|
- nvidia/OpenCodeReasoning |
|
|
- unsloth/OpenMathReasoning-mini |
|
|
- patrickfleith/instruction-freak-reasoning |
|
|
- FreedomIntelligence/medical-o1-reasoning-SFT |
|
|
- Malikeh1375/medical-question-answering-datasets |
|
|
- Myashka/SO-Python_QA-filtered-2023-no_code-tanh_score |
|
|
- ArdentTJ/t1_daily_conversations |
|
|
--- |
|
|
|
|
|
 |
|
|
|
|
|
"*We are all experts at something, but we’re all also beginners at something else.*" |
|
|
|
|
|
— *The Imitation Game (2014)* |
|
|
|
|
|
# Arcana Qwen3 2.4B A0.6B |
|
|
|
|
|
This is a MoE (Mixture of Experts) Qwen3 model which has total number of 2.4B parameters and 0.6B for each of 4 experts. All the expert models can be seen below. |
|
|
This model aims to provide more accurate results with more efficiency and less memory usage! |
|
|
|
|
|
## Expert Models: |
|
|
|
|
|
### Key Training Parameters (SFTConfig) |
|
|
|
|
|
* `per_device_train_batch_size = 2` |
|
|
* `gradient_accumulation_steps = 4` |
|
|
* `warmup_steps = 5` |
|
|
* `num_train_epochs = 1` |
|
|
* `learning_rate = 2e-5` |
|
|
* `optim = "adamw_8bit"` |
|
|
* `weight_decay = 0.01` |
|
|
* `seed = 3407` |
|
|
|
|
|
### Coding: |
|
|
[suayptalha/Qwen3-0.6B-Code-Expert](https://huggingface.co/suayptalha/Qwen3-0.6B-Code-Expert) |
|
|
|
|
|
This model was fully fine-tuned with BF16 on first 20k rows of `nvidia/OpenCodeReasoning` dataset for 1 epoch. |
|
|
|
|
|
### Math: |
|
|
[suayptalha/Qwen3-0.6B-Math-Expert](https://huggingface.co/suayptalha/Qwen3-0.6B-Math-Expert) |
|
|
|
|
|
This model was fully fine-tuned with BF16 on entire `unsloth/OpenMathReasoning-mini` dataset for 1 epoch. |
|
|
|
|
|
### Medical: |
|
|
[suayptalha/Qwen3-0.6B-Medical-Expert](https://huggingface.co/suayptalha/Qwen3-0.6B-Medical-Expert) |
|
|
|
|
|
This model was fully fine-tuned with BF16 on first 20k rows of `FreedomIntelligence/medical-o1-reasoning-SFT` dataset for 1 epoch. |
|
|
|
|
|
### Instruction Following: |
|
|
[Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) |
|
|
|
|
|
`Qwen/Qwen3-0.6B` model was directly used for this expert, no fine-tune was applied. |
|
|
|
|
|
## Router Model: |
|
|
The router model can be found [here](https://huggingface.co/suayptalha/MoE-Router-v2) which was trained version of `distilbert/distilbert-base-uncased` on 7 different datasets. |
|
|
|
|
|
## Usage: |
|
|
```py |
|
|
import torch |
|
|
from huggingface_hub import snapshot_download |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
|
|
|
local_dir = snapshot_download( |
|
|
repo_id="suayptalha/Qwen3-2.4B-A0.6B", |
|
|
) |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
local_dir, |
|
|
trust_remote_code=True, |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
local_dir, |
|
|
) |
|
|
|
|
|
model.to(device) |
|
|
model.eval() |
|
|
|
|
|
prompt = "I have pain in my chest, what should I do?" |
|
|
messages = [{"role": "user", "content": prompt}] |
|
|
|
|
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
|
|
with torch.no_grad(): |
|
|
output_ids = model.generate( |
|
|
text=prompt, |
|
|
max_new_tokens=1024, |
|
|
temperature=0.6, |
|
|
top_p=0.95, |
|
|
) |
|
|
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True) |
|
|
print(output_text) |
|
|
``` |
|
|
|
|
|
## License: |
|
|
|
|
|
This project is licensed under the Apache License 2.0. See the [LICENSE](./LICENSE) file for details. |
|
|
|
|
|
## Support: |
|
|
|
|
|
<a href="https://www.buymeacoffee.com/suayptalha" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a> |