README.md · suayptalha/Arcana-Qwen3-2.4B-A0.6B at main

Arcana-Qwen3-2.4B-A0.6B / README.md

suayptalha

Update README.md

8a9dcfd verified 7 months ago

preview code

raw

history blame contribute delete

3.53 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- Qwen/Qwen3-0.6B
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- moe
	- qwen3
	- code
	- math
	- reasoning
	- medical
	- instruction
	- if
	datasets:
	- nvidia/OpenCodeReasoning
	- unsloth/OpenMathReasoning-mini
	- patrickfleith/instruction-freak-reasoning
	- FreedomIntelligence/medical-o1-reasoning-SFT
	- Malikeh1375/medical-question-answering-datasets
	- Myashka/SO-Python_QA-filtered-2023-no_code-tanh_score
	- ArdentTJ/t1_daily_conversations
	---

	![The Imitation Game](qwen3-moe.jpg)

	"We are all experts at something, but we’re all also beginners at something else."

	— The Imitation Game (2014)

	# Arcana Qwen3 2.4B A0.6B

	This is a MoE (Mixture of Experts) Qwen3 model which has total number of 2.4B parameters and 0.6B for each of 4 experts. All the expert models can be seen below.
	This model aims to provide more accurate results with more efficiency and less memory usage!

	## Expert Models:

	### Key Training Parameters (SFTConfig)

	* `per_device_train_batch_size = 2`
	* `gradient_accumulation_steps = 4`
	* `warmup_steps = 5`
	* `num_train_epochs = 1`
	* `learning_rate = 2e-5`
	* `optim = "adamw_8bit"`
	* `weight_decay = 0.01`
	* `seed = 3407`

	### Coding:
	[suayptalha/Qwen3-0.6B-Code-Expert](https://huggingface.co/suayptalha/Qwen3-0.6B-Code-Expert)

	This model was fully fine-tuned with BF16 on first 20k rows of `nvidia/OpenCodeReasoning` dataset for 1 epoch.

	### Math:
	[suayptalha/Qwen3-0.6B-Math-Expert](https://huggingface.co/suayptalha/Qwen3-0.6B-Math-Expert)

	This model was fully fine-tuned with BF16 on entire `unsloth/OpenMathReasoning-mini` dataset for 1 epoch.

	### Medical:
	[suayptalha/Qwen3-0.6B-Medical-Expert](https://huggingface.co/suayptalha/Qwen3-0.6B-Medical-Expert)

	This model was fully fine-tuned with BF16 on first 20k rows of `FreedomIntelligence/medical-o1-reasoning-SFT` dataset for 1 epoch.

	### Instruction Following:
	[Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)

	`Qwen/Qwen3-0.6B` model was directly used for this expert, no fine-tune was applied.

	## Router Model:
	The router model can be found [here](https://huggingface.co/suayptalha/MoE-Router-v2) which was trained version of `distilbert/distilbert-base-uncased` on 7 different datasets.

	## Usage:
	```py
	import torch
	from huggingface_hub import snapshot_download
	from transformers import AutoModelForCausalLM, AutoTokenizer

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	local_dir = snapshot_download(
	repo_id="suayptalha/Qwen3-2.4B-A0.6B",
	)

	model = AutoModelForCausalLM.from_pretrained(
	local_dir,
	trust_remote_code=True,
	)
	tokenizer = AutoTokenizer.from_pretrained(
	local_dir,
	)

	model.to(device)
	model.eval()

	prompt = "I have pain in my chest, what should I do?"
	messages = [{"role": "user", "content": prompt}]

	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

	with torch.no_grad():
	output_ids = model.generate(
	text=prompt,
	max_new_tokens=1024,
	temperature=0.6,
	top_p=0.95,
	)
	output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
	print(output_text)
	```

	## License:

	This project is licensed under the Apache License 2.0. See the [LICENSE](./LICENSE) file for details.

	## Support:

	<a href="https://www.buymeacoffee.com/suayptalha" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>