Model Overview

RefinedNeuro/RN_TR_R2 is a Turkish‐language reasoning model fine-tuned from ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 using the GRPO method on distilled Q&A data of Qwen3 model. It excels at open-ended reasoning tasks in Turkish, particularly STEM, history, and cultural questions.


Evaluation

image/png

Benchmark: RN_TR_R2_Benchmark_Results (Open-ended Turkish culture & reasoning questions)

Model Name Score (%) Description
RefinedNeuro/RN_TR_R2 82.4 Upgraded from RN_TR_R1 via GRPO on distilled Qwen3 data
Qwen3-8B 76.8 Off-the-shelf Qwen3-8B
RefinedNeuro/RN_TR_R1 64.8 Baseline Turkish-fine-tuned model
ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 63.2 Turkish-adapted Llama-8B via direct preference optimization
Meta-Llama-3.1-8B-Instruct 63.2 Meta’s Llama-3.1-8B with instruct tuning

RN_TR_R2 achieves a 17.6 pp improvement over the RN_TR_R1 baseline.


Intended Use

  • Primary use cases: Question answering, step-by-step reasoning, educational tools in Turkish (math, physics, chemistry, biology, geometry, trigonometry, statistics, history, culture).
  • Not recommended: Generating creative fiction or tasks requiring memorized facts outside its training scope.
  • Known limitations: While no systematic failure modes are identified, RN_TR_R2 may hallucinate like all LLMs.

Training Data

  • Base model: ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1

  • Fine-tuning dataset: RefinedNeuro/Qwen3-Reasoning-Distill-Q-A-Dataset

    • Domains: Mathematics, Physics, Chemistry, Biology, Geometry, Trigonometry, Statistics
    • Grade levels: 6th–12th grade
    • Prompt variations (e.g., long/detailed vs. short/easy; answer must be a single number)
    • Size: 13 000 examples (~9 million tokens)
    • License: Creative Commons

Training Configuration

  • Method: GRPO (Generalized Reinforcement-Preference Optimization) via unsloth
  • Hardware: 2Γ— RTX 3090 GPUs for 1 week (~336 GPU-hours)
  • Batch size: 1
  • Context length: 4 096 tokens (max_seq_length=8 192 at inference)
  • Optimizer & scheduler: Cosine LR schedule, peak LR=5Γ—10⁻⁢
  • Library: unsloth==2025.3.18

Usage Example

pip install unsloth==2025.3.18
from unsloth import FastLanguageModel
from transformers import TextStreamer

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="RefinedNeuro/RN_TR_R2",
    max_seq_length=8192,
    load_in_4bit=True,
)

FastLanguageModel.for_inference(model)

SYSTEM_PROMPT = """
Sana sorulan sorulara cevap verirken, adım adım düşün ve gerekçelendir.
Γ–nce <think></think> taglarΔ± arasΔ±nda adΔ±m adΔ±m düşün.
Sonra düşünme adımlarını âzetle.
En son \\boxed{} iΓ§erisinde sorunun cevabΔ±nΔ± ver.
"""

def answer_question(question: str):
    messages = [
        {'role': 'system', 'content': SYSTEM_PROMPT},
        {'role': 'user',   'content': question},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    ).to("cuda")

    streamer = TextStreamer(tokenizer)
    print("-" * 60)
    _ = model.generate(
        input_ids=inputs,
        streamer=streamer,
        max_new_tokens=2048,
        temperature=0.6,
        top_p=0.1,
    )

# Examples
answer_question("Türkiye'nin başkenti neredir?")
answer_question("6 ve 2'nin Γ§arpΔ±mΔ± nedir?")

Sample Q&A

> **Q:** f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun x = 1 noktasındaki türevini bulunuz.
>
> **A:** <think>

1. Adım: f(x) = 3x⁴ - 2x³ + 5x - 7 fonksiyonunun türevini bulmak için, ânce f(x)'nin türevini bulmamız gerekmektedir.

2. Adım: Türevi bulmak için fonksiyonu parçalara ayırarak analiz edeceğiz.

f(x) = 3x⁴ - 2x³ + 5x - 7

3. AdΔ±m: TΓΌrevi bulmak iΓ§in her terimi ayrΔ± ayrΔ± tΓΌrevini buluyoruz.

- 3x⁴: Türevi 12x³'dür.
- -2xΒ³: TΓΌrevi -6xΒ²'dir.
- 5x: TΓΌrevi 5'dir.
- -7: TΓΌrevi 0'dΔ±r (Γ§ΓΌnkΓΌ sabit bir sayΔ±dΔ±r).

4. Adım: Türevi bulmak için elde edilen türevleri birleştiriyoruz.

f'(x) = 12xΒ³ - 6xΒ² + 5

5. Adım: Şimdi x = 1 yerine koyarak f'(x)'nin türevini hesaplayacağız.

f'(1) = 12(1)Β³ - 6(1)Β² + 5

f'(1) = 12(1) - 6(1) + 5

f'(1) = 12 - 6 + 5

f'(1) = 11

6. AdΔ±m: x = 1 noktasΔ±ndaki tΓΌrevi 11 olarak bulduk.

\boxed{11}

</think>

\boxed{11}

Citation

If you use this model, please cite:

@misc{farsak2025rn_tr_r2,
  title        = {RefinedNeuro/RN\_TR\_R2: A Turkish Reasoning Model},
  author       = {Mehmet Can Farsak and Serhat Atayeter and RefinedNeuro},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/RefinedNeuro/RN_TR_R2}},
  license      = {Apache-2.0}
}

License

Apache 2.0 β€” the most permissive open-source license.


Authors

  • Authors: Mehmet Can Farsak, Serhat Atayeter, RefinedNeuro
Downloads last month
936
Safetensors
Model size
8.03B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for RefinedNeuro/RN_TR_R2

Dataset used to train RefinedNeuro/RN_TR_R2

Space using RefinedNeuro/RN_TR_R2 1