prithivMLmods
/

Qwen3-0.6B-ft-bf16

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- Qwen/Qwen3-0.6B
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- text-generation-inference
+- moe
+- moderately abliterated variant
+---
+# **Qwen3-0.6B-ft-bf16**
+> **Qwen3-0.6B-ft-bf16** is a fine-tuned, moderately abliterated variant based on **Qwen3-0.6B**, the latest generation of large language models in the Qwen series. This version emphasizes **improved context awareness** and **balanced behavioral flexibility**, offering reliable performance across a wide range of natural language tasks. It integrates moderate experimental freedoms while maintaining the core strengths of Qwen3, including instruction-following, multilingual understanding, and strong reasoning capabilities.
+### Key Highlights:
+- **Improved Context Awareness**: Enhanced ability to maintain and utilize long-range conversational context, particularly useful for multi-turn dialogues, summarization, and document-based reasoning tasks.
+- **Moderate Abliteration**: Introduces moderate experimental freedoms to unlock more dynamic and expressive model behavior without compromising alignment or safety.
+- **Thinking Mode Support**: Capable of switching between deep reasoning mode and lightweight conversational mode for task-optimized performance.
+- **Multilingual Proficiency**: Supports 100+ languages and dialects for translation and instruction-following in multilingual settings.
+- **Instruction and Agent Alignment**: Performs well in instruction-following, tool integration, and agent-based interactions with external environments.
+---
+## Quickstart with 🤗 Transformers
+```bash
+pip install transformers==4.51.3
+pip install huggingface_hub[hf_xet]
+```
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "prithivMLmods/Qwen3-0.6B-ft-bf16"
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+# Define prompt and apply chat template
+prompt = "How does a rocket reach escape velocity?"
+messages = [{"role": "user", "content": prompt}]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    enable_thinking=True
+)
+# Tokenize input
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+# Generate response
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=32768
+)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
+# Optional: Separate thinking content
+try:
+    index = len(output_ids) - output_ids[::-1].index(151668)  # token ID for </think>
+except ValueError:
+    index = 0
+thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
+content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
+print("thinking content:", thinking_content)
+print("content:", content)
+```
+---
+## Recommended Settings
+- **Sampling (thinking mode)**:
+  - `temperature=0.6`, `top_p=0.95`, `top_k=20`, `min_p=0.0`
+- **Sampling (non-thinking mode)**:
+  - `temperature=0.7`, `top_p=0.8`, `top_k=20`, `min_p=0.0`
+- **Max tokens**:
+  - General: `32768`
+  - Complex problems: `38912`
+---
+## Prompting Tips
+- **Math**:
+  Include: *"Please reason step by step, and put your final answer within \boxed{}."*
+- **MCQs**:
+  Format response as JSON:
+  `{"answer": "B"}`
+- **Multi-Turn Chats**:
+  Store only the final response in conversation history; omit internal reasoning.