DESUCLUB/Qwen3-NoThinkEmbed

Model Details

Model Description

This model is based on Qwen3, and is an iterative process and set of experiments to try to remove thinking mode from Qwen3 architecturally, instead of providing <think>/n/n</think> tokens.

Current edits to Qwen3 model

The current model has been stripped of thinking tokens in the tokenizer.json and tokenizer_config.json files
Embedding has also been truncated from 151936 to 151667, truncating the lookup for thinking tokens

Usage:

You can use this model via the code below

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "DESUCLUB/Qwen3-NoThinkEmbed"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
print("content:", content)

Reproducing NoThink model

The code used for reproducing this model can also be found in this repo, under think_remover.py

Do note that if trying to reproduce this model, you will need to edit the Qwen3-4B tokenizer.json or use the ones provided here
The tokenizer has been modified to remove all thinking tokens

Credits:

Credit goes to the Qwen Team for developing the Qwen3 suite of models, as well as providing the baseline for the inference code above