Disable thinking mode?

#3
by daaain - opened

Is there a special token to disable thinking? I'm using the MLX version if that matters

I'm sorry, I'm useless to you since I don't use MLX and can't run this yes... but I wanted to say thank you for making me spit my coffee out laughing at what looked like a request for a "Disabled thinking mode."

Yes, please check our chat template.

daaain changed discussion title from Disabled thinking mode? to Disable thinking mode?

Thanks, so if I understand correctly, either write /nothink or use enable_thinking in the template if the inference library supports it?

https://huggingface.co/zai-org/GLM-4.5-Air/blob/main/chat_template.jinja#L47

@AbyssianOne haha, the irony of being too autistic to notice πŸ˜… or maybe just the temporary disability of being too tired...

yes, vLLM and sglang supoort enable_thinking params,check our github

Thanks a lot! I'm GPU poor, so only llama.cpp and mlx-lm (via LM Studio currently) for me πŸ˜…

But also have to say this model is an absolute sweet spot for people with more powerful Macs, I'm getting 20 tokens / sec on my M2 Max laptop with the 4bit quant, so really grateful for your work!

Sign up or log in to comment