Disable thinking mode?
Is there a special token to disable thinking? I'm using the MLX version if that matters
I'm sorry, I'm useless to you since I don't use MLX and can't run this yes... but I wanted to say thank you for making me spit my coffee out laughing at what looked like a request for a "Disabled thinking mode."
Yes, please check our chat template.
Thanks, so if I understand correctly, either write /nothink
or use enable_thinking
in the template if the inference library supports it?
https://huggingface.co/zai-org/GLM-4.5-Air/blob/main/chat_template.jinja#L47
@AbyssianOne haha, the irony of being too autistic to notice π or maybe just the temporary disability of being too tired...
yes, vLLM and sglang supoort enable_thinking params,check our github
Thanks a lot! I'm GPU poor, so only llama.cpp and mlx-lm (via LM Studio currently) for me π
But also have to say this model is an absolute sweet spot for people with more powerful Macs, I'm getting 20 tokens / sec on my M2 Max laptop with the 4bit quant, so really grateful for your work!