Good idea to remove the hybrid thinking mode
(hybrid = thinking/reasoning + normal mode in one model. thinking/reasoning can be switched on or off in the prompt (/no_think
in that case).)
I noticed Qwen3-32B worse evaluation results (in no think mode) vs Qwen2.5-32B in Falcon-H1 evaluation testing first. This is indeed maybe due to the hybrid thinking mode. So it's a no-brainer to remove the hybrid mode (if that's the reason for the improvement) and thank you for doing so, when the scores increase by that much.
Another reason is that SSD space is cheap, so if one needs the reasoning model, one can simply download it and use it and it will perform better than a hybrid thinking LLM.
PS: Many, including me, hope you release the other smaller Qwen3s, each as non-reasoning and reasoning (reasoning of this 235B ofc too).
I noticed that latest models, such as this Qwen3 instruct model and Kimi K2, emphasize their non-thinking mode performance. Can anyone shed some light on why the community is preferring non-thinking models recently?