Thinking=True on GGUF?
How do we set thinking parameter to true on GGUF? 🤔
Hi! We're actively working on an Ollama model with the corresponding go template. Ultimately, enabling thinking is a matter of enabling the right section of system prompt, so in the meantime you can use apply_chat_template on the client side, then use the expanded string with raw generate.
@quantflex
At the moment, to do this in llama.cpp, you would have to use apply_chat_template with thinking=True on the client side (or do the equivalent string manipulation in the programming language of your choice) and then use the formatted string as input for the raw generation. We have not updated the built in chat template logic in llama.cpp itself yet.
The key addition to the system prompt can be seen at line 88 here: https://ollama.com/gabegoodhart/granite3.2-preview:8b/blobs/f7e156ba65ab