Thinking=True on GGUF?

by MrDevolver - opened Feb 10

Discussion

MrDevolver

Feb 10

How do we set thinking parameter to true on GGUF? 🤔

gabegoodhart

IBM Granite org Feb 10

Hi! We're actively working on an Ollama model with the corresponding go template. Ultimately, enabling thinking is a matter of enabling the right section of system prompt, so in the meantime you can use apply_chat_template on the client side, then use the expanded string with raw generate.

gabegoodhart

IBM Granite org Feb 10

The draft Ollama model is now public: https://ollama.com/gabegoodhart/granite3.2-preview

quantflex

Feb 15

@gabegoodhart is there any way to do this with llama.cpp currently? (not ollama) Thank you!

gabegoodhart

IBM Granite org Feb 17

•

edited Feb 17

@quantflex At the moment, to do this in llama.cpp, you would have to use apply_chat_template with thinking=True on the client side (or do the equivalent string manipulation in the programming language of your choice) and then use the formatted string as input for the raw generation. We have not updated the built in chat template logic in llama.cpp itself yet.

The key addition to the system prompt can be seen at line 88 here: https://ollama.com/gabegoodhart/granite3.2-preview:8b/blobs/f7e156ba65ab

quantflex

Feb 22

Thank you @gabegoodhart !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment