|
|
--- |
|
|
datasets: |
|
|
- nlphuji/flickr30k |
|
|
base_model: |
|
|
- unsloth/Mistral-Small-3.2-24B-Instruct-2506 |
|
|
- mistralai/Mistral-Small-3.2-24B-Instruct-2506 |
|
|
--- |
|
|
|
|
|
Created with `llm-compressor`'s latest changes, quantized on a GH200, works well for me with vLLM's `main` branch on my RTX 3090Ti as of 2025-07-01. |
|
|
|
|
|
# What about tool calling? |
|
|
|
|
|
Per https://vllm-dev.slack.com/archives/C07QP347J4D/p1751401629797809?thread_ts=1751399869.254259&cid=C07QP347J4D, there is currently no way to get tool calling with Mistral-HF formatted models. |
|
|
|
|
|
I've worked around this on a GitHub branch here: https://github.com/sjuxax/vllm/tree/Mistral3.1-rebase . It includes code to remap the weights from HF-Mistral to Mistral, allowing use of `MistralTokenizer`. |
|
|
|
|
|
I've updated the `config.json` to be compatible with this approach, and I'm about to push the `tekken.json` tokenizer. With that, if you build that branch, you should be able |
|
|
to run this checkpoint with `MistralTokenizer` and get tool calling. |
|
|
|
|
|
--- |
|
|
|
|
|
Note: I spoke a little too soon on the above. We also needed https://github.com/vllm-project/vllm/pull/20503 to get tool calling to work properly. I've merged and pushed this to the Mistral3.1-rebase branch. |