jeffcookio
/

Mistral-Small-3.2-24B-Instruct-2506-awq-sym

compressed-tensors

Model card Files Files and versions

Mistral-Small-3.2-24B-Instruct-2506-awq-sym / README.md

jeffcookio's picture

Update README.md

49ec31e verified 4 months ago

|

history blame contribute delete

1.18 kB

	---
	datasets:
	- nlphuji/flickr30k
	base_model:
	- unsloth/Mistral-Small-3.2-24B-Instruct-2506
	- mistralai/Mistral-Small-3.2-24B-Instruct-2506
	---

	Created with `llm-compressor`'s latest changes, quantized on a GH200, works well for me with vLLM's `main` branch on my RTX 3090Ti as of 2025-07-01.

	# What about tool calling?

	Per https://vllm-dev.slack.com/archives/C07QP347J4D/p1751401629797809?thread_ts=1751399869.254259&cid=C07QP347J4D, there is currently no way to get tool calling with Mistral-HF formatted models.

	I've worked around this on a GitHub branch here: https://github.com/sjuxax/vllm/tree/Mistral3.1-rebase . It includes code to remap the weights from HF-Mistral to Mistral, allowing use of `MistralTokenizer`.

	I've updated the `config.json` to be compatible with this approach, and I'm about to push the `tekken.json` tokenizer. With that, if you build that branch, you should be able
	to run this checkpoint with `MistralTokenizer` and get tool calling.

	---

	Note: I spoke a little too soon on the above. We also needed https://github.com/vllm-project/vllm/pull/20503 to get tool calling to work properly. I've merged and pushed this to the Mistral3.1-rebase branch.