Home-cooked Mistral Small Omni
This is a multimodal model created by merging Mistral Small 2506 (with vision capabilities) and Voxtral 2507 (with audio capabilities) using a modified version of the mergekit
tool.
For detailed merging instructions, refer to the sections below.

License and Attribution
This model is a merged derivative work combining Mistral Small 2506 and Voxtral 2507, both originally released by Mistral AI under the Apache 2.0 license. The merged model is also distributed under the Apache 2.0 license, and the full license text, along with original copyright notices, is included in this repository. I have no affiliation, sponsorship, or formal relationship with Mistral AI. This project is an independent effort to combine the vision and audio capabilities of the two models.
Steps to reproduce
Merge text model
Install mergekit
from this version: https://github.com/arcee-ai/mergekit/tree/0027c5c51471fa891d438eccda5455ebe55b536e
Modify the mergekit
source code, open file mergekit/merge_methods/generalized_task_arithmetic.py
# Normalize the vectors to get the directions and angles
v0 = normalize(v0, eps)
v1 = normalize(v1, eps)
if v0.shape != v1.shape: # ADD THIS
res = np.array([0.0]) # ADD THIS
return maybe_torch(res, is_torch) # ADD THIS
# Dot product with the normalized vectors (can't use np.dot in W)
dot = np.sum(v0 * v1)
# If absolute value of dot product is almost 1, vectors are ~colinear, so use lerp
if np.abs(dot) > DOT_THRESHOLD:
res = lerp(t, v0_copy, v1_copy)
return maybe_torch(res, is_torch)
Prepare YAML file for merging config:
name: mistral-omni
merge_method: slerp
models:
- model: ../models/Voxtral-Small-24B-2507
- model: ../models/Mistral-Small-3.2-24B-Instruct-2506
base_model: ../models/Mistral-Small-3.2-24B-Instruct-2506
parameters:
t:
- filter: self_attn
value: [0.1, 0.3, 0.5, 0.3, 0.1, 0]
- filter: mlp
value: [0.1, 0.3, 0.5, 0.3, 0.1, 0]
- value: 0.5 # fallback for rest of tensors
dtype: bfloat16
Merge it:
mergekit-yaml mistral_o.yaml ../models/mistral_o
Go to the mistral_o
output directory, then download tekken.json
from Voxtral and place it there: https://huggingface.co/mistralai/Voxtral-Small-24B-2507/blob/main/tekken.json
Finally, use convert_hf_to_gguf.py
to convert it back to GGUF as usual
Merge mmproj models
Download these mmproj files:
- Audio: https://huggingface.co/ggml-org/Voxtral-Mini-3B-2507-GGUF/blob/main/mmproj-Voxtral-Mini-3B-2507-Q8_0.gguf
- Vision: https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF/blob/main/mmproj-F16.gguf
Rename them to audio.gguf
and vision.gguf
respectively
Then run merge_mmproj_models.py from this repo. The output file will be mmproj-model.gguf
- Downloads last month
- 199
4-bit
16-bit
Model tree for ngxson/Home-Cook-Mistral-Small-Omni-24B-2507-GGUF
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503