Home-cooked Mistral Small Omni

This is a multimodal model created by merging Mistral Small 2506 (with vision capabilities) and Voxtral 2507 (with audio capabilities) using a modified version of the mergekit tool.

For detailed merging instructions, refer to the sections below.

License and Attribution

This model is a merged derivative work combining Mistral Small 2506 and Voxtral 2507, both originally released by Mistral AI under the Apache 2.0 license. The merged model is also distributed under the Apache 2.0 license, and the full license text, along with original copyright notices, is included in this repository. I have no affiliation, sponsorship, or formal relationship with Mistral AI. This project is an independent effort to combine the vision and audio capabilities of the two models.

Steps to reproduce

Merge text model

Install mergekit from this version: https://github.com/arcee-ai/mergekit/tree/0027c5c51471fa891d438eccda5455ebe55b536e

Modify the mergekit source code, open file mergekit/merge_methods/generalized_task_arithmetic.py

    # Normalize the vectors to get the directions and angles
    v0 = normalize(v0, eps)
    v1 = normalize(v1, eps)

    if v0.shape != v1.shape:                # ADD THIS
        res = np.array([0.0])               # ADD THIS
        return maybe_torch(res, is_torch)   # ADD THIS

    # Dot product with the normalized vectors (can't use np.dot in W)
    dot = np.sum(v0 * v1)

    # If absolute value of dot product is almost 1, vectors are ~colinear, so use lerp
    if np.abs(dot) > DOT_THRESHOLD:
        res = lerp(t, v0_copy, v1_copy)
        return maybe_torch(res, is_torch)

Prepare YAML file for merging config:

name: mistral-omni
merge_method: slerp
models:
  - model: ../models/Voxtral-Small-24B-2507
  - model: ../models/Mistral-Small-3.2-24B-Instruct-2506
base_model: ../models/Mistral-Small-3.2-24B-Instruct-2506
parameters:
  t:
    - filter: self_attn
      value: [0.1, 0.3, 0.5, 0.3, 0.1, 0]
    - filter: mlp
      value: [0.1, 0.3, 0.5, 0.3, 0.1, 0]
    - value: 0.5 # fallback for rest of tensors
dtype: bfloat16

Merge it:

mergekit-yaml mistral_o.yaml ../models/mistral_o

Go to the mistral_o output directory, then download tekken.json from Voxtral and place it there: https://huggingface.co/mistralai/Voxtral-Small-24B-2507/blob/main/tekken.json

Finally, use convert_hf_to_gguf.py to convert it back to GGUF as usual

Merge mmproj models

Download these mmproj files:

Rename them to audio.ggufand vision.gguf respectively

Then run merge_mmproj_models.py from this repo. The output file will be mmproj-model.gguf

Downloads last month
199
GGUF
Model size
23.6B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ngxson/Home-Cook-Mistral-Small-Omni-24B-2507-GGUF