Voxtral Mini 3B β 2507 β Quantized (MLX)
Public quantized weights based on MLX bf16 from mlx-community/Voxtral-Mini-3B-2507-bf16.
Upstream model: mistralai/Voxtral-Mini-3B-2507.
Variants (quantization profiles)
- Q4: folder
mlx-q4/ - Q5: folder
mlx-q5/ - Q6: folder
mlx-q6/ - Q8: folder
mlx-q8/
Published variants appear as subfolders at the top of this repo when available.
Quantization notes
- Only inference weights are quantized (Q4/Q5/Q6/Q8 as above).
- Embeddings are NOT quantized to preserve shape compatibility. Therefore, any "bits per weight" metric may exceed the nominal target (informational, not an error).
Quickstart (MLX)
from mlx_lm import load, generate
model, tokenizer = load("NeoRoth/voxtral-3b-quantized")
print(generate(model, tokenizer, "Hello!", max_tokens=64))
Integrity (SHA256)
- Q4
model-00001-of-00001.safetensors:eec98aef078b3db2c226943d38558d814b10ec387dc5359d333eeed4be5298d2
- Q8
model-00001-of-00001.safetensors:37999e4a9dda52a0aedb593636be6c12e69dd8b8457f15ce48134f88b1ccebd3
License
- Apache-2.0 (see
LICENSE.txt).
Credits
- Upstream model:
mistralai/Voxtral-Mini-3B-2507 - MLX bf16 base used for quantization:
mlx-community/Voxtral-Mini-3B-2507-bf16
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for NeoRoth/voxtral-3b-quantized
Base model
mistralai/Voxtral-Mini-3B-2507