Devstral-Small-2507-Rebased-Vision

This model was created by taking Mistral-Small-3.2-24B-Instruct-2506 and replacing the weights under the language_model with the weights from Devstral-Small-2507. The result is Devstral with vision capabilities, but you should expect a small quality degradation.

Notes: I used unsloth's uploads of these models for convenience, since they include some extra files and configs too. I didn't name this "-Vision" because it was not trained or finetuned after weight rebase, and in case a future version by mistralai has vision.

The code will be released soon.

Evaluation

Evaluation was performed on 7 benchmarks using lm_eval and sglang. Scripts and other details will also be released with the code. This is not a comprehensive evaluation, and it's not directly comparable to the official benchmark numbers from Mistral, the goal was to approximate quality degradation. Make sure to test on your own downstream tasks!

Model Evaluation Comparison

Here's a comparison of the evaluation results for Devstral-Small-2507 and Devstral-Small-2507-rebased, including the relative loss and the relative standard error for each task:

Tasks	Metric	Devstral-Small-2507	Devstral-Small-2507-rebased	Relative Loss (%)	Relative Stderr (%)
arc_challenge_chat	exact_match	0.9292	0.9283	0.10%	±0.81%
eq_bench	eqbench	72.3376	73.7481	-1.95%	±3.52%
gsm8k	exact_match	0.8643	0.862	0.27%	±1.09%
gsm8k	exact_match	0.8605	0.8567	0.44%	±1.10%
ifeval	inst_level_loose_acc	0.6631	0.6595	0.54%	N/A
ifeval	inst_level_strict_acc	0.6067	0.6019	0.79%	N/A
ifeval	prompt_level_loose_acc	0.5619	0.5545	1.32%	±3.81%
ifeval	prompt_level_strict_acc	0.4917	0.4861	1.14%	±4.37%
mbpp	pass_at_1	0.118	0.112	5.08%	±12.20%
mmlu_pro	exact_match	0.5786	0.579	-0.07%	±0.76%
triviaqa	exact_match	0.7075	0.7068	0.10%	±0.48%

kmouratidis
/

Devstral-Small-2507-Rebased-Vision

Devstral-Small-2507-Rebased-Vision

Evaluation

Model Evaluation Comparison

Model tree for kmouratidis/Devstral-Small-2507-Rebased-Vision