AI & ML interests

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Recent Activity

šŸ”„šŸ”„ Introducing MMR1 — a Multimodal Reasoning Model trained with Variance-Aware Sampling (VAS)

šŸ’” Highlights

  • Variance-Aware Sampling (VAS) for multimodal RL training:
    • Establishes a theoretical link between reward variance and gradient signal strength;
    • Proposes the Variance Promotion Score (VPS) integrating Outcome Variance and Trajectory Diversity;
    • Enables more efficient and stable optimization under limited data conditions.
  • Open-sources ~1.6M Long-CoT cold-start samples, annotated by Gemini 2.5 Pro/Flash and verified with GPT-4o.
  • Releases a suite of SFT and RL checkpoints at multiple scales: 3B, 7B, and 32B variants.

šŸ“¦ Resources

šŸ“‘ Citation

If you find MMR1 useful for your research and applications, please cite using this BibTeX:

@misc{leng2025mmr1,
  title={MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources}, 
  author={Sicong Leng and Jing Wang and Jiaxi Li and Hao Zhang and Zhiqiang Hu and Boqiang Zhang and Yuming Jiang and Hang Zhang and Xin Li and Lidong Bing and Deli Zhao and Wei Lu and Yu Rong and Aixin Sun and Shijian Lu},
  year={2025},
  eprint={2509.21268},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2509.21268}, 
}