csfufu
/

Revisual-R1-final

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

csfufu commited on Jun 5

Commit

3f0e0aa

·

verified ·

1 Parent(s): 1cdf887

Create README.md

Files changed (1) hide show

README.md +57 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+base_model:
+- Qwen/Qwen2.5-VL-7B-Instruct
+language:
+- en
+license: apache-2.0
+pipeline_tag: image-text-to-text
+tags:
+- transformers
+- multimodal
+library_name: transformers
+---
+## 🌟 ReVisual-R1 (7B) — Open-Source Multimodal Reasoner
+> **One cold-start, two RL stages, endless reasoning power.**
+---
+### 🔑 Highlights
+* **SOTA on 9 tough benchmarks** covering visual–math + text reasoning.
+* **Three-Stage SRO Training**
+  1. **Text Cold-Start** — seed deep reflection
+  2. **Multimodal RL** — align vision & logic
+  3. **Text RL** — polish fluency & brevity
+* **PAD** (Prioritized Advantage Distillation) keeps gradients alive.
+* **Efficient-Length Reward** = concise, self-reflective CoT.
+---
+### 📚 Resources
+* [Paper](https://arxiv.org/abs/2506.04207)
+* [Code](https://github.com/CSfufu/Revisual-R1)
+---
+### 📌 Citation
+```bibtex
+@misc{chen2025advancingmultimodalreasoningoptimized,
+  title         = {Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning},
+  author        = {Shuang Chen and Yue Guo and Zhaochen Su and Yafu Li and Yulun Wu and Jiacheng Chen and
+                   Jiayu Chen and Weijie Wang and Xiaoye Qu and Yu Cheng},
+  year          = {2025},
+  eprint        = {2506.04207},
+  archivePrefix = {arXiv},
+  primaryClass  = {cs.LG},
+  url           = {https://arxiv.org/abs/2506.04207}
+}
+```
+Take ReVisual-R1 for a spin and let us know what you build! 🎯