csfufu commited on
Commit
3f0e0aa
·
verified ·
1 Parent(s): 1cdf887

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -0
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-VL-7B-Instruct
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ pipeline_tag: image-text-to-text
8
+ tags:
9
+ - transformers
10
+ - multimodal
11
+ library_name: transformers
12
+ ---
13
+
14
+
15
+ ## 🌟 ReVisual-R1 (7B) — Open-Source Multimodal Reasoner
16
+
17
+ > **One cold-start, two RL stages, endless reasoning power.**
18
+
19
+ ---
20
+
21
+ ### 🔑 Highlights
22
+
23
+ * **SOTA on 9 tough benchmarks** covering visual–math + text reasoning.
24
+ * **Three-Stage SRO Training**
25
+
26
+ 1. **Text Cold-Start** — seed deep reflection
27
+ 2. **Multimodal RL** — align vision & logic
28
+ 3. **Text RL** — polish fluency & brevity
29
+ * **PAD** (Prioritized Advantage Distillation) keeps gradients alive.
30
+ * **Efficient-Length Reward** = concise, self-reflective CoT.
31
+
32
+ ---
33
+
34
+ ### 📚 Resources
35
+
36
+ * [Paper](https://arxiv.org/abs/2506.04207)
37
+ * [Code](https://github.com/CSfufu/Revisual-R1)
38
+
39
+
40
+ ---
41
+
42
+ ### 📌 Citation
43
+
44
+ ```bibtex
45
+ @misc{chen2025advancingmultimodalreasoningoptimized,
46
+ title = {Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning},
47
+ author = {Shuang Chen and Yue Guo and Zhaochen Su and Yafu Li and Yulun Wu and Jiacheng Chen and
48
+ Jiayu Chen and Weijie Wang and Xiaoye Qu and Yu Cheng},
49
+ year = {2025},
50
+ eprint = {2506.04207},
51
+ archivePrefix = {arXiv},
52
+ primaryClass = {cs.LG},
53
+ url = {https://arxiv.org/abs/2506.04207}
54
+ }
55
+ ```
56
+
57
+ Take ReVisual-R1 for a spin and let us know what you build! 🎯