IntelligenceLab
/

RewardPreferenceBert

Safetensors

modernbert

Model card Files Files and versions

xet

Community

zli12321 commited on Jun 19

Commit

f02450c

verified ·

1 Parent(s): 15576a9

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -1

README.md CHANGED Viewed

@@ -2,6 +2,9 @@
 license: apache-2.0
 ---
 # VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
 [Zongxia Li*](https://zli12321.github.io/), [Xiyang Wu*](https://wuxiyang1996.github.io/), [Yubin Qin](https://www.linkedin.com/in/yubin-qin/), [Guangyao Shi](https://guangyaoshi.github.io/), [Hongyang Du](https://www.linkedin.com/in/hongyangdu/), [Dinesh Manocha](https://www.cs.umd.edu/people/dmanocha), [Tianyi Zhou](https://tianyizhou.github.io/), [Jordan Lee Boyd-Graber](https://users.umiacs.umd.edu/~ying/)
@@ -9,6 +12,10 @@ license: apache-2.0
 [[📖 Paper](https://arxiv.org/abs/2505.01481)] [[🤗 Dataset](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)][[🌍Website](https://wuxiyang1996.github.io/videohallu_page/)]
 ## 👀 About VideoHallu
@@ -25,7 +32,8 @@ We also use GRPO to train [Qwen-2.5-VL-7B](https://huggingface.co/Qwen/Qwen2.5-V
 ## 🏅 <a name='rb'></a>Reward Model
-We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
 #### Method: `compute_score`
 **Parameters**
@@ -75,6 +83,16 @@ If you find our work helpful for your research, please consider citing our work.
       url={https://arxiv.org/abs/2501.02189},
 }
 @misc{guan2024hallusionbenchadvanceddiagnosticsuite,
       title={HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models},
       author={Tianrui Guan and Fuxiao Liu and Xiyang Wu and Ruiqi Xian and Zongxia Li and Xiaoyu Liu and Xijun Wang and Lichang Chen and Furong Huang and Yaser Yacoob and Dinesh Manocha and Tianyi Zhou},

 license: apache-2.0
 ---
 # VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
 [Zongxia Li*](https://zli12321.github.io/), [Xiyang Wu*](https://wuxiyang1996.github.io/), [Yubin Qin](https://www.linkedin.com/in/yubin-qin/), [Guangyao Shi](https://guangyaoshi.github.io/), [Hongyang Du](https://www.linkedin.com/in/hongyangdu/), [Dinesh Manocha](https://www.cs.umd.edu/people/dmanocha), [Tianyi Zhou](https://tianyizhou.github.io/), [Jordan Lee Boyd-Graber](https://users.umiacs.umd.edu/~ying/)
 [[📖 Paper](https://arxiv.org/abs/2505.01481)] [[🤗 Dataset](https://huggingface.co/datasets/IntelligenceLab/VideoHallu)][[🌍Website](https://wuxiyang1996.github.io/videohallu_page/)]
+# Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation
+[[📖 Paper](https://arxiv.org/abs/2506.15068)]
 ## 👀 About VideoHallu
 ## 🏅 <a name='rb'></a>Reward Model
+- RewardBert is specifically targeted for free-form GRPO training, where the answers cannot be evaluated based on simple correctness.
+- We use [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert) as the base model to finetune on [MOCHA](https://arxiv.org/abs/2010.03636), [Prometheus-preference](https://huggingface.co/datasets/prometheus-eval/Preference-Collection), [Pedants](https://arxiv.org/abs/2402.11161) to evaluate free-form text generations. We use RewardBert as the reward in GRPO finetuning.
 #### Method: `compute_score`
 **Parameters**
       url={https://arxiv.org/abs/2501.02189},
 }
+@misc{li2025semanticallyawarerewardsopenendedr1,
+      title={Semantically-Aware Rewards for Open-Ended R1 Training in Free-Form Generation},
+      author={Zongxia Li and Yapei Chang and Yuhang Zhou and Xiyang Wu and Zichao Liang and Yoo Yeon Sung and Jordan Lee Boyd-Graber},
+      year={2025},
+      eprint={2506.15068},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2506.15068},
+}
 @misc{guan2024hallusionbenchadvanceddiagnosticsuite,
       title={HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models},
       author={Tianrui Guan and Fuxiao Liu and Xiyang Wu and Ruiqi Xian and Zongxia Li and Xiaoyu Liu and Xijun Wang and Lichang Chen and Furong Huang and Yaser Yacoob and Dinesh Manocha and Tianyi Zhou},