sienna223 commited on
Commit
b566ef2
Β·
verified Β·
1 Parent(s): d3de53b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -40
README.md CHANGED
@@ -1,62 +1,168 @@
1
  ---
2
- library_name: peft
3
- license: other
4
  base_model: Qwen/Qwen2.5-VL-72B-Instruct
 
 
5
  tags:
6
- - llama-factory
7
  - lora
8
- - generated_from_trainer
9
- model-index:
10
- - name: 72B
11
- results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- # v7-72B
 
 
 
 
18
 
19
- This model is a fine-tuned version of [/share/project/shared_models/Qwen2.5-VL-72B-Instruct](https://huggingface.co//share/project/shared_models/Qwen2.5-VL-72B-Instruct) on the v7-2_8models_omnigen2-4samples_gpt4-1_range_0to25 dataset.
20
 
21
- ## Model description
 
22
 
23
- More information needed
24
 
25
- ## Intended uses & limitations
 
 
 
 
26
 
27
- More information needed
28
 
29
- ## Training and evaluation data
 
 
 
30
 
31
- More information needed
32
 
33
- ## Training procedure
34
 
35
- ### Training hyperparameters
36
 
37
- The following hyperparameters were used during training:
38
- - learning_rate: 0.0001
39
- - train_batch_size: 1
40
- - eval_batch_size: 8
41
- - seed: 42
42
- - distributed_type: multi-GPU
43
- - num_devices: 16
44
- - gradient_accumulation_steps: 8
45
- - total_train_batch_size: 128
46
- - total_eval_batch_size: 128
47
- - optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
- - lr_scheduler_type: cosine
49
- - lr_scheduler_warmup_ratio: 0.1
50
- - num_epochs: 3.0
51
 
52
- ### Training results
 
 
53
 
 
 
 
54
 
 
 
55
 
56
- ### Framework versions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- - PEFT 0.15.2
59
- - Transformers 4.55.0
60
- - Pytorch 2.8.0+cu128
61
- - Datasets 3.6.0
62
- - Tokenizers 0.21.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  base_model: Qwen/Qwen2.5-VL-72B-Instruct
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
  tags:
6
+ - base_model:adapter:Qwen/Qwen2.5-VL-72B-Instruct
7
  - lora
8
+ - transformers
 
 
 
9
  ---
10
 
11
+ <p align="center">
12
+ <img src="assets/logo.png" width="65%">
13
+ </p>
14
+
15
+ <p align="center">
16
+ <a href="https://vectorspacelab.github.io/EditScore"><img src="https://img.shields.io/badge/Project%20Page-EditScore-yellow" alt="project page"></a>
17
+ <a href="https://arxiv.org/abs/2509.23909"><img src="https://img.shields.io/badge/arXiv%20paper-2509.23909-b31b1b.svg" alt="arxiv"></a>
18
+ <a href="https://huggingface.co/collections/EditScore/editscore-68d8e27ee676981221db3cfe"><img src="https://img.shields.io/badge/EditScore-πŸ€—-yellow" alt="model"></a>
19
+ <a href="https://huggingface.co/datasets/EditScore/EditReward-Bench"><img src="https://img.shields.io/badge/EditReward--Bench-πŸ€—-yellow" alt="dataset"></a>
20
+ </p>
21
+
22
+ <h4 align="center">
23
+ <p>
24
+ <a href=#-news>News</a> |
25
+ <a href=#-quick-start>Quick Start</a> |
26
+ <a href=#-benchmark-your-image-editing-reward-model usage>Benchmark Usage</a> |
27
+ <a href=#%EF%B8%8F-citing-us>Citation</a>
28
+ <p>
29
+ </h4>
30
+
31
+ **EditScore** is a series of state-of-the-art open-source reward models (7B–72B) designed to evaluate and enhance instruction-guided image editing.
32
+ ## ✨ Highlights
33
+ - **State-of-the-Art Performance**: Effectively matches the performance of leading proprietary VLMs. With a self-ensembling strategy, **our largest model surpasses even GPT-5** on our comprehensive benchmark, **EditReward-Bench**.
34
+ - **A Reliable Evaluation Standard**: We introduce **EditReward-Bench**, the first public benchmark specifically designed for evaluating reward models in image editing, featuring 13 subtasks, 11 state-of-the-art editing models (*including proprietary models*) and expert human annotations.
35
+ - **Simple and Easy-to-Use**: Get an accurate quality score for your image edits with just a few lines of code.
36
+ - **Versatile Applications**: Ready to use as a best-in-class reranker to improve editing outputs, or as a high-fidelity reward signal for **stable and effective Reinforcement Learning (RL) fine-tuning**.
37
+
38
+ ## πŸ”₯ News
39
+ - **2025-09-30**: We release **OmniGen2-EditScore7B**, unlocking online RL For Image Editing via high-fidelity EditScore. LoRA weights are available at [Hugging Face](https://huggingface.co/OmniGen2/OmniGen2-EditScore7B) and [ModelScope](https://www.modelscope.cn/models/OmniGen2/OmniGen2-EditScore7B).
40
+ - **2025-09-30**: We are excited to release **EditScore** and **EditReward-Bench**! Model weights and the benchmark dataset are now publicly available. You can access them on Hugging Face: [Models Collection](https://huggingface.co/collections/EditScore/editscore-68d8e27ee676981221db3cfe) and [Benchmark Dataset](https://huggingface.co/datasets/EditScore/EditReward-Bench), and on ModelScope: [Models Collection](https://www.modelscope.cn/collections/EditScore-8b0d53aa945d4e) and [Benchmark Dataset](https://www.modelscope.cn/datasets/EditScore/EditReward-Bench).
41
+
42
+ ## πŸ“– Introduction
43
+ While Reinforcement Learning (RL) holds immense potential for this domain, its progress has been severely hindered by the absence of a high-fidelity, efficient reward signal.
44
+
45
+ To overcome this barrier, we provide a systematic, two-part solution:
46
+
47
+ - **A Rigorous Evaluation Standard**: We first introduce **EditReward-Bench**, a new public benchmark for the direct and reliable evaluation of reward models. It features 13 diverse subtasks and expert human annotations, establishing a gold standard for measuring reward signal quality.
48
+
49
+ - **A Powerful & Versatile Tool**: Guided by our benchmark, we developed the **EditScore** model series. Through meticulous data curation and an effective self-ensembling strategy, EditScore sets a new state of the art for open-source reward models, even surpassing the accuracy of leading proprietary VLMs.
50
 
51
+ <p align="center">
52
+ <img src="assets/table_reward_model_results.png" width="95%">
53
+ <br>
54
+ <em>Benchmark results on EditReward-Bench.</em>
55
+ </p>
56
 
57
+ We demonstrate the practical utility of EditScore through two key applications:
58
 
59
+ - **As a State-of-the-Art Reranker**: Use EditScore to perform Best-of-*N* selection and instantly improve the output quality of diverse editing models.
60
+ - **As a High-Fidelity Reward for RL**: Use EditScore as a robust reward signal to fine-tune models via RL, enabling stable training and unlocking significant performance gains where general-purpose VLMs fail.
61
 
62
+ This repository releases both the **EditScore** models and the **EditReward-Bench** dataset to facilitate future research in reward modeling, policy optimization, and AI-driven model improvement.
63
 
64
+ <p align="center">
65
+ <img src="assets/figure_edit_results.png" width="95%">
66
+ <br>
67
+ <em>EditScore as a superior reward signal for image editing.</em>
68
+ </p>
69
 
 
70
 
71
+ ## πŸ“Œ TODO
72
+ We are actively working on improving EditScore and expanding its capabilities. Here's what's next:
73
+ - [ ] Release RL training code applying EditScore to OmniGen2.
74
+ - [ ] Provide Best-of-N inference scripts for OmniGen2, Flux-dev-Kontext, and Qwen-Image-Edit.
75
 
76
+ ## πŸš€ Quick Start
77
 
78
+ ### πŸ› οΈ Environment Setup
79
 
80
+ #### βœ… Recommended Setup
81
 
82
+ ```bash
83
+ # 1. Clone the repo
84
+ git clone [email protected]:VectorSpaceLab/EditScore.git
85
+ cd EditScore
 
 
 
 
 
 
 
 
 
 
86
 
87
+ # 2. (Optional) Create a clean Python environment
88
+ conda create -n editscore python=3.12
89
+ conda activate editscore
90
 
91
+ # 3. Install dependencies
92
+ # 3.1 Install PyTorch (choose correct CUDA version)
93
+ pip install torch==2.7.1 torchvision --extra-index-url https://download.pytorch.org/whl/cu126
94
 
95
+ # 3.2 Install other required packages
96
+ pip install -r requirements.txt
97
 
98
+ # EditScore runs even without vllm, though we recommend install it for best performance.
99
+ pip install vllm
100
+ ```
101
+
102
+ #### 🌏 For users in Mainland China
103
+
104
+ ```bash
105
+ # Install PyTorch from a domestic mirror
106
+ pip install torch==2.7.1 torchvision --index-url https://mirror.sjtu.edu.cn/pytorch-wheels/cu126
107
+
108
+ # Install other dependencies from Tsinghua mirror
109
+ pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
110
+
111
+ # EditScore runs even without vllm, though we recommend install it for best performance.
112
+ pip install vllm -i https://pypi.tuna.tsinghua.edu.cn/simple
113
+ ```
114
+
115
+ ---
116
+
117
+ ### πŸ§ͺ Usage Example
118
+ Using EditScore is straightforward. The model will be automatically downloaded from the Hugging Face Hub on its first run.
119
+ ```python
120
+ from PIL import Image
121
+ from editscore import EditScore
122
+
123
+ # Load the EditScore model. It will be downloaded automatically.
124
+ # Replace with the specific model version you want to use.
125
+ model_path = "Qwen/Qwen2.5-VL-7B-Instruct"
126
+ lora_path = "EditScore/EditScore-7B"
127
+
128
+ scorer = EditScore(
129
+ backbone="qwen25vl", # set to "qwen25vl_vllm" for faster inference
130
+ model_name_or_path=model_path,
131
+ enable_lora=True,
132
+ lora_path=lora_path,
133
+ score_range=25,
134
+ num_pass=1, # Increase for better performance via self-ensembling
135
+ )
136
+
137
+ input_image = Image.open("example_images/input.png")
138
+ output_image = Image.open("example_images/output.png")
139
+ instruction = "Adjust the background to a glass wall."
140
+
141
+ result = scorer.evaluate([input_image, output_image], instruction)
142
+ print(f"Edit Score: {result['final_score']}")
143
+ # Expected output: A dictionary containing the final score and other details.
144
+ ```
145
+
146
+ ---
147
 
148
+ ## πŸ“Š Benchmark Your Image-Editing Reward Model
149
+ We provide an evaluation script to benchmark reward models on **EditReward-Bench**. To evaluate your own custom reward model, simply create a scorer class with a similar interface and update the script.
150
+ ```bash
151
+ # This script will evaluate the default EditScore model on the benchmark
152
+ bash evaluate.sh
153
+
154
+ # Or speed up inference with VLLM
155
+ bash evaluate_vllm.sh
156
+ ```
157
+
158
+ ## ❀️ Citing Us
159
+ If you find this repository or our work useful, please consider giving a star ⭐ and citation πŸ¦–, which would be greatly appreciated:
160
+
161
+ ```bibtex
162
+ @article{luo2025editscore,
163
+ title={EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling},
164
+ author={Xin Luo and Jiahao Wang and Chenyuan Wu and Shitao Xiao and Xiyan Jiang and Defu Lian and Jiajun Zhang and Dong Liu and Zheng Liu},
165
+ journal={arXiv preprint arXiv:2509.23909},
166
+ year={2025}
167
+ }
168
+ ```