File size: 1,399 Bytes
2c5604d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# ARPO_UITARS1.5_7B

**Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark**  
[[Paper]](https://github.com/dvlab-research/ARPO/blob/main/paper.pdf) β€’ [[Code]](https://github.com/dvlab-research/ARPO) β€’ [[Logs]](https://wandb.ai/fanbinlu/arpo)


## Model Summary

`ARPO_UITARS1.5_7B` is fine-tuned from UI-Tars-1.5-7B using **Agentic Replay Policy Optimization (ARPO)** on the **OSWorld** benchmark for GUI agents. 


## πŸ“Š Performance

| Model                        | OSWorld (128 Tasks) | OSWorld Overall |
|-----------------------------|---------------------|-----------------|
| UI-Tars-1.5                 | 68.7%               | 23.5%           |
| UI-Tars-1.5 + GRPO          | 72.9%               | 26.0%           |
| **UI-Tars-1.5 + ARPO (Ours)** | **83.9%**             | **29.9%**         |

Evaluation setting: max 15 steps per trajectory.




## πŸ“ Citation

If you use this model in your work, please cite:

```bibtex
@article{lu2025arpo,
  title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay},
  author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia},
  journal={arxiv},
  year={2025}
}
```

---

## πŸ”— Related Resources

- [OSWorld Benchmark](https://github.com/FanbinLu/OSWorld)
- [EasyR1 Framework](https://github.com/hiyouga/EasyR1)
- [Training Logs on W&B](https://wandb.ai/fanbinlu/arpo)