File size: 1,399 Bytes
2c5604d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
# ARPO_UITARS1.5_7B
**Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark**
[[Paper]](https://github.com/dvlab-research/ARPO/blob/main/paper.pdf) β’ [[Code]](https://github.com/dvlab-research/ARPO) β’ [[Logs]](https://wandb.ai/fanbinlu/arpo)
## Model Summary
`ARPO_UITARS1.5_7B` is fine-tuned from UI-Tars-1.5-7B using **Agentic Replay Policy Optimization (ARPO)** on the **OSWorld** benchmark for GUI agents.
## π Performance
| Model | OSWorld (128 Tasks) | OSWorld Overall |
|-----------------------------|---------------------|-----------------|
| UI-Tars-1.5 | 68.7% | 23.5% |
| UI-Tars-1.5 + GRPO | 72.9% | 26.0% |
| **UI-Tars-1.5 + ARPO (Ours)** | **83.9%** | **29.9%** |
Evaluation setting: max 15 steps per trajectory.
## π Citation
If you use this model in your work, please cite:
```bibtex
@article{lu2025arpo,
title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay},
author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia},
journal={arxiv},
year={2025}
}
```
---
## π Related Resources
- [OSWorld Benchmark](https://github.com/FanbinLu/OSWorld)
- [EasyR1 Framework](https://github.com/hiyouga/EasyR1)
- [Training Logs on W&B](https://wandb.ai/fanbinlu/arpo)
|