YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
ARPO_UITARS1.5_7B
Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark
[Paper] β’ [Code] β’ [Logs]
Model Summary
ARPO_UITARS1.5_7B is fine-tuned from UI-Tars-1.5-7B using Agentic Replay Policy Optimization (ARPO) on the OSWorld benchmark for GUI agents.
π Performance
| Model | OSWorld (128 Tasks) | OSWorld Overall |
|---|---|---|
| UI-Tars-1.5 | 68.7% | 23.5% |
| UI-Tars-1.5 + GRPO | 72.9% | 26.0% |
| UI-Tars-1.5 + ARPO (Ours) | 83.9% | 29.9% |
Evaluation setting: max 15 steps per trajectory.
π Citation
If you use this model in your work, please cite:
@article{lu2025arpo,
title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay},
author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia},
journal={arxiv},
year={2025}
}
π Related Resources
- Downloads last month
- 78
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support