ARPO_UITARS1.5_7B

Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark
[Paper] • [Code] • [Logs]

Model Summary

ARPO_UITARS1.5_7B is fine-tuned from UI-Tars-1.5-7B using Agentic Replay Policy Optimization (ARPO) on the OSWorld benchmark for GUI agents.

📊 Performance

Model	OSWorld (128 Tasks)	OSWorld Overall
UI-Tars-1.5	68.7%	23.5%
UI-Tars-1.5 + GRPO	72.9%	26.0%
UI-Tars-1.5 + ARPO (Ours)	83.9%	29.9%

Evaluation setting: max 15 steps per trajectory.

📝 Citation

If you use this model in your work, please cite:

@article{lu2025arpo,
  title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay},
  author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia},
  journal={arxiv},
  year={2025}
}

Fanbin
/

ARPO_UITARS1.5_7B

ARPO_UITARS1.5_7B

Model Summary

📊 Performance

📝 Citation

🔗 Related Resources