YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ARPO_UITARS1.5_7B

Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark
[Paper] β€’ [Code] β€’ [Logs]

Model Summary

ARPO_UITARS1.5_7B is fine-tuned from UI-Tars-1.5-7B using Agentic Replay Policy Optimization (ARPO) on the OSWorld benchmark for GUI agents.

πŸ“Š Performance

Model OSWorld (128 Tasks) OSWorld Overall
UI-Tars-1.5 68.7% 23.5%
UI-Tars-1.5 + GRPO 72.9% 26.0%
UI-Tars-1.5 + ARPO (Ours) 83.9% 29.9%

Evaluation setting: max 15 steps per trajectory.

πŸ“ Citation

If you use this model in your work, please cite:

@article{lu2025arpo,
  title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay},
  author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia},
  journal={arxiv},
  year={2025}
}

πŸ”— Related Resources

Downloads last month
78
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Fanbin/ARPO_UITARS1.5_7B

Quantizations
2 models