YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ARPO_UITARS1.5_7B

Trained with ARPO (Agentic Replay Policy Optimization) on OSWorld benchmark
[Paper] • [Code] • [Logs]

Model Summary

ARPO_UITARS1.5_7B is fine-tuned from UI-Tars-1.5-7B using Agentic Replay Policy Optimization (ARPO) on the OSWorld benchmark for GUI agents.

📊 Performance

Model	OSWorld (128 Tasks)	OSWorld Overall
UI-Tars-1.5	68.7%	23.5%
UI-Tars-1.5 + GRPO	72.9%	26.0%
UI-Tars-1.5 + ARPO (Ours)	83.9%	29.9%

Evaluation setting: max 15 steps per trajectory.

📝 Citation

If you use this model in your work, please cite:

@article{lu2025arpo,
  title={ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay},
  author={Fanbin Lu and Zhisheng Zhong and Shu Liu and Chi-Wing Fu and Jiaya Jia},
  journal={arxiv},
  year={2025}
}

🔗 Related Resources

Downloads last month: 78

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Fanbin/ARPO_UITARS1.5_7B

Quantizations

2 models