InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 211
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory Paper • 2508.09736 • Published Aug 13, 2025 • 57
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels Paper • 2508.17437 • Published Aug 20, 2025 • 38
Mobile-Agent-v3: Foundamental Agents for GUI Automation Paper • 2508.15144 • Published Aug 21, 2025 • 64
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC Paper • 2502.14282 • Published Feb 20, 2025 • 29
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation Paper • 2506.04614 • Published Jun 5, 2025 • 19
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20, 2024 • 175
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space Paper • 2508.19247 • Published Aug 26, 2025 • 43
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22, 2025 • 160