Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper ⢠2506.17218 ⢠Published Jun 20 ⢠27
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence Paper ⢠2506.15677 ⢠Published Jun 18 ⢠24
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering Paper ⢠2505.23604 ⢠Published May 29 ⢠24
Scaling Autonomous Agents via Automatic Reward Modeling And Planning Paper ⢠2502.12130 ⢠Published Feb 17 ⢠2
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting Paper ⢠2306.11528 ⢠Published Jun 20, 2023
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding Paper ⢠2311.03354 ⢠Published Nov 6, 2023 ⢠8
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding Paper ⢠2311.03354 ⢠Published Nov 6, 2023 ⢠8