SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? Paper • 2410.03859 • Published Oct 4, 2024 • 1
VideoGameBench: Can Vision-Language Models complete popular video games? Paper • 2505.18134 • Published May 23 • 6