EvolProver: Advancing Automated Theorem Proving by Evolving Formalized Problems via Symmetry and Difficulty Paper • 2510.00732 • Published Oct 1 • 5
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper • 2508.14704 • Published Aug 20 • 43
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use Paper • 2504.07981 • Published Apr 4 • 2
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges Paper • 2411.18932 • Published Nov 28, 2024 • 1
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 20
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published Jun 11, 2024 • 36
CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification Paper • 2405.00253 • Published Apr 30, 2024
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models Paper • 2405.00390 • Published May 1, 2024
Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models Paper • 2401.13298 • Published Jan 24, 2024
MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems Paper • 2404.09486 • Published Apr 15, 2024 • 2
Positional Artefacts Propagate Through Masked Language Model Embeddings Paper • 2011.04393 • Published Nov 9, 2020 • 1