Pillar-0: A New Frontier for Radiology Foundation Models Paper • 2511.17803 • Published 20 days ago • 19
Constantly Improving Image Models Need Constantly Improving Benchmarks Paper • 2510.15021 • Published Oct 16 • 6
Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint Paper • 2505.23759 • Published May 29 • 5
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published Apr 17 • 39
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions Paper • 2409.12962 • Published Sep 19, 2024 • 2
view article Article Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark! Jul 23, 2024 • 3
ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video Paper • 2401.05314 • Published Jan 10, 2024 • 12