Pillar-0: A New Frontier for Radiology Foundation Models Paper • 2511.17803 • Published 6 days ago • 17
Constantly Improving Image Models Need Constantly Improving Benchmarks Paper • 2510.15021 • Published Oct 16 • 6
Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint Paper • 2505.23759 • Published May 29 • 5
Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint Paper • 2505.23759 • Published May 29 • 5 • 2
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published Apr 17 • 39
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published Apr 17 • 39 • 2
TULIP: Towards Unified Language-Image Pretraining Paper • 2503.15485 • Published Mar 19 • 49 • 2
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions Paper • 2409.12962 • Published Sep 19, 2024 • 2
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions Paper • 2409.12962 • Published Sep 19, 2024 • 2 • 2
Visual Haystacks: Answering Harder Questions About Sets of Images Paper • 2407.13766 • Published Jul 18, 2024 • 2 • 4
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition Paper • 2403.19822 • Published Mar 28, 2024
ALOHa: A New Measure for Hallucination in Captioning Models Paper • 2404.02904 • Published Apr 3, 2024