SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning Paper • 2506.21355 • Published Jun 26, 2025 • 10
MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning Paper • 2506.22992 • Published Jun 28, 2025 • 12
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs Paper • 2405.18740 • Published May 29, 2024
Almanac Copilot: Towards Autonomous Electronic Health Record Navigation Paper • 2405.07896 • Published Apr 30, 2024
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments Paper • 2405.07960 • Published May 13, 2024 • 1
MIRIAD: Augmenting LLMs with millions of medical query-response pairs Paper • 2506.06091 • Published Jun 6, 2025 • 10
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards Paper • 2506.11474 • Published Jun 13, 2025 • 18
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models Paper • 2401.15269 • Published Jan 27, 2024 • 2
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published Mar 10, 2025 • 16
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards Paper • 2506.11474 • Published Jun 13, 2025 • 18
AgentRxiv: Towards Collaborative Autonomous Research Paper • 2503.18102 • Published Mar 23, 2025 • 25
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks Paper • 2404.00376 • Published Mar 30, 2024 • 5
Predicting sepsis in multi-site, multi-national intensive care cohorts using deep learning Paper • 2107.05230 • Published Jul 12, 2021
Almanac: Retrieval-Augmented Language Models for Clinical Medicine Paper • 2303.01229 • Published Mar 1, 2023 • 1
Early Recognition of Sepsis with Gaussian Process Temporal Convolutional Networks and Dynamic Time Warping Paper • 1902.01659 • Published Feb 5, 2019