CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark Paper • 2510.26160 • Published 4 days ago • 4
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published 6 days ago • 91
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published 17 days ago • 44
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published 12 days ago • 26
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Paper • 2510.15444 • Published 17 days ago • 144
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models Paper • 2510.16641 • Published 15 days ago • 4
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs Paper • 2510.07499 • Published 25 days ago • 48
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs Paper • 2509.24107 • Published Sep 28 • 76
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning Paper • 2510.06217 • Published 26 days ago • 62
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 27 days ago • 462
ACON: Optimizing Context Compression for Long-horizon LLM Agents Paper • 2510.00615 • Published Oct 1 • 31
ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning Paper • 2509.21070 • Published Sep 25 • 9
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published Sep 24 • 117
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines Paper • 2509.21320 • Published Sep 25 • 99
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23 • 22
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper • 2509.18154 • Published Sep 16 • 49
EpiCache: Episodic KV Cache Management for Long Conversational Question Answering Paper • 2509.17396 • Published Sep 22 • 19