HardTests: Synthesizing High-Quality Test Cases for LLM Coding Paper • 2505.24098 • Published May 30 • 44
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models Paper • 2504.13367 • Published Apr 17 • 25
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling Paper • 2402.10466 • Published Feb 16, 2024 • 19