LLMAuditor: A Framework for Auditing Large Language Models Using Human-in-the-Loop Paper • 2402.09346 • Published Feb 14, 2024
The Art of Refusal: A Survey of Abstention in Large Language Models Paper • 2407.18418 • Published Jul 25, 2024
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation Paper • 2505.17613 • Published May 23 • 8
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only Paper • 2410.11055 • Published Oct 14, 2024