DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering Paper • 2507.11527 • Published Jul 15 • 31 • 1
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper • 2408.14468 • Published Aug 26, 2024 • 38 • 3