ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code Paper • 2506.02314 • Published Jun 2
Reliable and Efficient Amortized Model-Based Evaluation Collection Datasets and Models for the REEval project • 24 items • Updated Jun 17