ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code Paper • 2506.02314 • Published Jun 2
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models Paper • 2403.02715 • Published Mar 5, 2024 • 3
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Paper • 2306.11698 • Published Jun 20, 2023 • 12