Submitted by zhhli 13 Judging with Confidence: Calibrating Autoraters to Preference Distributions Google 2
Submitted by Srizzle 6 Performance Prediction for Large Systems via Text-to-Text Regression Google 273 2