AutoEval Done Right: Using Synthetic Data for Model Evaluation

Anastasios N. Angelopoulos; Jitendra Malik; Michael I. Jordan; Nir Yosef; Pierre Boyeau

arxiv: 2403.07008 · v3 · pith:HT6QQ4TBnew · submitted 2024-03-09 · 💻 cs.LG · cs.AI· cs.CL· stat.ME

AutoEval Done Right: Using Synthetic Data for Model Evaluation

Pierre Boyeau , Anastasios N. Angelopoulos , Nir Yosef , Jitendra Malik , Michael I. Jordan This is my paper

classification 💻 cs.LG cs.AIcs.CLstat.ME

keywords dataalgorithmsevaluationhuman-labeledpurposesamplesyntheticai-labeled

0 comments

read the original abstract

The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased. These algorithms increase the effective human-labeled sample size by up to 50% on experiments with GPT-4.

This paper has not been read by Pith yet.

AutoEval Done Right: Using Synthetic Data for Model Evaluation

discussion (0)