pith. sign in

arxiv: 2403.07008 · v3 · pith:HT6QQ4TBnew · submitted 2024-03-09 · 💻 cs.LG · cs.AI· cs.CL· stat.ME

AutoEval Done Right: Using Synthetic Data for Model Evaluation

classification 💻 cs.LG cs.AIcs.CLstat.ME
keywords dataalgorithmsevaluationhuman-labeledpurposesamplesyntheticai-labeled
0
0 comments X
read the original abstract

The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased. These algorithms increase the effective human-labeled sample size by up to 50% on experiments with GPT-4.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.