MINCE shrinks IFEVAL by 54%, MMLU by 89%, and GSM8K by 70% via few-model Monte Carlo calibration while keeping maximum drift at or below 2.62 percentage points.
arXiv preprint arXiv:2502.10312 , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MINCE: Shrinking LLM Evaluation Datasets via Few-Model Monte Carlo Calibration
MINCE shrinks IFEVAL by 54%, MMLU by 89%, and GSM8K by 70% via few-model Monte Carlo calibration while keeping maximum drift at or below 2.62 percentage points.