InProceedings of the 41st Interna- tional Conference on Machine Learning, ICML’24

tinybenchmarks: evaluating llms with fewer examples · 2022 · arXiv 2209.11830

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks

cs.CL · 2026-02-05 · unverdicted · novelty 7.0

BenchMarker toolkit audits 12 MCQA benchmarks for contamination, shortcuts, and writing errors using LLM judges, finding widespread flaws that inflate or deflate accuracy and alter rankings.

citing papers explorer

Showing 1 of 1 citing paper.

BenchMarker: An Education-Inspired Toolkit for Highlighting Flaws in Multiple-Choice Benchmarks cs.CL · 2026-02-05 · unverdicted · none · ref 6
BenchMarker toolkit audits 12 MCQA benchmarks for contamination, shortcuts, and writing errors using LLM judges, finding widespread flaws that inflate or deflate accuracy and alter rankings.

InProceedings of the 41st Interna- tional Conference on Machine Learning, ICML’24

fields

years

verdicts

representative citing papers

citing papers explorer