Rothblum, Jonathan Shafer, and Amir Yehudayoff

Shafi Goldwasser, Guy N · 2021 · DOI 10.4230/lipics.itcs.2021.41

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild

cs.SE · 2026-05-22 · unverdicted · novelty 6.0

An empirical study of 57 ML evaluation harnesses shows 41.4% of operational issues occur in the specification stage, driven mainly by unimplemented features, documentation gaps, and missing input validation.

citing papers explorer

Showing 1 of 1 citing paper.

Towards Evaluation Engineering: An Empirical Study of ML Evaluation Harnesses in the Wild cs.SE · 2026-05-22 · unverdicted · none · ref 22
An empirical study of 57 ML evaluation harnesses shows 41.4% of operational issues occur in the specification stage, driven mainly by unimplemented features, documentation gaps, and missing input validation.

Rothblum, Jonathan Shafer, and Amir Yehudayoff

fields

years

verdicts

representative citing papers

citing papers explorer