Large-scale study finds that counterfactual metrics on semi-simulated data do not select the same estimators as observable metrics on real data, and benchmark rankings fail to transfer.
predictive modeling: a theoretical analysis , author=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
The authors adapt established RCT validity principles from other fields into a standardized framework with 33 guidelines tailored to AI evaluation contexts.
citing papers explorer
No citing papers match the current filters.