Using CORE-Bench as a case study, the paper shows that saturated benchmarks can still deliver insights on efficiency, reliability, model-scaffold differences, and human collaboration even after accuracy plateaus, and introduces improved benchmark versions plus a small randomized experiment demonstra
Journal of Business and Economic Statistics , volume =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.
Simulations recommend the Mancl-DeRouen correction with t-distribution for continuous outcomes and the Morel-Bokossa-Neerchal estimator for binary outcomes in ETI models for SW-CRTs, while long-term effect estimates remain unstable.
citing papers explorer
-
Life After Benchmark Saturation: A Case Study of CORE-Bench
Using CORE-Bench as a case study, the paper shows that saturated benchmarks can still deliver insights on efficiency, reliability, model-scaffold differences, and human collaboration even after accuracy plateaus, and introduces improved benchmark versions plus a small randomized experiment demonstra
-
Foundation Models for Credit Risk Prediction: A Game Changer?
Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.
-
Which Small-Sample Correction Should Be Used When Analyzing Stepped-Wedge Designs with Time-Varying Treatment Effects?
Simulations recommend the Mancl-DeRouen correction with t-distribution for continuous outcomes and the Morel-Bokossa-Neerchal estimator for binary outcomes in ETI models for SW-CRTs, while long-term effect estimates remain unstable.