ScoringBench evaluates tabular models on 97 datasets with proper scoring rules and shows that model rankings shift substantially when switching from point-estimate to probabilistic metrics.
PMLB provides a standardised, version-controlled archive of datasets widely used in AutoML research
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
ScoringBench: A Benchmark for Evaluating Tabular Foundation Models with Proper Scoring Rules
ScoringBench evaluates tabular models on 97 datasets with proper scoring rules and shows that model rankings shift substantially when switching from point-estimate to probabilistic metrics.