A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
Olson, William La Cava, Patryk Orzechowski, Ryan J
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
ERBench is a new evaluation framework for symbolic regression that tests equation recovery robustness across dimensionality, sampling size, distribution, and domain using more groundtruth formulas than prior benchmarks.
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
Defines MCT as the weakest confidence an abductive explanation can guarantee and proposes an optimization-based algorithm to generate minimal explanations meeting a target confidence threshold for boosted tree classifiers.
A CBR system based on similarity of local explanations provides visualizations that fraud analysts at a Dutch bank found useful and easy to use for processing ML-generated fraud alerts.
citing papers explorer
-
STRABLE: Benchmarking Tabular Machine Learning with Strings
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
-
ERBench: A Benchmark and Testsuite for Equation Discovery Algorithms
ERBench is a new evaluation framework for symbolic regression that tests equation recovery robustness across dimensionality, sampling size, distribution, and domain using more groundtruth formulas than prior benchmarks.
-
Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
-
Beyond Explaining Predictions: Logic-Based Explanations for Confidence in Machine Learning Models
Defines MCT as the weakest confidence an abductive explanation can guarantee and proposes an optimization-based algorithm to generate minimal explanations meeting a target confidence threshold for boosted tree classifiers.
-
Case-Based Reasoning for Assisting Domain Experts in Processing Fraud Alerts of Black-Box Machine Learning Models
A CBR system based on similarity of local explanations provides visualizations that fraud analysts at a Dutch bank found useful and easy to use for processing ML-generated fraud alerts.