pith. sign in

arxiv: 2605.24680 · v1 · pith:3OI5JPPTnew · submitted 2026-05-23 · 💻 cs.LG

Trajectory-Based Difficulty Scoring for Reliable Learning on Tabular Data

classification 💻 cs.LG
keywords datadifficultypredictiontabularerrorinstanceslearningregression
0
0 comments X
read the original abstract

Gradient-boosted trees achieve strong performance on tabular data, yet often leave a long tail of poorly predicted instances. We introduce a Trajectory-based Difficulty Score (TDS), an instance-level difficulty estimator for boosted ensembles derived from per-tree cumulative prediction trajectories. For each instance, we compute interpretable trajectory descriptors (e.g., variance, oscillation peaks, sign switches, and tail stability) and train a lightweight regression model to predict held-out loss. An empirical CDF calibrates the resulting signal into a score in $[0,1]$ that supports ranking hard cases. Across diverse tabular benchmarks and ensemble sizes, TDS exhibits strong rank correlation with error and outperforms established instance-hardness and uncertainty baselines on classification, while remaining competitive on regression. We then show how a single difficulty signal improves multiple data mining workflows: difficulty-driven active learning for label-efficient training, difficulty-thresholded selective prediction for improved risk-coverage trade-offs, and TDS-stratified (Mondrian) conformal prediction for more uniform conditional coverage. Finally, clustering high-TDS instances using SHAP attributions reveals coherent failure modes characterized by compact feature-value ranges, supporting error analysis and targeted data acquisition.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.