pith. sign in

Is deep learning finally better than decision trees on tabular data?

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 1 method 1

citation-polarity summary

fields

cs.LG 7

years

2026 5 2025 2

representative citing papers

STRABLE: Benchmarking Tabular Machine Learning with Strings

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.

TabArena: A Living Benchmark for Machine Learning on Tabular Data

cs.LG · 2025-06-20 · conditional · novelty 8.0

TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.

Beyond IID: How General Are Tabular Foundation Models, Really?

cs.LG · 2026-06-29 · unverdicted · novelty 7.0

Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.

TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

TabPrep is a new feature engineering pipeline that targets three data patterns and improves performance of tree-based, neural, linear, and foundation models on tabular benchmarks, often more than model architecture changes.

citing papers explorer

Showing 7 of 7 citing papers.

  • STRABLE: Benchmarking Tabular Machine Learning with Strings cs.LG · 2026-05-12 · unverdicted · none · ref 69

    A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.

  • TabArena: A Living Benchmark for Machine Learning on Tabular Data cs.LG · 2025-06-20 · conditional · none · ref 50

    TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.

  • Beyond IID: How General Are Tabular Foundation Models, Really? cs.LG · 2026-06-29 · unverdicted · none · ref 94

    Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.

  • TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks cs.LG · 2026-06-01 · unverdicted · none · ref 38

    TabPrep is a new feature engineering pipeline that targets three data patterns and improves performance of tree-based, neural, linear, and foundation models on tabular benchmarks, often more than model architecture changes.

  • RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy cs.LG · 2026-05-03 · unverdicted · none · ref 61

    RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.

  • Benchmarking Optimizers for MLPs in Tabular Deep Learning cs.LG · 2026-04-16 · unverdicted · none · ref 13

    Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.

  • Multivariate Uncertainty Quantification with Tomographic Quantile Forests cs.LG · 2025-12-18 · unverdicted · none · ref 25

    Tomographic Quantile Forests estimate multivariate conditional distributions nonparametrically by training one model on directional quantiles and reconstructing via sliced Wasserstein minimization.