hub

arXiv preprint arXiv:2407.00956 , year=

A Closer Look at Deep Learning on Tabular Data , author= · 2024 · arXiv 2407.00956

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

STRABLE: Benchmarking Tabular Machine Learning with Strings

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.

TabArena: A Living Benchmark for Machine Learning on Tabular Data

cs.LG · 2025-06-20 · conditional · novelty 8.0

TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.

Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models

cs.LG · 2026-04-14 · unverdicted · novelty 7.0

TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.

BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

BoostLLM trains sequential PEFT adapters in a boosting framework with tree path inputs to improve LLM performance on few-shot tabular classification, matching or exceeding XGBoost.

Prior-Aligned Data Cleaning for Tabular Foundation Models

cs.LG · 2026-04-28 · unverdicted · novelty 6.0

L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining

cs.CL · 2025-09-08 · unverdicted · novelty 6.0

MachineLearningLM uses continued pretraining on SCM-synthesized ML tasks with random-forest distillation to give LLMs robust many-shot in-context learning on tabular classification, reaching random-forest accuracy levels while preserving general chat performance.

xRFM: Accurate, scalable, and interpretable feature learning models for tabular data

cs.LG · 2025-08-12 · unverdicted · novelty 6.0

xRFM merges kernel-based feature learning with tree structures for scalable, interpretable tabular modeling and reports top performance on 100 regression and competitive results on 200 classification datasets versus 31 baselines including GBDTs and TabPFNv2.

Foundation Models for Credit Risk Prediction: A Game Changer?

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.

Evaluating Deep Learning Models for Multiclass Classification of LIGO Gravitational-Wave Glitches

gr-qc · 2026-04-09 · unverdicted · novelty 5.0

Benchmark finds some deep learning models match gradient-boosted trees on LIGO glitch classification with fewer parameters and partially consistent feature importance across architectures.

A Data-Centric Framework for Intraoperative Fluorescence Lifetime Imaging for Glioma Surgical Guidance

cs.CV · 2026-04-28 · unverdicted · novelty 4.0

A data-centric AI framework cleans FLIm labels via confident learning and achieves 96% accuracy classifying glioma infiltration into low, moderate, and high cellularity.

citing papers explorer

Showing 11 of 11 citing papers.

STRABLE: Benchmarking Tabular Machine Learning with Strings cs.LG · 2026-05-12 · unverdicted · none · ref 68
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
TabArena: A Living Benchmark for Machine Learning on Tabular Data cs.LG · 2025-06-20 · conditional · none · ref 37
TabArena launches a dynamic, updatable benchmarking system for tabular ML that shows boosted trees remain competitive, deep learning matches them under larger budgets with ensembling, foundation models excel on small data, and cross-model ensembles advance SOTA while flagging validation overfitting.
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image cs.LG · 2026-05-11 · unverdicted · none · ref 109
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models cs.LG · 2026-04-14 · unverdicted · none · ref 26
TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification cs.LG · 2026-05-07 · unverdicted · none · ref 50 · 2 links
BoostLLM trains sequential PEFT adapters in a boosting framework with tree path inputs to improve LLM performance on few-shot tabular classification, matching or exceeding XGBoost.
Prior-Aligned Data Cleaning for Tabular Foundation Models cs.LG · 2026-04-28 · unverdicted · none · ref 32
L2C2 is a deep RL framework that learns to clean tabular data by aligning it to the synthetic prior of tabular foundation models, yielding higher accuracy on some benchmarks and cross-dataset policy transfer.
MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining cs.CL · 2025-09-08 · unverdicted · none · ref 32
MachineLearningLM uses continued pretraining on SCM-synthesized ML tasks with random-forest distillation to give LLMs robust many-shot in-context learning on tabular classification, reaching random-forest accuracy levels while preserving general chat performance.
xRFM: Accurate, scalable, and interpretable feature learning models for tabular data cs.LG · 2025-08-12 · unverdicted · none · ref 39
xRFM merges kernel-based feature learning with tree structures for scalable, interpretable tabular modeling and reports top performance on 100 regression and competitive results on 200 classification datasets versus 31 baselines including GBDTs and TabPFNv2.
Foundation Models for Credit Risk Prediction: A Game Changer? cs.LG · 2026-05-18 · unverdicted · none · ref 161
Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.
Evaluating Deep Learning Models for Multiclass Classification of LIGO Gravitational-Wave Glitches gr-qc · 2026-04-09 · unverdicted · none · ref 22
Benchmark finds some deep learning models match gradient-boosted trees on LIGO glitch classification with fewer parameters and partially consistent feature importance across architectures.
A Data-Centric Framework for Intraoperative Fluorescence Lifetime Imaging for Glioma Surgical Guidance cs.CV · 2026-04-28 · unverdicted · none · ref 25
A data-centric AI framework cleans FLIm labels via confident learning and achieves 96% accuracy classifying glioma infiltration into low, moderate, and high cellularity.

arXiv preprint arXiv:2407.00956 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer