A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
Tabular data: Is deep learning all you need?
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
citing papers explorer
-
STRABLE: Benchmarking Tabular Machine Learning with Strings
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
-
RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy
RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
-
Benchmarking Optimizers for MLPs in Tabular Deep Learning
Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.