TabArena: A Living Benchmark for Machine Learning on Tabular Data

arxiv: 2506.16791 · v4 · pith:OHUAB6VQnew · submitted 2025-06-20 · 💻 cs.LG · cs.AI

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Nick Erickson , Lennart Purucker , Andrej Tschalzev , David Holzm\"uller , Prateek Mutalik Desai , David Salinas , Frank Hutter This is my paper

Pith reviewed 2026-05-19 08:38 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords tabular datamachine learning benchmarkliving benchmarkgradient boosted treesdeep learningmodel ensemblingtabular foundation modelsvalidation overfitting

0 comments p. Extension

pith:OHUAB6VQ Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{OHUAB6VQ}

Prints a linked pith:OHUAB6VQ badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

TabArena launches as a continuously updated benchmark showing that ensembles across tabular models set new performance records while exposing overfitting in some deep learning approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes TabArena as the first living, continuously maintained benchmarking system for tabular machine learning to replace outdated static benchmarks. Through a large-scale initial study on curated datasets and models, it demonstrates that cross-model ensembles advance the state of the art, with deep learning methods reaching parity with gradient-boosted trees when given larger time budgets and ensembling. Foundation models perform particularly well on smaller datasets. The work also identifies that certain deep learning models appear overrepresented in top ensembles due to validation-set overfitting and provides a public leaderboard plus maintenance protocols for ongoing updates.

Core claim

TabArena is a living tabular benchmarking system that initializes a public leaderboard through manual curation of representative datasets and well-implemented models, followed by large-scale experiments. These experiments show that ensembles across different models improve upon single-model or same-model results to advance the state of the art in tabular machine learning. Gradient-boosted trees remain strong, deep learning catches up under larger time budgets when ensembled, and foundation models excel on smaller datasets, while some deep models overfit validation sets and become overrepresented in cross-model ensembles.

What carries the argument

TabArena, the continuously maintained tabular benchmarking system with curated datasets, models, validation protocols, ensembling procedures, and maintenance team that enables ongoing leaderboard updates.

If this is right

Validation method choice and ensembling of hyperparameter configurations are required to measure any model's full potential on tabular tasks.
Cross-model ensembles consistently outperform both individual models and within-model ensembles on the curated collection.
Deep learning models require additional safeguards against validation-set overfitting to participate reliably in cross-model ensembles.
Foundation models offer a practical advantage specifically on smaller tabular datasets.
The public leaderboard and maintenance protocols will allow new models and dataset fixes to be incorporated without resetting the benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Sustained maintenance of TabArena could establish a de facto standard reference for tracking long-term progress in tabular machine learning similar to established benchmarks in other domains.
Model developers should prioritize techniques that reduce validation overfitting so their contributions remain effective when combined with other models.
The observed performance patterns suggest that future work may benefit from exploring adaptive ensembles that weight tree-based, deep, and foundation models according to dataset size and characteristics.
If new data distributions emerge over time, the living nature of the benchmark will make it possible to detect and quantify shifts in which modeling approaches remain effective.

Load-bearing premise

The manually selected collection of datasets accurately represents the practical tabular problems that arise in real applications.

What would settle it

A follow-up study on an independent collection of real-world tabular datasets that produces different performance rankings, particularly reversing the observed benefits of cross-model ensembling or the relative standing of deep learning versus gradient-boosted trees, would falsify the generalizability of TabArena's initial findings.

Figures

Figures reproduced from arXiv: 2506.16791 by Andrej Tschalzev, David Holzm\"uller, David Salinas, Frank Hutter, Lennart Purucker, Nick Erickson, Prateek Mutalik Desai.

**Figure 1.** Figure 1: TabArena-v0.1 Leaderboard. We evaluate models under default parameters, tuning, and weighted ensembling [1] of hyperparameters. Since TabICL and TabPFNv2 are not applicable to all datasets, we evaluate them on subsets of the benchmark in [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Data Curation Results. The figure shows why and how many datasets we filter based on our criteria. We filter datasets that are duplicates, not from a tabular domain, not a real predictive task, tiny, have quality or license issues, and are not IID. modality, such as images, where it is unclear whether tabular machine learning is a reasonable alternative to domain-specific methods; (4) The dataset stems fro… view at source ↗

**Figure 3.** Figure 3: Characteristics of Datasets in TabArena. On the left, we show the number of datasets per task type, license, source of the dataset, and age group. On the right, we show the number of features (columns) and samples (rows), as well as the percentage of categorical features per dataset [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Leaderboard for TabPFNv2-compatible (left) and TabICL-compatible (right) datasets. For TabPFNv2, we obtain 33 datasets (≤ 10K training samples, ≤ 500 features). For TabICL, we obtain 36 classification datasets (≤ 100K, ≤ 500). Everything but the datasets is identical to [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: (Left) Pareto front of improvability and inference time. We report the median inference time per 1000 samples across all datasets. (Right) Improvability tuning trajectories. Time is shown as the tuning time with points from left to right marking ensembles of increasing numbers of random configurations (1, 2, 5, 10, 25, 50, 100, 150, 201). The trajectories are sampled 20 times from all trials and averaged. … view at source ↗

**Figure 6.** Figure 6: (Left) Model Efficiency. Median training times with cross-validation across TabPFNv2- and TabICL-compatible datasets (see [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: (Left) TabArena Ensemble. Simulated performance of an ensemble using all models in TabArena compared to the leaderboard from [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever. However, current benchmarks are static. Their design is not updated even if flaws are discovered, model versions are updated, or new models are released. To address this, we introduce TabArena, the first continuously maintained living tabular benchmarking system. To launch TabArena, we manually curate a representative collection of datasets and well-implemented models, conduct a large-scale benchmarking study to initialize a public leaderboard, and assemble a team of experienced maintainers. Our results highlight the influence of validation method and ensembling of hyperparameter configurations to benchmark models at their full potential. While gradient-boosted trees are still strong contenders on practical tabular datasets, we observe that deep learning methods have caught up under larger time budgets with ensembling. At the same time, foundation models excel on smaller datasets. Finally, we show that ensembles across models advance the state-of-the-art in tabular machine learning. We observe that some deep learning models are overrepresented in cross-model ensembles due to validation set overfitting, and we encourage model developers to address this issue. We launch TabArena with a public leaderboard, reproducible code, and maintenance protocols to create a living benchmark available at https://tabarena.ai.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TabArena sets up the first living benchmark for tabular ML with a public leaderboard and initial findings on ensembling and validation, but the dataset curation lacks quantitative checks that would strengthen the generalization of those findings.

read the letter

TabArena launches a living benchmark for tabular machine learning, with a public leaderboard at tabarena.ai, reproducible code, and a team of maintainers to keep it updated. That is the main new piece: moving away from static benchmarks that age quickly as models and practices change. The initial study compares trees, deep learning models, and foundation models across a curated set of datasets, and it reports that gradient boosted trees stay competitive while deep learning closes the gap with ensembling and larger time budgets. Foundation models look stronger on smaller datasets, and cross-model ensembles improve results overall, though some deep learning entries show up more often in those ensembles because they overfit the validation sets. The paper flags this overfitting issue and suggests model developers address it. The setup is practical and the observations on validation methods and ensembling are useful for anyone running tabular experiments. Reproducibility is handled well with the released code and protocols. The soft spot is the dataset collection. The authors call it representative after manual curation, but the paper does not show quantitative comparisons, such as distribution distances on size, dimensionality, imbalance, or domain coverage, against larger references like OpenML or Kaggle collections. Without that, the ensemble gains and the overfitting pattern could be tied to the specific datasets chosen rather than holding more broadly. This is a moderate concern for the central claims, not a fatal one, but it limits how far the results can be pushed. The work is aimed at researchers and practitioners who need current comparisons in tabular ML instead of outdated static tables. A reader focused on benchmark design or applied tabular methods would find concrete takeaways here. I would send it for peer review. The living benchmark idea is timely and the empirical points are worth referee scrutiny to tighten the dataset validation and confirm the scope of the findings.

Referee Report

2 major / 2 minor

Summary. The paper introduces TabArena as the first continuously maintained living benchmark for tabular machine learning. It describes the manual curation of a representative collection of datasets and well-implemented models, a large-scale benchmarking study that initializes a public leaderboard, and the assembly of a maintenance team. Key empirical observations include the continued competitiveness of gradient-boosted trees on practical datasets, the ability of deep learning methods to match or exceed them under larger time budgets with ensembling, the strength of foundation models on smaller datasets, and the finding that cross-model ensembles advance the state-of-the-art while some deep learning models appear overrepresented in such ensembles due to validation-set overfitting.

Significance. A successfully maintained living benchmark would address a genuine gap in tabular ML evaluation, where static benchmarks rapidly obsolesce. The reported findings on ensembling practices and validation overfitting could usefully guide model developers if the underlying dataset collection is shown to be representative; the provision of reproducible code and maintenance protocols is a concrete strength that supports ongoing community use.

major comments (2)

[Dataset curation section] Dataset curation section: the claim that the manually curated collection is 'representative' of practical tabular problems is not supported by any quantitative comparison (e.g., Kolmogorov-Smirnov or Earth-mover distance on instance count, feature dimensionality, class imbalance, or domain coverage) against reference corpora such as OpenML or Kaggle. This assumption is load-bearing for the generalization of the ensemble SOTA advancement and the overfitting observations.
[Benchmarking study and results sections] Benchmarking study and results sections: the abstract and text reference the influence of validation method and ensembling of hyperparameter configurations, yet the manuscript provides insufficient detail on exact dataset selection criteria, validation protocols, and statistical significance testing for the performance claims (e.g., no reported p-values or confidence intervals for cross-model ensemble gains).

minor comments (2)

[Abstract and maintenance protocols] The abstract states that 'we assemble a team of experienced maintainers' but the main text does not elaborate on their specific roles or expertise, which would strengthen the claim of long-term sustainability.
[Leaderboard presentation] Leaderboard tables or figures should explicitly report the number of runs, random seeds, and any multiple-testing corrections to allow readers to assess the reliability of the reported rankings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript introducing TabArena. The comments highlight opportunities to strengthen the justification for our dataset collection and to improve transparency in the benchmarking methodology. We address each major comment below and will incorporate revisions to enhance the paper accordingly.

read point-by-point responses

Referee: [Dataset curation section] Dataset curation section: the claim that the manually curated collection is 'representative' of practical tabular problems is not supported by any quantitative comparison (e.g., Kolmogorov-Smirnov or Earth-mover distance on instance count, feature dimensionality, class imbalance, or domain coverage) against reference corpora such as OpenML or Kaggle. This assumption is load-bearing for the generalization of the ensemble SOTA advancement and the overfitting observations.

Authors: We agree that the current manuscript does not include quantitative statistical comparisons to support the representativeness claim. Our curation process relied on expert-driven selection to achieve diversity across dataset sizes, feature dimensionalities, domains, and imbalance levels, informed by practical tabular ML use cases. To address this point directly, the revised manuscript will add a dedicated analysis (likely in an appendix or new subsection) that compares key dataset statistics from our collection against reference corpora such as OpenML and Kaggle, using metrics including Kolmogorov-Smirnov tests and Earth Mover's Distance on instance counts, feature counts, and class imbalance ratios. This addition will provide quantitative grounding for the generalization of our findings on cross-model ensembles and validation overfitting. revision: yes
Referee: [Benchmarking study and results sections] Benchmarking study and results sections: the abstract and text reference the influence of validation method and ensembling of hyperparameter configurations, yet the manuscript provides insufficient detail on exact dataset selection criteria, validation protocols, and statistical significance testing for the performance claims (e.g., no reported p-values or confidence intervals for cross-model ensemble gains).

Authors: We concur that additional methodological detail is warranted for reproducibility and to support the performance claims. The revised manuscript will expand the relevant sections to explicitly state the dataset selection criteria (including any quantitative thresholds for size, quality, and diversity), provide precise descriptions of the validation protocols (such as the specific train/validation/test splitting strategies and hyperparameter ensembling procedures), and include statistical significance testing. In particular, we will report p-values and confidence intervals for the gains achieved by cross-model ensembles relative to individual models. These clarifications will be added without altering the core empirical results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results on external datasets

full rationale

The paper reports outcomes from large-scale empirical runs of models on a manually curated collection of tabular datasets. All central claims—ensemble gains advancing SOTA, overrepresentation of certain deep learning models due to validation-set overfitting, and relative strengths of gradient-boosted trees versus deep learning under different budgets—are direct observations from these external benchmark executions rather than quantities derived from the paper's own equations or self-referential definitions. No load-bearing step reduces by construction to fitted parameters, self-citations, or ansatzes introduced in prior work by the same authors. The manual curation and living-benchmark protocols are descriptive and do not enter the performance measurements as circular inputs. This is a standard empirical benchmarking study whose results remain falsifiable against independent dataset collections.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claims rest on the representativeness of manually chosen datasets and models plus the assumption that the observed performance patterns generalize beyond the initial study.

free parameters (2)

Dataset curation choices
Manually selected collection of representative datasets used to initialize the leaderboard.
Model implementation selections
Well-implemented models chosen for the large-scale benchmarking study.

axioms (1)

domain assumption The curated datasets represent practical tabular data problems.
Invoked to support conclusions about model performance and ensembling benefits.

invented entities (1)

TabArena no independent evidence
purpose: Continuously maintained living tabular benchmarking system with public leaderboard and maintenance protocols.
New system introduced to address static benchmark limitations.

pith-pipeline@v0.9.0 · 5783 in / 1389 out tokens · 43741 ms · 2026-05-19T08:38:16.631863+00:00 · methodology

discussion (0)

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection
cs.LG 2026-02 accept novelty 8.0

MacrOData supplies three large, curated benchmark suites totaling 2,446 datasets for tabular outlier detection, complete with standardized splits, metadata, and a public leaderboard.
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
cs.LG 2026-05 unverdicted novelty 7.0

MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
cs.LG 2026-05 unverdicted novelty 7.0

TFM-Retouche is an architecture-agnostic input-space residual adapter that improves tabular foundation model accuracy on 51 datasets by learning input corrections through the frozen backbone, with an identity guard to...
Agentic-imodels: Evolving agentic interpretability tools via autoresearch
cs.AI 2026-05 unverdicted novelty 7.0

Agentic-imodels evolves scikit-learn regressors via an autoresearch loop to jointly boost predictive performance and LLM-simulatability, improving downstream agentic data science tasks by up to 73% on the BLADE benchmark.
RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy
cs.LG 2026-05 unverdicted novelty 7.0

RamanBench unifies 74 datasets into the first large-scale reproducible benchmark for ML on Raman spectra, finding tabular foundation models outperform baselines but no method generalizes across datasets.
Selecting Feature Interactions for Generalized Additive Models by Distilling Foundation Models
cs.LG 2026-04 unverdicted novelty 7.0

TabDistill distills feature interactions from tabular foundation models via post-hoc attribution and inserts them into GAMs, yielding consistent predictive gains.
OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale
cs.LG 2026-04 unverdicted novelty 7.0

OmniTabBench shows no single model family dominates tabular tasks and maps performance advantages to specific dataset properties like size and skewness.
TS-Arena -- A Live Forecast Pre-Registration Platform
cs.LG 2025-12 conditional novelty 7.0

TS-Arena is a live pre-registration platform that evaluates time series forecasts on future data streams to eliminate information leakage.
TabPFN-3: Technical Report
cs.LG 2026-05 unverdicted novelty 6.0

TabPFN-3 delivers state-of-the-art tabular prediction performance on benchmarks up to 1M rows, is up to 20x faster than prior versions, and introduces test-time scaling that beats non-TabPFN models by hundreds of Elo points.
BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification
cs.LG 2026-05 unverdicted novelty 6.0

BoostLLM trains sequential PEFT adapters as weak learners in a residual process, using decision-tree paths as a second input view, to improve few-shot tabular classification over standard LLM fine-tuning and match or ...
BoostLLM: Boosting-inspired LLM Fine-tuning for Few-shot Tabular Classification
cs.LG 2026-05 unverdicted novelty 6.0

BoostLLM trains sequential PEFT adapters in a boosting framework with tree path inputs to improve LLM performance on few-shot tabular classification, matching or exceeding XGBoost.
TFM-Retouche: A Lightweight Input-Space Adapter for Tabular Foundation Models
cs.LG 2026-05 unverdicted novelty 6.0

TFM-Retouche is an input-space residual adapter that lifts TabICLv2 performance by 56 Elo points on 51 tabular datasets while remaining architecture-agnostic and computationally light.
Tabular foundation models for in-context prediction of molecular properties
cs.LG 2026-04 unverdicted novelty 6.0

Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
Benchmarking Optimizers for MLPs in Tabular Deep Learning
cs.LG 2026-04 unverdicted novelty 6.0

Muon optimizer outperforms AdamW across 17 tabular datasets when training MLPs under a shared protocol.
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
cs.LG 2025-11 unverdicted novelty 6.0

TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast productio...
xRFM: Accurate, scalable, and interpretable feature learning models for tabular data
cs.LG 2025-08 unverdicted novelty 6.0

xRFM merges kernel-based feature learning with tree structures for scalable, interpretable tabular modeling and reports top performance on 100 regression and competitive results on 200 classification datasets versus 3...
TabCF: Distributional Control Function Estimation with Tabular Foundation Models
stat.ML 2026-05 unverdicted novelty 5.0

TabCF is a tuning-light method using tabular foundation models for control function regression to estimate distributional causal effects such as interventional means and quantiles.
Heterogeneous Scientific Foundation Model Collaboration
cs.AI 2026-04 unverdicted novelty 5.0

Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms
cs.LG 2026-04 unverdicted novelty 4.0

TabPFN maintains high ROC-AUC and structured attention under controlled additions of irrelevant features, nonlinear correlations, and mislabeled targets in binary classification.

Reference graph

Works this paper leans on

141 extracted references · 141 canonical work pages · cited by 17 Pith papers · 2 internal anchors

[1]

Caruana, A

R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes. Ensemble selection from libraries of models. In R. Greiner, editor,Proceedings of the 21st International Conference on Machine Learning (ICML’04). Omnipress, 2004

work page 2004
[2]

Borisov, T

V . Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci. Deep neural networks and tabular data: A survey.IEEE Transactions on Neural Networks and Learning Systems, 2022

work page 2022
[3]

van Breugel and M

B. van Breugel and M. van der Schaar. Why tabular foundation models should be a research priority. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning (ICML’24), volume 251 ofProceedings of Machine Learning Research. PMLR, 2024

work page 2024
[4]

Herrmann, F

M. Herrmann, F. Lange, K. Eggensperger, G. Casalicchio, M. Wever, M. Feurer, D. Rügamer, E. Hüllermeier, A.-L. Boulesteix, and B. Bischl. Position: Why we must rethink empirical research in machine learning. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, 11 J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Con...

work page 2024
[5]

Hollmann, S

N. Hollmann, S. Müller, L. Purucker, A. Krishnakumar, M. Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

work page 2025
[6]

Representation learning for tabular data: A comprehensive survey.arXiv preprint arXiv:2504.16109, 2025

Jun-Peng Jiang, Si-Yang Liu, Hao-Run Cai, Qile Zhou, and Han-Jia Ye. Representation learning for tabular data: A comprehensive survey.arXiv preprint arXiv:2504.16109, 2025

work page arXiv 2025
[7]

Kohli, M

R. Kohli, M. Feurer, B. Bischl, K. Eggensperger, and F. Hutter. Towards quantifying the effect of datasets for benchmarking: A look at tabular machine learning. InData-centric Machine Learning Research Workshop at the International Conference on Learning Representations, 2024

work page 2024
[8]

A data-centric perspective on evaluating machine learning models for tabular data

Andrej Tschalzev, Sascha Marton, Stefan Lüdtke, Christian Bartelt, and Heiner Stuckenschmidt. A data-centric perspective on evaluating machine learning models for tabular data. InThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024

work page 2024
[9]

Tabm: Advancing tabular deep learning with parameter-efficient ensembling.arXiv preprint arXiv:2410.24210, 2024

Yury Gorishniy, Akim Kotelnikov, and Artem Babenko. Tabm: Advancing tabular deep learning with parameter-efficient ensembling.arXiv preprint arXiv:2410.24210, 2024

work page arXiv 2024
[10]

Tabred: Ana- lyzing pitfalls and filling the gaps in tabular deep learning benchmarks.arXiv preprint arXiv:2406.19380, 2024

Ivan Rubachev, Nikolay Kartashev, Yury Gorishniy, and Artem Babenko. Tabred: Ana- lyzing pitfalls and filling the gaps in tabular deep learning benchmarks.arXiv preprint arXiv:2406.19380, 2024

work page arXiv 2024
[11]

Unreflected use of tabular data repositories can undermine research quality

Andrej Tschalzev, Lennart Purucker, Stefan Lüdtke, Frank Hutter, Christian Bartelt, and Heiner Stuckenschmidt. Unreflected use of tabular data repositories can undermine research quality. InThe Future of Machine Learning Data Practices and Repositories at ICLR 2025, 2025

work page 2025
[12]

L. Breiman. Random forests. 45:5–32, 2001

work page 2001
[13]

Geurts, D

P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. 63(1):3–42, 2006

work page 2006
[14]

Chen and C

T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In B. Krishnapuram, M. Shah, A. Smola, C. Aggarwal, D. Shen, and R. Rastogi, editors,Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16), pages 785–794, 2016

work page 2016
[15]

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y . Liu. Lightgbm: A highly efficient gradient boosting decision tree. In I. Guyon, U. von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Proceedings of the 31st International Conference on Advances in Neural Information Processing Systems (NeurIPS’17), 2017

work page 2017
[16]

Prokhorenkova, G

L. Prokhorenkova, G. Gusev, A. V orobev, A. Dorogush, and A. Gulin. Catboost: Unbiased boosting with categorical features. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Proceedings of the 31st International Conference on Advances in Neural Information Processing Systems (NeurIPS’18), page 6639–6649, 2018

work page 2018
[17]

Accurate intelligible models with pairwise interactions

Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurate intelligible models with pairwise interactions. InProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 623–631, 2013

work page 2013
[18]

Interpretml: A unified framework for machine learning interpretability.arXiv preprint arXiv:1909.09223, 2019

Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana. Interpretml: A unified framework for machine learning interpretability.arXiv preprint arXiv:1909.09223, 2019

work page arXiv 1909
[19]

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola. Autogluon- tabular: Robust and accurate automl for structured data.arXiv:2003.06505 [stat.ML], 2020

work page internal anchor Pith review Pith/arXiv arXiv 2003
[20]

Better by default: Strong pre-tuned mlps and boosted trees on tabular data.Advances in Neural Information Processing Systems, 37:26577–26658, 2024

David Holzmüller, Léo Grinsztajn, and Ingo Steinwart. Better by default: Strong pre-tuned mlps and boosted trees on tabular data.Advances in Neural Information Processing Systems, 37:26577–26658, 2024

work page 2024
[21]

Modern neighborhood components analysis: A deep tabular baseline two decades later.arXiv preprint arXiv:2407.03257, 2024

Han-Jia Ye, Huai-Hong Yin, and De-Chuan Zhan. Modern neighborhood components analysis: A deep tabular baseline two decades later.arXiv preprint arXiv:2407.03257, 2024

work page arXiv 2024
[22]

Tabicl: A tabular foundation model for in-context learning on large data.arXiv preprint arXiv:2502.05564, 2025

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. Tabicl: A tabular foundation model for in-context learning on large data.arXiv preprint arXiv:2502.05564, 2025. 12

work page arXiv 2025
[23]

Tabdpt: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164, 2024

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C Cresswell, Keyvan Golestan, Guangwei Yu, Maksims V olkovs, and Anthony L Caterini. Tabdpt: Scaling tabular foundation models.arXiv preprint arXiv:2410.18164, 2024

work page arXiv 2024
[24]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. 12:2825–2830, 2011

work page 2011
[25]

2023 kaggle ai report, 2023

Bojan Tunguz, Dieter, Heads or Tails, Karnika Kapoor, Parul Pandey, Paul Mooney, Phil Culliton, Rob Mulla, Sanyam Bhutani, and Will Cukierski. 2023 kaggle ai report, 2023. URL https://kaggle.com/competitions/2023-kaggle-ai-report

work page 2023
[26]

Carte: Pretraining and transfer for tabular learning

Myung Jun Kim, Leo Grinsztajn, and Gael Varoquaux. Carte: Pretraining and transfer for tabular learning. InInternational Conference on Machine Learning, pages 23843–23866. PMLR, 2024

work page 2024
[27]

Neural network ensembles, cross valida- tion, and active learning

Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross valida- tion, and active learning. In G. Tesauro, D. Touretzky, and T. Leen, edi- tors,Advances in Neural Information Processing Systems, volume 7. MIT Press,

work page
[28]

URL https://proceedings.neurips.cc/paper_files/paper/1994/file/ b8c37e33defde51cf91e1e03e51657da-Paper.pdf

work page 1994
[29]

Bischl, G

B. Bischl, G. Casalicchio, M. Feurer, F. Hutter, M. Lang, R. Mantovani, J. N. van Rijn, and J. Vanschoren. Openml benchmarking suites and the openml100.arXiv:1708.03731v1 [stat.ML], 2019

work page arXiv 2019
[30]

Bischl, G

B. Bischl, G. Casalicchio, M. Feurer, F. Hutter, M. Lang, R. Mantovani, J. van Rijn, and J. Vanschoren. OpenML benchmarking suites. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021

work page 2021
[31]

Gorishniy, I

Y . Gorishniy, I. Rubachev, V . Khrulkov, and A. Babenko. Revisiting deep learning models for tabular data. In M. Ranzato, A. Beygelzimer, K. Nguyen, P. Liang, J. Vaughan, and Y . Dauphin, editors,Proceedings of the 34th International Conference on Advances in Neural Information Processing Systems (NeurIPS’21), pages 18932–18943, 2021

work page 2021
[32]

Tabular data: Deep learning is not all you need

Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, 2022

work page 2022
[33]

Grinsztajn, E

L. Grinsztajn, E. Oyallon, and G. Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Proceedings of the 35th International Conference on Advances in Neural Information Processing Systems (NeurIPS’22), 2022

work page 2022
[34]

McElfresh, S

D. McElfresh, S. Khandagale, J. Valverde, V . Prasad C, G. Ramakrishnan, M. Goldblum, and C. White. When do neural nets outperform boosted trees on tabular data? In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Proceedings of the 36th International Conference on Advances in Neural Information Processing Systems (NeurIPS’23),...

work page 2023
[35]

Fischer, L

S. Fischer, L. Harutyunyan, M. Feurer, and B. Bischl. Openml-ctr23 – a curated tabular regression benchmarking suite. In A. Faust, C. White, F. Hutter, R. Garnett, and J. Gardner, editors,Second International Conference on Automated Machine Learning - Workshop Track, 2023

work page 2023
[36]

Gijsbers, M

P. Gijsbers, M. Bueno, S. Coors, E. LeDell, S. Poirier, J. Thomas, B. Bischl, and J. Vanschoren. Amlb: an automl benchmark. 25(101):1–65, 2024

work page 2024
[37]

A closer look at deep learning on tabular data.arXiv preprint arXiv:2407.00956, 2024

Han-Jia Ye, Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, and De-Chuan Zhan. A closer look at deep learning on tabular data.arXiv preprint arXiv:2407.00956, 2024

work page arXiv 2024
[38]

Tabrepo: A large scale repository of tabular model evalua- tions and its automl applications

David Salinas and Nick Erickson. Tabrepo: A large scale repository of tabular model evalua- tions and its automl applications. InAutoML Conference 2024 (ABCD Track), 2024

work page 2024
[39]

The proposed uscf rating system, its development, theory, and applications

Arpad E Elo. The proposed uscf rating system, its development, theory, and applications. Chess life, 22(8):242–247, 1967

work page 1967
[40]

Chatbot 13 arena: An open platform for evaluating llms by human preference

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael Jordan, Joseph E Gonzalez, et al. Chatbot 13 arena: An open platform for evaluating llms by human preference. InForty-first International Conference on Machine Learning, 2024

work page 2024
[41]

Maier, F

J. Maier, F. Möller, and L. Purucker. Hardware aware ensemble selection for balancing predictive accuracy and cost. In M. Lindauer, K. Eggensperger, R. Garnett, J. Vanschoren, and J. Gardner, editors,Third International Conference on Automated Machine Learning - Workshop Track, 2024

work page 2024
[42]

Nagler, L

T. Nagler, L. Schneider, B. Bischl, and M. Feurer. Reshuffling resampling splits can improve generalization of hyperparameter optimization. InProceedings of the 37th International Conference on Advances in Neural Information Processing Systems (NeurIPS’24), 2024

work page 2024
[43]

Overtuning in hyperparameter opti- mization

Lennart Schneider, Bernd Bischl, and Matthias Feurer. Overtuning in hyperparameter opti- mization. InInternational Conference on Automated Machine Learning, 2025

work page 2025
[44]

Purucker, L

L. Purucker, L. Schneider, M. Anastacio, J. Beel, B. Bischl, and H. Hoos. Q(d)o-es: Population- based quality (diversity) optimisation for post hoc ensemble selection in automl. In A. Faust, C. White, F. Hutter, R. Garnett, and J. Gardner, editors,Proceedings of the Second International Conference on Automated Machine Learning. Proceedings of Machine Lear...

work page 2023
[45]

Purucker and J

L. Purucker and J. Beel. Cma-es for post hoc ensembling in automl: A great success and salvageable failure. In A. Faust, C. White, F. Hutter, R. Garnett, and J. Gardner, editors, Proceedings of the Second International Conference on Automated Machine Learning. Pro- ceedings of Machine Learning Research, 2023

work page 2023
[46]

Pytorch tabular: A framework for deep learning with tabular data.arXiv preprint arXiv:2104.13638, 2021

Manu Joseph. Pytorch tabular: A framework for deep learning with tabular data.arXiv preprint arXiv:2104.13638, 2021

work page arXiv 2021
[47]

Olson, W

R. Olson, W. La Cava, P. Orzechowski, R. Urbanowicz, and J. Moore. PMLB: a large benchmark suite for machine learning evaluation and comparison.BioData mining, 10:1–13, 2017

work page 2017
[48]

Pmlb v1.0: an open-source dataset collection for benchmarking machine learning methods.Bioinformatics, 38(3):878–880, 2022

Joseph D Romano, Trang T Le, William La Cava, John T Gregg, Daniel J Goldberg, Praneel Chakraborty, Natasha L Ray, Daniel Himmelstein, Weixuan Fu, and Jason H Moore. Pmlb v1.0: an open-source dataset collection for benchmarking machine learning methods.Bioinformatics, 38(3):878–880, 2022

work page 2022
[49]

Pmlbmini: A tabular classification bench- mark suite for data-scarce applications

Ricardo Knauer, Marvin Grimm, and Erik Rodner. Pmlbmini: A tabular classification bench- mark suite for data-scarce applications. InAutoML Conference 2024 (ABCD Track), 2024

work page 2024
[50]

Is deep learning finally better than decision trees on tabular data?arXiv preprint arXiv:2402.03970, 2024

Guri Zabërgja, Arlind Kadra, Christian Frey, and Josif Grabocka. Is deep learning finally better than decision trees on tabular data?arXiv preprint arXiv:2402.03970, 2024

work page arXiv 2024
[51]

Talent: A tabular analytics and learning toolbox.arXiv preprint arXiv:2407.04057, 2024

Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, and Han-Jia Ye. Talent: A tabular analytics and learning toolbox.arXiv preprint arXiv:2407.04057, 2024

work page arXiv 2024
[52]

Tabr: Tabular deep learning meets nearest neighbors

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, and Artem Babenko. Tabr: Tabular deep learning meets nearest neighbors. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[53]

Rewardbench: Evaluating reward models for language modeling

Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, et al. Rewardbench: Evaluating reward models for language modeling.arXiv preprint arXiv:2403.13787, 2024

work page arXiv 2024
[54]

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Ben Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, et al. Livebench: A challenging, contamination-free llm benchmark.arXiv preprint arXiv:2406.19314, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[55]

A framework for few-shot language model evaluation, September 2021

Leo Gao, Jonathan Tow, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Kyle McDonell, Niklas Muennighoff, Jason Phang, Laria Reynolds, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. A framework for few-shot language model evaluation, September 2021. URLhttps://doi.org/10.5281/zenodo.5371628

work page doi:10.5281/zenodo.5371628 2021
[56]

Open llm leaderboard v2

Clémentine Fourrier, Nathan Habib, Alina Lozovskaya, Konrad Szafer, and Thomas Wolf. Open llm leaderboard v2. https://huggingface.co/spaces/open-llm-leaderboard/ open_llm_leaderboard, 2024

work page 2024
[57]

Gift-eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393, 2024

Taha Aksu, Gerald Woo, Juncheng Liu, Xu Liu, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Gift-eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393, 2024. 14

work page arXiv 2024
[58]

Bouthillier, P

X. Bouthillier, P. Delaunay, M. Bronzi, A. Trofimov, B. Nichyporuk, J. Szeto, N. Mohammadi Sepahvand, E. Raff, K. Madan, V . V oleti, S. Ebrahimi Kahou, V . Michalski, T. Arbel, C. Pal, G. Varoquaux, and P. Vincent. Accounting for variance in machine learning benchmarks. In A. Smola, A. Dimakis, and I. Stoica, editors,Proceedings of Machine Learning and S...

work page 2021
[59]

J. Demšar. Statistical comparisons of classifiers over multiple data sets. 7:1–30, 2006

work page 2006
[60]

S. Herbold. Autorank: A python package for automated ranking of classifiers.Journal of Open Source Software, 5(48):2173–2173, 2020

work page 2020
[61]

A closer look at tabpfn v2: Strength, limitation, and extension.arXiv preprint arXiv:2502.17361, 2025

Han-Jia Ye, Si-Yang Liu, and Wei-Lun Chao. A closer look at tabpfn v2: Strength, limitation, and extension.arXiv preprint arXiv:2502.17361, 2025

work page arXiv 2025
[62]

Feurer, J

M. Feurer, J. van Rijn, A. Kadra, P. Gijsbers, N. Mallik, S. Ravi, A. Müller, J. Vanschoren, and F. Hutter. OpenML-Python: an extensible Python API for OpenML. 22(100):1–5, 2021

work page 2021
[63]

Openml: Insights from 10 years and more than a thousand papers.Patterns, 2025

Bernd Bischl, Giuseppe Casalicchio, Taniya Das, Matthias Feurer, Sebastian Fischer, Pieter Gijsbers, Subhaditya Mukherjee, Andreas C Müller, László Németh, Luis Oala, et al. Openml: Insights from 10 years and more than a thousand papers.Patterns, 2025

work page 2025
[64]

Turl: Table understanding through representation learning.ACM SIGMOD Record, 51(1):33–40, 2022

Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. Turl: Table understanding through representation learning.ACM SIGMOD Record, 51(1):33–40, 2022

work page 2022
[65]

Gittables: A large-scale corpus of relational tables.Proceedings of the ACM on Management of Data, 1(1):1–17, 2023

Madelon Hulsebos, Çagatay Demiralp, and Paul Groth. Gittables: A large-scale corpus of relational tables.Proceedings of the ACM on Management of Data, 1(1):1–17, 2023

work page 2023
[66]

Knowledge discovery on rfm model using bernoulli sequence.Expert Systems with applications, 36(3):5866–5871, 2009

I-Cheng Yeh, King-Jang Yang, and Tao-Ming Ting. Knowledge discovery on rfm model using bernoulli sequence.Expert Systems with applications, 36(3):5866–5871, 2009

work page 2009
[67]

Using the adap learning algorithm to forecast the onset of diabetes mellitus

Jack W Smith, James E Everhart, William C Dickson, William C Knowler, and Robert Scott Johannes. Using the adap learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the annual symposium on computer application in medical care, page 261, 1988

work page 1988
[68]

Annealing

Unknown. Annealing. https://doi.org/10.24432/C5RW2F, 1990. UCI Machine Learning Repository

work page doi:10.24432/c5rw2f 1990
[69]

A similarity- based qsar model for predicting acute toxicity towards the fathead minnow (pimephales promelas).SAR and QSAR in Environmental Research, 26(3):217–243, 2015

Matteo Cassotti, Davide Ballabio, Roberto Todeschini, and Viviana Consonni. A similarity- based qsar model for predicting acute toxicity towards the fathead minnow (pimephales promelas).SAR and QSAR in Environmental Research, 26(3):217–243, 2015

work page 2015
[70]

H. Hofmann. Statlog (german credit data) [dataset]. https://doi.org/10.24432/C5NC77,

work page doi:10.24432/c5nc77
[72]

Review and analysis of risk factor of maternal health in remote area using the internet of things (iot)

Marzia Ahmed, Mohammod Abul Kashem, Mostafijur Rahman, and Sabira Khatun. Review and analysis of risk factor of maternal health in remote area using the internet of things (iot). InInECCE2019: Proceedings of the 5th International Conference on Electrical, Control & Computer Engineering, Kuantan, Pahang, Malaysia, 29th July 2019, pages 357–365. Springer, 2020

work page 2019
[73]

Modeling of strength of high-performance concrete using artificial neural networks

I-C Yeh. Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research, 28(12):1797–1808, 1998

work page 1998
[74]

Quantitative structure–activity relationship models for ready biodegradability of chemicals

Kamel Mansouri, Tine Ringsted, Davide Ballabio, Roberto Todeschini, and Viviana Consonni. Quantitative structure–activity relationship models for ready biodegradability of chemicals. Journal of chemical information and modeling, 53(4):867–878, 2013

work page 2013
[75]

Healthcare insurance expenses

Kaggle User Arunjangir245. Healthcare insurance expenses. https://www.kaggle.com/ datasets/arunjangir245/healthcare-insurance-expenses/, 2023. Kaggle dataset

work page 2023
[76]

Phishing detection based associative classification data mining.Expert Systems with Applications, 41(13):5948–5959, 2014

Neda Abdelhamid, Aladdin Ayesh, and Fadi Thabtah. Phishing detection based associative classification data mining.Expert Systems with Applications, 41(13):5948–5959, 2014

work page 2014
[77]

Fitness club dataset for ml classification

Kaggle User Ddosad. Fitness club dataset for ml classification. https://www.kaggle.com/ datasets/ddosad/datacamps-data-science-associate-certification , 2023. Kaggle dataset

work page 2023
[78]

Airfoil self-noise and prediction

Thomas F Brooks, D Stuart Pope, and Michael A Marcolini. Airfoil self-noise and prediction. Technical report, 1989. 15

work page 1989
[79]

Another dataset on used fiat 500 (1538 rows).https://www.kaggle

Kaggle User Paolocons. Another dataset on used fiat 500 (1538 rows).https://www.kaggle. com/datasets/paolocons/another-fiat-500-dataset-1538-rows , 2020. Kaggle dataset

work page 2020
[80]

Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data.GigaScience, 9(11):giaa128, 2020

Sergey E Golovenkin, Jonathan Bac, Alexander Chervov, Evgeny M Mirkes, Yuliya V Orlova, Emmanuel Barillot, Alexander N Gorban, and Andrei Zinovyev. Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data.GigaScience, 9(11):giaa128, 2020

work page 2020
[81]

Is this a good customer? https://www.kaggle.com/datasets/ podsyp/is-this-a-good-customer, 2020

Kaggle User Podsyp. Is this a good customer? https://www.kaggle.com/datasets/ podsyp/is-this-a-good-customer, 2020. Kaggle dataset

work page 2020

Showing first 80 references.

[1] [1]

Caruana, A

R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes. Ensemble selection from libraries of models. In R. Greiner, editor,Proceedings of the 21st International Conference on Machine Learning (ICML’04). Omnipress, 2004

work page 2004

[2] [2]

Borisov, T

V . Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci. Deep neural networks and tabular data: A survey.IEEE Transactions on Neural Networks and Learning Systems, 2022

work page 2022

[3] [3]

van Breugel and M

B. van Breugel and M. van der Schaar. Why tabular foundation models should be a research priority. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning (ICML’24), volume 251 ofProceedings of Machine Learning Research. PMLR, 2024

work page 2024

[4] [4]

Herrmann, F

M. Herrmann, F. Lange, K. Eggensperger, G. Casalicchio, M. Wever, M. Feurer, D. Rügamer, E. Hüllermeier, A.-L. Boulesteix, and B. Bischl. Position: Why we must rethink empirical research in machine learning. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, 11 J. Scarlett, and F. Berkenkamp, editors,Proceedings of the 41st International Con...

work page 2024

[5] [5]

Hollmann, S

N. Hollmann, S. Müller, L. Purucker, A. Krishnakumar, M. Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

work page 2025

[6] [6]

Representation learning for tabular data: A comprehensive survey.arXiv preprint arXiv:2504.16109, 2025

Jun-Peng Jiang, Si-Yang Liu, Hao-Run Cai, Qile Zhou, and Han-Jia Ye. Representation learning for tabular data: A comprehensive survey.arXiv preprint arXiv:2504.16109, 2025

work page arXiv 2025

[7] [7]

Kohli, M

R. Kohli, M. Feurer, B. Bischl, K. Eggensperger, and F. Hutter. Towards quantifying the effect of datasets for benchmarking: A look at tabular machine learning. InData-centric Machine Learning Research Workshop at the International Conference on Learning Representations, 2024

work page 2024

[8] [8]

A data-centric perspective on evaluating machine learning models for tabular data

Andrej Tschalzev, Sascha Marton, Stefan Lüdtke, Christian Bartelt, and Heiner Stuckenschmidt. A data-centric perspective on evaluating machine learning models for tabular data. InThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024

work page 2024

[9] [9]

Tabm: Advancing tabular deep learning with parameter-efficient ensembling.arXiv preprint arXiv:2410.24210, 2024

Yury Gorishniy, Akim Kotelnikov, and Artem Babenko. Tabm: Advancing tabular deep learning with parameter-efficient ensembling.arXiv preprint arXiv:2410.24210, 2024

work page arXiv 2024

[10] [10]

Tabred: Ana- lyzing pitfalls and filling the gaps in tabular deep learning benchmarks.arXiv preprint arXiv:2406.19380, 2024

Ivan Rubachev, Nikolay Kartashev, Yury Gorishniy, and Artem Babenko. Tabred: Ana- lyzing pitfalls and filling the gaps in tabular deep learning benchmarks.arXiv preprint arXiv:2406.19380, 2024

work page arXiv 2024

[11] [11]

Unreflected use of tabular data repositories can undermine research quality

Andrej Tschalzev, Lennart Purucker, Stefan Lüdtke, Frank Hutter, Christian Bartelt, and Heiner Stuckenschmidt. Unreflected use of tabular data repositories can undermine research quality. InThe Future of Machine Learning Data Practices and Repositories at ICLR 2025, 2025

work page 2025

[12] [12]

L. Breiman. Random forests. 45:5–32, 2001

work page 2001

[13] [13]

Geurts, D

P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. 63(1):3–42, 2006

work page 2006

[14] [14]

Chen and C

T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In B. Krishnapuram, M. Shah, A. Smola, C. Aggarwal, D. Shen, and R. Rastogi, editors,Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16), pages 785–794, 2016

work page 2016

[15] [15]

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y . Liu. Lightgbm: A highly efficient gradient boosting decision tree. In I. Guyon, U. von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Proceedings of the 31st International Conference on Advances in Neural Information Processing Systems (NeurIPS’17), 2017

work page 2017

[16] [16]

Prokhorenkova, G

L. Prokhorenkova, G. Gusev, A. V orobev, A. Dorogush, and A. Gulin. Catboost: Unbiased boosting with categorical features. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Proceedings of the 31st International Conference on Advances in Neural Information Processing Systems (NeurIPS’18), page 6639–6649, 2018

work page 2018

[17] [17]

Accurate intelligible models with pairwise interactions

Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurate intelligible models with pairwise interactions. InProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 623–631, 2013

work page 2013

[18] [18]

Interpretml: A unified framework for machine learning interpretability.arXiv preprint arXiv:1909.09223, 2019

Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana. Interpretml: A unified framework for machine learning interpretability.arXiv preprint arXiv:1909.09223, 2019

work page arXiv 1909

[19] [19]

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola. Autogluon- tabular: Robust and accurate automl for structured data.arXiv:2003.06505 [stat.ML], 2020

work page internal anchor Pith review Pith/arXiv arXiv 2003

[20] [20]

Better by default: Strong pre-tuned mlps and boosted trees on tabular data.Advances in Neural Information Processing Systems, 37:26577–26658, 2024

David Holzmüller, Léo Grinsztajn, and Ingo Steinwart. Better by default: Strong pre-tuned mlps and boosted trees on tabular data.Advances in Neural Information Processing Systems, 37:26577–26658, 2024

work page 2024

[21] [21]

Modern neighborhood components analysis: A deep tabular baseline two decades later.arXiv preprint arXiv:2407.03257, 2024

Han-Jia Ye, Huai-Hong Yin, and De-Chuan Zhan. Modern neighborhood components analysis: A deep tabular baseline two decades later.arXiv preprint arXiv:2407.03257, 2024

work page arXiv 2024

[22] [22]

Tabicl: A tabular foundation model for in-context learning on large data.arXiv preprint arXiv:2502.05564, 2025

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. Tabicl: A tabular foundation model for in-context learning on large data.arXiv preprint arXiv:2502.05564, 2025. 12

work page arXiv 2025

[23] [23]

Tabdpt: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164, 2024

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C Cresswell, Keyvan Golestan, Guangwei Yu, Maksims V olkovs, and Anthony L Caterini. Tabdpt: Scaling tabular foundation models.arXiv preprint arXiv:2410.18164, 2024

work page arXiv 2024

[24] [24]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. 12:2825–2830, 2011

work page 2011

[25] [25]

2023 kaggle ai report, 2023

Bojan Tunguz, Dieter, Heads or Tails, Karnika Kapoor, Parul Pandey, Paul Mooney, Phil Culliton, Rob Mulla, Sanyam Bhutani, and Will Cukierski. 2023 kaggle ai report, 2023. URL https://kaggle.com/competitions/2023-kaggle-ai-report

work page 2023

[26] [26]

Carte: Pretraining and transfer for tabular learning

Myung Jun Kim, Leo Grinsztajn, and Gael Varoquaux. Carte: Pretraining and transfer for tabular learning. InInternational Conference on Machine Learning, pages 23843–23866. PMLR, 2024

work page 2024

[27] [27]

Neural network ensembles, cross valida- tion, and active learning

Anders Krogh and Jesper Vedelsby. Neural network ensembles, cross valida- tion, and active learning. In G. Tesauro, D. Touretzky, and T. Leen, edi- tors,Advances in Neural Information Processing Systems, volume 7. MIT Press,

work page

[28] [28]

URL https://proceedings.neurips.cc/paper_files/paper/1994/file/ b8c37e33defde51cf91e1e03e51657da-Paper.pdf

work page 1994

[29] [29]

Bischl, G

B. Bischl, G. Casalicchio, M. Feurer, F. Hutter, M. Lang, R. Mantovani, J. N. van Rijn, and J. Vanschoren. Openml benchmarking suites and the openml100.arXiv:1708.03731v1 [stat.ML], 2019

work page arXiv 2019

[30] [30]

Bischl, G

B. Bischl, G. Casalicchio, M. Feurer, F. Hutter, M. Lang, R. Mantovani, J. van Rijn, and J. Vanschoren. OpenML benchmarking suites. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021

work page 2021

[31] [31]

Gorishniy, I

Y . Gorishniy, I. Rubachev, V . Khrulkov, and A. Babenko. Revisiting deep learning models for tabular data. In M. Ranzato, A. Beygelzimer, K. Nguyen, P. Liang, J. Vaughan, and Y . Dauphin, editors,Proceedings of the 34th International Conference on Advances in Neural Information Processing Systems (NeurIPS’21), pages 18932–18943, 2021

work page 2021

[32] [32]

Tabular data: Deep learning is not all you need

Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need. Information Fusion, 81:84–90, 2022

work page 2022

[33] [33]

Grinsztajn, E

L. Grinsztajn, E. Oyallon, and G. Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data? In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Proceedings of the 35th International Conference on Advances in Neural Information Processing Systems (NeurIPS’22), 2022

work page 2022

[34] [34]

McElfresh, S

D. McElfresh, S. Khandagale, J. Valverde, V . Prasad C, G. Ramakrishnan, M. Goldblum, and C. White. When do neural nets outperform boosted trees on tabular data? In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Proceedings of the 36th International Conference on Advances in Neural Information Processing Systems (NeurIPS’23),...

work page 2023

[35] [35]

Fischer, L

S. Fischer, L. Harutyunyan, M. Feurer, and B. Bischl. Openml-ctr23 – a curated tabular regression benchmarking suite. In A. Faust, C. White, F. Hutter, R. Garnett, and J. Gardner, editors,Second International Conference on Automated Machine Learning - Workshop Track, 2023

work page 2023

[36] [36]

Gijsbers, M

P. Gijsbers, M. Bueno, S. Coors, E. LeDell, S. Poirier, J. Thomas, B. Bischl, and J. Vanschoren. Amlb: an automl benchmark. 25(101):1–65, 2024

work page 2024

[37] [37]

A closer look at deep learning on tabular data.arXiv preprint arXiv:2407.00956, 2024

Han-Jia Ye, Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, and De-Chuan Zhan. A closer look at deep learning on tabular data.arXiv preprint arXiv:2407.00956, 2024

work page arXiv 2024

[38] [38]

Tabrepo: A large scale repository of tabular model evalua- tions and its automl applications

David Salinas and Nick Erickson. Tabrepo: A large scale repository of tabular model evalua- tions and its automl applications. InAutoML Conference 2024 (ABCD Track), 2024

work page 2024

[39] [39]

The proposed uscf rating system, its development, theory, and applications

Arpad E Elo. The proposed uscf rating system, its development, theory, and applications. Chess life, 22(8):242–247, 1967

work page 1967

[40] [40]

Chatbot 13 arena: An open platform for evaluating llms by human preference

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael Jordan, Joseph E Gonzalez, et al. Chatbot 13 arena: An open platform for evaluating llms by human preference. InForty-first International Conference on Machine Learning, 2024

work page 2024

[41] [41]

Maier, F

J. Maier, F. Möller, and L. Purucker. Hardware aware ensemble selection for balancing predictive accuracy and cost. In M. Lindauer, K. Eggensperger, R. Garnett, J. Vanschoren, and J. Gardner, editors,Third International Conference on Automated Machine Learning - Workshop Track, 2024

work page 2024

[42] [42]

Nagler, L

T. Nagler, L. Schneider, B. Bischl, and M. Feurer. Reshuffling resampling splits can improve generalization of hyperparameter optimization. InProceedings of the 37th International Conference on Advances in Neural Information Processing Systems (NeurIPS’24), 2024

work page 2024

[43] [43]

Overtuning in hyperparameter opti- mization

Lennart Schneider, Bernd Bischl, and Matthias Feurer. Overtuning in hyperparameter opti- mization. InInternational Conference on Automated Machine Learning, 2025

work page 2025

[44] [44]

Purucker, L

L. Purucker, L. Schneider, M. Anastacio, J. Beel, B. Bischl, and H. Hoos. Q(d)o-es: Population- based quality (diversity) optimisation for post hoc ensemble selection in automl. In A. Faust, C. White, F. Hutter, R. Garnett, and J. Gardner, editors,Proceedings of the Second International Conference on Automated Machine Learning. Proceedings of Machine Lear...

work page 2023

[45] [45]

Purucker and J

L. Purucker and J. Beel. Cma-es for post hoc ensembling in automl: A great success and salvageable failure. In A. Faust, C. White, F. Hutter, R. Garnett, and J. Gardner, editors, Proceedings of the Second International Conference on Automated Machine Learning. Pro- ceedings of Machine Learning Research, 2023

work page 2023

[46] [46]

Pytorch tabular: A framework for deep learning with tabular data.arXiv preprint arXiv:2104.13638, 2021

Manu Joseph. Pytorch tabular: A framework for deep learning with tabular data.arXiv preprint arXiv:2104.13638, 2021

work page arXiv 2021

[47] [47]

Olson, W

R. Olson, W. La Cava, P. Orzechowski, R. Urbanowicz, and J. Moore. PMLB: a large benchmark suite for machine learning evaluation and comparison.BioData mining, 10:1–13, 2017

work page 2017

[48] [48]

Pmlb v1.0: an open-source dataset collection for benchmarking machine learning methods.Bioinformatics, 38(3):878–880, 2022

Joseph D Romano, Trang T Le, William La Cava, John T Gregg, Daniel J Goldberg, Praneel Chakraborty, Natasha L Ray, Daniel Himmelstein, Weixuan Fu, and Jason H Moore. Pmlb v1.0: an open-source dataset collection for benchmarking machine learning methods.Bioinformatics, 38(3):878–880, 2022

work page 2022

[49] [49]

Pmlbmini: A tabular classification bench- mark suite for data-scarce applications

Ricardo Knauer, Marvin Grimm, and Erik Rodner. Pmlbmini: A tabular classification bench- mark suite for data-scarce applications. InAutoML Conference 2024 (ABCD Track), 2024

work page 2024

[50] [50]

Is deep learning finally better than decision trees on tabular data?arXiv preprint arXiv:2402.03970, 2024

Guri Zabërgja, Arlind Kadra, Christian Frey, and Josif Grabocka. Is deep learning finally better than decision trees on tabular data?arXiv preprint arXiv:2402.03970, 2024

work page arXiv 2024

[51] [51]

Talent: A tabular analytics and learning toolbox.arXiv preprint arXiv:2407.04057, 2024

Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, and Han-Jia Ye. Talent: A tabular analytics and learning toolbox.arXiv preprint arXiv:2407.04057, 2024

work page arXiv 2024

[52] [52]

Tabr: Tabular deep learning meets nearest neighbors

Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, and Artem Babenko. Tabr: Tabular deep learning meets nearest neighbors. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[53] [53]

Rewardbench: Evaluating reward models for language modeling

Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, et al. Rewardbench: Evaluating reward models for language modeling.arXiv preprint arXiv:2403.13787, 2024

work page arXiv 2024

[54] [54]

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Ben Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, et al. Livebench: A challenging, contamination-free llm benchmark.arXiv preprint arXiv:2406.19314, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[55] [55]

A framework for few-shot language model evaluation, September 2021

Leo Gao, Jonathan Tow, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Kyle McDonell, Niklas Muennighoff, Jason Phang, Laria Reynolds, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. A framework for few-shot language model evaluation, September 2021. URLhttps://doi.org/10.5281/zenodo.5371628

work page doi:10.5281/zenodo.5371628 2021

[56] [56]

Open llm leaderboard v2

Clémentine Fourrier, Nathan Habib, Alina Lozovskaya, Konrad Szafer, and Thomas Wolf. Open llm leaderboard v2. https://huggingface.co/spaces/open-llm-leaderboard/ open_llm_leaderboard, 2024

work page 2024

[57] [57]

Gift-eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393, 2024

Taha Aksu, Gerald Woo, Juncheng Liu, Xu Liu, Chenghao Liu, Silvio Savarese, Caiming Xiong, and Doyen Sahoo. Gift-eval: A benchmark for general time series forecasting model evaluation.arXiv preprint arXiv:2410.10393, 2024. 14

work page arXiv 2024

[58] [58]

Bouthillier, P

X. Bouthillier, P. Delaunay, M. Bronzi, A. Trofimov, B. Nichyporuk, J. Szeto, N. Mohammadi Sepahvand, E. Raff, K. Madan, V . V oleti, S. Ebrahimi Kahou, V . Michalski, T. Arbel, C. Pal, G. Varoquaux, and P. Vincent. Accounting for variance in machine learning benchmarks. In A. Smola, A. Dimakis, and I. Stoica, editors,Proceedings of Machine Learning and S...

work page 2021

[59] [59]

J. Demšar. Statistical comparisons of classifiers over multiple data sets. 7:1–30, 2006

work page 2006

[60] [60]

S. Herbold. Autorank: A python package for automated ranking of classifiers.Journal of Open Source Software, 5(48):2173–2173, 2020

work page 2020

[61] [61]

A closer look at tabpfn v2: Strength, limitation, and extension.arXiv preprint arXiv:2502.17361, 2025

Han-Jia Ye, Si-Yang Liu, and Wei-Lun Chao. A closer look at tabpfn v2: Strength, limitation, and extension.arXiv preprint arXiv:2502.17361, 2025

work page arXiv 2025

[62] [62]

Feurer, J

M. Feurer, J. van Rijn, A. Kadra, P. Gijsbers, N. Mallik, S. Ravi, A. Müller, J. Vanschoren, and F. Hutter. OpenML-Python: an extensible Python API for OpenML. 22(100):1–5, 2021

work page 2021

[63] [63]

Openml: Insights from 10 years and more than a thousand papers.Patterns, 2025

Bernd Bischl, Giuseppe Casalicchio, Taniya Das, Matthias Feurer, Sebastian Fischer, Pieter Gijsbers, Subhaditya Mukherjee, Andreas C Müller, László Németh, Luis Oala, et al. Openml: Insights from 10 years and more than a thousand papers.Patterns, 2025

work page 2025

[64] [64]

Turl: Table understanding through representation learning.ACM SIGMOD Record, 51(1):33–40, 2022

Xiang Deng, Huan Sun, Alyssa Lees, You Wu, and Cong Yu. Turl: Table understanding through representation learning.ACM SIGMOD Record, 51(1):33–40, 2022

work page 2022

[65] [65]

Gittables: A large-scale corpus of relational tables.Proceedings of the ACM on Management of Data, 1(1):1–17, 2023

Madelon Hulsebos, Çagatay Demiralp, and Paul Groth. Gittables: A large-scale corpus of relational tables.Proceedings of the ACM on Management of Data, 1(1):1–17, 2023

work page 2023

[66] [66]

Knowledge discovery on rfm model using bernoulli sequence.Expert Systems with applications, 36(3):5866–5871, 2009

I-Cheng Yeh, King-Jang Yang, and Tao-Ming Ting. Knowledge discovery on rfm model using bernoulli sequence.Expert Systems with applications, 36(3):5866–5871, 2009

work page 2009

[67] [67]

Using the adap learning algorithm to forecast the onset of diabetes mellitus

Jack W Smith, James E Everhart, William C Dickson, William C Knowler, and Robert Scott Johannes. Using the adap learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the annual symposium on computer application in medical care, page 261, 1988

work page 1988

[68] [68]

Annealing

Unknown. Annealing. https://doi.org/10.24432/C5RW2F, 1990. UCI Machine Learning Repository

work page doi:10.24432/c5rw2f 1990

[69] [69]

A similarity- based qsar model for predicting acute toxicity towards the fathead minnow (pimephales promelas).SAR and QSAR in Environmental Research, 26(3):217–243, 2015

Matteo Cassotti, Davide Ballabio, Roberto Todeschini, and Viviana Consonni. A similarity- based qsar model for predicting acute toxicity towards the fathead minnow (pimephales promelas).SAR and QSAR in Environmental Research, 26(3):217–243, 2015

work page 2015

[70] [70]

H. Hofmann. Statlog (german credit data) [dataset]. https://doi.org/10.24432/C5NC77,

work page doi:10.24432/c5nc77

[71] [72]

Review and analysis of risk factor of maternal health in remote area using the internet of things (iot)

Marzia Ahmed, Mohammod Abul Kashem, Mostafijur Rahman, and Sabira Khatun. Review and analysis of risk factor of maternal health in remote area using the internet of things (iot). InInECCE2019: Proceedings of the 5th International Conference on Electrical, Control & Computer Engineering, Kuantan, Pahang, Malaysia, 29th July 2019, pages 357–365. Springer, 2020

work page 2019

[72] [73]

Modeling of strength of high-performance concrete using artificial neural networks

I-C Yeh. Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research, 28(12):1797–1808, 1998

work page 1998

[73] [74]

Quantitative structure–activity relationship models for ready biodegradability of chemicals

Kamel Mansouri, Tine Ringsted, Davide Ballabio, Roberto Todeschini, and Viviana Consonni. Quantitative structure–activity relationship models for ready biodegradability of chemicals. Journal of chemical information and modeling, 53(4):867–878, 2013

work page 2013

[74] [75]

Healthcare insurance expenses

Kaggle User Arunjangir245. Healthcare insurance expenses. https://www.kaggle.com/ datasets/arunjangir245/healthcare-insurance-expenses/, 2023. Kaggle dataset

work page 2023

[75] [76]

Phishing detection based associative classification data mining.Expert Systems with Applications, 41(13):5948–5959, 2014

Neda Abdelhamid, Aladdin Ayesh, and Fadi Thabtah. Phishing detection based associative classification data mining.Expert Systems with Applications, 41(13):5948–5959, 2014

work page 2014

[76] [77]

Fitness club dataset for ml classification

Kaggle User Ddosad. Fitness club dataset for ml classification. https://www.kaggle.com/ datasets/ddosad/datacamps-data-science-associate-certification , 2023. Kaggle dataset

work page 2023

[77] [78]

Airfoil self-noise and prediction

Thomas F Brooks, D Stuart Pope, and Michael A Marcolini. Airfoil self-noise and prediction. Technical report, 1989. 15

work page 1989

[78] [79]

Another dataset on used fiat 500 (1538 rows).https://www.kaggle

Kaggle User Paolocons. Another dataset on used fiat 500 (1538 rows).https://www.kaggle. com/datasets/paolocons/another-fiat-500-dataset-1538-rows , 2020. Kaggle dataset

work page 2020

[79] [80]

Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data.GigaScience, 9(11):giaa128, 2020

Sergey E Golovenkin, Jonathan Bac, Alexander Chervov, Evgeny M Mirkes, Yuliya V Orlova, Emmanuel Barillot, Alexander N Gorban, and Andrei Zinovyev. Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data.GigaScience, 9(11):giaa128, 2020

work page 2020

[80] [81]

Is this a good customer? https://www.kaggle.com/datasets/ podsyp/is-this-a-good-customer, 2020

Kaggle User Podsyp. Is this a good customer? https://www.kaggle.com/datasets/ podsyp/is-this-a-good-customer, 2020. Kaggle dataset

work page 2020