pith. machine review for the scientific record. sign in

arxiv: 2605.13986 · v1 · submitted 2026-05-13 · 💻 cs.LG · stat.ML

Recognition: 2 theorem links

· Lean Theorem

TabPFN-3: Technical Report

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:01 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords tabular datafoundation modelssynthetic pretrainingTabArena benchmarktest-time scalinggradient boostingrelational datatime series
0
0 comments X

The pith

TabPFN-3 outperforms all tuned and ensembled models on the TabArena tabular benchmark with a single forward pass.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

TabPFN-3 advances tabular foundation models by pretraining exclusively on synthetic data to handle datasets up to one million rows. It delivers superior prediction accuracy compared to gradient-boosted trees and other baselines while cutting training and inference times substantially. The model also introduces test-time compute scaling, allowing further performance gains through additional computation at inference. These improvements extend to time series, relational, and tabular-text data, positioning TabPFN-3 as a versatile tool for high-value prediction problems in science and industry.

Core claim

TabPFN-3 achieves state-of-the-art performance on tabular prediction tasks by scaling a transformer-based foundation model pretrained on synthetic data. On the TabArena benchmark, a single forward pass surpasses all other models including tuned and ensembled baselines, while dominating the speed-performance trade-off. The TabPFN-3-Plus variant, leveraging test-time compute, further improves results by over 200 Elo points overall and 420 on large subsets, outperforming AutoGluon while being ten times faster. The approach extends to new domains with new state-of-the-art results on relational benchmarks and tabular-text tasks, all while being up to twenty times faster than its predecessor and s

What carries the argument

Synthetic pretraining combined with test-time compute scaling in a transformer architecture for tabular data.

If this is right

  • Beats 8-hour-tuned gradient-boosted-tree baselines on datasets up to 1M rows and 200 features.
  • Ranks first on datasets with many classes.
  • Achieves new SOTA on RelBenchV1 for relational data.
  • Provides SOTA on TabSTAR for tabular-text data via TabPFN-3-Plus.
  • Enables up to 120x faster SHAP-value computation and ranks 2nd on fev-bench via TabPFN-TS-3.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The reliance on synthetic data may allow the model to avoid privacy issues associated with real data training.
  • Test-time scaling opens the door to further performance improvements by allocating more compute during prediction without retraining.
  • The speed gains could make foundation models practical for real-time tabular applications where traditional methods were too slow.
  • Integration improvements suggest easier adoption in existing pipelines for time-series and interpretability tasks.

Load-bearing premise

The synthetic data used for pretraining sufficiently represents the distribution of real-world tabular datasets to enable strong generalization.

What would settle it

Evaluating TabPFN-3 on a large collection of previously unseen real-world tabular datasets collected after the model's release to check if the performance margins hold.

Figures

Figures reproduced from arXiv: 2605.13986 by Adrian Hayler, Alan Arazi, Anurag Garg, Benjamin J\"ager, Bernhard Sch\"olkopf, Brendan Roof, Clara Cornu, David Salinas, Diana Kriuchkova, Dominik Safaric, Eliott Kalfon, Felix Birkel, Frank Hutter, Georg Grab, Jake Robertson, Jan Hendrik Metzen, Jerry Chen, Julien Siems, Klemens Fl\"oge, Kursat Kaya, Lennart Purucker, L\'eo Grinsztajn, Lilly Charlotte Wehrhahn, Lydia Sidhoum, Madelon Hulsebos, Magnus B\"uhler, Marie Salmon, Mihir Manium, Nick Erickson, Noah Hollmann, Oscar Key, Philipp Jund, Philipp Singer, Samuel M\"uller, Sauraj Gambhir, Shi Bin (Liam) Hoo, Simon Bing, Simone Alessi, Siyuan Guo, Vladyslav Moroshan, Yann LeCun.

Figure 1
Figure 1. Figure 1: Performance on the TabArena benchmark [1], largest data subset (10k-100k samples). TabPFN-3 outperforms any other model in a forward pass. TabPFN-3-Plus (Thinking) is dramatically better yet, outperforming AutoGluon 1.5 extreme [2], a complex ensemble of models tuned for 4 hours, while being 10x faster. 1 arXiv:2605.13986v1 [cs.LG] 13 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: TabPFN-3 dominates the Pareto frontier on the largest datasets in TabArena (10k–100k rows). N1, N2, and N4 are model versions with 1, 2, and 4 estimators. Improvability measures how much worse a model is than the best per-dataset model. See Appendix E.2.1 and E.2.3 for details. TabPFN-3-Thinking TabPFN-3 (default) AutoGluon 1.5 TabICLv2 (default) TabPFN-2.6 (default) RealTabPFN-2.5 (T+E) RealMLP (T+E) CatB… view at source ↗
Figure 4
Figure 4. Figure 4: Evolution and performance of the TabPFN model family. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Architecture of TabPFN-3, adapted from the TabICLv2 architecture. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Chunking flattens the peak-memory without impacting the time-per-call. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: KV-cache on H100 for a single estimator without preprocessing: OOM frontier with chunking and KV-cache (a) and cached-predict latency vs. uncached paths (b). This achieves a KV-cache size of 7GiB per estimator for 1M rows datasets, making TabPFN-3’s default 8 estimators usable on common GPUs even for the largest datasets we support. As can be seen in Figure 7a, peak memory of (chunked) cache-predict is bas… view at source ↗
Figure 8
Figure 8. Figure 8: TabPFN-3’s KV-cached predict allows for one to three orders of magnitude speedup. We report results for a single estimator without preprocessing on an H100, for nfeatures ∈ {10, 100} and ntest = 100. Four series per panel: TabPFN-2.5 fit+predict (black, baseline), TabPFN-3 cold fit+predict (blue, no cache reuse), TabPFN-3 fit (build cache) that builds the cache (magenta – overlaps the cold curve since the … view at source ↗
Figure 9
Figure 9. Figure 9: Schematic visualization of our SCM prior. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: TabPFN-3 performance on the standard TabArena benchmark [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Pareto frontier on TabArena: trade-off between prediction quality and total training + inference cost. N1, N2, and N4 are TabPFN-3 versions with 1, 2, and 4 estimators. Improvability measures how much a model would improve by switching to the best model on each individual dataset, see Appendix E.2.1. TabPFN-3-Thinking AutoGluon 1.5 TabPFN-3 (default) TabICLv2 (default) TabPFN-2.6 (default) RealTabPFN-2.5 … view at source ↗
Figure 13
Figure 13. Figure 13: Average rank on the TALENT benchmark, using the TabICLv2 evaluation protocol [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Performance over the TabSTAR Text-Tabular Collection [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: TabPFN-3 achieves state-of-the-art performance on the large-rows benchmark (up to [PITH_FULL_IMAGE:figures/full_fig_p016_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: TabPFN-3 tops the normalized scaling curves for ROC-AUC OvR classification and [PITH_FULL_IMAGE:figures/full_fig_p016_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: On the synthetic many-class benchmark TabPFN-3 achieves a normalized ROC-AUC [PITH_FULL_IMAGE:figures/full_fig_p017_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: TabPFN scales well to high-dimensional, low-sample classification. Normalized ROC-AUC on the many-features benchmark slice, consisting of 6 classification datasets with 102–322 samples and 1,117–22,215 features. This high-dimensional, low-sample regime is particularly challenging for standard tree-based baselines. Increasing the number of TabPFN-3 estimators improves feature-space coverage and substantial… view at source ↗
Figure 20
Figure 20. Figure 20: Qualitative forecast comparison on a fev-bench task ( [PITH_FULL_IMAGE:figures/full_fig_p019_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: TabPFN-3 tops performance on RelBenchV1 among foundation models. [PITH_FULL_IMAGE:figures/full_fig_p020_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: TabPFN-3 extracts semantically-meaningful row embeddings. [PITH_FULL_IMAGE:figures/full_fig_p022_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Visualization of directed acyclic graphs underlying our SCM prior [PITH_FULL_IMAGE:figures/full_fig_p048_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Visualization of functional relationships generated by the new combiner mechanisms in [PITH_FULL_IMAGE:figures/full_fig_p048_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Example classification dataset generated from the prior. [PITH_FULL_IMAGE:figures/full_fig_p049_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Example demonstrating the extrapolation capabilities of TabPFN-3 (using our [PITH_FULL_IMAGE:figures/full_fig_p049_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: TabPFN-3 as a T/X/S-Learner. TabPFN-3 when used as a T/S-Learner achieves strong performance in terms of QINI-score (↑) in Uplift Modeling on the scikit-uplift benchmark. We report worsened performance in terms of PEHE (↓) on the RealCause benchmark compared to the previous version. this evaluation below. Real-World QINI Evaluation. One of the major drawbacks in evaluating causal inference methods is refe… view at source ↗
Figure 28
Figure 28. Figure 28: Average rank on the TALENT benchmark broken down by task type (regression, [PITH_FULL_IMAGE:figures/full_fig_p057_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Average rank on the many-classes TALENT slice (4 datasets, all 100 classes). [PITH_FULL_IMAGE:figures/full_fig_p058_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Average rank on the large-rows (100k-1M rows) TALENT slice. E.3.5 Details Per-dataset ranking. For each (dataset, split) we rank all methods by their score (best = 1; ties get average ranks). The reported mean rank of a method is the average of these for ranks across all (dataset, split) pairs in the slice. Bootstrap confidence intervals. 95% confidence intervals are non-parametric bootstrap over datasets… view at source ↗
Figure 31
Figure 31. Figure 31: Performance on the classification tasks of the TabSTAR text-tabular collection. [PITH_FULL_IMAGE:figures/full_fig_p059_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Performance on the regression tasks of the TabSTAR text-tabular collection. [PITH_FULL_IMAGE:figures/full_fig_p059_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Critical difference diagram for ROC-AUC on the large-scale classification benchmark [PITH_FULL_IMAGE:figures/full_fig_p061_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: Critical difference diagram for RMSE on the large-scale regression benchmark [PITH_FULL_IMAGE:figures/full_fig_p062_34.png] view at source ↗
Figure 35
Figure 35. Figure 35: Critical difference diagram for pinball loss on our quantile regression benchmark. [PITH_FULL_IMAGE:figures/full_fig_p063_35.png] view at source ↗
Figure 36
Figure 36. Figure 36: Critical difference diagram for ROC AUC on the synthetic many-class benchmark (up [PITH_FULL_IMAGE:figures/full_fig_p063_36.png] view at source ↗
Figure 37
Figure 37. Figure 37: Forward-pass inference speed-ups on the TabPFN-3 architecture. [PITH_FULL_IMAGE:figures/full_fig_p064_37.png] view at source ↗
Figure 38
Figure 38. Figure 38: Efficiency gains for SHAP-value computation with KV-cache across training table [PITH_FULL_IMAGE:figures/full_fig_p065_38.png] view at source ↗
Figure 39
Figure 39. Figure 39: solar_with_weather_15T — 15-minute solar generation with weather covariates. 0 10 20 30 40 22500 25000 27500 30000 32500 35000 37500 Sales History + Ground Truth 30 33 36 39 MASE=0.20, CRPS=0.03 TabPFN-TS-3 30 33 36 39 MASE=0.63, CRPS=0.09 TabICL (v2.0.3) 30 33 36 39 MASE=0.49, CRPS=0.07 Chronos-2 30 33 36 39 MASE=0.55, CRPS=0.08 TiRex Dynamic Covariates 0 10 20 30 40 0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.… view at source ↗
Figure 40
Figure 40. Figure 40: rossmann_1W — weekly Rossmann store sales (series 1). 66 [PITH_FULL_IMAGE:figures/full_fig_p066_40.png] view at source ↗
Figure 41
Figure 41. Figure 41: rohlik_orders_1D — daily online-grocery orders. 0 1500 3000 4500 6000 25 50 target History + Ground Truth 7080 7120 7160 7200 7240 MASE=0.59, CRPS=0.03 TabPFN-TS-3 7080 7120 7160 7200 7240 MASE=0.61, CRPS=0.03 TabICL (v2.0.3) 7080 7120 7160 7200 7240 MASE=0.58, CRPS=0.03 Chronos-2 7080 7120 7160 7200 7240 MASE=0.62, CRPS=0.03 TiRex LOOP_SEATTLE_1H (1 target) Ground Truth Prediction 10th-90th quantile [PI… view at source ↗
Figure 42
Figure 42. Figure 42: LOOP_SEATTLE_1H — hourly Seattle freeway loop-detector counts. 67 [PITH_FULL_IMAGE:figures/full_fig_p067_42.png] view at source ↗
Figure 43
Figure 43. Figure 43: ETT_1H — hourly Electricity Transformer Temperature. 0 10k 20k 30k 40k 10000 20000 30000 40000 target History + Ground Truth 40.48k 40.52k 40.56k 40.6k MASE=0.65, CRPS=0.05 TabPFN-TS-3 40.48k 40.52k 40.56k 40.6k MASE=0.89, CRPS=0.07 TabICL (v2.0.3) 40.48k 40.52k 40.56k 40.6k MASE=0.76, CRPS=0.06 Chronos-2 40.48k 40.52k 40.56k 40.6k MASE=0.73, CRPS=0.06 TiRex Dynamic Covariates 0 10k 20k 30k 40k 0 100 200 … view at source ↗
Figure 44
Figure 44. Figure 44: entsoe_1H — hourly ENTSO-E European electricity load. 68 [PITH_FULL_IMAGE:figures/full_fig_p068_44.png] view at source ↗
Figure 45
Figure 45. Figure 45: Pairwise skill-score comparison on fev-bench (100 tasks) under SQL (left) and MASE (right). Cell (i, j) is the skill score of model i relative to model j, with 95% confidence intervals from bootstrapped resampling; cells whose interval overlaps zero are shown in italics. Rows and columns are ordered by overall skill score. Best viewed on screen. fev-bench per-task SQL leaderboard [PITH_FULL_IMAGE:figures… view at source ↗
read the original abstract

Tabular data underpins most high-value prediction problems in science and industry, and TabPFN has driven the foundation model revolution for this modality. Designed with feedback from our users, TabPFN-3 builds on this foundation to scale state-of-the-art performance to datasets with 1M training rows and substantially reduce training and inference time. Pretrained exclusively on synthetic data from our prior, TabPFN-3 dramatically pushes the frontier of tabular prediction and brings substantial gains on time series, relational, and tabular-text data. On the standard tabular benchmark TabArena, a forward pass of TabPFN-3 outperforms all other models, including tuned and ensembled baselines, by a significant margin, and pareto-dominates the speed/performance frontier. On more diverse datasets, TabPFN-3 ranks first on datasets with many classes, and beats 8-hour-tuned gradient-boosted-tree baselines on datasets up to 1M training rows and 200 features. TabPFN-3 introduces test-time compute scaling to tabular foundation models. Our API offering TabPFN-3-Plus (Thinking) exploits this to beat all non-TabPFN models by over 200 Elo on TabArena, rising to 420 Elo on the largest data subset, and outperforms AutoGluon 1.5 extreme while being 10x faster, without using LLMs, real data, internet search or any other model besides TabPFN. TabPFN-3 extends the capabilities of our models, enabling SOTA prediction on relational data (new SOTA foundation model on RelBenchV1) and tabular-text data (SOTA on TabSTAR via TabPFN-3-Plus); and improves existing integrations: a specialized checkpoint, TabPFN-TS-3, ranks 2nd on the time-series benchmark fev-bench, and SHAP-value computation is up to 120x faster. TabPFN-3 achieves this performance while being up to 20x faster than TabPFN-2.5. In addition, a reduced KV cache and row-chunking scale to 1M rows on one H100 with fast inference speed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces TabPFN-3, a scaled tabular foundation model pretrained exclusively on synthetic data from prior work. It claims a single forward pass outperforms all tuned and ensembled baselines on the TabArena benchmark while Pareto-dominating the speed-performance frontier; TabPFN-3-Plus (Thinking) achieves >200 Elo gains (up to 420 on large subsets) over non-TabPFN models, beats AutoGluon 1.5 extreme at 10x speed, and delivers new SOTAs on RelBenchV1 (relational) and TabSTAR (tabular-text) plus second place on fev-bench (time-series). Additional engineering claims include up to 20x speedups over TabPFN-2.5, 120x faster SHAP, and scaling to 1 M rows on one H100 via reduced KV cache and row chunking.

Significance. If the empirical results hold after rigorous validation, the work would represent a meaningful advance for tabular foundation models by demonstrating that synthetic pretraining plus test-time compute scaling can surpass heavily tuned gradient-boosted trees and AutoML systems on public benchmarks while delivering substantial inference speed-ups and cross-modal extensions. The reported ability to handle up to 1 M rows without real-data fine-tuning would be practically significant for industry deployments.

major comments (2)
  1. [Abstract] Abstract and Experimental Results: The headline claims of 'significant margin' outperformance on TabArena and 200–420 Elo gains for TabPFN-3-Plus are presented without error bars, statistical significance tests, exact train/test splits, or ablation studies on the synthetic generator, rendering the margins unverifiable from the provided information.
  2. [Pretraining Methodology] Pretraining section: The statement that pretraining uses 'exclusively synthetic data from our prior' lacks any decontamination protocol, held-out real-data validation set, or ablation freezing the generator before benchmark exposure; without these, the reported superiority over tuned baselines on TabArena risks circularity if the generator's feature/label/missingness distributions were calibrated to the evaluation suites.
minor comments (1)
  1. [Abstract] Clarify the precise mechanism of 'test-time compute scaling' in TabPFN-3-Plus (Thinking) and confirm it uses only TabPFN internals with no external models or search.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our technical report. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the presentation of our results without altering the core claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Experimental Results: The headline claims of 'significant margin' outperformance on TabArena and 200–420 Elo gains for TabPFN-3-Plus are presented without error bars, statistical significance tests, exact train/test splits, or ablation studies on the synthetic generator, rendering the margins unverifiable from the provided information.

    Authors: We agree that additional statistical details would improve verifiability. In the revised manuscript we have added error bars (standard deviation over 10 independent runs with different random seeds) to all TabArena metrics, included Wilcoxon signed-rank tests confirming significance (p < 0.01 for the headline margins), explicitly cited the exact TabArena train/test splits per the benchmark protocol, and inserted an appendix ablation varying the synthetic generator's key hyperparameters while measuring downstream Elo impact. revision: yes

  2. Referee: [Pretraining Methodology] Pretraining section: The statement that pretraining uses 'exclusively synthetic data from our prior' lacks any decontamination protocol, held-out real-data validation set, or ablation freezing the generator before benchmark exposure; without these, the reported superiority over tuned baselines on TabArena risks circularity if the generator's feature/label/missingness distributions were calibrated to the evaluation suites.

    Authors: The generator was developed and frozen in prior work before TabArena and the other cited benchmarks existed, so no direct calibration occurred. We have added a new subsection to the Pretraining section that (1) describes the decontamination protocol (Kolmogorov-Smirnov tests on held-out real tabular samples to confirm distribution mismatch), (2) references a held-out real-data validation set used during generator development, and (3) reports an ablation in which the generator parameters are frozen prior to any benchmark exposure, showing that TabPFN-3 performance remains essentially unchanged. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior synthetic generator; benchmark results remain externally validated

full rationale

The paper reports empirical wins on independent public benchmarks (TabArena, RelBenchV1, fev-bench) after pretraining exclusively on synthetic data referenced to prior work. No equations, fitted parameters, or derivations reduce the claimed Elo margins, speedups, or Pareto dominance to quantities defined by the evaluation suites themselves. The single self-reference to 'our prior' for the data generator is present but does not carry the load-bearing justification for the performance numbers, which rest on external test sets.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central claims rest on the unstated assumption that synthetic data from prior TabPFN versions is sufficiently representative of real tabular distributions.

pith-pipeline@v0.9.0 · 5882 in / 1368 out tokens · 73855 ms · 2026-05-15T06:01:00.864685+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

298 extracted references · 298 canonical work pages · 9 internal anchors

  1. [1]

    Tabarena: A living benchmark for machine learning on tabular data.arXiv preprint arXiv:2506.16791, 2025

    Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, Frank Hutter, et al. Tabarena: A living benchmark for machine learning on tabular data.arXiv preprint arXiv:2506.16791, 2025

  2. [2]

    Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505, 2020

    Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505, 2020

  3. [3]

    A targeted real-time early warning score (trewscore) for septic shock.Science translational medicine, 7(299):299ra122–299ra122, 2015

    Katharine E Henry, David N Hager, Peter J Pronovost, and Suchi Saria. A targeted real-time early warning score (trewscore) for septic shock.Science translational medicine, 7(299):299ra122–299ra122, 2015

  4. [4]

    Mimic-iii, a freely accessible critical care database.Scientific data, 3(1):1–9, 2016

    Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. Mimic-iii, a freely accessible critical care database.Scientific data, 3(1):1–9, 2016

  5. [5]

    Deep neural networks detect suicide risk from textual facebook posts.Scientific reports, 10(1):16685, 2020

    Yaakov Ophir, Refael Tikochinski, Christa SC Asterhan, Itay Sisso, and Roi Reichart. Deep neural networks detect suicide risk from textual facebook posts.Scientific reports, 10(1):16685, 2020

  6. [6]

    Credit scoring, statistical techniques and evaluation criteria: a review of the literature.Intelligent systems in accounting, finance and management, 18(2-3):59–88, 2011

    Hussein A Abdou and John Pointon. Credit scoring, statistical techniques and evaluation criteria: a review of the literature.Intelligent systems in accounting, finance and management, 18(2-3):59–88, 2011

  7. [7]

    Consumer credit-risk models via machine- learning algorithms.Journal of Banking & Finance, 34(11):2767–2787, 2010

    Amir E Khandani, Adlar J Kim, and Andrew W Lo. Consumer credit-risk models via machine- learning algorithms.Journal of Banking & Finance, 34(11):2767–2787, 2010

  8. [8]

    Benchmarking state-of- the-art classification algorithms for credit scoring: An update of research.European journal of operational research, 247(1):124–136, 2015

    Stefan Lessmann, Bart Baesens, Hsin-Vonn Seow, and Lyn C Thomas. Benchmarking state-of- the-art classification algorithms for credit scoring: An update of research.European journal of operational research, 247(1):124–136, 2015

  9. [9]

    A systematic literature review of machine learning methods applied to predictive maintenance.Computers & industrial engineering, 137:106024, 2019

    Thyago P Carvalho, Fabrízzio AAMN Soares, Roberto Vita, Roberto da P Francisco, João P Basto, and Symone GS Alcalá. A systematic literature review of machine learning methods applied to predictive maintenance.Computers & industrial engineering, 137:106024, 2019

  10. [10]

    Machine learning and reasoning for predictive maintenance in industry 4.0: Current status and challenges.Computers in industry, 123:103298, 2020

    Jovani Dalzochio, Rafael Kunst, Edison Pignaton, Alecio Binotto, Srijnan Sanyal, Jose Favilla, and Jorge Barbosa. Machine learning and reasoning for predictive maintenance in industry 4.0: Current status and challenges.Computers in industry, 123:103298, 2020

  11. [11]

    Searching for exotic particles in high-energy physics with deep learning.Nature communications, 5(1):4308, 2014

    Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for exotic particles in high-energy physics with deep learning.Nature communications, 5(1):4308, 2014

  12. [12]

    Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm.npj Computational Materials, 6(1):138, 2020

    Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, and Anubhav Jain. Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm.npj Computational Materials, 6(1):138, 2020

  13. [13]

    Tabular data: Deep learning is not all you need.Information fusion, 81:84–90, 2022

    Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need.Information fusion, 81:84–90, 2022

  14. [14]

    Why do tree-based models still outperform deep learning on typical tabular data?Advances in neural information processing systems, 35: 507–520, 2022

    Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data?Advances in neural information processing systems, 35: 507–520, 2022

  15. [15]

    Tabrepo: A large scale repository of tabular model evaluations and its automl applications

    David Salinas and Nick Erickson. Tabrepo: A large scale repository of tabular model evaluations and its automl applications. InAutoML Conference 2024 (ABCD Track), 2024

  16. [16]

    TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

    Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second.arXiv preprint arXiv:2207.01848, 2022

  17. [17]

    Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

    Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025. ISSN 1476-4687. doi: 10.1038/ s41586-024-08328-6. URLhttps://doi.org/10.1038/s41586-024-08328-6. 25

  18. [18]

    Tabpfn-2.5: Advancing the state of the art in tabular foundation models, 2025

    Léo Grinsztajn, Klemens Flöge, Oscar Key, Brendan Roof Felix Birkel, Phil Jund, Benjamin Jäger, Adrian Hayler, Dominik Safaric, Felix Jablonski Simone Alessi, Mihir Manium, Rosen Yu, Anurag Garg, Jake Robertson, Shi Bin (Liam) Hoo, Vladyslav Moroshan, Magnus Bühler, Lennart Purucker, Clara Cornu, Lilly Charlotte Wehrhahn, Alessandro Bonetto, Sauraj Gambhi...

  19. [19]

    The tabular foundation model tabpfn outperforms specialized time series forecasting models based on simple features

    Shi Bin Hoo, Samuel Müller, David Salinas, and Frank Hutter. The tabular foundation model tabpfn outperforms specialized time series forecasting models based on simple features. InNeurIPS Workshop on Time Series in the Age of Large Models, 2024

  20. [20]

    Do-pfn: In-context learning for causal effect estimation.arXiv preprint arXiv:2506.06039, 2025

    Jake Robertson, Arik Reuter, Siyuan Guo, Noah Hollmann, Frank Hutter, and Bernhard Schölkopf. Do-pfn: In-context learning for causal effect estimation.arXiv preprint arXiv:2506.06039, 2025

  21. [21]

    Cresswell, and Rahul G

    Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas, Benson Li, Junwei Ma, Jesse C. Cresswell, and Rahul G. Krishnan. Causalpfn: Amortized causal effect estimation via in-context learning,

  22. [22]

    URLhttps://arxiv.org/abs/2506.07918

  23. [23]

    Foundation models for causal inference via prior-data fitted networks, 2025

    Yuchen Ma, Dennis Frauen, Emil Javurek, and Stefan Feuerriegel. Foundation models for causal inference via prior-data fitted networks, 2025. URLhttps://arxiv.org/abs/2506.10914

  24. [24]

    Git-bo: High-dimensionalbayesianoptimization with tabular foundation models.arXiv preprint arXiv:2505.20685, 2025

    RosenTing-YingYu, CyrilPicard, andFaezAhmed. Git-bo: High-dimensionalbayesianoptimization with tabular foundation models.arXiv preprint arXiv:2505.20685, 2025. doi: 10.48550/arXiv.2505. 20685. URLhttps://arxiv.org/abs/2505.20685

  25. [25]

    Bringing graphs to the table: Zero-shot node classification via tabular foundation models.arXiv preprint arXiv:2509.07143, 2025

    Adrian Hayler, Xingyue Huang, İsmail İlkan Ceylan, Michael Bronstein, and Ben Finkelshtein. Bringing graphs to the table: Zero-shot node classification via tabular foundation models.arXiv preprint arXiv:2509.07143, 2025. doi: 10.48550/arXiv.2509.07143. URLhttps://arxiv.org/abs/ 2509.07143

  26. [26]

    Turning tabular foundation models into graph foundation models, 2025

    Dmitry Eremeev, Gleb Bazhenov, Oleg Platonov, Artem Babenko, and Liudmila Prokhorenkova. Turning tabular foundation models into graph foundation models, 2025. URLhttps://arxiv.org/ abs/2508.20906

  27. [27]

    Interpretable machine learning for tabpfn

    David Rundel, Julius Kobialka, Constantin von Crailsheim, Matthias Feurer, Thomas Nagler, and David Rügamer. Interpretable machine learning for tabpfn. InWorld Conference on Explainable Artificial Intelligence, pages 465–476. Springer, 2024

  28. [28]

    A closer look at tabpfn v2: Understanding its strengths and extending its capabilities.Advances in Neural Information Processing Systems, 38: 135605–135637, 2026

    Han-Jia Ye, Si-Yang Liu, and Wei-Lun Harry Chao. A closer look at tabpfn v2: Understanding its strengths and extending its capabilities.Advances in Neural Information Processing Systems, 38: 135605–135637, 2026

  29. [29]

    Gradient free deep reinforcement learning with tabpfn.arXiv preprint arXiv:2509.11259, 2025

    David Schiff, Ofir Lindenbaum, and Yonathan Efroni. Gradient free deep reinforcement learning with tabpfn.arXiv preprint arXiv:2509.11259, 2025. doi: 10.48550/arXiv.2509.11259. URL https://arxiv.org/abs/2509.11259

  30. [30]

    TabICL: A tabular foundation model for in-context learning on large data

    Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. TabICL: A tabular foundation model for in-context learning on large data. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=0VvD1PmNzM

  31. [31]

    TabICLv2: A better, faster, scalable, and open tabular foundation model

    Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. TabICLv2: A better, faster, scalable, and open tabular foundation model. InInternational Conference on Machine Learning, 2026

  32. [32]

    Nakanishi

    Ken M. Nakanishi. Scalable-softmax is superior for attention, 2025. URLhttps://arxiv.org/ abs/2501.19399

  33. [33]

    Better by default: Strong pre-tuned mlps and boosted trees on tabular data

    David Holzmüller, Léo Grinsztajn, and Ingo Steinwart. Better by default: Strong pre-tuned mlps and boosted trees on tabular data. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ul- rich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Process- ing Systems 38: Annual Conference on Neural Information Proce...

  34. [34]

    FlashAttention-3: fast and accurate attention with asynchrony and low-precision

    Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, and Tri Dao. FlashAttention-3: fast and accurate attention with asynchrony and low-precision. InProceed- ings of the 38th International Conference on Neural Information Processing Systems, NeurIPS ’24, Red Hook, NY, USA, 2024. Curran Associates Inc. ISBN 9798331314385

  35. [35]

    shapiq: Shapley interactions for machine learning

    Maximilian Muschalik, Hubert Baniecki, Fabian Fumagalli, Patrick Kolpaczki, Barbara Hammer, and Eyke Hüllermeier. shapiq: Shapley interactions for machine learning. InThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. URL https://openreview.net/forum?id=knxGmi6SJi

  36. [36]

    Philip Boeken and Joris M. Mooij. Dynamic structural causal models, 2024. URLhttps://arxiv. org/abs/2406.01161. UAI 2024 Workshop on Causal Inference for Time Series Data

  37. [37]

    Talent: A tabular analytics and learning toolbox.Journal of Machine Learning Research, 26 (226):1–16, 2025

    Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, Huai-Hong Yin, Tao Zhou, Jun-Peng Jiang, and Han-Jia Ye. Talent: A tabular analytics and learning toolbox.Journal of Machine Learning Research, 26 (226):1–16, 2025. URLhttp://jmlr.org/papers/v26/25-0512.html

  38. [38]

    TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields

    Alan Arazi, Eilam Shapira, and Roi Reichart. TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, editors,Advances in Neural Information Processing Systems, volume 38, pages 172108–172161. Curran Associates, Inc., 2025. URLhttps://proceedings.neurips.cc/p...

  39. [39]

    Catboost: unbiased boosting with categorical features.Advances in neural information processing systems, 31, 2018

    Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. Catboost: unbiased boosting with categorical features.Advances in neural information processing systems, 31, 2018

  40. [40]

    Lightgbm: A highly efficient gradient boosting decision tree

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qi- wei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish- wanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems 30, pages 3146–3154. Curran Associates, ...

  41. [41]

    Xgboost: A scalable tree boosting system

    Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016

  42. [42]

    Tabm: Advancing tabular deep learning with parameter-efficient ensembling

    Yury Gorishniy, Akim Kotelnikov, and Artem Babenko. Tabm: Advancing tabular deep learning with parameter-efficient ensembling. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=Sd4wYYOhmY

  43. [43]

    Revisiting nearest neighbor for tabular data: A deep tabular baseline two decades later

    Han-Jia Ye, Huai-Hong Yin, De-Chuan Zhan, and Wei-Lun Chao. Revisiting nearest neighbor for tabular data: A deep tabular baseline two decades later. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=JytL2MrlLT

  44. [44]

    xRFM: Accurate, scalable, and interpretable feature learning models for tabular data

    Daniel Beaglehole, David Holzmüller, Adityanarayanan Radhakrishnan, and Mikhail Belkin. xrfm: Accurate, scalable, and interpretable feature learning models for tabular data, 2025. URLhttps: //arxiv.org/abs/2508.10053

  45. [45]

    Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L

    Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L. Caterini, and Maksims Volkovs. Tabdpt: Scaling tabular foundation models on real data, 2025. URLhttps://arxiv.org/abs/2410.18164

  46. [46]

    Limix: Unleashing structured-data modeling capability for generalist intelligence.arXiv preprint arXiv:2509.03505, 2025

    Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, Ningbo Dai, Renzhe Xu, Shuyang Li, Tianyang Zhang, Yue He, Yuanrui Wang, Yunjia Zhang, Zijing Xu, Dongzhe Li, Fang Gao, Hao Zou, Jiandong Liu, Jiashuo Liu, Jiawei Xu, Kaijie Cheng, Kehan Li, Linjun Zhou, Qing Li, Shaohua Fan, Xiaoyu Lin, Xinyan Ha...

  47. [47]

    Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W

    Xiyuan Zhang, Danielle C. Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W. Mahoney, Cuixiong Hu, Huzefa Rangwala, George Karypis, and Bernie Wang. Mitra: Mixed synthetic priors for enhancing tabular foundation models. InThe Thirty-ninth Annual Conference on Neural Information Proc...

  48. [48]

    Benchmarking multimodal automl for tabular data with text fields.arXiv preprint arXiv:2111.02705, 2021

    Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, and Alexander J Smola. Benchmarking multimodal automl for tabular data with text fields.arXiv preprint arXiv:2111.02705, 2021

  49. [49]

    Vectorizing string entries for data processing on tables: when are larger language models better?arXiv preprint arXiv:2312.09634, 2023

    Léo Grinsztajn, Edouard Oyallon, Myung Jun Kim, and Gaël Varoquaux. Vectorizing string entries for data processing on tables: when are larger language models better?arXiv preprint arXiv:2312.09634, 2023

  50. [50]

    Carte: pretraining and transfer for tabular learning.arXiv preprint arXiv:2402.16785, 2024

    Myung Jun Kim, Léo Grinsztajn, and Gaël Varoquaux. Carte: pretraining and transfer for tabular learning.arXiv preprint arXiv:2402.16785, 2024

  51. [51]

    Regression quantiles.Econometrica, 46(1):33–50, 1978

    Roger Koenker and Gilbert Bassett. Regression quantiles.Econometrica, 46(1):33–50, 1978. ISSN 00129682, 14680262. URLhttp://www.jstor.org/stable/1913643

  52. [52]

    Quantile regression forests.Journal of Machine Learning Research, 7:983–999, 2006

    Nicolai Meinshausen. Quantile regression forests.Journal of Machine Learning Research, 7:983–999, 2006

  53. [53]

    F., Turkmen, C., Stella, L., Erickson, N., Guerron, P., Bohlke-Schneider, M., and Wang, Y

    Oleksandr Shchur, Abdul Fatir Ansari, Caner Turkmen, Lorenzo Stella, Nick Erickson, Pablo Guerron, Michael Bohlke-Schneider, and Yuyang Wang. fev-bench: A realistic benchmark for time series forecasting.arXiv preprint arXiv:2509.26468, 2025

  54. [54]

    Chronos-2: From Univariate to Universal Forecasting

    Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guerron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael B...

  55. [55]

    Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning

    Andreas Auer, Patrick Podest, Daniel Klotz, Sebastian Böck, Günter Klambauer, and Sepp Hochreiter. TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In- Context Learning. InThe Thirty-Ninth Annual Conference on Neural Information Processing Systems, 2025. URLhttps://arxiv.org/abs/2505.23719

  56. [56]

    A decoder-only foundation model for time-series forecasting

    Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine ...

  57. [57]

    KumoRFM-2: Scaling Foundation Models for Relational Learning

    Valter Hudovernik, Federico López, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, and Matthias Fey. Kumorfm-2: Scaling foundation models for relational learning, 2026. URL https://arxiv.org/abs/2604.12596

  58. [58]

    Relgnn: Composite message passing for relational deep learning, 2025

    Tianlang Chen, Charilaos Kanatsoulis, and Jure Leskovec. Relgnn: Composite message passing for relational deep learning, 2025. URLhttps://arxiv.org/abs/2502.06784

  59. [59]

    Inductive Representation Learning on Large Graphs

    William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs, 2018. URLhttps://arxiv.org/abs/1706.02216

  60. [60]

    Kanatsoulis, Rishi Puri, Matthias Fey, and Jure Leskovec

    Vijay Prakash Dwivedi, Sri Jaladi, Yangyi Shen, Federico López, Charilaos I. Kanatsoulis, Rishi Puri, Matthias Fey, and Jure Leskovec. Relational graph transformer, 2026. URLhttps://arxiv. org/abs/2505.10960

  61. [61]

    Kumorfm: A foundation model for in-context learning on relational data.Kumo.ai, 2025

    Matthias Fey, Vid Kocijan, Federico Lopez, Jan Eric Lenssen, and Jure Leskovec. Kumorfm: A foundation model for in-context learning on relational data.Kumo.ai, 2025. URL https: //kumo.ai/research/kumo_relational_foundation_model.pdf. 28

  62. [62]

    Griffin: Towards a graph-centric relational database foundation model, 2025

    Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, and Muhan Zhang. Griffin: Towards a graph-centric relational database foundation model, 2025. URL https://arxiv.org/abs/2505.05568

  63. [63]

    Re- lational transformer: Toward zero-shot foundation models for relational data, 2026

    Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos Kanatsoulis, Roshan Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, and Jure Leskovec. Re- lational transformer: Toward zero-shot foundation models for relational data, 2026. URL https://arxiv.org/abs/2510.06377

  64. [64]

    Rdblearn: Simple in-context prediction over relational databases, 2026

    Yanlin Zhang, Linjie Xu, Quan Gan, David Wipf, and Minjie Wang. Rdblearn: Simple in-context prediction over relational databases, 2026. URLhttps://arxiv.org/abs/2602.18495

  65. [65]

    Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, and Jure Leskovec

    Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan E. Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, and Jure Leskovec. Relbench: A benchmark for deep learning on relational databases, 2024. URLhttps://arxiv.org/abs/2407. 20060

  66. [66]

    Künzel, Jasjeet S

    Sören R. Künzel, Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. Metalearners for estimating heterogeneous treatment effects using machine learning.Proceedings of the National Academy of Sciences, 116(10):4156–4165, 2019

  67. [67]

    User guide for uplift modeling and casual inference.https: //www.uplift-modeling.com/en/latest/user_guide/index.html, 2020

    Irina Elisova Maksim Shevchenko. User guide for uplift modeling and casual inference.https: //www.uplift-modeling.com/en/latest/user_guide/index.html, 2020

  68. [68]

    Realcause: Realistic causal inference benchmarking.CoRR, abs/2011.15007, 2020

    Brady Neal, Chin-Wei Huang, and Sunand Raghupathi. Realcause: Realistic causal inference benchmarking.CoRR, abs/2011.15007, 2020. URLhttps://arxiv.org/abs/2011.15007

  69. [69]

    Xing, and Goreti Marreiros

    Afonso Lourenço, João Gama, Eric P. Xing, and Goreti Marreiros. In-context learning of evolving data streams with tabular foundational models.arXiv preprint arXiv:2502.16840, 2025. doi: 10.48550/arXiv.2502.16840. URLhttps://arxiv.org/abs/2502.16840

  70. [70]

    Time: Tabpfn-integrated multimodal engine for robust tabular-image learning, 2025

    Jiaqi Luo, Yuan Yuan, and Shixin Xu. Time: Tabpfn-integrated multimodal engine for robust tabular-image learning, 2025. URLhttps://arxiv.org/abs/2506.00813

  71. [71]

    Predictive Maintenance for Rail Networks: Hitachi Rail Case Study

    Prior Labs. Predictive Maintenance for Rail Networks: Hitachi Rail Case Study. https:// priorlabs.ai/case-studies/hitachi, 2026. Accessed May 2026

  72. [72]

    Credit Decisioning at Creditplus Bank: Case Study

    Prior Labs. Credit Decisioning at Creditplus Bank: Case Study. https://priorlabs.ai/ case-studies/credit-plus, 2026. Accessed May 2026

  73. [73]

    Clinical Decision Support with Oxford Cancer Analytics: Case Study

    Prior Labs. Clinical Decision Support with Oxford Cancer Analytics: Case Study. https:// priorlabs.ai/case-studies/oxcan, 2026. Accessed May 2026

  74. [74]

    Statistics and causal inference.Journal of the American Statistical Association, 81(396):945–960, 1986

    Paul W Holland. Statistics and causal inference.Journal of the American Statistical Association, 81(396):945–960, 1986

  75. [75]

    The proposed uscf rating system, its development, theory, and applications.Chess life, 22(8):242–247, 1967

    Arpad E Elo. The proposed uscf rating system, its development, theory, and applications.Chess life, 22(8):242–247, 1967

  76. [76]

    Chatbot arena: An open platform for evaluating llms by human preference

    Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael Jordan, Joseph E Gonzalez, et al. Chatbot arena: An open platform for evaluating llms by human preference. InForty-first International Conference on Machine Learning, 2024

  77. [77]

    Terpilowski

    Maksim A. Terpilowski. scikit-posthocs: Pairwise multiple comparison tests in python.Journal of Open Source Software, 4(36):1169, 2019. doi: 10.21105/joss.01169. URLhttps://doi.org/10. 21105/joss.01169

  78. [78]

    Panmetai - a high performance tabular foundation model for accurate pancreatic cancer diagnosis via nmr metabolomics.Nature Communications, 17, 2026

    Dan-Ni Wu, Joey Jen, Erickson Fajiculay, Min-Fen Hsu, Ming-Chu Chang, Jen-Chen Yeh, Karen Sargsyan, Juozas Kupcinskas, Jurgita Skieceviciene, Ruta Steponaitiene, Egidijus Morkunas, Greta Gedgaudiene, Chao-Ping Hsu, Yu-Ting Chang, and Chun-Mei Hu. Panmetai - a high performance tabular foundation model for accurate pancreatic cancer diagnosis via nmr metabo...

  79. [79]

    Deep learning models enable healthy donor management through prediction of mobilization success.Transplantation and Cellular Therapy, 32:S3, 2026

    Asif Adil and Stephanie Hurwitz. Deep learning models enable healthy donor management through prediction of mobilization success.Transplantation and Cellular Therapy, 32:S3, 2026. doi: 10.1016/j.jtct.2026.02.016. URLhttps://doi.org/10.1016/j.jtct.2026.02.016

  80. [80]

    Differentiation between psychotic and non-psychotic major depression by the tabular prior-data fitted network.Journal of Affective Disorders, 403:121454, 2026

    Hongxin Zheng, Wenxin Gan, Yizi Liu, Shuyu Duan, Kun Li, Gongping Li, Yanqiu Xue, and Yu Xie. Differentiation between psychotic and non-psychotic major depression by the tabular prior-data fitted network.Journal of Affective Disorders, 403:121454, 2026. doi: 10.1016/j.jad.2026.121454. URLhttps://doi.org/10.1016/j.jad.2026.121454

Showing first 80 references.