arxiv: 2605.13986 · v1 · submitted 2026-05-13 · 💻 cs.LG · stat.ML

Recognition: 2 theorem links

· Lean Theorem

TabPFN-3: Technical Report

L\'eo Grinsztajn , Klemens Fl\"oge , Oscar Key , Felix Birkel , Philipp Jund , Brendan Roof , Mihir Manium , Shi Bin (Liam) Hoo

show 33 more authors

Magnus B\"uhler Anurag Garg Dominik Safaric Jake Robertson Benjamin J\"ager Simone Alessi Adrian Hayler Vladyslav Moroshan Lennart Purucker Philipp Singer Alan Arazi Julien Siems Jan Hendrik Metzen Georg Grab Nick Erickson Siyuan Guo Eliott Kalfon Simon Bing David Salinas Clara Cornu Lilly Charlotte Wehrhahn Diana Kriuchkova Kursat Kaya Lydia Sidhoum Marie Salmon Jerry Chen Madelon Hulsebos Yann LeCun Samuel M\"uller Bernhard Sch\"olkopf Sauraj Gambhir Noah Hollmann Frank Hutter

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:01 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords tabular datafoundation modelssynthetic pretrainingTabArena benchmarktest-time scalinggradient boostingrelational datatime series

0 comments

The pith

TabPFN-3 outperforms all tuned and ensembled models on the TabArena tabular benchmark with a single forward pass.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

TabPFN-3 advances tabular foundation models by pretraining exclusively on synthetic data to handle datasets up to one million rows. It delivers superior prediction accuracy compared to gradient-boosted trees and other baselines while cutting training and inference times substantially. The model also introduces test-time compute scaling, allowing further performance gains through additional computation at inference. These improvements extend to time series, relational, and tabular-text data, positioning TabPFN-3 as a versatile tool for high-value prediction problems in science and industry.

Core claim

TabPFN-3 achieves state-of-the-art performance on tabular prediction tasks by scaling a transformer-based foundation model pretrained on synthetic data. On the TabArena benchmark, a single forward pass surpasses all other models including tuned and ensembled baselines, while dominating the speed-performance trade-off. The TabPFN-3-Plus variant, leveraging test-time compute, further improves results by over 200 Elo points overall and 420 on large subsets, outperforming AutoGluon while being ten times faster. The approach extends to new domains with new state-of-the-art results on relational benchmarks and tabular-text tasks, all while being up to twenty times faster than its predecessor and s

What carries the argument

Synthetic pretraining combined with test-time compute scaling in a transformer architecture for tabular data.

If this is right

Beats 8-hour-tuned gradient-boosted-tree baselines on datasets up to 1M rows and 200 features.
Ranks first on datasets with many classes.
Achieves new SOTA on RelBenchV1 for relational data.
Provides SOTA on TabSTAR for tabular-text data via TabPFN-3-Plus.
Enables up to 120x faster SHAP-value computation and ranks 2nd on fev-bench via TabPFN-TS-3.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reliance on synthetic data may allow the model to avoid privacy issues associated with real data training.
Test-time scaling opens the door to further performance improvements by allocating more compute during prediction without retraining.
The speed gains could make foundation models practical for real-time tabular applications where traditional methods were too slow.
Integration improvements suggest easier adoption in existing pipelines for time-series and interpretability tasks.

Load-bearing premise

The synthetic data used for pretraining sufficiently represents the distribution of real-world tabular datasets to enable strong generalization.

What would settle it

Evaluating TabPFN-3 on a large collection of previously unseen real-world tabular datasets collected after the model's release to check if the performance margins hold.

Figures

Figures reproduced from arXiv: 2605.13986 by Adrian Hayler, Alan Arazi, Anurag Garg, Benjamin J\"ager, Bernhard Sch\"olkopf, Brendan Roof, Clara Cornu, David Salinas, Diana Kriuchkova, Dominik Safaric, Eliott Kalfon, Felix Birkel, Frank Hutter, Georg Grab, Jake Robertson, Jan Hendrik Metzen, Jerry Chen, Julien Siems, Klemens Fl\"oge, Kursat Kaya, Lennart Purucker, L\'eo Grinsztajn, Lilly Charlotte Wehrhahn, Lydia Sidhoum, Madelon Hulsebos, Magnus B\"uhler, Marie Salmon, Mihir Manium, Nick Erickson, Noah Hollmann, Oscar Key, Philipp Jund, Philipp Singer, Samuel M\"uller, Sauraj Gambhir, Shi Bin (Liam) Hoo, Simon Bing, Simone Alessi, Siyuan Guo, Vladyslav Moroshan, Yann LeCun.

**Figure 1.** Figure 1: Performance on the TabArena benchmark [1], largest data subset (10k-100k samples). TabPFN-3 outperforms any other model in a forward pass. TabPFN-3-Plus (Thinking) is dramatically better yet, outperforming AutoGluon 1.5 extreme [2], a complex ensemble of models tuned for 4 hours, while being 10x faster. 1 arXiv:2605.13986v1 [cs.LG] 13 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: TabPFN-3 dominates the Pareto frontier on the largest datasets in TabArena (10k–100k rows). N1, N2, and N4 are model versions with 1, 2, and 4 estimators. Improvability measures how much worse a model is than the best per-dataset model. See Appendix E.2.1 and E.2.3 for details. TabPFN-3-Thinking TabPFN-3 (default) AutoGluon 1.5 TabICLv2 (default) TabPFN-2.6 (default) RealTabPFN-2.5 (T+E) RealMLP (T+E) CatB… view at source ↗

**Figure 4.** Figure 4: Evolution and performance of the TabPFN model family. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Architecture of TabPFN-3, adapted from the TabICLv2 architecture. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Chunking flattens the peak-memory without impacting the time-per-call. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: KV-cache on H100 for a single estimator without preprocessing: OOM frontier with chunking and KV-cache (a) and cached-predict latency vs. uncached paths (b). This achieves a KV-cache size of 7GiB per estimator for 1M rows datasets, making TabPFN-3’s default 8 estimators usable on common GPUs even for the largest datasets we support. As can be seen in Figure 7a, peak memory of (chunked) cache-predict is bas… view at source ↗

**Figure 8.** Figure 8: TabPFN-3’s KV-cached predict allows for one to three orders of magnitude speedup. We report results for a single estimator without preprocessing on an H100, for nfeatures ∈ {10, 100} and ntest = 100. Four series per panel: TabPFN-2.5 fit+predict (black, baseline), TabPFN-3 cold fit+predict (blue, no cache reuse), TabPFN-3 fit (build cache) that builds the cache (magenta – overlaps the cold curve since the … view at source ↗

**Figure 9.** Figure 9: Schematic visualization of our SCM prior. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: TabPFN-3 performance on the standard TabArena benchmark [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: Pareto frontier on TabArena: trade-off between prediction quality and total training + inference cost. N1, N2, and N4 are TabPFN-3 versions with 1, 2, and 4 estimators. Improvability measures how much a model would improve by switching to the best model on each individual dataset, see Appendix E.2.1. TabPFN-3-Thinking AutoGluon 1.5 TabPFN-3 (default) TabICLv2 (default) TabPFN-2.6 (default) RealTabPFN-2.5 … view at source ↗

**Figure 13.** Figure 13: Average rank on the TALENT benchmark, using the TabICLv2 evaluation protocol [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 14.** Figure 14: Performance over the TabSTAR Text-Tabular Collection [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗

**Figure 15.** Figure 15: TabPFN-3 achieves state-of-the-art performance on the large-rows benchmark (up to [PITH_FULL_IMAGE:figures/full_fig_p016_15.png] view at source ↗

**Figure 16.** Figure 16: TabPFN-3 tops the normalized scaling curves for ROC-AUC OvR classification and [PITH_FULL_IMAGE:figures/full_fig_p016_16.png] view at source ↗

**Figure 17.** Figure 17: On the synthetic many-class benchmark TabPFN-3 achieves a normalized ROC-AUC [PITH_FULL_IMAGE:figures/full_fig_p017_17.png] view at source ↗

**Figure 18.** Figure 18: TabPFN scales well to high-dimensional, low-sample classification. Normalized ROC-AUC on the many-features benchmark slice, consisting of 6 classification datasets with 102–322 samples and 1,117–22,215 features. This high-dimensional, low-sample regime is particularly challenging for standard tree-based baselines. Increasing the number of TabPFN-3 estimators improves feature-space coverage and substantial… view at source ↗

**Figure 20.** Figure 20: Qualitative forecast comparison on a fev-bench task ( [PITH_FULL_IMAGE:figures/full_fig_p019_20.png] view at source ↗

**Figure 21.** Figure 21: TabPFN-3 tops performance on RelBenchV1 among foundation models. [PITH_FULL_IMAGE:figures/full_fig_p020_21.png] view at source ↗

**Figure 22.** Figure 22: TabPFN-3 extracts semantically-meaningful row embeddings. [PITH_FULL_IMAGE:figures/full_fig_p022_22.png] view at source ↗

**Figure 23.** Figure 23: Visualization of directed acyclic graphs underlying our SCM prior [PITH_FULL_IMAGE:figures/full_fig_p048_23.png] view at source ↗

**Figure 24.** Figure 24: Visualization of functional relationships generated by the new combiner mechanisms in [PITH_FULL_IMAGE:figures/full_fig_p048_24.png] view at source ↗

**Figure 25.** Figure 25: Example classification dataset generated from the prior. [PITH_FULL_IMAGE:figures/full_fig_p049_25.png] view at source ↗

**Figure 26.** Figure 26: Example demonstrating the extrapolation capabilities of TabPFN-3 (using our [PITH_FULL_IMAGE:figures/full_fig_p049_26.png] view at source ↗

**Figure 27.** Figure 27: TabPFN-3 as a T/X/S-Learner. TabPFN-3 when used as a T/S-Learner achieves strong performance in terms of QINI-score (↑) in Uplift Modeling on the scikit-uplift benchmark. We report worsened performance in terms of PEHE (↓) on the RealCause benchmark compared to the previous version. this evaluation below. Real-World QINI Evaluation. One of the major drawbacks in evaluating causal inference methods is refe… view at source ↗

**Figure 28.** Figure 28: Average rank on the TALENT benchmark broken down by task type (regression, [PITH_FULL_IMAGE:figures/full_fig_p057_28.png] view at source ↗

**Figure 29.** Figure 29: Average rank on the many-classes TALENT slice (4 datasets, all 100 classes). [PITH_FULL_IMAGE:figures/full_fig_p058_29.png] view at source ↗

**Figure 30.** Figure 30: Average rank on the large-rows (100k-1M rows) TALENT slice. E.3.5 Details Per-dataset ranking. For each (dataset, split) we rank all methods by their score (best = 1; ties get average ranks). The reported mean rank of a method is the average of these for ranks across all (dataset, split) pairs in the slice. Bootstrap confidence intervals. 95% confidence intervals are non-parametric bootstrap over datasets… view at source ↗

**Figure 31.** Figure 31: Performance on the classification tasks of the TabSTAR text-tabular collection. [PITH_FULL_IMAGE:figures/full_fig_p059_31.png] view at source ↗

**Figure 32.** Figure 32: Performance on the regression tasks of the TabSTAR text-tabular collection. [PITH_FULL_IMAGE:figures/full_fig_p059_32.png] view at source ↗

**Figure 33.** Figure 33: Critical difference diagram for ROC-AUC on the large-scale classification benchmark [PITH_FULL_IMAGE:figures/full_fig_p061_33.png] view at source ↗

**Figure 34.** Figure 34: Critical difference diagram for RMSE on the large-scale regression benchmark [PITH_FULL_IMAGE:figures/full_fig_p062_34.png] view at source ↗

**Figure 35.** Figure 35: Critical difference diagram for pinball loss on our quantile regression benchmark. [PITH_FULL_IMAGE:figures/full_fig_p063_35.png] view at source ↗

**Figure 36.** Figure 36: Critical difference diagram for ROC AUC on the synthetic many-class benchmark (up [PITH_FULL_IMAGE:figures/full_fig_p063_36.png] view at source ↗

**Figure 37.** Figure 37: Forward-pass inference speed-ups on the TabPFN-3 architecture. [PITH_FULL_IMAGE:figures/full_fig_p064_37.png] view at source ↗

**Figure 38.** Figure 38: Efficiency gains for SHAP-value computation with KV-cache across training table [PITH_FULL_IMAGE:figures/full_fig_p065_38.png] view at source ↗

**Figure 39.** Figure 39: solar_with_weather_15T — 15-minute solar generation with weather covariates. 0 10 20 30 40 22500 25000 27500 30000 32500 35000 37500 Sales History + Ground Truth 30 33 36 39 MASE=0.20, CRPS=0.03 TabPFN-TS-3 30 33 36 39 MASE=0.63, CRPS=0.09 TabICL (v2.0.3) 30 33 36 39 MASE=0.49, CRPS=0.07 Chronos-2 30 33 36 39 MASE=0.55, CRPS=0.08 TiRex Dynamic Covariates 0 10 20 30 40 0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.… view at source ↗

**Figure 40.** Figure 40: rossmann_1W — weekly Rossmann store sales (series 1). 66 [PITH_FULL_IMAGE:figures/full_fig_p066_40.png] view at source ↗

**Figure 41.** Figure 41: rohlik_orders_1D — daily online-grocery orders. 0 1500 3000 4500 6000 25 50 target History + Ground Truth 7080 7120 7160 7200 7240 MASE=0.59, CRPS=0.03 TabPFN-TS-3 7080 7120 7160 7200 7240 MASE=0.61, CRPS=0.03 TabICL (v2.0.3) 7080 7120 7160 7200 7240 MASE=0.58, CRPS=0.03 Chronos-2 7080 7120 7160 7200 7240 MASE=0.62, CRPS=0.03 TiRex LOOP_SEATTLE_1H (1 target) Ground Truth Prediction 10th-90th quantile [PI… view at source ↗

**Figure 42.** Figure 42: LOOP_SEATTLE_1H — hourly Seattle freeway loop-detector counts. 67 [PITH_FULL_IMAGE:figures/full_fig_p067_42.png] view at source ↗

**Figure 43.** Figure 43: ETT_1H — hourly Electricity Transformer Temperature. 0 10k 20k 30k 40k 10000 20000 30000 40000 target History + Ground Truth 40.48k 40.52k 40.56k 40.6k MASE=0.65, CRPS=0.05 TabPFN-TS-3 40.48k 40.52k 40.56k 40.6k MASE=0.89, CRPS=0.07 TabICL (v2.0.3) 40.48k 40.52k 40.56k 40.6k MASE=0.76, CRPS=0.06 Chronos-2 40.48k 40.52k 40.56k 40.6k MASE=0.73, CRPS=0.06 TiRex Dynamic Covariates 0 10k 20k 30k 40k 0 100 200 … view at source ↗

**Figure 44.** Figure 44: entsoe_1H — hourly ENTSO-E European electricity load. 68 [PITH_FULL_IMAGE:figures/full_fig_p068_44.png] view at source ↗

**Figure 45.** Figure 45: Pairwise skill-score comparison on fev-bench (100 tasks) under SQL (left) and MASE (right). Cell (i, j) is the skill score of model i relative to model j, with 95% confidence intervals from bootstrapped resampling; cells whose interval overlaps zero are shown in italics. Rows and columns are ordered by overall skill score. Best viewed on screen. fev-bench per-task SQL leaderboard [PITH_FULL_IMAGE:figures… view at source ↗

read the original abstract

Tabular data underpins most high-value prediction problems in science and industry, and TabPFN has driven the foundation model revolution for this modality. Designed with feedback from our users, TabPFN-3 builds on this foundation to scale state-of-the-art performance to datasets with 1M training rows and substantially reduce training and inference time. Pretrained exclusively on synthetic data from our prior, TabPFN-3 dramatically pushes the frontier of tabular prediction and brings substantial gains on time series, relational, and tabular-text data. On the standard tabular benchmark TabArena, a forward pass of TabPFN-3 outperforms all other models, including tuned and ensembled baselines, by a significant margin, and pareto-dominates the speed/performance frontier. On more diverse datasets, TabPFN-3 ranks first on datasets with many classes, and beats 8-hour-tuned gradient-boosted-tree baselines on datasets up to 1M training rows and 200 features. TabPFN-3 introduces test-time compute scaling to tabular foundation models. Our API offering TabPFN-3-Plus (Thinking) exploits this to beat all non-TabPFN models by over 200 Elo on TabArena, rising to 420 Elo on the largest data subset, and outperforms AutoGluon 1.5 extreme while being 10x faster, without using LLMs, real data, internet search or any other model besides TabPFN. TabPFN-3 extends the capabilities of our models, enabling SOTA prediction on relational data (new SOTA foundation model on RelBenchV1) and tabular-text data (SOTA on TabSTAR via TabPFN-3-Plus); and improves existing integrations: a specialized checkpoint, TabPFN-TS-3, ranks 2nd on the time-series benchmark fev-bench, and SHAP-value computation is up to 120x faster. TabPFN-3 achieves this performance while being up to 20x faster than TabPFN-2.5. In addition, a reduced KV cache and row-chunking scale to 1M rows on one H100 with fast inference speed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TabPFN-3 adds test-time compute scaling and 1M-row support via chunking and KV cache tweaks, with reported benchmark wins, but thin experimental details leave the synthetic pretraining claims open to the circularity concern.

read the letter

TabPFN-3 extends the earlier TabPFN line by introducing test-time compute scaling for tabular models and engineering changes that let it handle up to 1M training rows on a single H100. The reduced KV cache and row-chunking are straightforward but effective steps that deliver the claimed 20x speedups over TabPFN-2.5 while keeping inference fast. The paper also shows the model ranking first on multi-class datasets and beating 8-hour tuned gradient-boosted trees on large tables, plus new SOTA marks on RelBenchV1 and TabSTAR when using the thinking variant. Those speed and scale numbers are the concrete advance here, and they matter for anyone running tabular prediction at production sizes. The main soft spot is the lack of error bars, exact splits, or ablation tables in the abstract, which makes it hard to judge how stable the margins are. The stress-test point about synthetic pretraining potentially aligning with TabArena distributions is worth checking in the full text; if the generator was never exposed to benchmark statistics, the Elo gains of 200-420 look more robust, but without a clear decontamination protocol the results stay harder to trust at face value. This work is aimed at tabular ML practitioners and foundation-model researchers who need faster inference on big tables. It deserves a serious referee because the scaling claims are specific enough to be checked against code and data, even if the experimental section needs tightening.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces TabPFN-3, a scaled tabular foundation model pretrained exclusively on synthetic data from prior work. It claims a single forward pass outperforms all tuned and ensembled baselines on the TabArena benchmark while Pareto-dominating the speed-performance frontier; TabPFN-3-Plus (Thinking) achieves >200 Elo gains (up to 420 on large subsets) over non-TabPFN models, beats AutoGluon 1.5 extreme at 10x speed, and delivers new SOTAs on RelBenchV1 (relational) and TabSTAR (tabular-text) plus second place on fev-bench (time-series). Additional engineering claims include up to 20x speedups over TabPFN-2.5, 120x faster SHAP, and scaling to 1 M rows on one H100 via reduced KV cache and row chunking.

Significance. If the empirical results hold after rigorous validation, the work would represent a meaningful advance for tabular foundation models by demonstrating that synthetic pretraining plus test-time compute scaling can surpass heavily tuned gradient-boosted trees and AutoML systems on public benchmarks while delivering substantial inference speed-ups and cross-modal extensions. The reported ability to handle up to 1 M rows without real-data fine-tuning would be practically significant for industry deployments.

major comments (2)

[Abstract] Abstract and Experimental Results: The headline claims of 'significant margin' outperformance on TabArena and 200–420 Elo gains for TabPFN-3-Plus are presented without error bars, statistical significance tests, exact train/test splits, or ablation studies on the synthetic generator, rendering the margins unverifiable from the provided information.
[Pretraining Methodology] Pretraining section: The statement that pretraining uses 'exclusively synthetic data from our prior' lacks any decontamination protocol, held-out real-data validation set, or ablation freezing the generator before benchmark exposure; without these, the reported superiority over tuned baselines on TabArena risks circularity if the generator's feature/label/missingness distributions were calibrated to the evaluation suites.

minor comments (1)

[Abstract] Clarify the precise mechanism of 'test-time compute scaling' in TabPFN-3-Plus (Thinking) and confirm it uses only TabPFN internals with no external models or search.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our technical report. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the presentation of our results without altering the core claims.

read point-by-point responses

Referee: [Abstract] Abstract and Experimental Results: The headline claims of 'significant margin' outperformance on TabArena and 200–420 Elo gains for TabPFN-3-Plus are presented without error bars, statistical significance tests, exact train/test splits, or ablation studies on the synthetic generator, rendering the margins unverifiable from the provided information.

Authors: We agree that additional statistical details would improve verifiability. In the revised manuscript we have added error bars (standard deviation over 10 independent runs with different random seeds) to all TabArena metrics, included Wilcoxon signed-rank tests confirming significance (p < 0.01 for the headline margins), explicitly cited the exact TabArena train/test splits per the benchmark protocol, and inserted an appendix ablation varying the synthetic generator's key hyperparameters while measuring downstream Elo impact. revision: yes
Referee: [Pretraining Methodology] Pretraining section: The statement that pretraining uses 'exclusively synthetic data from our prior' lacks any decontamination protocol, held-out real-data validation set, or ablation freezing the generator before benchmark exposure; without these, the reported superiority over tuned baselines on TabArena risks circularity if the generator's feature/label/missingness distributions were calibrated to the evaluation suites.

Authors: The generator was developed and frozen in prior work before TabArena and the other cited benchmarks existed, so no direct calibration occurred. We have added a new subsection to the Pretraining section that (1) describes the decontamination protocol (Kolmogorov-Smirnov tests on held-out real tabular samples to confirm distribution mismatch), (2) references a held-out real-data validation set used during generator development, and (3) reports an ablation in which the generator parameters are frozen prior to any benchmark exposure, showing that TabPFN-3 performance remains essentially unchanged. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior synthetic generator; benchmark results remain externally validated

full rationale

The paper reports empirical wins on independent public benchmarks (TabArena, RelBenchV1, fev-bench) after pretraining exclusively on synthetic data referenced to prior work. No equations, fitted parameters, or derivations reduce the claimed Elo margins, speedups, or Pareto dominance to quantities defined by the evaluation suites themselves. The single self-reference to 'our prior' for the data generator is present but does not carry the load-bearing justification for the performance numbers, which rest on external test sets.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central claims rest on the unstated assumption that synthetic data from prior TabPFN versions is sufficiently representative of real tabular distributions.

pith-pipeline@v0.9.0 · 5882 in / 1368 out tokens · 73855 ms · 2026-05-15T06:01:00.864685+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Pretrained exclusively on synthetic data from our prior... Structural Causal Model (SCM) prior
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

three-stage architecture: Feature distribution embedding... Feature aggregation... In-context learning

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

298 extracted references · 298 canonical work pages · 9 internal anchors

[1]

Tabarena: A living benchmark for machine learning on tabular data.arXiv preprint arXiv:2506.16791, 2025

Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, Frank Hutter, et al. Tabarena: A living benchmark for machine learning on tabular data.arXiv preprint arXiv:2506.16791, 2025

work page arXiv 2025
[2]

Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505, 2020

Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505, 2020

work page arXiv 2003
[3]

A targeted real-time early warning score (trewscore) for septic shock.Science translational medicine, 7(299):299ra122–299ra122, 2015

Katharine E Henry, David N Hager, Peter J Pronovost, and Suchi Saria. A targeted real-time early warning score (trewscore) for septic shock.Science translational medicine, 7(299):299ra122–299ra122, 2015

work page 2015
[4]

Mimic-iii, a freely accessible critical care database.Scientific data, 3(1):1–9, 2016

Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. Mimic-iii, a freely accessible critical care database.Scientific data, 3(1):1–9, 2016

work page 2016
[5]

Deep neural networks detect suicide risk from textual facebook posts.Scientific reports, 10(1):16685, 2020

Yaakov Ophir, Refael Tikochinski, Christa SC Asterhan, Itay Sisso, and Roi Reichart. Deep neural networks detect suicide risk from textual facebook posts.Scientific reports, 10(1):16685, 2020

work page 2020
[6]

Credit scoring, statistical techniques and evaluation criteria: a review of the literature.Intelligent systems in accounting, finance and management, 18(2-3):59–88, 2011

Hussein A Abdou and John Pointon. Credit scoring, statistical techniques and evaluation criteria: a review of the literature.Intelligent systems in accounting, finance and management, 18(2-3):59–88, 2011

work page 2011
[7]

Consumer credit-risk models via machine- learning algorithms.Journal of Banking & Finance, 34(11):2767–2787, 2010

Amir E Khandani, Adlar J Kim, and Andrew W Lo. Consumer credit-risk models via machine- learning algorithms.Journal of Banking & Finance, 34(11):2767–2787, 2010

work page 2010
[8]

Benchmarking state-of- the-art classification algorithms for credit scoring: An update of research.European journal of operational research, 247(1):124–136, 2015

Stefan Lessmann, Bart Baesens, Hsin-Vonn Seow, and Lyn C Thomas. Benchmarking state-of- the-art classification algorithms for credit scoring: An update of research.European journal of operational research, 247(1):124–136, 2015

work page 2015
[9]

A systematic literature review of machine learning methods applied to predictive maintenance.Computers & industrial engineering, 137:106024, 2019

Thyago P Carvalho, Fabrízzio AAMN Soares, Roberto Vita, Roberto da P Francisco, João P Basto, and Symone GS Alcalá. A systematic literature review of machine learning methods applied to predictive maintenance.Computers & industrial engineering, 137:106024, 2019

work page 2019
[10]

Machine learning and reasoning for predictive maintenance in industry 4.0: Current status and challenges.Computers in industry, 123:103298, 2020

Jovani Dalzochio, Rafael Kunst, Edison Pignaton, Alecio Binotto, Srijnan Sanyal, Jose Favilla, and Jorge Barbosa. Machine learning and reasoning for predictive maintenance in industry 4.0: Current status and challenges.Computers in industry, 123:103298, 2020

work page 2020
[11]

Searching for exotic particles in high-energy physics with deep learning.Nature communications, 5(1):4308, 2014

Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for exotic particles in high-energy physics with deep learning.Nature communications, 5(1):4308, 2014

work page 2014
[12]

Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm.npj Computational Materials, 6(1):138, 2020

Alexander Dunn, Qi Wang, Alex Ganose, Daniel Dopp, and Anubhav Jain. Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm.npj Computational Materials, 6(1):138, 2020

work page 2020
[13]

Tabular data: Deep learning is not all you need.Information fusion, 81:84–90, 2022

Ravid Shwartz-Ziv and Amitai Armon. Tabular data: Deep learning is not all you need.Information fusion, 81:84–90, 2022

work page 2022
[14]

Why do tree-based models still outperform deep learning on typical tabular data?Advances in neural information processing systems, 35: 507–520, 2022

Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data?Advances in neural information processing systems, 35: 507–520, 2022

work page 2022
[15]

Tabrepo: A large scale repository of tabular model evaluations and its automl applications

David Salinas and Nick Erickson. Tabrepo: A large scale repository of tabular model evaluations and its automl applications. InAutoML Conference 2024 (ABCD Track), 2024

work page 2024
[16]

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second.arXiv preprint arXiv:2207.01848, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[17]

Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025. ISSN 1476-4687. doi: 10.1038/ s41586-024-08328-6. URLhttps://doi.org/10.1038/s41586-024-08328-6. 25

work page doi:10.1038/s41586-024-08328-6 2025
[18]

Tabpfn-2.5: Advancing the state of the art in tabular foundation models, 2025

Léo Grinsztajn, Klemens Flöge, Oscar Key, Brendan Roof Felix Birkel, Phil Jund, Benjamin Jäger, Adrian Hayler, Dominik Safaric, Felix Jablonski Simone Alessi, Mihir Manium, Rosen Yu, Anurag Garg, Jake Robertson, Shi Bin (Liam) Hoo, Vladyslav Moroshan, Magnus Bühler, Lennart Purucker, Clara Cornu, Lilly Charlotte Wehrhahn, Alessandro Bonetto, Sauraj Gambhi...

work page 2025
[19]

The tabular foundation model tabpfn outperforms specialized time series forecasting models based on simple features

Shi Bin Hoo, Samuel Müller, David Salinas, and Frank Hutter. The tabular foundation model tabpfn outperforms specialized time series forecasting models based on simple features. InNeurIPS Workshop on Time Series in the Age of Large Models, 2024

work page 2024
[20]

Do-pfn: In-context learning for causal effect estimation.arXiv preprint arXiv:2506.06039, 2025

Jake Robertson, Arik Reuter, Siyuan Guo, Noah Hollmann, Frank Hutter, and Bernhard Schölkopf. Do-pfn: In-context learning for causal effect estimation.arXiv preprint arXiv:2506.06039, 2025

work page arXiv 2025
[21]

Cresswell, and Rahul G

Vahid Balazadeh, Hamidreza Kamkari, Valentin Thomas, Benson Li, Junwei Ma, Jesse C. Cresswell, and Rahul G. Krishnan. Causalpfn: Amortized causal effect estimation via in-context learning,

work page
[22]

URLhttps://arxiv.org/abs/2506.07918

work page arXiv
[23]

Foundation models for causal inference via prior-data fitted networks, 2025

Yuchen Ma, Dennis Frauen, Emil Javurek, and Stefan Feuerriegel. Foundation models for causal inference via prior-data fitted networks, 2025. URLhttps://arxiv.org/abs/2506.10914

work page arXiv 2025
[24]

Git-bo: High-dimensionalbayesianoptimization with tabular foundation models.arXiv preprint arXiv:2505.20685, 2025

RosenTing-YingYu, CyrilPicard, andFaezAhmed. Git-bo: High-dimensionalbayesianoptimization with tabular foundation models.arXiv preprint arXiv:2505.20685, 2025. doi: 10.48550/arXiv.2505. 20685. URLhttps://arxiv.org/abs/2505.20685

work page doi:10.48550/arxiv.2505 2025
[25]

Bringing graphs to the table: Zero-shot node classification via tabular foundation models.arXiv preprint arXiv:2509.07143, 2025

Adrian Hayler, Xingyue Huang, İsmail İlkan Ceylan, Michael Bronstein, and Ben Finkelshtein. Bringing graphs to the table: Zero-shot node classification via tabular foundation models.arXiv preprint arXiv:2509.07143, 2025. doi: 10.48550/arXiv.2509.07143. URLhttps://arxiv.org/abs/ 2509.07143

work page doi:10.48550/arxiv.2509.07143 2025
[26]

Turning tabular foundation models into graph foundation models, 2025

Dmitry Eremeev, Gleb Bazhenov, Oleg Platonov, Artem Babenko, and Liudmila Prokhorenkova. Turning tabular foundation models into graph foundation models, 2025. URLhttps://arxiv.org/ abs/2508.20906

work page arXiv 2025
[27]

Interpretable machine learning for tabpfn

David Rundel, Julius Kobialka, Constantin von Crailsheim, Matthias Feurer, Thomas Nagler, and David Rügamer. Interpretable machine learning for tabpfn. InWorld Conference on Explainable Artificial Intelligence, pages 465–476. Springer, 2024

work page 2024
[28]

A closer look at tabpfn v2: Understanding its strengths and extending its capabilities.Advances in Neural Information Processing Systems, 38: 135605–135637, 2026

Han-Jia Ye, Si-Yang Liu, and Wei-Lun Harry Chao. A closer look at tabpfn v2: Understanding its strengths and extending its capabilities.Advances in Neural Information Processing Systems, 38: 135605–135637, 2026

work page 2026
[29]

Gradient free deep reinforcement learning with tabpfn.arXiv preprint arXiv:2509.11259, 2025

David Schiff, Ofir Lindenbaum, and Yonathan Efroni. Gradient free deep reinforcement learning with tabpfn.arXiv preprint arXiv:2509.11259, 2025. doi: 10.48550/arXiv.2509.11259. URL https://arxiv.org/abs/2509.11259

work page doi:10.48550/arxiv.2509.11259 2025
[30]

TabICL: A tabular foundation model for in-context learning on large data

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. TabICL: A tabular foundation model for in-context learning on large data. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=0VvD1PmNzM

work page 2025
[31]

TabICLv2: A better, faster, scalable, and open tabular foundation model

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. TabICLv2: A better, faster, scalable, and open tabular foundation model. InInternational Conference on Machine Learning, 2026

work page 2026
[32]

Nakanishi

Ken M. Nakanishi. Scalable-softmax is superior for attention, 2025. URLhttps://arxiv.org/ abs/2501.19399

work page arXiv 2025
[33]

Better by default: Strong pre-tuned mlps and boosted trees on tabular data

David Holzmüller, Léo Grinsztajn, and Ingo Steinwart. Better by default: Strong pre-tuned mlps and boosted trees on tabular data. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ul- rich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Process- ing Systems 38: Annual Conference on Neural Information Proce...

work page 2024
[34]

FlashAttention-3: fast and accurate attention with asynchrony and low-precision

Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, and Tri Dao. FlashAttention-3: fast and accurate attention with asynchrony and low-precision. InProceed- ings of the 38th International Conference on Neural Information Processing Systems, NeurIPS ’24, Red Hook, NY, USA, 2024. Curran Associates Inc. ISBN 9798331314385

work page 2024
[35]

shapiq: Shapley interactions for machine learning

Maximilian Muschalik, Hubert Baniecki, Fabian Fumagalli, Patrick Kolpaczki, Barbara Hammer, and Eyke Hüllermeier. shapiq: Shapley interactions for machine learning. InThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. URL https://openreview.net/forum?id=knxGmi6SJi

work page 2024
[36]

Philip Boeken and Joris M. Mooij. Dynamic structural causal models, 2024. URLhttps://arxiv. org/abs/2406.01161. UAI 2024 Workshop on Causal Inference for Time Series Data

work page arXiv 2024
[37]

Talent: A tabular analytics and learning toolbox.Journal of Machine Learning Research, 26 (226):1–16, 2025

Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, Huai-Hong Yin, Tao Zhou, Jun-Peng Jiang, and Han-Jia Ye. Talent: A tabular analytics and learning toolbox.Journal of Machine Learning Research, 26 (226):1–16, 2025. URLhttp://jmlr.org/papers/v26/25-0512.html

work page 2025
[38]

TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields

Alan Arazi, Eilam Shapira, and Roi Reichart. TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, editors,Advances in Neural Information Processing Systems, volume 38, pages 172108–172161. Curran Associates, Inc., 2025. URLhttps://proceedings.neurips.cc/p...

work page 2025
[39]

Catboost: unbiased boosting with categorical features.Advances in neural information processing systems, 31, 2018

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. Catboost: unbiased boosting with categorical features.Advances in neural information processing systems, 31, 2018

work page 2018
[40]

Lightgbm: A highly efficient gradient boosting decision tree

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qi- wei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish- wanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems 30, pages 3146–3154. Curran Associates, ...

work page 2017
[41]

Xgboost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016

work page 2016
[42]

Tabm: Advancing tabular deep learning with parameter-efficient ensembling

Yury Gorishniy, Akim Kotelnikov, and Artem Babenko. Tabm: Advancing tabular deep learning with parameter-efficient ensembling. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=Sd4wYYOhmY

work page 2025
[43]

Revisiting nearest neighbor for tabular data: A deep tabular baseline two decades later

Han-Jia Ye, Huai-Hong Yin, De-Chuan Zhan, and Wei-Lun Chao. Revisiting nearest neighbor for tabular data: A deep tabular baseline two decades later. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=JytL2MrlLT

work page 2025
[44]

xRFM: Accurate, scalable, and interpretable feature learning models for tabular data

Daniel Beaglehole, David Holzmüller, Adityanarayanan Radhakrishnan, and Mikhail Belkin. xrfm: Accurate, scalable, and interpretable feature learning models for tabular data, 2025. URLhttps: //arxiv.org/abs/2508.10053

work page internal anchor Pith review Pith/arXiv arXiv 2025
[45]

Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Anthony L. Caterini, and Maksims Volkovs. Tabdpt: Scaling tabular foundation models on real data, 2025. URLhttps://arxiv.org/abs/2410.18164

work page arXiv 2025
[46]

Limix: Unleashing structured-data modeling capability for generalist intelligence.arXiv preprint arXiv:2509.03505, 2025

Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, Ningbo Dai, Renzhe Xu, Shuyang Li, Tianyang Zhang, Yue He, Yuanrui Wang, Yunjia Zhang, Zijing Xu, Dongzhe Li, Fang Gao, Hao Zou, Jiandong Liu, Jiashuo Liu, Jiawei Xu, Kaijie Cheng, Kehan Li, Linjun Zhou, Qing Li, Shaohua Fan, Xiaoyu Lin, Xinyan Ha...

work page arXiv 2025
[47]

Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W

Xiyuan Zhang, Danielle C. Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W. Mahoney, Cuixiong Hu, Huzefa Rangwala, George Karypis, and Bernie Wang. Mitra: Mixed synthetic priors for enhancing tabular foundation models. InThe Thirty-ninth Annual Conference on Neural Information Proc...

work page 2025
[48]

Benchmarking multimodal automl for tabular data with text fields.arXiv preprint arXiv:2111.02705, 2021

Xingjian Shi, Jonas Mueller, Nick Erickson, Mu Li, and Alexander J Smola. Benchmarking multimodal automl for tabular data with text fields.arXiv preprint arXiv:2111.02705, 2021

work page arXiv 2021
[49]

Vectorizing string entries for data processing on tables: when are larger language models better?arXiv preprint arXiv:2312.09634, 2023

Léo Grinsztajn, Edouard Oyallon, Myung Jun Kim, and Gaël Varoquaux. Vectorizing string entries for data processing on tables: when are larger language models better?arXiv preprint arXiv:2312.09634, 2023

work page arXiv 2023
[50]

Carte: pretraining and transfer for tabular learning.arXiv preprint arXiv:2402.16785, 2024

Myung Jun Kim, Léo Grinsztajn, and Gaël Varoquaux. Carte: pretraining and transfer for tabular learning.arXiv preprint arXiv:2402.16785, 2024

work page arXiv 2024
[51]

Regression quantiles.Econometrica, 46(1):33–50, 1978

Roger Koenker and Gilbert Bassett. Regression quantiles.Econometrica, 46(1):33–50, 1978. ISSN 00129682, 14680262. URLhttp://www.jstor.org/stable/1913643

work page arXiv 1978
[52]

Quantile regression forests.Journal of Machine Learning Research, 7:983–999, 2006

Nicolai Meinshausen. Quantile regression forests.Journal of Machine Learning Research, 7:983–999, 2006

work page 2006
[53]

F., Turkmen, C., Stella, L., Erickson, N., Guerron, P., Bohlke-Schneider, M., and Wang, Y

Oleksandr Shchur, Abdul Fatir Ansari, Caner Turkmen, Lorenzo Stella, Nick Erickson, Pablo Guerron, Michael Bohlke-Schneider, and Yuyang Wang. fev-bench: A realistic benchmark for time series forecasting.arXiv preprint arXiv:2509.26468, 2025

work page arXiv 2025
[54]

Chronos-2: From Univariate to Universal Forecasting

Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, Mononito Goswami, Shubham Kapoor, Danielle C. Maddix, Pablo Guerron, Tony Hu, Junming Yin, Nick Erickson, Prateek Mutalik Desai, Hao Wang, Huzefa Rangwala, George Karypis, Yuyang Wang, and Michael B...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[55]

Tirex: Zero-shot forecasting across long and short horizons with enhanced in-context learning

Andreas Auer, Patrick Podest, Daniel Klotz, Sebastian Böck, Günter Klambauer, and Sepp Hochreiter. TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In- Context Learning. InThe Thirty-Ninth Annual Conference on Neural Information Processing Systems, 2025. URLhttps://arxiv.org/abs/2505.23719

work page arXiv 2025
[56]

A decoder-only foundation model for time-series forecasting

Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoder-only foundation model for time-series forecasting. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine ...

work page 2024
[57]

KumoRFM-2: Scaling Foundation Models for Relational Learning

Valter Hudovernik, Federico López, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, and Matthias Fey. Kumorfm-2: Scaling foundation models for relational learning, 2026. URL https://arxiv.org/abs/2604.12596

work page internal anchor Pith review Pith/arXiv arXiv 2026
[58]

Relgnn: Composite message passing for relational deep learning, 2025

Tianlang Chen, Charilaos Kanatsoulis, and Jure Leskovec. Relgnn: Composite message passing for relational deep learning, 2025. URLhttps://arxiv.org/abs/2502.06784

work page arXiv 2025
[59]

Inductive Representation Learning on Large Graphs

William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs, 2018. URLhttps://arxiv.org/abs/1706.02216

work page internal anchor Pith review Pith/arXiv arXiv 2018
[60]

Kanatsoulis, Rishi Puri, Matthias Fey, and Jure Leskovec

Vijay Prakash Dwivedi, Sri Jaladi, Yangyi Shen, Federico López, Charilaos I. Kanatsoulis, Rishi Puri, Matthias Fey, and Jure Leskovec. Relational graph transformer, 2026. URLhttps://arxiv. org/abs/2505.10960

work page arXiv 2026
[61]

Kumorfm: A foundation model for in-context learning on relational data.Kumo.ai, 2025

Matthias Fey, Vid Kocijan, Federico Lopez, Jan Eric Lenssen, and Jure Leskovec. Kumorfm: A foundation model for in-context learning on relational data.Kumo.ai, 2025. URL https: //kumo.ai/research/kumo_relational_foundation_model.pdf. 28

work page 2025
[62]

Griffin: Towards a graph-centric relational database foundation model, 2025

Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, and Muhan Zhang. Griffin: Towards a graph-centric relational database foundation model, 2025. URL https://arxiv.org/abs/2505.05568

work page arXiv 2025
[63]

Re- lational transformer: Toward zero-shot foundation models for relational data, 2026

Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos Kanatsoulis, Roshan Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, and Jure Leskovec. Re- lational transformer: Toward zero-shot foundation models for relational data, 2026. URL https://arxiv.org/abs/2510.06377

work page arXiv 2026
[64]

Rdblearn: Simple in-context prediction over relational databases, 2026

Yanlin Zhang, Linjie Xu, Quan Gan, David Wipf, and Minjie Wang. Rdblearn: Simple in-context prediction over relational databases, 2026. URLhttps://arxiv.org/abs/2602.18495

work page arXiv 2026
[65]

Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, and Jure Leskovec

Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan E. Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, and Jure Leskovec. Relbench: A benchmark for deep learning on relational databases, 2024. URLhttps://arxiv.org/abs/2407. 20060

work page 2024
[66]

Künzel, Jasjeet S

Sören R. Künzel, Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. Metalearners for estimating heterogeneous treatment effects using machine learning.Proceedings of the National Academy of Sciences, 116(10):4156–4165, 2019

work page 2019
[67]

User guide for uplift modeling and casual inference.https: //www.uplift-modeling.com/en/latest/user_guide/index.html, 2020

Irina Elisova Maksim Shevchenko. User guide for uplift modeling and casual inference.https: //www.uplift-modeling.com/en/latest/user_guide/index.html, 2020

work page 2020
[68]

Realcause: Realistic causal inference benchmarking.CoRR, abs/2011.15007, 2020

Brady Neal, Chin-Wei Huang, and Sunand Raghupathi. Realcause: Realistic causal inference benchmarking.CoRR, abs/2011.15007, 2020. URLhttps://arxiv.org/abs/2011.15007

work page arXiv 2011
[69]

Xing, and Goreti Marreiros

Afonso Lourenço, João Gama, Eric P. Xing, and Goreti Marreiros. In-context learning of evolving data streams with tabular foundational models.arXiv preprint arXiv:2502.16840, 2025. doi: 10.48550/arXiv.2502.16840. URLhttps://arxiv.org/abs/2502.16840

work page doi:10.48550/arxiv.2502.16840 2025
[70]

Time: Tabpfn-integrated multimodal engine for robust tabular-image learning, 2025

Jiaqi Luo, Yuan Yuan, and Shixin Xu. Time: Tabpfn-integrated multimodal engine for robust tabular-image learning, 2025. URLhttps://arxiv.org/abs/2506.00813

work page arXiv 2025
[71]

Predictive Maintenance for Rail Networks: Hitachi Rail Case Study

Prior Labs. Predictive Maintenance for Rail Networks: Hitachi Rail Case Study. https:// priorlabs.ai/case-studies/hitachi, 2026. Accessed May 2026

work page 2026
[72]

Credit Decisioning at Creditplus Bank: Case Study

Prior Labs. Credit Decisioning at Creditplus Bank: Case Study. https://priorlabs.ai/ case-studies/credit-plus, 2026. Accessed May 2026

work page 2026
[73]

Clinical Decision Support with Oxford Cancer Analytics: Case Study

Prior Labs. Clinical Decision Support with Oxford Cancer Analytics: Case Study. https:// priorlabs.ai/case-studies/oxcan, 2026. Accessed May 2026

work page 2026
[74]

Statistics and causal inference.Journal of the American Statistical Association, 81(396):945–960, 1986

Paul W Holland. Statistics and causal inference.Journal of the American Statistical Association, 81(396):945–960, 1986

work page 1986
[75]

The proposed uscf rating system, its development, theory, and applications.Chess life, 22(8):242–247, 1967

Arpad E Elo. The proposed uscf rating system, its development, theory, and applications.Chess life, 22(8):242–247, 1967

work page 1967
[76]

Chatbot arena: An open platform for evaluating llms by human preference

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael Jordan, Joseph E Gonzalez, et al. Chatbot arena: An open platform for evaluating llms by human preference. InForty-first International Conference on Machine Learning, 2024

work page 2024
[77]

Terpilowski

Maksim A. Terpilowski. scikit-posthocs: Pairwise multiple comparison tests in python.Journal of Open Source Software, 4(36):1169, 2019. doi: 10.21105/joss.01169. URLhttps://doi.org/10. 21105/joss.01169

work page doi:10.21105/joss.01169 2019
[78]

Panmetai - a high performance tabular foundation model for accurate pancreatic cancer diagnosis via nmr metabolomics.Nature Communications, 17, 2026

Dan-Ni Wu, Joey Jen, Erickson Fajiculay, Min-Fen Hsu, Ming-Chu Chang, Jen-Chen Yeh, Karen Sargsyan, Juozas Kupcinskas, Jurgita Skieceviciene, Ruta Steponaitiene, Egidijus Morkunas, Greta Gedgaudiene, Chao-Ping Hsu, Yu-Ting Chang, and Chun-Mei Hu. Panmetai - a high performance tabular foundation model for accurate pancreatic cancer diagnosis via nmr metabo...

work page doi:10.1038/s41467-026-69426-9 2026
[79]

Deep learning models enable healthy donor management through prediction of mobilization success.Transplantation and Cellular Therapy, 32:S3, 2026

Asif Adil and Stephanie Hurwitz. Deep learning models enable healthy donor management through prediction of mobilization success.Transplantation and Cellular Therapy, 32:S3, 2026. doi: 10.1016/j.jtct.2026.02.016. URLhttps://doi.org/10.1016/j.jtct.2026.02.016

work page doi:10.1016/j.jtct.2026.02.016 2026
[80]

Differentiation between psychotic and non-psychotic major depression by the tabular prior-data fitted network.Journal of Affective Disorders, 403:121454, 2026

Hongxin Zheng, Wenxin Gan, Yizi Liu, Shuyu Duan, Kun Li, Gongping Li, Yanqiu Xue, and Yu Xie. Differentiation between psychotic and non-psychotic major depression by the tabular prior-data fitted network.Journal of Affective Disorders, 403:121454, 2026. doi: 10.1016/j.jad.2026.121454. URLhttps://doi.org/10.1016/j.jad.2026.121454

work page doi:10.1016/j.jad.2026.121454 2026

Showing first 80 references.