TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data

Cormac Cureton; Narges Armanfard

arxiv: 2605.20234 · v1 · pith:6ZYQSJWEnew · submitted 2026-05-16 · 💻 cs.LG · cs.AI

TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data

Cormac Cureton , Narges Armanfard This is my paper

Pith reviewed 2026-05-21 07:36 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords tabular datamultitask learningin-context learningsynthetic priorTabPFNsmall datasetsmulti-target predictiondeep learning

0 comments

The pith

TabPFN-MT performs multitask in-context learning on tabular data by training on a multi-target synthetic prior.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops TabPFN-MT to handle multiple prediction targets simultaneously in tabular datasets using in-context learning. Previous PFNs required separate runs for each target, missing opportunities to use information shared across tasks. By expanding the synthetic prior to include multiple targets and adding an expanded y-encoder with a shared decoder, the new model learns inter-task dependencies during training. This leads to better performance on small datasets averaging under 1,000 samples across hundreds of datasets, while making inference much faster by needing only one pass for any number of tasks.

Core claim

TabPFN-MT is trained on an expanded multi-target synthetic prior to capture inter-task dependencies in context. This model uses an expanded y-encoder and a shared decoder head to enable multitask in-context learning and simultaneous inference. Within this regime of averaging fewer than 1,000 samples, extensive evaluations across 344 datasets demonstrate that TabPFN-MT establishes a new state-of-the-art for deep tabular multitask learning and remains highly competitive with the latest state-of-the-art single-task ensembles while reducing the inference cost for T tasks from O(T) to O(1) forward passes.

What carries the argument

Expanded y-encoder and shared decoder head operating on a multi-target synthetic prior, allowing joint modeling of multiple targets in a single in-context forward pass.

Load-bearing premise

The expanded multi-target synthetic prior successfully captures the inter-task dependencies that exist in real-world tabular data.

What would settle it

Running TabPFN-MT and single-task baselines on a collection of real multitask tabular datasets where task correlations are measured independently, and checking whether the multitask model shows larger gains on high-correlation datasets.

Figures

Figures reproduced from arXiv: 2605.20234 by Cormac Cureton, Narges Armanfard.

**Figure 1.** Figure 1: The TabPFN-MT Framework. The model relies on a synthetic multitask prior generated via highly sparse Multi-Layer Perceptrons (MLPs) simulating causal directed acyclic graphs (DAGs). During inference, the architecture leverages an expanded y-encoder and a dynamically sliced decoder head to perform simultaneous multitask in-context learning over a shared Transformer backbone. There have been several works th… view at source ↗

**Figure 2.** Figure 2: TabPFN-MT is in the top clique for Accuracy and is the highest-ranked multitask model. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Computational cost scaling across architectures. Total FLOPs (log scale) from training and [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Performance heatmaps for varying model configurations. The grid search evaluates 9 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Critical difference diagrams for auxiliary metrics. Consistent with accuracy results, TabPFN-MT significantly outperforms all other evaluated multitask baselines across both F1 and ROC AUC. Notably, for both metrics, our model is not significantly different from the top-ranked baselines, placing it in the highest statistical tier [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of task scaling and correlation on multitask gain. We report the relative ROC AUC [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Cost of target scaling across architectures. Total runtime (fit and inference) is evaluated [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

read the original abstract

Prior-Data Fitted networks (PFNs) have been very successful in tabular contexts, handling prediction tasks in context. However, they are designed for single-task inference, meaning that predicting several target values within a context requires repeated forward calls and precludes inter-task information sharing. We propose TabPFN-MT, which is trained on an expanded multi-target synthetic prior to capture inter-task dependencies in context. This model uses an expanded $y$-encoder and a shared decoder head to enable multitask in-context learning and simultaneous inference. The model is uniquely specialized for small-to-medium datasets by relying on in-context learning rather than traditional gradient-based training. Within this regime (averaging fewer than 1,000 samples), extensive evaluations across 344 datasets demonstrate that TabPFN-MT establishes a new state-of-the-art for deep tabular multitask learning. Furthermore, despite the inherent compute asymmetry of joint optimization, our model remains highly competitive with the latest state-of-the-art single-task ensembles. Notably, on multitask datasets it achieves an overall Accuracy rank of 4.89, the highest average rank among all models tested. Crucially, TabPFN-MT delivers this highly competitive performance while reducing the inference cost for $T$ tasks from $O(T)$ to $O(1)$ forward passes, offering a massive computational efficiency improvement for multi-target tabular applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TabPFN-MT adds a shared decoder and multi-target prior to enable O(1) multitask inference on small tabular data, but the gains rest on an unverified assumption about the synthetic prior.

read the letter

The core advance is training a PFN-style model on a joint multi-target synthetic prior, then using an expanded y-encoder plus one shared decoder head so that T targets can be predicted in a single forward pass instead of T separate ones. That efficiency claim is the practical hook for anyone running many small tabular problems at once. The paper shows the architecture change and reports an average rank of 4.89 across 344 datasets in the multitask setting, staying competitive with strong single-task ensembles while cutting inference cost. Those numbers, if they hold under scrutiny, give a clear operational win for the small-data regime the model targets. The experiments appear to use the same evaluation collections as prior PFN work, which at least keeps the comparison grounded. The main soft spot is that nothing in the reported results checks whether the multi-target prior actually reproduces the label correlations or conditional distributions seen in the real evaluation sets. Without that diagnostic, the observed improvement could come from the architectural tweaks alone rather than from any learned inter-task structure. The abstract also gives no detail on how baselines were re-implemented or whether error bars and statistical tests were applied, so the SOTA claim needs the full experimental section to be convincing. This paper is aimed at practitioners who already use TabPFN or similar in-context methods on tabular data with multiple targets and who care about inference speed. A reader working on industrial multitask tabular pipelines would get immediate value from the efficiency angle. It is coherent enough and addresses a real use case, so it deserves a serious referee who can press on the prior-validation gap and the experimental controls.

Referee Report

2 major / 1 minor

Summary. The paper introduces TabPFN-MT as a multitask extension of Prior-Data Fitted Networks (PFNs) for tabular data. It trains the model on an expanded multi-target synthetic prior, using an expanded y-encoder and shared decoder head to support native in-context multitask learning and simultaneous inference across targets in O(1) forward passes rather than O(T). Evaluations across 344 small-to-medium datasets (average <1000 samples) claim a new state-of-the-art for deep tabular multitask learning, with an overall accuracy rank of 4.89 on the multitask subset, while remaining competitive with single-task ensembles despite the compute asymmetry of joint optimization.

Significance. If the results hold under rigorous verification, the work would offer a meaningful contribution to efficient multitask tabular modeling in low-data regimes by eliminating per-task gradient training and reducing inference cost. The in-context approach and claimed ability to exploit inter-task structure could be practically useful for multi-target applications, provided the performance gains are shown to arise from the multitask prior rather than architecture alone.

major comments (2)

[Abstract] Abstract: The headline claim that TabPFN-MT 'establishes a new state-of-the-art for deep tabular multitask learning' and achieves rank 4.89 rests on the untested assumption that the expanded multi-target synthetic prior successfully induces the model to exploit real inter-task dependencies. No diagnostic is reported (e.g., comparison of synthetic vs. empirical target correlations, label co-occurrence statistics, or conditional distributions on the 344 evaluation collections), so it remains possible that observed gains derive entirely from the architectural changes (expanded y-encoder + shared decoder head) rather than from prior-data fitting of multitask structure.
[Abstract] Abstract and evaluation description: The reported SOTA results and rank of 4.89 supply no information on baseline implementations, statistical testing, dataset selection criteria, or error bars. Without these details the reliability of the cross-model comparison cannot be assessed and the central empirical claim cannot be verified.

minor comments (1)

[Method] Provide a clear, self-contained definition of the multi-target synthetic prior (including how targets are jointly sampled) and contrast it explicitly with the single-task prior from prior TabPFN work.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and describe the revisions we intend to incorporate.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim that TabPFN-MT 'establishes a new state-of-the-art for deep tabular multitask learning' and achieves rank 4.89 rests on the untested assumption that the expanded multi-target synthetic prior successfully induces the model to exploit real inter-task dependencies. No diagnostic is reported (e.g., comparison of synthetic vs. empirical target correlations, label co-occurrence statistics, or conditional distributions on the 344 evaluation collections), so it remains possible that observed gains derive entirely from the architectural changes (expanded y-encoder + shared decoder head) rather than from prior-data fitting of multitask structure.

Authors: We agree that direct diagnostics would strengthen the claim that performance gains arise from the multitask prior rather than architecture alone. The synthetic prior is explicitly constructed with correlated multi-target samples to induce inter-task structure, and the observed improvements on multitask datasets are consistent with this design. To address the concern rigorously, we will add an appendix containing (i) a comparison of pairwise target correlations between the synthetic prior and the 344 evaluation collections and (ii) an ablation isolating the multitask prior from the expanded y-encoder and shared decoder. revision: yes
Referee: [Abstract] Abstract and evaluation description: The reported SOTA results and rank of 4.89 supply no information on baseline implementations, statistical testing, dataset selection criteria, or error bars. Without these details the reliability of the cross-model comparison cannot be assessed and the central empirical claim cannot be verified.

Authors: All requested details are already present in the manuscript: baseline implementations and hyper-parameters are described in Section 4.1, dataset selection criteria (small-to-medium tabular datasets with average size <1000 samples) appear in Section 4.2, and statistical testing with error bars and rank aggregation are reported in Section 4.3 and the supplementary material. The rank 4.89 is the mean rank over the multitask subset. For improved accessibility we will add a concise summary of the evaluation protocol and a pointer to these sections directly in the abstract. revision: partial

Circularity Check

0 steps flagged

No circularity: results are direct empirical measurements on held-out data

full rationale

The paper's central claims consist of measured performance ranks (e.g., Accuracy rank of 4.89) and efficiency gains obtained by running the trained model on 344 external tabular datasets. These quantities are not algebraically reduced to any fitted parameter, synthetic prior statistic, or self-referential definition inside the training loop. The multi-target synthetic prior is an input to training; the reported SOTA numbers are outputs of separate evaluation and do not collapse back to that prior by construction. No equations, uniqueness theorems, or self-citations are invoked to force the headline result. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on the unverified premise that a synthetic multi-target prior induces useful inter-task representations; no free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption Training on an expanded multi-target synthetic prior captures inter-task dependencies present in real tabular data
This premise is required for the multitask in-context learning benefit to transfer from synthetic training to the 344 evaluation datasets.

pith-pipeline@v0.9.0 · 5782 in / 1320 out tokens · 35850 ms · 2026-05-21T07:36:56.303462+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose TabPFN-MT, which is trained on an expanded multi-target synthetic prior to capture inter-task dependencies in context. This model uses an expanded y-encoder and a shared decoder head to enable multitask in-context learning and simultaneous inference.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Datasets are generated by sampling a random directed acyclic graph (DAG) representing an SCM, propagating noise variables through non-linear deterministic functions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 3 internal anchors

[1]

Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025

Hollmann, Noah and Müller, Samuel and Purucker, Lennart and Krishnakumar, Arjun and Körfer, Max and Hoo, Shi Bin and Schirrmeister, Robin Tibor and Hutter, Frank , month = jan, year =. Accurate predictions on small data with a tabular foundation model , volume =. Nature , publisher =. doi:10.1038/s41586-024-08328-6 , abstract =

work page doi:10.1038/s41586-024-08328-6
[2]

Hollmann, Noah and Müller, Samuel and Eggensperger, Katharina and Hutter, Frank , year =. The

work page
[3]

Chen and C

Chen, Tianqi and Guestrin, Carlos , year =. Proceedings of the 22nd. doi:10.1145/2939672.2939785 , abstract =

work page doi:10.1145/2939672.2939785
[4]

Revisiting

Gorishniy, Yury and Rubachev, Ivan and Khrulkov, Valentin and Babenko, Artem , year =. Revisiting. Advances in

work page
[5]

SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training.arXiv preprint arXiv:2106.01342,

Somepalli, Gowthami and Goldblum, Micah and Schwarzschild, Avi and Bruss, C. Bayan and Goldstein, Tom , month = jun, year =. doi:10.48550/arXiv.2106.01342 , abstract =

work page doi:10.48550/arxiv.2106.01342
[6]

Advances in

Prokhorenkova, Liudmila and Gusev, Gleb and Vorobev, Aleksandr and Dorogush, Anna Veronika and Gulin, Andrey , year =. Advances in

work page
[7]

2024 , note =

Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2024 , note =. doi:10.1609/aaai.v38i8.28749 , abstract =

work page doi:10.1609/aaai.v38i8.28749 2024
[8]

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , series =

Ma, Jiaqi and Zhao, Zhe and Yi, Xinyang and Chen, Jilin and Hong, Lichan and Chi, Ed H. , year =. Modeling. Proceedings of the 24th. doi:10.1145/3219819.3220007 , abstract =

work page doi:10.1145/3219819.3220007
[9]

Progressive

Tang, Hongyan and Liu, Junning and Zhao, Ming and Gong, Xudong , year =. Progressive. Proceedings of the 14th. doi:10.1145/3383313.3412236 , abstract =

work page doi:10.1145/3383313.3412236
[10]

Transformers

Müller, Samuel and Hollmann, Noah and Arango, Sebastian Pineda and Grabocka, Josif and Hutter, Frank , year =. Transformers. International

work page
[11]

Advances in

Feuer, Benjamin and Schirrmeister, Robin Tibor and Cherepanova, Valeriia and Hegde, Chinmay and Hutter, Frank and Goldblum, Micah and Cohen, Niv and White, Colin , editor =. Advances in. 2024 , pages =

work page 2024
[12]

Ye, Han-Jia and Liu, Si-Yang and Chao, Wei-Lun , month = feb, year =. A. doi:10.48550/arXiv.2502.17361 , abstract =

work page doi:10.48550/arxiv.2502.17361
[13]

doi:10.48550/arXiv.2502.06684 , abstract =

Arbel, Michael and Salinas, David and Hutter, Frank , month = feb, year =. doi:10.48550/arXiv.2502.06684 , abstract =

work page doi:10.48550/arxiv.2502.06684
[14]

Interpretable

Rundel, David and Kobialka, Julius and von Crailsheim, Constantin and Feurer, Matthias and Nagler, Thomas and Rügamer, David , editor =. Interpretable. Explainable. 2024 , keywords =. doi:10.1007/978-3-031-63797-1_23 , abstract =

work page doi:10.1007/978-3-031-63797-1_23 2024
[15]

Tokenize features, enhancing tables: the

Liu, Quangao and Yang, Wei and Liang, Chen and Pang, Longlong and Zou, Zhuozhang , month = jun, year =. Tokenize features, enhancing tables: the. doi:10.48550/arXiv.2406.06891 , abstract =

work page doi:10.48550/arxiv.2406.06891
[16]

International

Rubachev, Ivan and Kartashev, Nikolay and Gorishniy, Yury and Babenko, Artem , month = oct, year =. International

work page
[17]

Helli, Kai and Schnurr, David and Hollmann, Noah and Müller, Samuel and Hutter, Frank , month = nov, year =. Drift-. Advances in

work page
[18]

Advances in

Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan , year =. Advances in

work page
[19]

Why do tree-based models still outperform deep learning on typical tabular data? , url =

Grinsztajn, Leo and Oyallon, Edouard and Varoquaux, Gael , month = jun, year =. Why do tree-based models still outperform deep learning on typical tabular data? , url =. Thirty-sixth

work page
[20]

Everything

Xiong, Zheyang and Cai, Ziyang and Cooper, John and Ge, Albert and Papageorgiou, Vasilis and Sifakis, Zack and Giannou, Angeliki and Lin, Ziqian and Yang, Liu and Agarwal, Saurabh and Chrysos, Grigorios and Oymak, Samet and Lee, Kangwook and Papailiopoulos, Dimitris , month = jun, year =. Everything. Forty-second

work page
[21]

When do neural nets outperform boosted trees on tabular data? , abstract =

McElfresh, Duncan and Khandagale, Sujay and Valverde, Jonathan and C., Vishak Prasad and Ramakrishnan, Ganesh and Goldblum, Micah and White, Colin , month = dec, year =. When do neural nets outperform boosted trees on tabular data? , abstract =. Proceedings of the 37th

work page
[22]

, month = jun, year =

Zeng, Yuchen and Dinh, Tuan and Kang, Wonjun and Mueller, Andreas C. , month = jun, year =. Forty-second

work page
[23]

TabNet: Attentive Interpretable Tabular Learning.,

Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2021 , pages =. doi:10.1609/aaai.v35i8.16826 , number =

work page doi:10.1609/aaai.v35i8.16826 2021
[24]

and Golestan, Keyvan and Yu, Guangwei and Caterini, Anthony L

Ma, Junwei and Thomas, Valentin and Hosseinzadeh, Rasa and Labach, Alex and Cresswell, Jesse C. and Golestan, Keyvan and Yu, Guangwei and Caterini, Anthony L. and Volkovs, Maksims , month = oct, year =. The

work page
[25]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , month = sep, year =. Exploring the. doi:10.48550/arXiv.1910.10683 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1910.10683 1910
[26]

Language

Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...

work page
[27]

2024 , keywords =

Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2024 , keywords =. doi:10.1609/aaai.v38i10.28988 , abstract =

work page doi:10.1609/aaai.v38i10.28988 2024
[28]

The Twelfth International Conference on Learning Representations , url =

Gorishniy, Yury and Rubachev, Ivan and Kartashev, Nikolay and Shlenskii, Daniil and Kotelnikov, Akim and Babenko, Artem , month = oct, year =. The Twelfth International Conference on Learning Representations , url =

work page
[29]

Better by default:

Holzmüller, David and Grinsztajn, Leo and Steinwart, Ingo , month = nov, year =. Better by default:. The Thirty-eighth Annual Conference on Neural Information Processing Systems , url =

work page
[30]

CoRR , year=

Adam: A Method for Stochastic Optimization , author=. CoRR , year=

work page
[31]

John Kim, Parviz Moin, and Robert Moser

Cipolla, Roberto and Gal, Yarin and Kendall, Alex , month = jun, year =. Multi-task. 2018. doi:10.1109/CVPR.2018.00781 , abstract =

work page doi:10.1109/cvpr.2018.00781 2018
[32]

Proceedings of the 35th

Chen, Zhao and Badrinarayanan, Vijay and Lee, Chen-Yu and Rabinovich, Andrew , month = jul, year =. Proceedings of the 35th

work page
[33]

Gradient

Yu, Tianhe and Kumar, Saurabh and Gupta, Abhishek and Levine, Sergey and Hausman, Karol and Finn, Chelsea , year =. Gradient. Advances in

work page
[34]

Multitask-

Sinodinos, Dimitrios and Nikpour, Bahareh and Wei, Jack Yi and Sinha, Sushant and Ma, Xiaoping and Rehman, Kashif and Yue, Stephen and Armanfard, Narges , month = mar, year =. Multitask-. doi:10.48550/arXiv.2603.22738 , abstract =

work page doi:10.48550/arxiv.2603.22738
[35]

Retiring

Ding, Frances and Hardt, Moritz and Miller, John and Schmidt, Ludwig , year =. Retiring. Advances in

work page
[36]

Improving

Li, Pengcheng and Li, Runze and Da, Qing and Zeng, An-Xiang and Zhang, Lijun , month = oct, year =. Improving. Proceedings of the 29th. doi:10.1145/3340531.3412713 , abstract =

work page doi:10.1145/3340531.3412713
[37]

Modeling the

Xi, Dongbo and Chen, Zhen and Yan, Peng and Zhang, Yinger and Zhu, Yongchun and Zhuang, Fuzhen and Chen, Yu , month = aug, year =. Modeling the. Proceedings of the 27th. doi:10.1145/3447548.3467071 , abstract =

work page doi:10.1145/3447548.3467071
[38]

Deep Learning , author=

work page
[39]

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

Huang, Xin and Khetan, Ashish and Cvitkovic, Milan and Karnin, Zohar , month = dec, year =. doi:10.48550/arXiv.2012.06678 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2012.06678 2012
[40]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Grinsztajn, Léo and Flöge, Klemens and Key, Oscar and Birkel, Felix and Jund, Philipp and Roof, Brendan and Jäger, Benjamin and Safaric, Dominik and Alessi, Simone and Hayler, Adrian and Manium, Mihir and Yu, Rosen and Jablonski, Felix and Hoo, Shi Bin and Garg, Anurag and Robertson, Jake and Bühler, Magnus and Moroshan, Vladyslav and Purucker, Lennart an...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.08667
[41]

Gadetsky, Artyom and Atanov, Andrei and Jiang, Yulun and Gao, Zhitong and Mighan, Ghazal Hosseini and Zamir, Amir and Brbic, Maria , month = apr, year =. Large (. doi:10.48550/arXiv.2504.02349 , abstract =

work page doi:10.48550/arxiv.2504.02349
[42]

Machine Learning 45(1), 5–32 (Oct 2001)

Random. Machine Learning , author =. 2001 , keywords =. doi:10.1023/A:1010933404324 , abstract =

work page doi:10.1023/a:1010933404324 2001
[43]

Machine Learning , author =

A. Machine Learning , author =. 2001 , keywords =. doi:10.1023/A:1010920819831 , abstract =

work page doi:10.1023/a:1010920819831 2001
[44]

Statistical. J. Mach. Learn. Res. , author =. 2006 , pages =

work page 2006
[45]

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Attentive Single-Tasking of Multiple Tasks , author=. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

work page 2019
[46]

Information Fusion , author =

Tabular data:. Information Fusion , author =. 2022 , keywords =. doi:10.1016/j.inffus.2021.11.011 , abstract =

work page doi:10.1016/j.inffus.2021.11.011 2022
[47]

Proceedings of the

Sinodinos, Dimitrios and Wei, Jack Yi and Armanfard, Narges , year =. Proceedings of the

work page

[1] [1]

Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025

Hollmann, Noah and Müller, Samuel and Purucker, Lennart and Krishnakumar, Arjun and Körfer, Max and Hoo, Shi Bin and Schirrmeister, Robin Tibor and Hutter, Frank , month = jan, year =. Accurate predictions on small data with a tabular foundation model , volume =. Nature , publisher =. doi:10.1038/s41586-024-08328-6 , abstract =

work page doi:10.1038/s41586-024-08328-6

[2] [2]

Hollmann, Noah and Müller, Samuel and Eggensperger, Katharina and Hutter, Frank , year =. The

work page

[3] [3]

Chen and C

Chen, Tianqi and Guestrin, Carlos , year =. Proceedings of the 22nd. doi:10.1145/2939672.2939785 , abstract =

work page doi:10.1145/2939672.2939785

[4] [4]

Revisiting

Gorishniy, Yury and Rubachev, Ivan and Khrulkov, Valentin and Babenko, Artem , year =. Revisiting. Advances in

work page

[5] [5]

SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training.arXiv preprint arXiv:2106.01342,

Somepalli, Gowthami and Goldblum, Micah and Schwarzschild, Avi and Bruss, C. Bayan and Goldstein, Tom , month = jun, year =. doi:10.48550/arXiv.2106.01342 , abstract =

work page doi:10.48550/arxiv.2106.01342

[6] [6]

Advances in

Prokhorenkova, Liudmila and Gusev, Gleb and Vorobev, Aleksandr and Dorogush, Anna Veronika and Gulin, Andrey , year =. Advances in

work page

[7] [7]

2024 , note =

Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2024 , note =. doi:10.1609/aaai.v38i8.28749 , abstract =

work page doi:10.1609/aaai.v38i8.28749 2024

[8] [8]

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , series =

Ma, Jiaqi and Zhao, Zhe and Yi, Xinyang and Chen, Jilin and Hong, Lichan and Chi, Ed H. , year =. Modeling. Proceedings of the 24th. doi:10.1145/3219819.3220007 , abstract =

work page doi:10.1145/3219819.3220007

[9] [9]

Progressive

Tang, Hongyan and Liu, Junning and Zhao, Ming and Gong, Xudong , year =. Progressive. Proceedings of the 14th. doi:10.1145/3383313.3412236 , abstract =

work page doi:10.1145/3383313.3412236

[10] [10]

Transformers

Müller, Samuel and Hollmann, Noah and Arango, Sebastian Pineda and Grabocka, Josif and Hutter, Frank , year =. Transformers. International

work page

[11] [11]

Advances in

Feuer, Benjamin and Schirrmeister, Robin Tibor and Cherepanova, Valeriia and Hegde, Chinmay and Hutter, Frank and Goldblum, Micah and Cohen, Niv and White, Colin , editor =. Advances in. 2024 , pages =

work page 2024

[12] [12]

Ye, Han-Jia and Liu, Si-Yang and Chao, Wei-Lun , month = feb, year =. A. doi:10.48550/arXiv.2502.17361 , abstract =

work page doi:10.48550/arxiv.2502.17361

[13] [13]

doi:10.48550/arXiv.2502.06684 , abstract =

Arbel, Michael and Salinas, David and Hutter, Frank , month = feb, year =. doi:10.48550/arXiv.2502.06684 , abstract =

work page doi:10.48550/arxiv.2502.06684

[14] [14]

Interpretable

Rundel, David and Kobialka, Julius and von Crailsheim, Constantin and Feurer, Matthias and Nagler, Thomas and Rügamer, David , editor =. Interpretable. Explainable. 2024 , keywords =. doi:10.1007/978-3-031-63797-1_23 , abstract =

work page doi:10.1007/978-3-031-63797-1_23 2024

[15] [15]

Tokenize features, enhancing tables: the

Liu, Quangao and Yang, Wei and Liang, Chen and Pang, Longlong and Zou, Zhuozhang , month = jun, year =. Tokenize features, enhancing tables: the. doi:10.48550/arXiv.2406.06891 , abstract =

work page doi:10.48550/arxiv.2406.06891

[16] [16]

International

Rubachev, Ivan and Kartashev, Nikolay and Gorishniy, Yury and Babenko, Artem , month = oct, year =. International

work page

[17] [17]

Helli, Kai and Schnurr, David and Hollmann, Noah and Müller, Samuel and Hutter, Frank , month = nov, year =. Drift-. Advances in

work page

[18] [18]

Advances in

Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan , year =. Advances in

work page

[19] [19]

Why do tree-based models still outperform deep learning on typical tabular data? , url =

Grinsztajn, Leo and Oyallon, Edouard and Varoquaux, Gael , month = jun, year =. Why do tree-based models still outperform deep learning on typical tabular data? , url =. Thirty-sixth

work page

[20] [20]

Everything

Xiong, Zheyang and Cai, Ziyang and Cooper, John and Ge, Albert and Papageorgiou, Vasilis and Sifakis, Zack and Giannou, Angeliki and Lin, Ziqian and Yang, Liu and Agarwal, Saurabh and Chrysos, Grigorios and Oymak, Samet and Lee, Kangwook and Papailiopoulos, Dimitris , month = jun, year =. Everything. Forty-second

work page

[21] [21]

When do neural nets outperform boosted trees on tabular data? , abstract =

McElfresh, Duncan and Khandagale, Sujay and Valverde, Jonathan and C., Vishak Prasad and Ramakrishnan, Ganesh and Goldblum, Micah and White, Colin , month = dec, year =. When do neural nets outperform boosted trees on tabular data? , abstract =. Proceedings of the 37th

work page

[22] [22]

, month = jun, year =

Zeng, Yuchen and Dinh, Tuan and Kang, Wonjun and Mueller, Andreas C. , month = jun, year =. Forty-second

work page

[23] [23]

TabNet: Attentive Interpretable Tabular Learning.,

Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2021 , pages =. doi:10.1609/aaai.v35i8.16826 , number =

work page doi:10.1609/aaai.v35i8.16826 2021

[24] [24]

and Golestan, Keyvan and Yu, Guangwei and Caterini, Anthony L

Ma, Junwei and Thomas, Valentin and Hosseinzadeh, Rasa and Labach, Alex and Cresswell, Jesse C. and Golestan, Keyvan and Yu, Guangwei and Caterini, Anthony L. and Volkovs, Maksims , month = oct, year =. The

work page

[25] [25]

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , month = sep, year =. Exploring the. doi:10.48550/arXiv.1910.10683 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1910.10683 1910

[26] [26]

Language

Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...

work page

[27] [27]

2024 , keywords =

Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2024 , keywords =. doi:10.1609/aaai.v38i10.28988 , abstract =

work page doi:10.1609/aaai.v38i10.28988 2024

[28] [28]

The Twelfth International Conference on Learning Representations , url =

Gorishniy, Yury and Rubachev, Ivan and Kartashev, Nikolay and Shlenskii, Daniil and Kotelnikov, Akim and Babenko, Artem , month = oct, year =. The Twelfth International Conference on Learning Representations , url =

work page

[29] [29]

Better by default:

Holzmüller, David and Grinsztajn, Leo and Steinwart, Ingo , month = nov, year =. Better by default:. The Thirty-eighth Annual Conference on Neural Information Processing Systems , url =

work page

[30] [30]

CoRR , year=

Adam: A Method for Stochastic Optimization , author=. CoRR , year=

work page

[31] [31]

John Kim, Parviz Moin, and Robert Moser

Cipolla, Roberto and Gal, Yarin and Kendall, Alex , month = jun, year =. Multi-task. 2018. doi:10.1109/CVPR.2018.00781 , abstract =

work page doi:10.1109/cvpr.2018.00781 2018

[32] [32]

Proceedings of the 35th

Chen, Zhao and Badrinarayanan, Vijay and Lee, Chen-Yu and Rabinovich, Andrew , month = jul, year =. Proceedings of the 35th

work page

[33] [33]

Gradient

Yu, Tianhe and Kumar, Saurabh and Gupta, Abhishek and Levine, Sergey and Hausman, Karol and Finn, Chelsea , year =. Gradient. Advances in

work page

[34] [34]

Multitask-

Sinodinos, Dimitrios and Nikpour, Bahareh and Wei, Jack Yi and Sinha, Sushant and Ma, Xiaoping and Rehman, Kashif and Yue, Stephen and Armanfard, Narges , month = mar, year =. Multitask-. doi:10.48550/arXiv.2603.22738 , abstract =

work page doi:10.48550/arxiv.2603.22738

[35] [35]

Retiring

Ding, Frances and Hardt, Moritz and Miller, John and Schmidt, Ludwig , year =. Retiring. Advances in

work page

[36] [36]

Improving

Li, Pengcheng and Li, Runze and Da, Qing and Zeng, An-Xiang and Zhang, Lijun , month = oct, year =. Improving. Proceedings of the 29th. doi:10.1145/3340531.3412713 , abstract =

work page doi:10.1145/3340531.3412713

[37] [37]

Modeling the

Xi, Dongbo and Chen, Zhen and Yan, Peng and Zhang, Yinger and Zhu, Yongchun and Zhuang, Fuzhen and Chen, Yu , month = aug, year =. Modeling the. Proceedings of the 27th. doi:10.1145/3447548.3467071 , abstract =

work page doi:10.1145/3447548.3467071

[38] [38]

Deep Learning , author=

work page

[39] [39]

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

Huang, Xin and Khetan, Ashish and Cvitkovic, Milan and Karnin, Zohar , month = dec, year =. doi:10.48550/arXiv.2012.06678 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2012.06678 2012

[40] [40]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Grinsztajn, Léo and Flöge, Klemens and Key, Oscar and Birkel, Felix and Jund, Philipp and Roof, Brendan and Jäger, Benjamin and Safaric, Dominik and Alessi, Simone and Hayler, Adrian and Manium, Mihir and Yu, Rosen and Jablonski, Felix and Hoo, Shi Bin and Garg, Anurag and Robertson, Jake and Bühler, Magnus and Moroshan, Vladyslav and Purucker, Lennart an...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.08667

[41] [41]

Gadetsky, Artyom and Atanov, Andrei and Jiang, Yulun and Gao, Zhitong and Mighan, Ghazal Hosseini and Zamir, Amir and Brbic, Maria , month = apr, year =. Large (. doi:10.48550/arXiv.2504.02349 , abstract =

work page doi:10.48550/arxiv.2504.02349

[42] [42]

Machine Learning 45(1), 5–32 (Oct 2001)

Random. Machine Learning , author =. 2001 , keywords =. doi:10.1023/A:1010933404324 , abstract =

work page doi:10.1023/a:1010933404324 2001

[43] [43]

Machine Learning , author =

A. Machine Learning , author =. 2001 , keywords =. doi:10.1023/A:1010920819831 , abstract =

work page doi:10.1023/a:1010920819831 2001

[44] [44]

Statistical. J. Mach. Learn. Res. , author =. 2006 , pages =

work page 2006

[45] [45]

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Attentive Single-Tasking of Multiple Tasks , author=. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

work page 2019

[46] [46]

Information Fusion , author =

Tabular data:. Information Fusion , author =. 2022 , keywords =. doi:10.1016/j.inffus.2021.11.011 , abstract =

work page doi:10.1016/j.inffus.2021.11.011 2022

[47] [47]

Proceedings of the

Sinodinos, Dimitrios and Wei, Jack Yi and Armanfard, Narges , year =. Proceedings of the

work page