pith. sign in

arxiv: 2605.20234 · v1 · pith:6ZYQSJWEnew · submitted 2026-05-16 · 💻 cs.LG · cs.AI

TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data

Pith reviewed 2026-05-21 07:36 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords tabular datamultitask learningin-context learningsynthetic priorTabPFNsmall datasetsmulti-target predictiondeep learning
0
0 comments X

The pith

TabPFN-MT performs multitask in-context learning on tabular data by training on a multi-target synthetic prior.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops TabPFN-MT to handle multiple prediction targets simultaneously in tabular datasets using in-context learning. Previous PFNs required separate runs for each target, missing opportunities to use information shared across tasks. By expanding the synthetic prior to include multiple targets and adding an expanded y-encoder with a shared decoder, the new model learns inter-task dependencies during training. This leads to better performance on small datasets averaging under 1,000 samples across hundreds of datasets, while making inference much faster by needing only one pass for any number of tasks.

Core claim

TabPFN-MT is trained on an expanded multi-target synthetic prior to capture inter-task dependencies in context. This model uses an expanded y-encoder and a shared decoder head to enable multitask in-context learning and simultaneous inference. Within this regime of averaging fewer than 1,000 samples, extensive evaluations across 344 datasets demonstrate that TabPFN-MT establishes a new state-of-the-art for deep tabular multitask learning and remains highly competitive with the latest state-of-the-art single-task ensembles while reducing the inference cost for T tasks from O(T) to O(1) forward passes.

What carries the argument

Expanded y-encoder and shared decoder head operating on a multi-target synthetic prior, allowing joint modeling of multiple targets in a single in-context forward pass.

Load-bearing premise

The expanded multi-target synthetic prior successfully captures the inter-task dependencies that exist in real-world tabular data.

What would settle it

Running TabPFN-MT and single-task baselines on a collection of real multitask tabular datasets where task correlations are measured independently, and checking whether the multitask model shows larger gains on high-correlation datasets.

Figures

Figures reproduced from arXiv: 2605.20234 by Cormac Cureton, Narges Armanfard.

Figure 1
Figure 1. Figure 1: The TabPFN-MT Framework. The model relies on a synthetic multitask prior generated via highly sparse Multi-Layer Perceptrons (MLPs) simulating causal directed acyclic graphs (DAGs). During inference, the architecture leverages an expanded y-encoder and a dynamically sliced decoder head to perform simultaneous multitask in-context learning over a shared Transformer backbone. There have been several works th… view at source ↗
Figure 2
Figure 2. Figure 2: TabPFN-MT is in the top clique for Accuracy and is the highest-ranked multitask model. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Computational cost scaling across architectures. Total FLOPs (log scale) from training and [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance heatmaps for varying model configurations. The grid search evaluates 9 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Critical difference diagrams for auxiliary metrics. Consistent with accuracy results, TabPFN-MT significantly outperforms all other evaluated multitask baselines across both F1 and ROC AUC. Notably, for both metrics, our model is not significantly different from the top-ranked baselines, placing it in the highest statistical tier [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effect of task scaling and correlation on multitask gain. We report the relative ROC AUC [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Cost of target scaling across architectures. Total runtime (fit and inference) is evaluated [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
read the original abstract

Prior-Data Fitted networks (PFNs) have been very successful in tabular contexts, handling prediction tasks in context. However, they are designed for single-task inference, meaning that predicting several target values within a context requires repeated forward calls and precludes inter-task information sharing. We propose TabPFN-MT, which is trained on an expanded multi-target synthetic prior to capture inter-task dependencies in context. This model uses an expanded $y$-encoder and a shared decoder head to enable multitask in-context learning and simultaneous inference. The model is uniquely specialized for small-to-medium datasets by relying on in-context learning rather than traditional gradient-based training. Within this regime (averaging fewer than 1,000 samples), extensive evaluations across 344 datasets demonstrate that TabPFN-MT establishes a new state-of-the-art for deep tabular multitask learning. Furthermore, despite the inherent compute asymmetry of joint optimization, our model remains highly competitive with the latest state-of-the-art single-task ensembles. Notably, on multitask datasets it achieves an overall Accuracy rank of 4.89, the highest average rank among all models tested. Crucially, TabPFN-MT delivers this highly competitive performance while reducing the inference cost for $T$ tasks from $O(T)$ to $O(1)$ forward passes, offering a massive computational efficiency improvement for multi-target tabular applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces TabPFN-MT as a multitask extension of Prior-Data Fitted Networks (PFNs) for tabular data. It trains the model on an expanded multi-target synthetic prior, using an expanded y-encoder and shared decoder head to support native in-context multitask learning and simultaneous inference across targets in O(1) forward passes rather than O(T). Evaluations across 344 small-to-medium datasets (average <1000 samples) claim a new state-of-the-art for deep tabular multitask learning, with an overall accuracy rank of 4.89 on the multitask subset, while remaining competitive with single-task ensembles despite the compute asymmetry of joint optimization.

Significance. If the results hold under rigorous verification, the work would offer a meaningful contribution to efficient multitask tabular modeling in low-data regimes by eliminating per-task gradient training and reducing inference cost. The in-context approach and claimed ability to exploit inter-task structure could be practically useful for multi-target applications, provided the performance gains are shown to arise from the multitask prior rather than architecture alone.

major comments (2)
  1. [Abstract] Abstract: The headline claim that TabPFN-MT 'establishes a new state-of-the-art for deep tabular multitask learning' and achieves rank 4.89 rests on the untested assumption that the expanded multi-target synthetic prior successfully induces the model to exploit real inter-task dependencies. No diagnostic is reported (e.g., comparison of synthetic vs. empirical target correlations, label co-occurrence statistics, or conditional distributions on the 344 evaluation collections), so it remains possible that observed gains derive entirely from the architectural changes (expanded y-encoder + shared decoder head) rather than from prior-data fitting of multitask structure.
  2. [Abstract] Abstract and evaluation description: The reported SOTA results and rank of 4.89 supply no information on baseline implementations, statistical testing, dataset selection criteria, or error bars. Without these details the reliability of the cross-model comparison cannot be assessed and the central empirical claim cannot be verified.
minor comments (1)
  1. [Method] Provide a clear, self-contained definition of the multi-target synthetic prior (including how targets are jointly sampled) and contrast it explicitly with the single-task prior from prior TabPFN work.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and describe the revisions we intend to incorporate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim that TabPFN-MT 'establishes a new state-of-the-art for deep tabular multitask learning' and achieves rank 4.89 rests on the untested assumption that the expanded multi-target synthetic prior successfully induces the model to exploit real inter-task dependencies. No diagnostic is reported (e.g., comparison of synthetic vs. empirical target correlations, label co-occurrence statistics, or conditional distributions on the 344 evaluation collections), so it remains possible that observed gains derive entirely from the architectural changes (expanded y-encoder + shared decoder head) rather than from prior-data fitting of multitask structure.

    Authors: We agree that direct diagnostics would strengthen the claim that performance gains arise from the multitask prior rather than architecture alone. The synthetic prior is explicitly constructed with correlated multi-target samples to induce inter-task structure, and the observed improvements on multitask datasets are consistent with this design. To address the concern rigorously, we will add an appendix containing (i) a comparison of pairwise target correlations between the synthetic prior and the 344 evaluation collections and (ii) an ablation isolating the multitask prior from the expanded y-encoder and shared decoder. revision: yes

  2. Referee: [Abstract] Abstract and evaluation description: The reported SOTA results and rank of 4.89 supply no information on baseline implementations, statistical testing, dataset selection criteria, or error bars. Without these details the reliability of the cross-model comparison cannot be assessed and the central empirical claim cannot be verified.

    Authors: All requested details are already present in the manuscript: baseline implementations and hyper-parameters are described in Section 4.1, dataset selection criteria (small-to-medium tabular datasets with average size <1000 samples) appear in Section 4.2, and statistical testing with error bars and rank aggregation are reported in Section 4.3 and the supplementary material. The rank 4.89 is the mean rank over the multitask subset. For improved accessibility we will add a concise summary of the evaluation protocol and a pointer to these sections directly in the abstract. revision: partial

Circularity Check

0 steps flagged

No circularity: results are direct empirical measurements on held-out data

full rationale

The paper's central claims consist of measured performance ranks (e.g., Accuracy rank of 4.89) and efficiency gains obtained by running the trained model on 344 external tabular datasets. These quantities are not algebraically reduced to any fitted parameter, synthetic prior statistic, or self-referential definition inside the training loop. The multi-target synthetic prior is an input to training; the reported SOTA numbers are outputs of separate evaluation and do not collapse back to that prior by construction. No equations, uniqueness theorems, or self-citations are invoked to force the headline result. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on the unverified premise that a synthetic multi-target prior induces useful inter-task representations; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Training on an expanded multi-target synthetic prior captures inter-task dependencies present in real tabular data
    This premise is required for the multitask in-context learning benefit to transfer from synthetic training to the 344 evaluation datasets.

pith-pipeline@v0.9.0 · 5782 in / 1320 out tokens · 35850 ms · 2026-05-21T07:36:56.303462+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 3 internal anchors

  1. [1]

    Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025

    Hollmann, Noah and Müller, Samuel and Purucker, Lennart and Krishnakumar, Arjun and Körfer, Max and Hoo, Shi Bin and Schirrmeister, Robin Tibor and Hutter, Frank , month = jan, year =. Accurate predictions on small data with a tabular foundation model , volume =. Nature , publisher =. doi:10.1038/s41586-024-08328-6 , abstract =

  2. [2]

    Hollmann, Noah and Müller, Samuel and Eggensperger, Katharina and Hutter, Frank , year =. The

  3. [3]

    Chen and C

    Chen, Tianqi and Guestrin, Carlos , year =. Proceedings of the 22nd. doi:10.1145/2939672.2939785 , abstract =

  4. [4]

    Revisiting

    Gorishniy, Yury and Rubachev, Ivan and Khrulkov, Valentin and Babenko, Artem , year =. Revisiting. Advances in

  5. [5]

    SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training.arXiv preprint arXiv:2106.01342,

    Somepalli, Gowthami and Goldblum, Micah and Schwarzschild, Avi and Bruss, C. Bayan and Goldstein, Tom , month = jun, year =. doi:10.48550/arXiv.2106.01342 , abstract =

  6. [6]

    Advances in

    Prokhorenkova, Liudmila and Gusev, Gleb and Vorobev, Aleksandr and Dorogush, Anna Veronika and Gulin, Andrey , year =. Advances in

  7. [7]

    2024 , note =

    Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2024 , note =. doi:10.1609/aaai.v38i8.28749 , abstract =

  8. [8]

    Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , series =

    Ma, Jiaqi and Zhao, Zhe and Yi, Xinyang and Chen, Jilin and Hong, Lichan and Chi, Ed H. , year =. Modeling. Proceedings of the 24th. doi:10.1145/3219819.3220007 , abstract =

  9. [9]

    Progressive

    Tang, Hongyan and Liu, Junning and Zhao, Ming and Gong, Xudong , year =. Progressive. Proceedings of the 14th. doi:10.1145/3383313.3412236 , abstract =

  10. [10]

    Transformers

    Müller, Samuel and Hollmann, Noah and Arango, Sebastian Pineda and Grabocka, Josif and Hutter, Frank , year =. Transformers. International

  11. [11]

    Advances in

    Feuer, Benjamin and Schirrmeister, Robin Tibor and Cherepanova, Valeriia and Hegde, Chinmay and Hutter, Frank and Goldblum, Micah and Cohen, Niv and White, Colin , editor =. Advances in. 2024 , pages =

  12. [12]

    Ye, Han-Jia and Liu, Si-Yang and Chao, Wei-Lun , month = feb, year =. A. doi:10.48550/arXiv.2502.17361 , abstract =

  13. [13]

    doi:10.48550/arXiv.2502.06684 , abstract =

    Arbel, Michael and Salinas, David and Hutter, Frank , month = feb, year =. doi:10.48550/arXiv.2502.06684 , abstract =

  14. [14]

    Interpretable

    Rundel, David and Kobialka, Julius and von Crailsheim, Constantin and Feurer, Matthias and Nagler, Thomas and Rügamer, David , editor =. Interpretable. Explainable. 2024 , keywords =. doi:10.1007/978-3-031-63797-1_23 , abstract =

  15. [15]

    Tokenize features, enhancing tables: the

    Liu, Quangao and Yang, Wei and Liang, Chen and Pang, Longlong and Zou, Zhuozhang , month = jun, year =. Tokenize features, enhancing tables: the. doi:10.48550/arXiv.2406.06891 , abstract =

  16. [16]

    International

    Rubachev, Ivan and Kartashev, Nikolay and Gorishniy, Yury and Babenko, Artem , month = oct, year =. International

  17. [17]

    Helli, Kai and Schnurr, David and Hollmann, Noah and Müller, Samuel and Hutter, Frank , month = nov, year =. Drift-. Advances in

  18. [18]

    Advances in

    Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan , year =. Advances in

  19. [19]

    Why do tree-based models still outperform deep learning on typical tabular data? , url =

    Grinsztajn, Leo and Oyallon, Edouard and Varoquaux, Gael , month = jun, year =. Why do tree-based models still outperform deep learning on typical tabular data? , url =. Thirty-sixth

  20. [20]

    Everything

    Xiong, Zheyang and Cai, Ziyang and Cooper, John and Ge, Albert and Papageorgiou, Vasilis and Sifakis, Zack and Giannou, Angeliki and Lin, Ziqian and Yang, Liu and Agarwal, Saurabh and Chrysos, Grigorios and Oymak, Samet and Lee, Kangwook and Papailiopoulos, Dimitris , month = jun, year =. Everything. Forty-second

  21. [21]

    When do neural nets outperform boosted trees on tabular data? , abstract =

    McElfresh, Duncan and Khandagale, Sujay and Valverde, Jonathan and C., Vishak Prasad and Ramakrishnan, Ganesh and Goldblum, Micah and White, Colin , month = dec, year =. When do neural nets outperform boosted trees on tabular data? , abstract =. Proceedings of the 37th

  22. [22]

    , month = jun, year =

    Zeng, Yuchen and Dinh, Tuan and Kang, Wonjun and Mueller, Andreas C. , month = jun, year =. Forty-second

  23. [23]

    TabNet: Attentive Interpretable Tabular Learning.,

    Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2021 , pages =. doi:10.1609/aaai.v35i8.16826 , number =

  24. [24]

    and Golestan, Keyvan and Yu, Guangwei and Caterini, Anthony L

    Ma, Junwei and Thomas, Valentin and Hosseinzadeh, Rasa and Labach, Alex and Cresswell, Jesse C. and Golestan, Keyvan and Yu, Guangwei and Caterini, Anthony L. and Volkovs, Maksims , month = oct, year =. The

  25. [25]

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

    Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J. , month = sep, year =. Exploring the. doi:10.48550/arXiv.1910.10683 , abstract =

  26. [26]

    Language

    Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...

  27. [27]

    2024 , keywords =

    Proceedings of the AAAI Conference on Artificial Intelligence , author =. 2024 , keywords =. doi:10.1609/aaai.v38i10.28988 , abstract =

  28. [28]

    The Twelfth International Conference on Learning Representations , url =

    Gorishniy, Yury and Rubachev, Ivan and Kartashev, Nikolay and Shlenskii, Daniil and Kotelnikov, Akim and Babenko, Artem , month = oct, year =. The Twelfth International Conference on Learning Representations , url =

  29. [29]

    Better by default:

    Holzmüller, David and Grinsztajn, Leo and Steinwart, Ingo , month = nov, year =. Better by default:. The Thirty-eighth Annual Conference on Neural Information Processing Systems , url =

  30. [30]

    CoRR , year=

    Adam: A Method for Stochastic Optimization , author=. CoRR , year=

  31. [31]

    John Kim, Parviz Moin, and Robert Moser

    Cipolla, Roberto and Gal, Yarin and Kendall, Alex , month = jun, year =. Multi-task. 2018. doi:10.1109/CVPR.2018.00781 , abstract =

  32. [32]

    Proceedings of the 35th

    Chen, Zhao and Badrinarayanan, Vijay and Lee, Chen-Yu and Rabinovich, Andrew , month = jul, year =. Proceedings of the 35th

  33. [33]

    Gradient

    Yu, Tianhe and Kumar, Saurabh and Gupta, Abhishek and Levine, Sergey and Hausman, Karol and Finn, Chelsea , year =. Gradient. Advances in

  34. [34]

    Multitask-

    Sinodinos, Dimitrios and Nikpour, Bahareh and Wei, Jack Yi and Sinha, Sushant and Ma, Xiaoping and Rehman, Kashif and Yue, Stephen and Armanfard, Narges , month = mar, year =. Multitask-. doi:10.48550/arXiv.2603.22738 , abstract =

  35. [35]

    Retiring

    Ding, Frances and Hardt, Moritz and Miller, John and Schmidt, Ludwig , year =. Retiring. Advances in

  36. [36]

    Improving

    Li, Pengcheng and Li, Runze and Da, Qing and Zeng, An-Xiang and Zhang, Lijun , month = oct, year =. Improving. Proceedings of the 29th. doi:10.1145/3340531.3412713 , abstract =

  37. [37]

    Modeling the

    Xi, Dongbo and Chen, Zhen and Yan, Peng and Zhang, Yinger and Zhu, Yongchun and Zhuang, Fuzhen and Chen, Yu , month = aug, year =. Modeling the. Proceedings of the 27th. doi:10.1145/3447548.3467071 , abstract =

  38. [38]

    Deep Learning , author=

  39. [39]

    TabTransformer: Tabular Data Modeling Using Contextual Embeddings

    Huang, Xin and Khetan, Ashish and Cvitkovic, Milan and Karnin, Zohar , month = dec, year =. doi:10.48550/arXiv.2012.06678 , abstract =

  40. [40]

    TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

    Grinsztajn, Léo and Flöge, Klemens and Key, Oscar and Birkel, Felix and Jund, Philipp and Roof, Brendan and Jäger, Benjamin and Safaric, Dominik and Alessi, Simone and Hayler, Adrian and Manium, Mihir and Yu, Rosen and Jablonski, Felix and Hoo, Shi Bin and Garg, Anurag and Robertson, Jake and Bühler, Magnus and Moroshan, Vladyslav and Purucker, Lennart an...

  41. [41]

    Gadetsky, Artyom and Atanov, Andrei and Jiang, Yulun and Gao, Zhitong and Mighan, Ghazal Hosseini and Zamir, Amir and Brbic, Maria , month = apr, year =. Large (. doi:10.48550/arXiv.2504.02349 , abstract =

  42. [42]

    Machine Learning 45(1), 5–32 (Oct 2001)

    Random. Machine Learning , author =. 2001 , keywords =. doi:10.1023/A:1010933404324 , abstract =

  43. [43]

    Machine Learning , author =

    A. Machine Learning , author =. 2001 , keywords =. doi:10.1023/A:1010920819831 , abstract =

  44. [44]

    Statistical. J. Mach. Learn. Res. , author =. 2006 , pages =

  45. [45]

    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    Attentive Single-Tasking of Multiple Tasks , author=. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  46. [46]

    Information Fusion , author =

    Tabular data:. Information Fusion , author =. 2022 , keywords =. doi:10.1016/j.inffus.2021.11.011 , abstract =

  47. [47]

    Proceedings of the

    Sinodinos, Dimitrios and Wei, Jack Yi and Armanfard, Narges , year =. Proceedings of the