On the Uncertainty Quantification Ability of Tabular Foundation Models

Kian Ben-Jacob; Nima Negarandeh; Oriol Vendrell-Gallart; Ramin Bostanabad; Tyler R. Johnson

arxiv: 2606.01427 · v1 · pith:JPF2EBR2new · submitted 2026-05-31 · 📊 stat.ML · cs.LG

On the Uncertainty Quantification Ability of Tabular Foundation Models

Tyler R. Johnson , Kian Ben-Jacob , Nima Negarandeh , Oriol Vendrell-Gallart , Ramin Bostanabad This is my paper

Pith reviewed 2026-06-28 15:57 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords tabular foundation modelsuncertainty quantificationTabPFNGaussian processesregressiondata scarcitypredictive accuracy

0 comments

The pith

TabPFN matches Gaussian processes on complex high-dimensional regression with enough data but GPs give superior accuracy and uncertainty estimates when data is scarce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper empirically compares tabular foundation models, specifically TabPFN, against Gaussian processes on a range of regression tasks that vary in complexity, dataset size, and input dimensions. It establishes that learned priors in TabPFN compete well or better on large, complex problems, while explicit priors in GPs deliver stronger predictive performance and uncertainty quantification in low-data regimes. The work also shows that a well-matched kernel can let GPs substantially outperform TabPFN. This matters for applications in mechanics and computational science that require reliable uncertainty estimates alongside predictions. The comparison uses default GP settings throughout for consistency against TabPFN version 2.5.

Core claim

TabPFN achieves highly competitive performance for complex, high-dimensional problems with sufficient data, while GPs often provide superior predictive accuracy and UQ in data-scarce settings. When the chosen kernel constitutes a good prior for the underlying function, GP performance can substantially exceed that of TabPFN.

What carries the argument

Empirical head-to-head evaluation of TabPFN's learned priors versus GPs' explicit kernel priors on regression accuracy and uncertainty quantification across varying data regimes.

If this is right

In data-scarce settings GPs are the stronger choice for both accuracy and calibrated uncertainty.
TabPFN becomes competitive or preferable once dataset size and dimensionality increase sufficiently.
Kernel choice directly controls whether GPs can substantially beat TabPFN on a given problem.
Default GP configurations already serve as a reproducible baseline against the current TabPFN release.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid models that blend explicit kernels with learned components may close the gap between the two approaches in intermediate data regimes.
Model selection guidelines for UQ tasks should include dataset size and dimensionality thresholds rather than treating one method as universally superior.
The same trade-off pattern may appear when comparing other foundation models to classical probabilistic methods outside the tabular setting.

Load-bearing premise

Using a default setting to build all the GPs provides a fair comparison against TabPFN v2.5 across the tested range of dataset sizes, complexities, and input dimensionalities.

What would settle it

Re-running the experiments with kernels that are explicitly optimized or selected to match each underlying function and checking whether GP performance no longer exceeds TabPFN in the low-data regime.

read the original abstract

Foundation models (FMs) have achieved substantial success in generalizing across tasks without problemspecific training or fine-tuning. However, many critical applications in mechanics and computational science require not only accurate predictions but also reliable uncertainty quantification (UQ). Herein we investigate the UQ capabilities of tabular FMs in regression tasks through a comprehensive empirical study comparing Tabular Prior-Data Fitted Networks (TabPFN) against Gaussian processes (GPs). We systematically evaluate these two methods across a host of regression problems with varying complexity, dataset sizes, and input dimensionalities. We use a default setting to build all the GPs and for a fair comparison against TabPFN v2.5. Our findings highlight an important trade-off between explicit and learned priors: while TabPFN achieves highly competitive performance for complex, high-dimensional problems with sufficient data, GPs often provide superior predictive accuracy and UQ in data-scarce settings. Moreover, when the chosen kernel constitutes a good prior for the underlying function, GP performance can substantially exceed that of TabPFN. Our results can be reproduced from https://github.com/kianswarehouse/GPvsPFN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper benchmarks TabPFN against default GPs and reports a regime-dependent trade-off in UQ performance for tabular regression.

read the letter

The main thing to know is that this paper finds GPs with default kernels often beat TabPFN on accuracy and UQ in low-data tabular regression, while TabPFN is competitive or better in high-dimensional cases with more data.

The contribution is an empirical head-to-head comparison across varying dataset sizes, dimensions, and complexities. They evaluate both methods on regression tasks relevant to mechanics and computational science, focusing on UQ quality in addition to point predictions. TabPFN is taken as-is in version 2.5, and GPs use a default setting to keep the comparison straightforward. The results highlight the difference between learned priors and explicit ones, with the added observation that good kernel choice can make GPs substantially stronger. The code is available on GitHub, which helps with verification.

This is useful as a benchmarking study because it systematically covers the axes of variation and reports the trade-offs without overclaiming new theory. The focus on UQ in real application areas gives it practical value.

The soft spot is the default GP setup. While the authors point out that performance can exceed TabPFN when the kernel is a good prior, the 'often superior' result in data-scarce settings depends on how well the default matches the test problems. If the default kernel is a poor fit for many of the functions, the advantage might not hold against a more carefully chosen explicit prior. This is a moderate concern rather than a deal-breaker, since they frame the default as the fair baseline against TabPFN's fixed model. The math and data handling appear standard for this type of work.

This paper is for readers who need guidance on choosing between tabular foundation models and Gaussian processes for regression with uncertainty quantification. It offers regime-specific insights that could inform method selection in scientific computing. The empirical grounding and reproducibility make it appropriate for peer review.

I recommend sending it to referees.

Referee Report

1 major / 0 minor

Summary. The paper conducts an empirical study comparing the uncertainty quantification performance of TabPFN v2.5 against Gaussian processes on regression tasks, varying dataset size, input dimensionality, and problem complexity. Using a fixed default configuration for all GPs, the authors conclude that TabPFN is competitive for complex, high-dimensional problems with adequate data while GPs frequently deliver better accuracy and UQ in low-data regimes, with GP performance improving further when the kernel matches the underlying function. Results are stated to be reproducible via the linked GitHub repository.

Significance. If the central empirical comparison holds under a fair GP baseline, the work usefully documents a practical trade-off between learned priors in tabular foundation models and explicit priors in GPs for regression UQ. The public code repository is a clear strength that enables direct verification of the reported trends.

major comments (1)

[Abstract / Experimental setup] Abstract and experimental setup (likely §3–4): the claim that GPs 'often provide superior predictive accuracy and UQ in data-scarce settings' rests on a single default GP configuration applied uniformly. Because a fixed default kernel can be a poor prior for many regression functions, this choice is load-bearing for the reported trade-off; the manuscript should either justify why the default remains representative across the tested regimes or include results with kernel selection/tuning to secure the 'often superior' conclusion.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our empirical comparison of TabPFN and GPs for uncertainty quantification. We address the single major comment below.

read point-by-point responses

Referee: [Abstract / Experimental setup] Abstract and experimental setup (likely §3–4): the claim that GPs 'often provide superior predictive accuracy and UQ in data-scarce settings' rests on a single default GP configuration applied uniformly. Because a fixed default kernel can be a poor prior for many regression functions, this choice is load-bearing for the reported trade-off; the manuscript should either justify why the default remains representative across the tested regimes or include results with kernel selection/tuning to secure the 'often superior' conclusion.

Authors: We chose the fixed default GP configuration (as stated in the abstract and §3) specifically to enable a fair comparison with TabPFN v2.5, which is applied in its default zero-shot mode without task-specific tuning or kernel selection. This design choice reflects the practical use of tabular foundation models. The manuscript already notes that GP performance improves substantially when the kernel matches the underlying function. We will revise the experimental setup section to add an explicit justification for the representativeness of the default configuration in the context of this comparison. We do not plan to add kernel-tuned GP results, as that would shift the comparison away from the foundation-model setting. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with no derivations

full rationale

The paper is an observational empirical study comparing TabPFN v2.5 against default GPs across regression benchmarks. It contains no mathematical derivation chain, no fitted parameters renamed as predictions, and no self-citation load-bearing steps that reduce claims to inputs by construction. The central trade-off claim is presented as an outcome of the described experiments rather than a self-referential result. The default-GP assumption is a methodological choice open to critique on fairness grounds but does not constitute circularity under the defined patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the representativeness of the chosen regression problems and the fairness of the default GP configuration as a benchmark; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Default GP settings constitute a fair and representative baseline for comparison with TabPFN v2.5
Explicitly stated in the abstract as the protocol used for all GPs.

pith-pipeline@v0.9.1-grok · 5748 in / 1281 out tokens · 29662 ms · 2026-06-28T15:57:47.528369+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 2 canonical work pages · 2 internal anchors

[1]

Weight uncertainty in neural network

Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural network. InInternational conference on machine learning, pages 1613–1622. PMLR, 2015

2015
[2]

The MIT Press, 2006

Carl Rasmussen and Christopher Williams.Gaussian Processes For Machine Learning. The MIT Press, 2006

2006
[3]

Physical systems with random uncertainties: Chaos represen- tations with arbitrary probability measure.SIAM Journal on Scientific Computing, 26(2):395–410, 2004

Christian Soize and Roger Ghanem. Physical systems with random uncertainties: Chaos represen- tations with arbitrary probability measure.SIAM Journal on Scientific Computing, 26(2):395–410, 2004

2004
[4]

Srivastava, G

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(1):1929–1958, 2014

1929
[5]

Dropout as a bayesian approximation: Representing model uncer- tainty in deep learning

Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncer- tainty in deep learning. Ininternational conference on machine learning, pages 1050–1059, 2016

2016
[6]

A survey of transformers, 2021

Tianyang Lin, Yuxin Wang, Xiangyang Liu, and Xipeng Qiu. A survey of transformers, 2021

2021
[7]

Accurate predictions on small data with a tabular foun- dation model.Nature, 637:319–326, 2025

Noah Hollmann, Samuel M ¨uller, Lennart Purucker, Arjun Krishnakumar, Max K ¨orfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foun- dation model.Nature, 637:319–326, 2025

2025
[8]

Gp+: a python library for kernel-based learning via gaussian processes.Advances in Engineering Software, 195:103686, 2024

Amin Yousefpour, Zahra Zanjani Foumani, Mehdi Shishehbor, Carlos Mora, and Ramin Bostanabad. Gp+: a python library for kernel-based learning via gaussian processes.Advances in Engineering Software, 195:103686, 2024

2024
[9]

Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378, 2007

2007
[10]

Classes of kernels for machine learning: a statistics perspective.Journal of machine learning research, 2(Dec):299–312, 2001

Marc G Genton. Classes of kernels for machine learning: a statistics perspective.Journal of machine learning research, 2(Dec):299–312, 2001

2001
[11]

Sparse gaussian processes using pseudo-inputs

Edward Snelson and Zoubin Ghahramani. Sparse gaussian processes using pseudo-inputs. InAdvances in neural information processing systems, pages 1257–1264, 2005

2005
[12]

Variable noise and dimensionality reduction for sparse Gaussian processes

Edward Snelson and Zoubin Ghahramani. Variable noise and dimensionality reduction for sparse gaussian processes.arXiv preprint arXiv:1206.6873, 2012. 11

work page internal anchor Pith review Pith/arXiv arXiv 2012
[13]

Gaussian Processes for Big Data

James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processes for big data.arXiv preprint arXiv:1309.6835, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[14]

Distributed gaussian processes

Marc Deisenroth and Jun Wei Ng. Distributed gaussian processes. InInternational conference on machine learning, pages 1481–1490. PMLR, 2015

2015
[15]

Scalable variational gaussian process classification

James Hensman, Alexander Matthews, and Zoubin Ghahramani. Scalable variational gaussian process classification. InArtificial Intelligence and Statistics, pages 351–360. PMLR, 2015

2015
[16]

Deep kernel learning

Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. Deep kernel learning. InArtificial intelligence and statistics, pages 370–378. PMLR, 2016

2016
[17]

Surjanovic and D

S. Surjanovic and D. Bingham. Virtual library of simulation experiments: Test functions and datasets. Retrieved January 23, 2026, fromhttp://www.sfu.ca/ ˜ssurjano

2026
[18]

Latent map gaussian processes for mixed variable metamod- eling.Computer Methods in Applied Mechanics and Engineering, 387:114128, 2021

Nicholas Oune and Ramin Bostanabad. Latent map gaussian processes for mixed variable metamod- eling.Computer Methods in Applied Mechanics and Engineering, 387:114128, 2021

2021
[19]

Non-stationary kernel learning in gaussian processes.Journal of Mechanical Design, 148(2):021714, 2026

Nima Negarandeh, Carlos Mora, and Ramin Bostanabad. Non-stationary kernel learning in gaussian processes.Journal of Mechanical Design, 148(2):021714, 2026

2026
[20]

Random features for large-scale kernel machines.Advances in neural information processing systems, 20, 2007

Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines.Advances in neural information processing systems, 20, 2007

2007
[21]

A survey on high-dimensional gaussian process modeling with application to bayesian optimization.ACM Transactions on Evolutionary Learning and Optimization, 2(2):1–26, 2022

Mickael Binois and Nathan Wycoff. A survey on high-dimensional gaussian process modeling with application to bayesian optimization.ACM Transactions on Evolutionary Learning and Optimization, 2(2):1–26, 2022. 12

2022

[1] [1]

Weight uncertainty in neural network

Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural network. InInternational conference on machine learning, pages 1613–1622. PMLR, 2015

2015

[2] [2]

The MIT Press, 2006

Carl Rasmussen and Christopher Williams.Gaussian Processes For Machine Learning. The MIT Press, 2006

2006

[3] [3]

Physical systems with random uncertainties: Chaos represen- tations with arbitrary probability measure.SIAM Journal on Scientific Computing, 26(2):395–410, 2004

Christian Soize and Roger Ghanem. Physical systems with random uncertainties: Chaos represen- tations with arbitrary probability measure.SIAM Journal on Scientific Computing, 26(2):395–410, 2004

2004

[4] [4]

Srivastava, G

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(1):1929–1958, 2014

1929

[5] [5]

Dropout as a bayesian approximation: Representing model uncer- tainty in deep learning

Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncer- tainty in deep learning. Ininternational conference on machine learning, pages 1050–1059, 2016

2016

[6] [6]

A survey of transformers, 2021

Tianyang Lin, Yuxin Wang, Xiangyang Liu, and Xipeng Qiu. A survey of transformers, 2021

2021

[7] [7]

Accurate predictions on small data with a tabular foun- dation model.Nature, 637:319–326, 2025

Noah Hollmann, Samuel M ¨uller, Lennart Purucker, Arjun Krishnakumar, Max K ¨orfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foun- dation model.Nature, 637:319–326, 2025

2025

[8] [8]

Gp+: a python library for kernel-based learning via gaussian processes.Advances in Engineering Software, 195:103686, 2024

Amin Yousefpour, Zahra Zanjani Foumani, Mehdi Shishehbor, Carlos Mora, and Ramin Bostanabad. Gp+: a python library for kernel-based learning via gaussian processes.Advances in Engineering Software, 195:103686, 2024

2024

[9] [9]

Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378, 2007

2007

[10] [10]

Classes of kernels for machine learning: a statistics perspective.Journal of machine learning research, 2(Dec):299–312, 2001

Marc G Genton. Classes of kernels for machine learning: a statistics perspective.Journal of machine learning research, 2(Dec):299–312, 2001

2001

[11] [11]

Sparse gaussian processes using pseudo-inputs

Edward Snelson and Zoubin Ghahramani. Sparse gaussian processes using pseudo-inputs. InAdvances in neural information processing systems, pages 1257–1264, 2005

2005

[12] [12]

Variable noise and dimensionality reduction for sparse Gaussian processes

Edward Snelson and Zoubin Ghahramani. Variable noise and dimensionality reduction for sparse gaussian processes.arXiv preprint arXiv:1206.6873, 2012. 11

work page internal anchor Pith review Pith/arXiv arXiv 2012

[13] [13]

Gaussian Processes for Big Data

James Hensman, Nicolo Fusi, and Neil D Lawrence. Gaussian processes for big data.arXiv preprint arXiv:1309.6835, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[14] [14]

Distributed gaussian processes

Marc Deisenroth and Jun Wei Ng. Distributed gaussian processes. InInternational conference on machine learning, pages 1481–1490. PMLR, 2015

2015

[15] [15]

Scalable variational gaussian process classification

James Hensman, Alexander Matthews, and Zoubin Ghahramani. Scalable variational gaussian process classification. InArtificial Intelligence and Statistics, pages 351–360. PMLR, 2015

2015

[16] [16]

Deep kernel learning

Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, and Eric P Xing. Deep kernel learning. InArtificial intelligence and statistics, pages 370–378. PMLR, 2016

2016

[17] [17]

Surjanovic and D

S. Surjanovic and D. Bingham. Virtual library of simulation experiments: Test functions and datasets. Retrieved January 23, 2026, fromhttp://www.sfu.ca/ ˜ssurjano

2026

[18] [18]

Latent map gaussian processes for mixed variable metamod- eling.Computer Methods in Applied Mechanics and Engineering, 387:114128, 2021

Nicholas Oune and Ramin Bostanabad. Latent map gaussian processes for mixed variable metamod- eling.Computer Methods in Applied Mechanics and Engineering, 387:114128, 2021

2021

[19] [19]

Non-stationary kernel learning in gaussian processes.Journal of Mechanical Design, 148(2):021714, 2026

Nima Negarandeh, Carlos Mora, and Ramin Bostanabad. Non-stationary kernel learning in gaussian processes.Journal of Mechanical Design, 148(2):021714, 2026

2026

[20] [20]

Random features for large-scale kernel machines.Advances in neural information processing systems, 20, 2007

Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines.Advances in neural information processing systems, 20, 2007

2007

[21] [21]

A survey on high-dimensional gaussian process modeling with application to bayesian optimization.ACM Transactions on Evolutionary Learning and Optimization, 2(2):1–26, 2022

Mickael Binois and Nathan Wycoff. A survey on high-dimensional gaussian process modeling with application to bayesian optimization.ACM Transactions on Evolutionary Learning and Optimization, 2(2):1–26, 2022. 12

2022