pith. sign in

arxiv: 2603.21033 · v2 · pith:5OYGWJZ5new · submitted 2026-03-22 · 💻 cs.CE · cs.LG

TabPFN Extensions for Interpretable Geotechnical Modelling

Pith reviewed 2026-05-21 10:17 UTC · model grok-4.3

classification 💻 cs.CE cs.LG
keywords geotechnical modellingiterative imputationsoil mechanical parametersSHAP interpretabilityuncertainty decompositionborehole datatabular foundation modelconsolidation reliability
0
0 comments X

The pith

TabPFN reduces error when imputing five mechanical soil parameters from sparse borehole data and produces attributions that match established geotechnical relations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates TabPFN on two geotechnical tasks that use real borehole records: classifying soil types from N-value and shear-wave velocity, and filling in missing values for five mechanical parameters. It applies the model without any retraining, adds cosine-similarity checks on embeddings, SHAP explanations, and a simple uncertainty breakdown, then pushes the imputed values through a one-dimensional consolidation calculation to obtain a reliability index. The results show lower root-mean-square errors than several common baselines on four of the five targets, and the explanations recover known relations such as the Skempton link between compression index and water content. This matters in geotechnics because site data are almost always incomplete, so any method that improves imputation while remaining interpretable can reduce reliance on conservative safety factors.

Core claim

TabPFN applied directly to geotechnical regression yields lower RMSE than mean imputation, linear regression, random forests, XGBoost, and a hierarchical Bayesian model on four of the five targets (undrained shear strength, undrained modulus, preconsolidation pressure, compression index, and coefficient of consolidation). Embeddings group clay and sand samples consistently, SHAP attributions recover the Skempton compression-index correlation and the inverse dependence of preconsolidation pressure on water content, and a proxy decomposition of predictive uncertainty identifies the within-posterior term as the largest contributor. Marginal distributions of the imputed parameters are then fed a

What carries the argument

Iterative imputation that repeatedly replaces missing entries with TabPFN predictions, paired with SHAP value computation and a proxy decomposition of uncertainty across context-perturbation classes.

If this is right

  • RMSE drops for every one of the five mechanical parameters, with TabPFN lowest on four of them.
  • SHAP attributions recover the Skempton compression-index correlation and the inverse preconsolidation-pressure versus water-content relation.
  • The within-posterior component dominates the proxy uncertainty decomposition.
  • Propagation of the imputed marginal distributions through a one-dimensional consolidation model yields concrete values for the reliability index beta and the serviceability exceedance probability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same iterative-imputation-plus-SHAP workflow could be tried on other engineering domains that rely on sparse tabular records, such as structural health monitoring or environmental sensor networks.
  • Direct field measurements of the same soil parameters at additional locations would provide an external check on whether the proxy uncertainty breakdown tracks actual site-to-site variability.
  • Embedding similarity patterns might be examined for finer sub-classifications of soil behavior beyond the broad clay-sand split already observed.

Load-bearing premise

The proxy decomposition of predictive uncertainty across context-perturbation classes accurately isolates the dominant uncertainty source without external validation against field variability or alternative uncertainty methods.

What would settle it

A head-to-head comparison of the TabPFN-derived parameter distributions and the resulting consolidation reliability index against independent laboratory or field measurements from new sites would test whether the reported error reductions persist outside the current dataset.

Figures

Figures reproduced from arXiv: 2603.21033 by Daijiro Mizutani, Stephen Wu, Taiga Saito, Yu Otake.

Figure 3
Figure 3. Figure 3: training samples are ordered along the row axis and test samples along the column axis [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 1
Figure 1. Figure 1: Scatter plots of training (circles) and test (stars) data in the [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Predicted probability of Sand class P(Sand) over the N – Vs domain, with the training samples overlaid (circles). The bold contour marks the decision boundary at P(Sand) = 0.5; grey contours indicate the 0.1, 0.25, 0.75, and 0.9 levels. 2.3 Embedding Analysis A distinctive property of TabPFN as a foundation model is its contextually enriched internal represen￾tations, made accessible through the tabpfn-ext… view at source ↗
Figure 3
Figure 3. Figure 3: Cosine similarity heatmap of TabPFN embeddings between test and training samples (axis [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Normalised RMSE (RMSE / RMSE at iteration 1) per iteration for each mechanical [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Posterior distributions at iteration 10 for all five mechanical parameters across test samples. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Mean absolute SHAP values for each input feature across all test samples, shown separately [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: SHAP value of the most influential index property versus its feature value, shown for each [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

Geotechnical site characterisation relies on sparse, heterogeneous borehole data, where uncertainty quantification and interpretability matter as much as predictive accuracy. We evaluate TabPFN~\citep{Hollmann2025}, a tabular foundation model, and its \texttt{tabpfn-extensions} library on two geotechnical tasks: (1) soil-type classification from N-value and shear-wave velocity data as a controlled illustrative case, and (2) iterative imputation of five mechanical parameters ($s_\mathrm{u}$, $E_{\mathrm{u}}$, ${\sigma'}_\mathrm{p}$, $C_\mathrm{c}$, $C_\mathrm{v}$) in BM/AirportSoilProperties/2/2025. Without retraining, we apply cosine-similarity analysis to TabPFN embeddings, visualise predictive distributions, and compute SHAP attributions. On the regression benchmark we compare TabPFN with mean imputation, linear regression, random forests, XGBoost, and HBM; introduce a proxy decomposition of predictive uncertainty across context-perturbation classes; and propagate marginal $C_\mathrm{c}$ and ${\sigma'}_\mathrm{p}$ distributions through a one-dimensional consolidation model to obtain the reliability index $\beta$ and serviceability exceedance probability $P_\mathrm{f}$. Embeddings exhibit label-consistent Clay/Sand grouping; iterative imputation reduces RMSE for all five targets, with TabPFN lowest on four; SHAP attributions are consistent with the Skempton compression-index correlation and the inverse preconsolidation-pressure-water-content dependence; the within-posterior component is largest in the proxy decomposition. We position the contribution as a worked evaluation workflow that may complement established methods for data-scarce geotechnics, not as algorithmic innovation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper evaluates TabPFN and tabpfn-extensions on two geotechnical tasks using sparse borehole data: (1) soil-type classification from N-value and shear-wave velocity as an illustrative case, and (2) iterative imputation of five mechanical parameters (su, Eu, σ'p, Cc, Cv) on the BM/AirportSoilProperties dataset. It reports that TabPFN embeddings show label-consistent groupings, iterative imputation yields lower RMSE than mean imputation, linear regression, random forests, XGBoost, and HBM (lowest on four of five targets), SHAP attributions align with known relations such as the Skempton compression-index correlation, and a proxy decomposition of predictive uncertainty across context-perturbation classes identifies the within-posterior component as dominant. Marginal distributions of Cc and σ'p are propagated through a 1D consolidation model to obtain reliability index β and serviceability exceedance probability Pf. The contribution is positioned as a worked evaluation workflow for data-scarce geotechnics.

Significance. If the empirical results and uncertainty propagation hold, the manuscript offers a concrete, interpretable workflow for applying tabular foundation models to geotechnical site characterisation. Strengths include direct comparison against multiple baselines, consistency of SHAP values with established correlations, and end-to-end propagation from imputed parameters to reliability metrics. These elements could usefully complement conventional methods in data-limited settings, though the work presents itself as an evaluation rather than an algorithmic advance.

major comments (1)
  1. [Methods section on proxy uncertainty decomposition and results on propagation to β/Pf] The proxy decomposition of predictive uncertainty across context-perturbation classes (introduced in the methods and used for the within-posterior dominance claim) is presented without calibration against measured field variability (e.g., repeated borehole tests) or against standard alternatives such as bootstrap ensembles, Monte Carlo dropout, or hierarchical Bayesian posteriors. This decomposition is load-bearing for the downstream propagation of Cc and σ'p marginals through the consolidation model to obtain β and Pf; without external validation the claim that it isolates the dominant uncertainty source remains unverified.
minor comments (2)
  1. [Regression benchmark and results sections] Details on data partitioning (train/test splits, cross-validation strategy) and any statistical significance tests for the reported RMSE reductions are not provided, which limits assessment of the robustness of the iterative imputation results.
  2. [Uncertainty quantification subsection] The manuscript would benefit from explicit sensitivity analysis of the proxy decomposition with respect to the choice of perturbation classes or context size.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the potential utility of the evaluation workflow for data-scarce geotechnical applications. We address the single major comment below and will make targeted revisions to clarify the scope and limitations of the proxy uncertainty analysis.

read point-by-point responses
  1. Referee: [Methods section on proxy uncertainty decomposition and results on propagation to β/Pf] The proxy decomposition of predictive uncertainty across context-perturbation classes (introduced in the methods and used for the within-posterior dominance claim) is presented without calibration against measured field variability (e.g., repeated borehole tests) or against standard alternatives such as bootstrap ensembles, Monte Carlo dropout, or hierarchical Bayesian posteriors. This decomposition is load-bearing for the downstream propagation of Cc and σ'p marginals through the consolidation model to obtain β and Pf; without external validation the claim that it isolates the dominant uncertainty source remains unverified.

    Authors: We thank the referee for highlighting this important limitation. The proxy decomposition was introduced as a lightweight, training-free heuristic that attributes uncertainty by perturbing the in-context set within TabPFN, rather than as a calibrated or comprehensive uncertainty quantification procedure. We agree that it has not been validated against repeated borehole measurements or benchmarked against bootstrap ensembles, Monte Carlo dropout, or hierarchical Bayesian posteriors, and that this weakens the strength of the within-posterior dominance statement. In the revised manuscript we will: (i) expand the Methods section to explicitly label the approach as an exploratory proxy and detail its assumptions and scope; (ii) insert a dedicated Limitations subsection that acknowledges the lack of external field calibration data and recommends future comparisons with the methods suggested by the referee; and (iii) revise the Results and Discussion to present the observed dominance of the within-posterior component as a finding under this specific proxy, rather than a verified isolation of the dominant source. The propagation of marginal Cc and σ'p distributions through the 1D consolidation model to obtain β and Pf relies on the predictive distributions obtained from iterative imputation; the decomposition supplies supplementary insight but is not presented as the sole justification for the reliability metrics. The current BM/AirportSoilProperties dataset does not contain repeated test records that would permit direct calibration, so we will flag this as an important direction for follow-on work. revision: partial

Circularity Check

0 steps flagged

Minor self-citation not load-bearing; independent baselines and model propagation present

full rationale

The paper evaluates TabPFN on imputation and classification tasks, reporting RMSE reductions against mean imputation, linear regression, random forests, XGBoost, and HBM, plus SHAP attributions matching the Skempton correlation and inverse preconsolidation-pressure dependence. It introduces a proxy uncertainty decomposition and propagates marginal distributions through a 1D consolidation model to obtain β and Pf. These steps are presented as external checks rather than reductions to author-defined fitted quantities. The sole citation to Hollmann2025 (original TabPFN) supports the base model but does not justify the central claims or the proxy decomposition by self-reference. No equations or definitions collapse the reported results to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions of tabular foundation models and geotechnical domain knowledge; no new free parameters, axioms, or invented entities are introduced beyond the proxy uncertainty decomposition whose validity is unverified.

axioms (1)
  • domain assumption Cosine similarity on TabPFN embeddings produces label-consistent geological groupings
    Invoked to support Clay/Sand separation without reported external validation metric.

pith-pipeline@v0.9.0 · 5843 in / 1135 out tokens · 54564 ms · 2026-05-21T10:17:51.854662+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TabPFN-3: Technical Report

    cs.LG 2026-05 unverdicted novelty 6.0

    TabPFN-3 delivers state-of-the-art tabular prediction performance on benchmarks up to 1M rows, is up to 20x faster than prior versions, and introduces test-time scaling that beats non-TabPFN models by hundreds of Elo points.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · cited by 1 Pith paper

  1. [1]

    Modifying the tailored clustering enabled regionalization (TCER) framework for outlier site detection and inference efficiency.Engineering Geology, 335:107537, 2024

    Yongmin Cai, Kok-Kwang Phoon, Qiujing Pan, and Wuzhang Luo. Modifying the tailored clustering enabled regionalization (TCER) framework for outlier site detection and inference efficiency.Engineering Geology, 335:107537, 2024. doi: 10.1016/j.enggeo.2024.107537

  2. [2]

    Efficient dictionary learning for constructing quasi-local transformation models.Computers and Geotechnics, 180:107072,

    Yongmin Cai, Kok-Kwang Phoon, Yu Otake, and Yu Wang. Efficient dictionary learning for constructing quasi-local transformation models.Computers and Geotechnics, 180:107072,

  3. [3]

    doi: 10.1016/j.compgeo.2025.107072

  4. [4]

    Constructing quasi-site-specific multivariate probability distribution using hierarchical Bayesian model.Journal of Engineering Mechanics, 147(10):04021069, 2021

    Jianye Ching, Stephen Wu, and Kok-Kwang Phoon. Constructing quasi-site-specific multivariate probability distribution using hierarchical Bayesian model.Journal of Engineering Mechanics, 147(10):04021069, 2021. doi: 10.1061/(ASCE)EM.1943-7889.0001964

  5. [5]

    Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025

    Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Thomas, Shi Bin Hoo, Eddie Bergman, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637:319–326, 2025. doi: 10.1038/s41586-024-08328-6

  6. [6]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InAdvances in Neural Information Processing Systems 30, pages 4765–4774, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/ 8a20a8621978632d76c43dfd28b67767-Abstract.html

  7. [7]

    Mobasher, and Waleed El-Sekelly

    Prajowal Manandhar, Tarek Abdoun, Mostafa E. Mobasher, and Waleed El-Sekelly. Predict- ing soil properties for the purpose of site characterization using advanced machine learning approaches.Geodata and AI, page 100054, 2026. doi: 10.1016/j.geoai.2026.100054

  8. [8]

    GEOAI benchmark problems BM/AirportSoilProperties/2/2025.Geodata and AI, 2:100012, 2025

    Yu Otake, Jianye Ching, Taiga Saito, and Kotaro Asano. GEOAI benchmark problems BM/AirportSoilProperties/2/2025.Geodata and AI, 2:100012, 2025. doi: 10.1016/j.geoai.2025. 100012

  9. [9]

    Challenges in data-driven site character- ization.Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards, 16(1):114–126, 2022

    Kok-Kwang Phoon, Jianye Ching, and Takayuki Shuku. Challenges in data-driven site character- ization.Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards, 16(1):114–126, 2022. doi: 10.1080/17499518.2021.1896005

  10. [10]

    facial recognition

    Kok-Kwang Phoon, Yongmin Cai, and Chong Tang. Geotechnical “facial recognition” challenge. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, 11(3):03125001, 2025. doi: 10.1061/AJRUA6.RUENG-1553

  11. [11]

    tabpfn-extensions

    Prior Labs. tabpfn-extensions. https://github.com/priorlabs/tabpfn-extensions. Accessed: 2026-03-11

  12. [12]

    Maruzen Publishing, Tokyo, Japan, 2012

    Railway Technical Research Institute.Design Standards for Railway Structures and Com- mentary: Seismic Design. Maruzen Publishing, Tokyo, Japan, 2012. (In Japanese: Tetsudo Kozobutsu-to Sekkei Hyojun – Dokaisetsu: Taishin Sekkei, September 2012)

  13. [13]

    Applying a tabular foundation model to geotechnical site characterization.Geodata and AI, page 100040, 2025

    Taiga Saito, Yu Otake, and Stephen Wu. Applying a tabular foundation model to geotechnical site characterization.Geodata and AI, page 100040, 2025. doi: 10.1016/j.geoai.2025.100040

  14. [14]

    Exploring high-order multivariate geotechnical features using the minimum information dependence model.Geodata and AI, 2: 100009, 2025

    Taiga Saito, Yu Otake, Stephen Wu, and Keisuke Yano. Exploring high-order multivariate geotechnical features using the minimum information dependence model.Geodata and AI, 2: 100009, 2025. doi: 10.1016/j.geoai.2025.100009

  15. [15]

    A hierarchical Bayesian similarity measure for geotechnical site retrieval.Journal of Engineering Mechanics, 148(9):04022045,

    Atma Sharma, Jianye Ching, and Kok-Kwang Phoon. A hierarchical Bayesian similarity measure for geotechnical site retrieval.Journal of Engineering Mechanics, 148(9):04022045,

  16. [16]

    doi: 10.1061/(ASCE)EM.1943-7889.0002145

  17. [17]

    Masset, R

    Atma Sharma, Jianye Ching, and Kok-Kwang Phoon. A spectral algorithm for quasi-regional geotechnical site clustering.Computers and Geotechnics, 163:105624, 2023. doi: 10.1016/j. compgeo.2023.105624

  18. [18]

    A. W. Skempton. Notes on the compressibility of clays.Quarterly Journal of the Geological Society, 100:119–135, 1944. 13

  19. [19]

    Quasi-site-specific soil property prediction using a cluster-based hierarchical Bayesian model.Structural Safety, 99:102253, 2022

    Stephen Wu, Jianye Ching, and Kok-Kwang Phoon. Quasi-site-specific soil property prediction using a cluster-based hierarchical Bayesian model.Structural Safety, 99:102253, 2022. doi: 10.1016/j.strusafe.2022.102253

  20. [20]

    Runhong Zhang, Chongzhi Wu, Anthony T. C. Goh, Thomas Böhlke, and Wengang Zhang. Estimation of diaphragm wall deflections for deep braced excavation in anisotropic clays using ensemble learning.Geoscience Frontiers, 12(1):365–373, 2021. doi: 10.1016/j.gsf.2020.03.003. 14