TabPFN Extensions for Interpretable Geotechnical Modelling
Pith reviewed 2026-05-21 10:17 UTC · model grok-4.3
The pith
TabPFN reduces error when imputing five mechanical soil parameters from sparse borehole data and produces attributions that match established geotechnical relations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TabPFN applied directly to geotechnical regression yields lower RMSE than mean imputation, linear regression, random forests, XGBoost, and a hierarchical Bayesian model on four of the five targets (undrained shear strength, undrained modulus, preconsolidation pressure, compression index, and coefficient of consolidation). Embeddings group clay and sand samples consistently, SHAP attributions recover the Skempton compression-index correlation and the inverse dependence of preconsolidation pressure on water content, and a proxy decomposition of predictive uncertainty identifies the within-posterior term as the largest contributor. Marginal distributions of the imputed parameters are then fed a
What carries the argument
Iterative imputation that repeatedly replaces missing entries with TabPFN predictions, paired with SHAP value computation and a proxy decomposition of uncertainty across context-perturbation classes.
If this is right
- RMSE drops for every one of the five mechanical parameters, with TabPFN lowest on four of them.
- SHAP attributions recover the Skempton compression-index correlation and the inverse preconsolidation-pressure versus water-content relation.
- The within-posterior component dominates the proxy uncertainty decomposition.
- Propagation of the imputed marginal distributions through a one-dimensional consolidation model yields concrete values for the reliability index beta and the serviceability exceedance probability.
Where Pith is reading between the lines
- The same iterative-imputation-plus-SHAP workflow could be tried on other engineering domains that rely on sparse tabular records, such as structural health monitoring or environmental sensor networks.
- Direct field measurements of the same soil parameters at additional locations would provide an external check on whether the proxy uncertainty breakdown tracks actual site-to-site variability.
- Embedding similarity patterns might be examined for finer sub-classifications of soil behavior beyond the broad clay-sand split already observed.
Load-bearing premise
The proxy decomposition of predictive uncertainty across context-perturbation classes accurately isolates the dominant uncertainty source without external validation against field variability or alternative uncertainty methods.
What would settle it
A head-to-head comparison of the TabPFN-derived parameter distributions and the resulting consolidation reliability index against independent laboratory or field measurements from new sites would test whether the reported error reductions persist outside the current dataset.
Figures
read the original abstract
Geotechnical site characterisation relies on sparse, heterogeneous borehole data, where uncertainty quantification and interpretability matter as much as predictive accuracy. We evaluate TabPFN~\citep{Hollmann2025}, a tabular foundation model, and its \texttt{tabpfn-extensions} library on two geotechnical tasks: (1) soil-type classification from N-value and shear-wave velocity data as a controlled illustrative case, and (2) iterative imputation of five mechanical parameters ($s_\mathrm{u}$, $E_{\mathrm{u}}$, ${\sigma'}_\mathrm{p}$, $C_\mathrm{c}$, $C_\mathrm{v}$) in BM/AirportSoilProperties/2/2025. Without retraining, we apply cosine-similarity analysis to TabPFN embeddings, visualise predictive distributions, and compute SHAP attributions. On the regression benchmark we compare TabPFN with mean imputation, linear regression, random forests, XGBoost, and HBM; introduce a proxy decomposition of predictive uncertainty across context-perturbation classes; and propagate marginal $C_\mathrm{c}$ and ${\sigma'}_\mathrm{p}$ distributions through a one-dimensional consolidation model to obtain the reliability index $\beta$ and serviceability exceedance probability $P_\mathrm{f}$. Embeddings exhibit label-consistent Clay/Sand grouping; iterative imputation reduces RMSE for all five targets, with TabPFN lowest on four; SHAP attributions are consistent with the Skempton compression-index correlation and the inverse preconsolidation-pressure-water-content dependence; the within-posterior component is largest in the proxy decomposition. We position the contribution as a worked evaluation workflow that may complement established methods for data-scarce geotechnics, not as algorithmic innovation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates TabPFN and tabpfn-extensions on two geotechnical tasks using sparse borehole data: (1) soil-type classification from N-value and shear-wave velocity as an illustrative case, and (2) iterative imputation of five mechanical parameters (su, Eu, σ'p, Cc, Cv) on the BM/AirportSoilProperties dataset. It reports that TabPFN embeddings show label-consistent groupings, iterative imputation yields lower RMSE than mean imputation, linear regression, random forests, XGBoost, and HBM (lowest on four of five targets), SHAP attributions align with known relations such as the Skempton compression-index correlation, and a proxy decomposition of predictive uncertainty across context-perturbation classes identifies the within-posterior component as dominant. Marginal distributions of Cc and σ'p are propagated through a 1D consolidation model to obtain reliability index β and serviceability exceedance probability Pf. The contribution is positioned as a worked evaluation workflow for data-scarce geotechnics.
Significance. If the empirical results and uncertainty propagation hold, the manuscript offers a concrete, interpretable workflow for applying tabular foundation models to geotechnical site characterisation. Strengths include direct comparison against multiple baselines, consistency of SHAP values with established correlations, and end-to-end propagation from imputed parameters to reliability metrics. These elements could usefully complement conventional methods in data-limited settings, though the work presents itself as an evaluation rather than an algorithmic advance.
major comments (1)
- [Methods section on proxy uncertainty decomposition and results on propagation to β/Pf] The proxy decomposition of predictive uncertainty across context-perturbation classes (introduced in the methods and used for the within-posterior dominance claim) is presented without calibration against measured field variability (e.g., repeated borehole tests) or against standard alternatives such as bootstrap ensembles, Monte Carlo dropout, or hierarchical Bayesian posteriors. This decomposition is load-bearing for the downstream propagation of Cc and σ'p marginals through the consolidation model to obtain β and Pf; without external validation the claim that it isolates the dominant uncertainty source remains unverified.
minor comments (2)
- [Regression benchmark and results sections] Details on data partitioning (train/test splits, cross-validation strategy) and any statistical significance tests for the reported RMSE reductions are not provided, which limits assessment of the robustness of the iterative imputation results.
- [Uncertainty quantification subsection] The manuscript would benefit from explicit sensitivity analysis of the proxy decomposition with respect to the choice of perturbation classes or context size.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for recognizing the potential utility of the evaluation workflow for data-scarce geotechnical applications. We address the single major comment below and will make targeted revisions to clarify the scope and limitations of the proxy uncertainty analysis.
read point-by-point responses
-
Referee: [Methods section on proxy uncertainty decomposition and results on propagation to β/Pf] The proxy decomposition of predictive uncertainty across context-perturbation classes (introduced in the methods and used for the within-posterior dominance claim) is presented without calibration against measured field variability (e.g., repeated borehole tests) or against standard alternatives such as bootstrap ensembles, Monte Carlo dropout, or hierarchical Bayesian posteriors. This decomposition is load-bearing for the downstream propagation of Cc and σ'p marginals through the consolidation model to obtain β and Pf; without external validation the claim that it isolates the dominant uncertainty source remains unverified.
Authors: We thank the referee for highlighting this important limitation. The proxy decomposition was introduced as a lightweight, training-free heuristic that attributes uncertainty by perturbing the in-context set within TabPFN, rather than as a calibrated or comprehensive uncertainty quantification procedure. We agree that it has not been validated against repeated borehole measurements or benchmarked against bootstrap ensembles, Monte Carlo dropout, or hierarchical Bayesian posteriors, and that this weakens the strength of the within-posterior dominance statement. In the revised manuscript we will: (i) expand the Methods section to explicitly label the approach as an exploratory proxy and detail its assumptions and scope; (ii) insert a dedicated Limitations subsection that acknowledges the lack of external field calibration data and recommends future comparisons with the methods suggested by the referee; and (iii) revise the Results and Discussion to present the observed dominance of the within-posterior component as a finding under this specific proxy, rather than a verified isolation of the dominant source. The propagation of marginal Cc and σ'p distributions through the 1D consolidation model to obtain β and Pf relies on the predictive distributions obtained from iterative imputation; the decomposition supplies supplementary insight but is not presented as the sole justification for the reliability metrics. The current BM/AirportSoilProperties dataset does not contain repeated test records that would permit direct calibration, so we will flag this as an important direction for follow-on work. revision: partial
Circularity Check
Minor self-citation not load-bearing; independent baselines and model propagation present
full rationale
The paper evaluates TabPFN on imputation and classification tasks, reporting RMSE reductions against mean imputation, linear regression, random forests, XGBoost, and HBM, plus SHAP attributions matching the Skempton correlation and inverse preconsolidation-pressure dependence. It introduces a proxy uncertainty decomposition and propagates marginal distributions through a 1D consolidation model to obtain β and Pf. These steps are presented as external checks rather than reductions to author-defined fitted quantities. The sole citation to Hollmann2025 (original TabPFN) supports the base model but does not justify the central claims or the proxy decomposition by self-reference. No equations or definitions collapse the reported results to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Cosine similarity on TabPFN embeddings produces label-consistent geological groupings
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
iterative imputation of five mechanical parameters (su, Eu, σ′p, Cc, Cv) ... SHAP-based feature importance ... cosine-similarity analysis to TabPFN embeddings
-
IndisputableMonolith/Foundation/ArrowOfTime.leanz_monotone_absolute unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
posterior distributions ... proxy decomposition of predictive uncertainty across context-perturbation classes
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
TabPFN-3: Technical Report
TabPFN-3 delivers state-of-the-art tabular prediction performance on benchmarks up to 1M rows, is up to 20x faster than prior versions, and introduces test-time scaling that beats non-TabPFN models by hundreds of Elo points.
Reference graph
Works this paper leans on
-
[1]
Yongmin Cai, Kok-Kwang Phoon, Qiujing Pan, and Wuzhang Luo. Modifying the tailored clustering enabled regionalization (TCER) framework for outlier site detection and inference efficiency.Engineering Geology, 335:107537, 2024. doi: 10.1016/j.enggeo.2024.107537
-
[2]
Yongmin Cai, Kok-Kwang Phoon, Yu Otake, and Yu Wang. Efficient dictionary learning for constructing quasi-local transformation models.Computers and Geotechnics, 180:107072,
-
[3]
doi: 10.1016/j.compgeo.2025.107072
-
[4]
Jianye Ching, Stephen Wu, and Kok-Kwang Phoon. Constructing quasi-site-specific multivariate probability distribution using hierarchical Bayesian model.Journal of Engineering Mechanics, 147(10):04021069, 2021. doi: 10.1061/(ASCE)EM.1943-7889.0001964
-
[5]
Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025
Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Thomas, Shi Bin Hoo, Eddie Bergman, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637:319–326, 2025. doi: 10.1038/s41586-024-08328-6
-
[6]
Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InAdvances in Neural Information Processing Systems 30, pages 4765–4774, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/ 8a20a8621978632d76c43dfd28b67767-Abstract.html
work page 2017
-
[7]
Mobasher, and Waleed El-Sekelly
Prajowal Manandhar, Tarek Abdoun, Mostafa E. Mobasher, and Waleed El-Sekelly. Predict- ing soil properties for the purpose of site characterization using advanced machine learning approaches.Geodata and AI, page 100054, 2026. doi: 10.1016/j.geoai.2026.100054
-
[8]
GEOAI benchmark problems BM/AirportSoilProperties/2/2025.Geodata and AI, 2:100012, 2025
Yu Otake, Jianye Ching, Taiga Saito, and Kotaro Asano. GEOAI benchmark problems BM/AirportSoilProperties/2/2025.Geodata and AI, 2:100012, 2025. doi: 10.1016/j.geoai.2025. 100012
-
[9]
Kok-Kwang Phoon, Jianye Ching, and Takayuki Shuku. Challenges in data-driven site character- ization.Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards, 16(1):114–126, 2022. doi: 10.1080/17499518.2021.1896005
-
[10]
Kok-Kwang Phoon, Yongmin Cai, and Chong Tang. Geotechnical “facial recognition” challenge. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering, 11(3):03125001, 2025. doi: 10.1061/AJRUA6.RUENG-1553
-
[11]
Prior Labs. tabpfn-extensions. https://github.com/priorlabs/tabpfn-extensions. Accessed: 2026-03-11
work page 2026
-
[12]
Maruzen Publishing, Tokyo, Japan, 2012
Railway Technical Research Institute.Design Standards for Railway Structures and Com- mentary: Seismic Design. Maruzen Publishing, Tokyo, Japan, 2012. (In Japanese: Tetsudo Kozobutsu-to Sekkei Hyojun – Dokaisetsu: Taishin Sekkei, September 2012)
work page 2012
-
[13]
Taiga Saito, Yu Otake, and Stephen Wu. Applying a tabular foundation model to geotechnical site characterization.Geodata and AI, page 100040, 2025. doi: 10.1016/j.geoai.2025.100040
-
[14]
Taiga Saito, Yu Otake, Stephen Wu, and Keisuke Yano. Exploring high-order multivariate geotechnical features using the minimum information dependence model.Geodata and AI, 2: 100009, 2025. doi: 10.1016/j.geoai.2025.100009
-
[15]
Atma Sharma, Jianye Ching, and Kok-Kwang Phoon. A hierarchical Bayesian similarity measure for geotechnical site retrieval.Journal of Engineering Mechanics, 148(9):04022045,
-
[16]
doi: 10.1061/(ASCE)EM.1943-7889.0002145
-
[17]
Atma Sharma, Jianye Ching, and Kok-Kwang Phoon. A spectral algorithm for quasi-regional geotechnical site clustering.Computers and Geotechnics, 163:105624, 2023. doi: 10.1016/j. compgeo.2023.105624
work page doi:10.1016/j 2023
-
[18]
A. W. Skempton. Notes on the compressibility of clays.Quarterly Journal of the Geological Society, 100:119–135, 1944. 13
work page 1944
-
[19]
Stephen Wu, Jianye Ching, and Kok-Kwang Phoon. Quasi-site-specific soil property prediction using a cluster-based hierarchical Bayesian model.Structural Safety, 99:102253, 2022. doi: 10.1016/j.strusafe.2022.102253
-
[20]
Runhong Zhang, Chongzhi Wu, Anthony T. C. Goh, Thomas Böhlke, and Wengang Zhang. Estimation of diaphragm wall deflections for deep braced excavation in anisotropic clays using ensemble learning.Geoscience Frontiers, 12(1):365–373, 2021. doi: 10.1016/j.gsf.2020.03.003. 14
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.