Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality

Aditya Tanna; Mohamed Bouadi; Nassim Bouarour; Shivam Dubey; Varun Kulkarni; Vinay Kumar Sankarapu

arxiv: 2605.18971 · v1 · pith:Q5Z2LOLAnew · submitted 2026-05-18 · 💻 cs.LG · cs.AI

Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality

Mohamed Bouadi , Nassim Bouarour , Varun Kulkarni , Shivam Dubey , Aditya Tanna , Vinay Kumar Sankarapu This is my paper

Pith reviewed 2026-05-20 12:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords tabular foundation modelssynthetic pretraininginductive biasesdistributional irregularitiesrealism priorrobustnesssynthetic task distributionsO'Prior

0 comments

The pith

Tabular foundation models gain accuracy and robustness when pretrained on synthetic distributions that include mechanism diversity, heterogeneous realism, and explicit stress.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Tabular foundation models derive their inductive biases almost entirely from synthetic pretraining distributions, yet most such distributions are too clean and omit the irregularities that matter for real deployment. The paper introduces O'Prior, a compositional realism prior assembled from a hierarchical SCM meta-generator, a modular realism engine for marginals and missingness, an explicit stress module that injects confounding and support-query mismatch, and a curriculum-governed leakage-safe generation protocol. By freezing architecture, optimizer, and compute budget and changing only the synthetic task distribution, the authors isolate prior design as the variable under study. O'Prior produces consistent downstream accuracy and robustness gains on real tabular benchmarks, with the largest improvements appearing precisely in regimes that exhibit distributional irregularities. Ablations show that mechanism diversity, realism composition, and shift-aware stress each contribute independently and are not interchangeable.

Core claim

O'Prior is a compositional realism prior built around four coupled components: a hierarchical SCM meta-generator spanning diverse functional families; a modular realism engine covering heterogeneous marginals, missingness, and target transforms; an explicit stress module injecting confounding and support-query mismatch; and a curriculum-governed, leakage-safe generation protocol. Holding architecture, optimizer, and compute budget fixed while varying only the synthetic task distribution isolates prior design as the causal variable. O'Prior yields consistent and substantial improvements in downstream accuracy and robustness across real tabular benchmarks, with gains concentrated in regimes of

What carries the argument

O'Prior, the compositional realism prior that couples mechanism diversity, realism composition, and shift-aware stress to generate synthetic tasks whose irregularities more closely match real tabular data.

If this is right

Synthetic prior construction becomes a first-order lever for tabular foundation model quality once architecture and training procedure are fixed.
Gains from O'Prior concentrate in regimes with distributional irregularities, confounding, and support mismatches.
Each of the four O'Prior components contributes independently; removing any one reduces the observed benefit.
Standard synthetic priors that omit irregularities limit downstream robustness even when model capacity and training compute remain unchanged.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same principle of injecting controlled stress and realism into synthetic pretraining may transfer to other modalities that rely on synthetic data, such as time-series or graph foundation models.
One could test whether automated search or reinforcement learning over the four component knobs can discover even stronger priors than the hand-designed O'Prior.
Existing tabular benchmarks may systematically undervalue models trained only on clean distributions, suggesting the need for stress-augmented evaluation suites.

Load-bearing premise

That holding architecture, optimizer, and compute budget fixed while varying only the synthetic task distribution fully isolates prior design as the causal variable without hidden interactions from the training procedure or optimizer dynamics.

What would settle it

If models trained on O'Prior show no accuracy or robustness gain over standard well-behaved priors when architecture, optimizer, and total compute are held exactly fixed across the same suite of real tabular benchmarks, the claim that prior design is the primary determinant would be falsified.

read the original abstract

What determines the quality of a tabular foundation model? Unlike language or vision, tabular foundation models acquire their inductive biases almost entirely from synthetic pretraining distributions, yet the design of these distributions remains poorly understood. Standard synthetic priors are too well-behaved: they omit the irregularities and failure modes that determine deployment robustness. We introduce O'Prior, a compositional realism prior built around four coupled components: a hierarchical SCM meta-generator spanning diverse functional families; a modular realism engine covering heterogeneous marginals, missingness, and target transforms; an explicit stress module injecting confounding and support-query mismatch; and a curriculum-governed, leakage-safe generation protocol. To isolate prior design as the scientific variable, we hold architecture, optimizer, and compute budget fixed and vary only the synthetic task distribution. O'Prior yields consistent and substantial improvements in downstream accuracy and robustness across real tabular benchmarks, with gains concentrated in regimes characterized by distributional irregularities. Ablations confirm that mechanism diversity, realism composition, and shift-aware stress each contribute independently, their effects are not interchangeable. These results establish synthetic prior construction as a first-order and largely overlooked determinant of tabular foundation model quality

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Synthetic prior design is a bigger lever for tabular foundation models than most work assumes, and this paper isolates it with a new compositional construction that beats standard priors on real benchmarks.

read the letter

The main thing to know is that this paper argues synthetic pretraining distributions drive tabular foundation model quality more than architecture tweaks, and their O'Prior construction delivers consistent gains by adding realism and stress that standard generators skip. They keep the model, optimizer, and budget fixed while swapping only the task distribution, which lets them attribute improvements to the prior itself. The four pieces are a hierarchical SCM meta-generator for varied functional forms, a modular realism engine handling marginals and missingness, an explicit stress module for confounding and support mismatches, and a curriculum protocol that avoids leakage. Ablations show each component contributes separately rather than trading off, and the gains show up most on benchmarks with distributional quirks. That controlled isolation and the independent ablation results are the concrete advance over prior synthetic data work for tabular settings. The empirical pattern holds across the reported real-world tables, which is useful evidence. The soft spot is the isolation protocol itself. Different priors can shift gradient magnitudes and curvature, so a single fixed optimizer schedule might favor one distribution over another without anyone measuring it. The abstract does not report convergence diagnostics or hyperparameter sweeps across the priors, so the causal claim rests on an assumption that may need extra checks in the full results. Minor issues include whether error bars or statistical tests appear in the tables, but those are fixable. This is for researchers who build or pretrain tabular models and care about robustness under shift. Readers who generate their own synthetic data or study inductive bias transfer will get the most out of the construction details and the ablation breakdowns. It is solid enough on its own terms to deserve referee time rather than a desk reject, even if the optimizer confound needs addressing in revision.

Referee Report

2 major / 2 minor

Summary. The paper claims that tabular foundation model quality is primarily determined by the design of synthetic pretraining task distributions rather than architecture or optimization. It introduces O'Prior, a compositional realism prior with four coupled components: a hierarchical SCM meta-generator spanning diverse functional families, a modular realism engine for heterogeneous marginals/missingness/target transforms, an explicit stress module for confounding and support-query mismatch, and a curriculum-governed leakage-safe generation protocol. Holding architecture, optimizer, and compute budget fixed while varying only the synthetic distribution, O'Prior produces consistent gains in downstream accuracy and robustness on real tabular benchmarks (concentrated in irregular regimes). Ablations indicate that mechanism diversity, realism composition, and shift-aware stress contribute independently and are not interchangeable.

Significance. If the isolation protocol and ablations prove robust, the work establishes synthetic prior construction as a first-order determinant of tabular foundation model performance, with potential to redirect research emphasis toward data-generation strategies that incorporate irregularities and stress. The compositional design and explicit robustness mechanisms represent concrete, extensible contributions that could improve deployment reliability in real-world tabular settings.

major comments (2)

[Experimental protocol / §4] The central isolation claim (holding architecture, optimizer, and compute fixed while varying only the synthetic task distribution) risks confounding by distribution-dependent optimizer dynamics. Different priors can alter gradient magnitudes, curvature, and effective noise, so a fixed learning rate/momentum/schedule may yield non-comparable trajectories. The manuscript should report convergence diagnostics, loss curves, or hyperparameter sensitivity analyses across priors to confirm that gains are attributable to prior quality rather than implicit compatibility with the fixed procedure.
[Ablation studies / §5] Ablation results asserting independent, non-interchangeable contributions from mechanism diversity, realism composition, and shift-aware stress are load-bearing for the claim that each component matters. Without reported statistical tests, error bars, or per-benchmark effect sizes, it is unclear whether the independence holds or whether interactions with the fixed training procedure explain the patterns.

minor comments (2)

[Introduction / §3] The acronym 'O'Prior' and the four components should receive explicit first-use definitions and, where possible, pseudocode or mathematical sketches of the SCM meta-generator and stress injection to aid reproducibility.
[Results / §6] Benchmark comparison figures should include error bars, number of random seeds, and exact dataset splits to allow readers to assess the magnitude and reliability of reported gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our isolation protocol and ablation design. The comments highlight important considerations for strengthening the attribution of performance gains to synthetic prior quality. We address each major comment below and have incorporated revisions to provide additional diagnostics and statistical support.

read point-by-point responses

Referee: [Experimental protocol / §4] The central isolation claim (holding architecture, optimizer, and compute fixed while varying only the synthetic task distribution) risks confounding by distribution-dependent optimizer dynamics. Different priors can alter gradient magnitudes, curvature, and effective noise, so a fixed learning rate/momentum/schedule may yield non-comparable trajectories. The manuscript should report convergence diagnostics, loss curves, or hyperparameter sensitivity analyses across priors to confirm that gains are attributable to prior quality rather than implicit compatibility with the fixed procedure.

Authors: We agree that distribution-dependent optimizer dynamics could in principle introduce confounding. To address this directly, the revised manuscript now includes convergence diagnostics: training and validation loss curves for O'Prior and baseline priors under the fixed optimizer, demonstrating that all distributions reach comparable loss plateaus within the fixed compute budget. We further add a hyperparameter sensitivity analysis sweeping learning rates over a 10x range while keeping architecture and total steps fixed; relative gains from O'Prior persist across the sweep, indicating that the improvements are not an artifact of implicit compatibility with the original schedule. revision: yes
Referee: [Ablation studies / §5] Ablation results asserting independent, non-interchangeable contributions from mechanism diversity, realism composition, and shift-aware stress are load-bearing for the claim that each component matters. Without reported statistical tests, error bars, or per-benchmark effect sizes, it is unclear whether the independence holds or whether interactions with the fixed training procedure explain the patterns.

Authors: We acknowledge the need for greater statistical transparency in the ablations. The revised version reports error bars as standard deviation over five random seeds, per-benchmark effect sizes (Cohen's d), and paired Wilcoxon signed-rank tests for each ablation variant against the full model. These tests confirm statistically significant independent contributions from mechanism diversity, realism composition, and shift-aware stress on the majority of benchmarks, with no indication that fixed-training-procedure interactions account for the observed patterns. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation is self-contained

full rationale

The paper introduces O'Prior as a compositional synthetic prior and evaluates its effect on tabular foundation model quality through controlled experiments. It holds architecture, optimizer, and compute fixed while varying only the task distribution, then reports accuracy and robustness gains on real benchmarks plus ablations. No mathematical derivations, equations, or self-referential definitions appear in the abstract or described methodology that would make any result equivalent to its inputs by construction. No load-bearing self-citations, fitted parameters renamed as predictions, or uniqueness theorems are invoked. The central claims rest on external benchmark measurements rather than tautological reductions, making the chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that synthetic distributions can be engineered to capture real-world irregularities without introducing new artifacts, and that ablations cleanly separate the contributions of each component.

axioms (1)

domain assumption Synthetic task distributions can be composed to reproduce the irregularities and failure modes that determine deployment robustness in real tabular data.
Invoked when claiming that standard priors omit key irregularities and that O'Prior corrects this.

invented entities (1)

O'Prior compositional realism prior no independent evidence
purpose: To serve as the controllable synthetic pretraining distribution whose design is isolated as the experimental variable.
Newly introduced four-component system; no independent evidence outside the paper's own experiments is provided in the abstract.

pith-pipeline@v0.9.0 · 5752 in / 1402 out tokens · 27293 ms · 2026-05-20T12:35:39.867652+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

To isolate prior design as the scientific variable, we hold architecture, optimizer, and compute budget fixed and vary only the synthetic task distribution.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 3 internal anchors

[1]

Why do tree-based models still outperform deep learning on typical tabular data?Advances in neural information processing systems, 35:507–520, 2022

Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data?Advances in neural information processing systems, 35:507–520, 2022

work page 2022
[2]

Vishak prasad c.Ganesh Ramakrishnan, Micah Goldblum, and Colin White.“When do neural nets outperform boosted trees on tabular data, 2023

Duncan McElfresh, Sujay Khandagale, and Jonathan Valverde. Vishak prasad c.Ganesh Ramakrishnan, Micah Goldblum, and Colin White.“When do neural nets outperform boosted trees on tabular data, 2023

work page 2023
[3]

Xgboost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016

work page 2016
[4]

Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems, 30, 2017

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems, 30, 2017

work page 2017
[5]

Catboost: unbiased boosting with categorical features.Advances in neural information processing systems, 31, 2018

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr V orobev, Anna Veronika Dorogush, and Andrey Gulin. Catboost: unbiased boosting with categorical features.Advances in neural information processing systems, 31, 2018

work page 2018
[6]

Tabpfn: A transformer that solves small tabular classification problems in a second

Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023

work page 2023
[7]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Benjamin Jäger, Dominik Safaric, Simone Alessi, Adrian Hayler, Mihir Manium, Rosen Yu, Felix Jablonski, Shi Bin Hoo, Anurag Garg, Jake Robertson, Magnus Bühler, Vladyslav Moroshan, Lennart Purucker, Clara Cornu, Lilly Charlotte Wehrhahn, Alessandro Bonetto, Bernhard Schölk...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Tabicl: A tabular foundation model for in-context learning on large data

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. Tabicl: A tabular foundation model for in-context learning on large data. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, editors,Forty-second International Conference on 15 Shaping the Prior: How Synthetic Tas...

work page 2025
[9]

Orion-bix: Bi-axial attention for tabular in-context learning

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, and Vinay Kumar Sankarapu. Orion-bix: Bi-axial attention for tabular in-context learning. In Hakim Hacid, Yoelle Maarek, Francesco Bonchi, Ido Guy, and Emine Yilmaz, editors,Proceedings of the ACM Web Conference 2026, WWW 2026, Dubai, United Arab Emirates, originally scheduled for April 13-17, 2026, rescheduled...

work page 2026
[10]

Orion-MSP: Multi-scale sparse attention for tabular in-context learning.arXiv preprint arXiv:2511.02818, 2025

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, and Vinay Kumar Sankarapu. Orion-msp: Multi-scale sparse attention for tabular in-context learning.CoRR, abs/2511.02818, 2025

work page arXiv 2025
[11]

Limix: Unleashing structured- data modeling capability for generalist intelligence

Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, et al. Limix: Unleashing structured-data modeling capability for generalist intelligence.arXiv preprint arXiv:2509.03505, 2025

work page arXiv 2025
[12]

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, David Sali- nas, and Frank Hutter. Tabarena: A living benchmark for machine learning on tabular data.arXiv preprint arXiv:2506.16791, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Foundations of structural causal models with cycles and latent variables.The Annals of Statistics, 49(5):2885–2915, 2021

Stephan Bongers, Patrick Forré, Jonas Peters, and Joris M Mooij. Foundations of structural causal models with cycles and latent variables.The Annals of Statistics, 49(5):2885–2915, 2021

work page 2021
[14]

Chapman and Hall/CRC, 2017

Leo Breiman, Jerome Friedman, Richard A Olshen, and Charles J Stone.Classification and regression trees. Chapman and Hall/CRC, 2017

work page 2017
[15]

Extremely randomized trees.Machine learning, 63(1):3–42, 2006

Pierre Geurts, Damien Ernst, and Louis Wehenkel. Extremely randomized trees.Machine learning, 63(1):3–42, 2006

work page 2006
[16]

Random forests.Machine learning, 45(1):5–32, 2001

Leo Breiman. Random forests.Machine learning, 45(1):5–32, 2001

work page 2001
[17]

MIT press Cambridge, MA, 2006

Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006

work page 2006
[18]

Oxford University Press, 2010

Timo Teräsvirta, Dag Tjøstheim, and Clive WJ Granger.Modelling nonlinear economic time series. Oxford University Press, 2010

work page 2010
[19]

Estimating mutual information.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 69(6):066138, 2004

Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. Estimating mutual information.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 69(6):066138, 2004

work page 2004
[20]

From louvain to leiden: guaranteeing well-connected communities.Scientific reports, 9(1):5233, 2019

Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. From louvain to leiden: guaranteeing well-connected communities.Scientific reports, 9(1):5233, 2019

work page 2019
[21]

nanotabpfn: A lightweight and educa- tional reimplementation of tabpfn.arXiv preprint arXiv:2511.03634, 2025

Alexander Pfefferle, Johannes Hog, Lennart Purucker, and Frank Hutter. nanotabpfn: A lightweight and educa- tional reimplementation of tabpfn.arXiv preprint arXiv:2511.03634, 2025

work page arXiv 2025
[22]

Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

work page 2025
[23]

TabICLv2: A better, faster, scalable, and open tabular foundation model.arXiv preprint arXiv:2602.11139,

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. Tabiclv2: A better, faster, scalable, and open tabular foundation model.CoRR, abs/2602.11139, 2026

work page arXiv 2026
[24]

arXiv preprint arXiv:1708.03731 (2021)

Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Pieter Gijsbers, Frank Hutter, Michel Lang, Rafael G Man- tovani, Jan N van Rijn, and Joaquin Vanschoren. Openml benchmarking suites.arXiv preprint arXiv:1708.03731, 2017

work page arXiv 2017
[25]

Tabstruct: Measuring structural fidelity of tabular data

Xiangjian Jiang, Nikola Simidjievski, and Mateja Jamnik. Tabstruct: Measuring structural fidelity of tabular data. arXiv preprint arXiv:2509.11950, 2025

work page arXiv 2025
[26]

TabDPT: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164,

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Maksims V olkovs, and Anthony L. Caterini. Tabdpt: Scaling tabular foundation models. CoRR, abs/2410.18164, 2024

work page arXiv 2024
[27]

Orion-bix: Bi-axial attention for tabular in-context learning.CoRR, abs/2512.00181, 2025

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, and Vinay Kumar Sankarapu. Orion-bix: Bi-axial attention for tabular in-context learning.CoRR, abs/2512.00181, 2025

work page arXiv 2025
[28]

TabTune: A unified library for inference and fine-tuning tabular foundation models.arXiv preprint arXiv:2511.02802, 2025

Aditya Tanna, Pratinav Seth, Mohamed Bouadi, Utsav Avaiya, and Vinay Kumar Sankarapu. TabTune: A unified library for inference and fine-tuning tabular foundation models.arXiv preprint arXiv:2511.02802, 2025

work page arXiv 2025
[29]

Exploring fine-tuning for tabular foundation models

Aditya Tanna, Pratinav Seth, Mohamed Bouadi, and Vinay Kumar Sankarapu. Exploring fine-tuning for tabular foundation models. InProceedings of the ACM Web Conference 2026, pages 8613–8616, 2026

work page 2026
[30]

Transformers can do bayesian inference

Samuel Müller, Noah Hollmann, Sebastian Pineda-Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. InThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022

work page 2022
[31]

Causal inference.Causality: objectives and assessment, pages 39–58, 2010

Judea Pearl. Causal inference.Causality: objectives and assessment, pages 39–58, 2010. 16 Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality

work page 2010
[32]

MIT press, 2017

Jonas Peters, Dominik Janzing, and Bernhard Scholkopf.Elements of causal inference: foundations and learning algorithms. MIT press, 2017

work page 2017
[33]

Fine-tuned in-context learning transformers are excellent tabular data classifiers

Felix den Breejen, Sangmin Bae, Stephen Cha, and Se-Young Yun. Fine-tuned in-context learning transformers are excellent tabular data classifiers. 2024

work page 2024
[34]

Improving tabpfn’s synthetic data generation by integrating causal structure.CoRR, abs/2603.10254, 2026

Davide Tugnoli, Andrea De Lorenzo, Marco Virgolin, and Giovanni Cinà. Improving tabpfn’s synthetic data generation by integrating causal structure.CoRR, abs/2603.10254, 2026

work page arXiv 2026
[35]

Relational In-Context Learning via Synthetic Pre-training with Structural Prior

Yanbo Wang, Jiaxuan You, Chuan Shi, and Muhan Zhang. Relational in-context learning via synthetic pre-training with structural prior.arXiv preprint arXiv:2603.03805, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[36]

Modeling tabular data using conditional gan.Advances in neural information processing systems, 32, 2019

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Modeling tabular data using conditional gan.Advances in neural information processing systems, 32, 2019

work page 2019
[37]

Language models are realistic tabular data generators

Vadim Borisov, Kathrin Seßler, Tobias Leemann, Martin Pawelczyk, and Gjergji Kasneci. Language models are realistic tabular data generators. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023

work page 2023
[38]

Tabddpm: Modelling tabular data with diffusion models

Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. Tabddpm: Modelling tabular data with diffusion models. InInternational conference on machine learning, pages 17564–17579. PMLR, 2023. 17 Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality A Data Visualisation Figures 3, 4, 5, 6 shows the PCA-ba...

work page 2023

[1] [1]

Why do tree-based models still outperform deep learning on typical tabular data?Advances in neural information processing systems, 35:507–520, 2022

Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data?Advances in neural information processing systems, 35:507–520, 2022

work page 2022

[2] [2]

Vishak prasad c.Ganesh Ramakrishnan, Micah Goldblum, and Colin White.“When do neural nets outperform boosted trees on tabular data, 2023

Duncan McElfresh, Sujay Khandagale, and Jonathan Valverde. Vishak prasad c.Ganesh Ramakrishnan, Micah Goldblum, and Colin White.“When do neural nets outperform boosted trees on tabular data, 2023

work page 2023

[3] [3]

Xgboost: A scalable tree boosting system

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016

work page 2016

[4] [4]

Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems, 30, 2017

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems, 30, 2017

work page 2017

[5] [5]

Catboost: unbiased boosting with categorical features.Advances in neural information processing systems, 31, 2018

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr V orobev, Anna Veronika Dorogush, and Andrey Gulin. Catboost: unbiased boosting with categorical features.Advances in neural information processing systems, 31, 2018

work page 2018

[6] [6]

Tabpfn: A transformer that solves small tabular classification problems in a second

Noah Hollmann, Samuel Müller, Katharina Eggensperger, and Frank Hutter. Tabpfn: A transformer that solves small tabular classification problems in a second. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023

work page 2023

[7] [7]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Benjamin Jäger, Dominik Safaric, Simone Alessi, Adrian Hayler, Mihir Manium, Rosen Yu, Felix Jablonski, Shi Bin Hoo, Anurag Garg, Jake Robertson, Magnus Bühler, Vladyslav Moroshan, Lennart Purucker, Clara Cornu, Lilly Charlotte Wehrhahn, Alessandro Bonetto, Bernhard Schölk...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Tabicl: A tabular foundation model for in-context learning on large data

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. Tabicl: A tabular foundation model for in-context learning on large data. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste-Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, editors,Forty-second International Conference on 15 Shaping the Prior: How Synthetic Tas...

work page 2025

[9] [9]

Orion-bix: Bi-axial attention for tabular in-context learning

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, and Vinay Kumar Sankarapu. Orion-bix: Bi-axial attention for tabular in-context learning. In Hakim Hacid, Yoelle Maarek, Francesco Bonchi, Ido Guy, and Emine Yilmaz, editors,Proceedings of the ACM Web Conference 2026, WWW 2026, Dubai, United Arab Emirates, originally scheduled for April 13-17, 2026, rescheduled...

work page 2026

[10] [10]

Orion-MSP: Multi-scale sparse attention for tabular in-context learning.arXiv preprint arXiv:2511.02818, 2025

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, and Vinay Kumar Sankarapu. Orion-msp: Multi-scale sparse attention for tabular in-context learning.CoRR, abs/2511.02818, 2025

work page arXiv 2025

[11] [11]

Limix: Unleashing structured- data modeling capability for generalist intelligence

Xingxuan Zhang, Gang Ren, Han Yu, Hao Yuan, Hui Wang, Jiansheng Li, Jiayun Wu, Lang Mo, Li Mao, Mingchao Hao, et al. Limix: Unleashing structured-data modeling capability for generalist intelligence.arXiv preprint arXiv:2509.03505, 2025

work page arXiv 2025

[12] [12]

TabArena: A Living Benchmark for Machine Learning on Tabular Data

Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, David Sali- nas, and Frank Hutter. Tabarena: A living benchmark for machine learning on tabular data.arXiv preprint arXiv:2506.16791, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

Foundations of structural causal models with cycles and latent variables.The Annals of Statistics, 49(5):2885–2915, 2021

Stephan Bongers, Patrick Forré, Jonas Peters, and Joris M Mooij. Foundations of structural causal models with cycles and latent variables.The Annals of Statistics, 49(5):2885–2915, 2021

work page 2021

[14] [14]

Chapman and Hall/CRC, 2017

Leo Breiman, Jerome Friedman, Richard A Olshen, and Charles J Stone.Classification and regression trees. Chapman and Hall/CRC, 2017

work page 2017

[15] [15]

Extremely randomized trees.Machine learning, 63(1):3–42, 2006

Pierre Geurts, Damien Ernst, and Louis Wehenkel. Extremely randomized trees.Machine learning, 63(1):3–42, 2006

work page 2006

[16] [16]

Random forests.Machine learning, 45(1):5–32, 2001

Leo Breiman. Random forests.Machine learning, 45(1):5–32, 2001

work page 2001

[17] [17]

MIT press Cambridge, MA, 2006

Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006

work page 2006

[18] [18]

Oxford University Press, 2010

Timo Teräsvirta, Dag Tjøstheim, and Clive WJ Granger.Modelling nonlinear economic time series. Oxford University Press, 2010

work page 2010

[19] [19]

Estimating mutual information.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 69(6):066138, 2004

Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. Estimating mutual information.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 69(6):066138, 2004

work page 2004

[20] [20]

From louvain to leiden: guaranteeing well-connected communities.Scientific reports, 9(1):5233, 2019

Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. From louvain to leiden: guaranteeing well-connected communities.Scientific reports, 9(1):5233, 2019

work page 2019

[21] [21]

nanotabpfn: A lightweight and educa- tional reimplementation of tabpfn.arXiv preprint arXiv:2511.03634, 2025

Alexander Pfefferle, Johannes Hog, Lennart Purucker, and Frank Hutter. nanotabpfn: A lightweight and educa- tional reimplementation of tabpfn.arXiv preprint arXiv:2511.03634, 2025

work page arXiv 2025

[22] [22]

Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

Noah Hollmann, Samuel Müller, Lennart Purucker, Arjun Krishnakumar, Max Körfer, Shi Bin Hoo, Robin Tibor Schirrmeister, and Frank Hutter. Accurate predictions on small data with a tabular foundation model.Nature, 637(8045):319–326, 2025

work page 2025

[23] [23]

TabICLv2: A better, faster, scalable, and open tabular foundation model.arXiv preprint arXiv:2602.11139,

Jingang Qu, David Holzmüller, Gaël Varoquaux, and Marine Le Morvan. Tabiclv2: A better, faster, scalable, and open tabular foundation model.CoRR, abs/2602.11139, 2026

work page arXiv 2026

[24] [24]

arXiv preprint arXiv:1708.03731 (2021)

Bernd Bischl, Giuseppe Casalicchio, Matthias Feurer, Pieter Gijsbers, Frank Hutter, Michel Lang, Rafael G Man- tovani, Jan N van Rijn, and Joaquin Vanschoren. Openml benchmarking suites.arXiv preprint arXiv:1708.03731, 2017

work page arXiv 2017

[25] [25]

Tabstruct: Measuring structural fidelity of tabular data

Xiangjian Jiang, Nikola Simidjievski, and Mateja Jamnik. Tabstruct: Measuring structural fidelity of tabular data. arXiv preprint arXiv:2509.11950, 2025

work page arXiv 2025

[26] [26]

TabDPT: Scaling tabular foundation models on real data.arXiv preprint arXiv:2410.18164,

Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh, Hamidreza Kamkari, Alex Labach, Jesse C. Cresswell, Keyvan Golestan, Guangwei Yu, Maksims V olkovs, and Anthony L. Caterini. Tabdpt: Scaling tabular foundation models. CoRR, abs/2410.18164, 2024

work page arXiv 2024

[27] [27]

Orion-bix: Bi-axial attention for tabular in-context learning.CoRR, abs/2512.00181, 2025

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, and Vinay Kumar Sankarapu. Orion-bix: Bi-axial attention for tabular in-context learning.CoRR, abs/2512.00181, 2025

work page arXiv 2025

[28] [28]

TabTune: A unified library for inference and fine-tuning tabular foundation models.arXiv preprint arXiv:2511.02802, 2025

Aditya Tanna, Pratinav Seth, Mohamed Bouadi, Utsav Avaiya, and Vinay Kumar Sankarapu. TabTune: A unified library for inference and fine-tuning tabular foundation models.arXiv preprint arXiv:2511.02802, 2025

work page arXiv 2025

[29] [29]

Exploring fine-tuning for tabular foundation models

Aditya Tanna, Pratinav Seth, Mohamed Bouadi, and Vinay Kumar Sankarapu. Exploring fine-tuning for tabular foundation models. InProceedings of the ACM Web Conference 2026, pages 8613–8616, 2026

work page 2026

[30] [30]

Transformers can do bayesian inference

Samuel Müller, Noah Hollmann, Sebastian Pineda-Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. InThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022

work page 2022

[31] [31]

Causal inference.Causality: objectives and assessment, pages 39–58, 2010

Judea Pearl. Causal inference.Causality: objectives and assessment, pages 39–58, 2010. 16 Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality

work page 2010

[32] [32]

MIT press, 2017

Jonas Peters, Dominik Janzing, and Bernhard Scholkopf.Elements of causal inference: foundations and learning algorithms. MIT press, 2017

work page 2017

[33] [33]

Fine-tuned in-context learning transformers are excellent tabular data classifiers

Felix den Breejen, Sangmin Bae, Stephen Cha, and Se-Young Yun. Fine-tuned in-context learning transformers are excellent tabular data classifiers. 2024

work page 2024

[34] [34]

Improving tabpfn’s synthetic data generation by integrating causal structure.CoRR, abs/2603.10254, 2026

Davide Tugnoli, Andrea De Lorenzo, Marco Virgolin, and Giovanni Cinà. Improving tabpfn’s synthetic data generation by integrating causal structure.CoRR, abs/2603.10254, 2026

work page arXiv 2026

[35] [35]

Relational In-Context Learning via Synthetic Pre-training with Structural Prior

Yanbo Wang, Jiaxuan You, Chuan Shi, and Muhan Zhang. Relational in-context learning via synthetic pre-training with structural prior.arXiv preprint arXiv:2603.03805, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[36] [36]

Modeling tabular data using conditional gan.Advances in neural information processing systems, 32, 2019

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. Modeling tabular data using conditional gan.Advances in neural information processing systems, 32, 2019

work page 2019

[37] [37]

Language models are realistic tabular data generators

Vadim Borisov, Kathrin Seßler, Tobias Leemann, Martin Pawelczyk, and Gjergji Kasneci. Language models are realistic tabular data generators. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023

work page 2023

[38] [38]

Tabddpm: Modelling tabular data with diffusion models

Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. Tabddpm: Modelling tabular data with diffusion models. InInternational conference on machine learning, pages 17564–17579. PMLR, 2023. 17 Shaping the Prior: How Synthetic Task Distributions Determine Tabular Foundation Model Quality A Data Visualisation Figures 3, 4, 5, 6 shows the PCA-ba...

work page 2023