LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models

Guillermo Gomez-Trenado; Isaac Triguero; Oroel Ipas; Roc\'io Romero-Zaliz

arxiv: 2605.27254 · v1 · pith:AO2LOWZPnew · submitted 2026-05-26 · 💻 cs.LG · cs.AI

LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models

Oroel Ipas , Guillermo Gomez-Trenado , Roc\'io Romero-Zaliz , Isaac Triguero This is my paper

Pith reviewed 2026-06-29 18:19 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords tabular foundation modelscontext selectionlow-label learningunsupervised embeddingsmedoid selectionPrior-Fitted Networkcold-startlatent geometry

0 comments

The pith

Selecting context using latent embeddings from an unsupervised Prior-Fitted Network outperforms raw-feature and random selection for low-label tabular foundation models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that context selection for tabular foundation models in cold-start low-label settings should use the latent geometry induced by embeddings from an unsupervised Prior-Fitted Network rather than the original heterogeneous tabular features. Raw space lacks a reliable metric because of mixed types, scales, and nonlinear interactions, so selection there often falls below random as budgets grow. LUCoS selects representative medoids in the latent space and ranks first in mean AUC, accuracy, and F1 across 67 OpenML-CC18 datasets and six budgets. A gain decomposition shows that enforcing coverage drives most of the benefit at the smallest budgets while the choice of representation space dominates at larger budgets. The central insight is that reliable unsupervised context selection depends primarily on defining representativeness inside a meaningful geometry.

Core claim

LUCoS replaces the unreliable geometry of the original heterogeneous tabular feature space with the latent geometry induced by embeddings from an unsupervised Prior-Fitted Network and selects representative medoids from this space to serve as the labeled context. Across 67 OpenML-CC18 datasets and six low-label budgets, this yields first rank under mean AUC, ACC, and F1, with conclusions stable across metrics and dataset-level checks. Gain decomposition reveals that at the smallest budgets the main benefit comes from enforcing coverage; as the budget increases, the decisive factor becomes the representation space in which coverage is measured. LUCoS thereby mitigates the failures of original

What carries the argument

Latent geometry induced by embeddings from an unsupervised Prior-Fitted Network, used to select representative medoids as context

If this is right

Original-space selection performs below random on the majority of datasets as the labeling budget grows.
Enforcing coverage via medoid selection supplies the primary performance gain at the smallest budgets.
The choice of representation space becomes the decisive performance factor once budgets are moderately larger.
Reliable unsupervised context selection depends less on selector sophistication than on using a meaningful representation geometry.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same latent-geometry approach could supply a general metric for instance selection in other heterogeneous data settings where foundation models already produce embeddings.
Pre-trained unsupervised networks may serve as label-free sources of representativeness metrics that transfer across tabular tasks without retraining.
The budget-dependent shift from coverage to representation space suggests a natural schedule for switching selection strategies as more labels become available.

Load-bearing premise

The latent geometry induced by embeddings from an unsupervised Prior-Fitted Network provides a meaningfully better metric for selecting representative medoids than the original heterogeneous tabular feature space.

What would settle it

If raw-space medoid selection or random selection achieves equal or higher mean AUC than LUCoS on a majority of the 67 datasets at multiple low-label budgets under the same protocol, the claimed advantage of the latent geometry would be falsified.

Figures

Figures reproduced from arXiv: 2605.27254 by Guillermo Gomez-Trenado, Isaac Triguero, Oroel Ipas, Roc\'io Romero-Zaliz.

**Figure 1.** Figure 1: Overview of LUCoS. The method consists of four stages: (i) embedding the unlabeled training data into a high-dimensional latent representation space, (ii) selecting representative instances using a geometric criterion, (iii) mapping the selected instances back to the original tabular space and querying the oracle to label them, and (iv) using the labeled instances as the in-context set of a supervised TFM … view at source ↗

**Figure 2.** Figure 2: Performance of different classification models versus number of selected training instances [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Paired AUC difference ∆AUC = AUCLUCoS − AUCOriginal-KM per labeling budget (n = 67 datasets). Boxes are split at zero (blue: LUCoS advantage; orange: Original-KM advantage); solid and dashed lines mark the median and mean. Wilcoxon signed-rank p-values are annotated above each boxplot (bold: p < 0.05). The advantage is near zero and non-significant at 1C–2C, shifts positively and reaches significance at 4C… view at source ↗

**Figure 4.** Figure 4: Left: Decomposition of the LUCoS gain over Random Unsupervised into ∆Sel (blue bars) and ∆Repr (orange hatched bars), shown as dataset-paired means per budget. Stars indicate Wilcoxon significance against zero: ∗p<0.05, ∗∗p<0.01, ∗∗∗p<0.001. The dashed vertical line marks the 2C–4C crossover, beyond which ∆Sel is non-significant and ∆Repr becomes the sole significant component. Right: Distribution of ∆Repr… view at source ↗

read the original abstract

Selecting which instances to label is a key challenge in low-label tabular learning. For recent Tabular Foundation Models such as TabPFN, context selection directly determines predictive performance. Supervised oracle experiments show that carefully chosen labeled context sets can strongly outperform random selection under the same labeling budget. However, the cold-start setting, where instances must be selected before any labels are available, has received little attention in the TFM literature. This problem is fundamentally geometric. In vision and language, foundation models induce embedding spaces where simple geometric selection methods are effective. In contrast, tabular instance selection has so far been performed predominantly in the original tabular space, which lacks a natural metric; heterogeneous types, mixed scales, and nonlinear interactions make raw-space distances unreliable for context construction, and original-space selection falls below random on the majority of datasets as the budget grows. We propose LUCoS (Latent Unsupervised Context Selection), which replaces raw-feature geometry with the latent geometry induced by embeddings from an unsupervised Prior-Fitted Network (PFN) and selects representative medoids as context. Evaluated on 67 OpenML-CC18 datasets across six low-label budgets, LUCoS ranks first under mean AUC, ACC, and F1, with conclusions stable across metrics and dataset-level robustness checks. A gain decomposition reveals a simple mechanism: at the smallest budgets, the main benefit comes from enforcing coverage; as the budget increases, the decisive factor becomes the representation space in which coverage is measured. LUCoS mitigates failures of original feature space selection, showing that reliable unsupervised context selection depends less on selector sophistication than on defining representativeness in a meaningful representation geometry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LUCoS shows medoid selection in unsupervised PFN embeddings beats raw space and random on 67 datasets for low-label tabular context, but the geometry advantage is asserted more than dissected.

read the letter

The core contribution is replacing raw tabular distances with medoids drawn from embeddings of an unsupervised Prior-Fitted Network for cold-start context selection. On 67 OpenML-CC18 datasets and six labeling budgets, this ranks first on mean AUC, accuracy, and F1, and the gain split shows coverage dominates at the smallest budgets while the representation space takes over as the budget grows. Raw-space selection falling below random on most datasets is a clear, useful negative result.

The evaluation scale and the decomposition are the strongest parts; they give concrete evidence that the choice of space matters more than the selector once you move beyond tiny budgets. The method is simple and directly addresses a practical gap in tabular foundation model deployment.

The soft spot is the limited probing of why the PFN latent geometry works better. The abstract states that raw space lacks a natural metric and that PFN embeddings fix the problem, but there is no analysis of what structure the embeddings preserve, no ablation against other unsupervised embeddings, and no check on whether the advantage is specific to PFN or just any dimensionality reduction. The claim that representation space becomes decisive therefore rests on the performance numbers rather than direct evidence about the geometry itself.

This is for researchers working on tabular foundation models and unsupervised instance selection. It supplies a workable method plus large-scale empirical support, so it deserves peer review even though the embedding analysis could be tightened.

Referee Report

3 major / 2 minor

Summary. The paper proposes LUCoS for unsupervised context selection in low-label tabular learning with models like TabPFN. It replaces raw-feature geometry with latent embeddings from an unsupervised Prior-Fitted Network, selects representative medoids as context, and reports that this ranks first in mean AUC, ACC, and F1 across 67 OpenML-CC18 datasets and six labeling budgets. A gain decomposition attributes benefits to coverage enforcement at the smallest budgets and to the representation space at larger budgets, with the central claim that reliable selection depends on defining representativeness in a meaningful geometry rather than selector sophistication.

Significance. If the results hold after verification of the embedding geometry and evaluation details, the work would establish that unsupervised PFN-induced spaces can mitigate failures of heterogeneous tabular feature spaces for cold-start context construction, providing a practical advance for tabular foundation models and highlighting representation choice over algorithmic complexity in instance selection.

major comments (3)

[Abstract / gain decomposition] Abstract and gain decomposition: the claim that 'as the budget increases, the decisive factor becomes the representation space' is load-bearing for the central argument that PFN geometry is meaningfully superior, yet the manuscript provides no comparison of PFN embeddings against other unsupervised embeddings (e.g., autoencoders, PCA, or random projections) or analysis of embedding properties such as clustering quality or downstream relevance; this leaves open whether gains arise from the specific PFN space or generic effects of dimensionality reduction.
[Evaluation] Evaluation section: the reported top ranking across 67 datasets and stability across metrics is presented without details on hyperparameter choices for the unsupervised PFN, statistical testing procedures, or explicit checks against post-hoc dataset exclusions, which directly affects verifiability of the robustness claims given the low-label regime.
[Motivation / baseline comparison] § on original-space baseline: the statement that original-space selection 'falls below random on the majority of datasets as the budget grows' is used to motivate the latent approach, but without quantitative per-dataset breakdowns or controls for feature preprocessing, it is unclear whether the raw-space failure is due to heterogeneous metrics or other implementation factors.

minor comments (2)

[Method] Notation for the medoid selection objective and the unsupervised PFN embedding function should be introduced with explicit equations rather than prose descriptions to improve reproducibility.
[Results] Figure captions for the gain decomposition plots should include the exact formulas used for the coverage vs. representation attribution to allow readers to replicate the analysis.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the central claims and improve verifiability. We address each major point below, proposing revisions to strengthen the manuscript where the concerns are valid.

read point-by-point responses

Referee: [Abstract / gain decomposition] Abstract and gain decomposition: the claim that 'as the budget increases, the decisive factor becomes the representation space' is load-bearing for the central argument that PFN geometry is meaningfully superior, yet the manuscript provides no comparison of PFN embeddings against other unsupervised embeddings (e.g., autoencoders, PCA, or random projections) or analysis of embedding properties such as clustering quality or downstream relevance; this leaves open whether gains arise from the specific PFN space or generic effects of dimensionality reduction.

Authors: We agree that direct comparisons to other unsupervised embeddings would better isolate the contribution of the PFN-induced geometry. The manuscript emphasizes PFN because it is a tabular foundation model whose unsupervised variant provides a natural latent space without requiring labels, but we acknowledge the need for broader controls. In revision we will add experiments comparing PFN embeddings against autoencoders, PCA, and random projections on the same 67 datasets, including metrics for clustering quality (e.g., silhouette score) and downstream relevance to the selection task. revision: yes
Referee: [Evaluation] Evaluation section: the reported top ranking across 67 datasets and stability across metrics is presented without details on hyperparameter choices for the unsupervised PFN, statistical testing procedures, or explicit checks against post-hoc dataset exclusions, which directly affects verifiability of the robustness claims given the low-label regime.

Authors: We will expand the evaluation section to include: (i) the exact hyperparameter settings used for the unsupervised PFN (inherited from the original TabPFN training protocol with no additional tuning); (ii) the statistical testing procedure (Wilcoxon signed-rank tests with Holm correction across datasets and budgets); and (iii) explicit confirmation that no datasets were excluded post-hoc—all 67 OpenML-CC18 datasets were retained. These details will be added to the main text and appendix. revision: yes
Referee: [Motivation / baseline comparison] § on original-space baseline: the statement that original-space selection 'falls below random on the majority of datasets as the budget grows' is used to motivate the latent approach, but without quantitative per-dataset breakdowns or controls for feature preprocessing, it is unclear whether the raw-space failure is due to heterogeneous metrics or other implementation factors.

Authors: The original-space baseline used standard preprocessing (one-hot encoding for categoricals followed by z-score normalization on numerical features) before distance computation. The aggregate result that it falls below random on the majority of datasets is driven by the lack of a natural metric in heterogeneous tabular spaces. We will add per-dataset breakdowns (showing the fraction below random at each budget) to the appendix and clarify the preprocessing steps in the main text. We maintain that the heterogeneous metrics are the primary cause, but the additional breakdowns will improve transparency. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation stands independently

full rationale

The paper describes an empirical method (LUCoS) that applies an off-the-shelf unsupervised PFN to induce embeddings, then performs medoid selection in that space, with performance measured on 67 external OpenML datasets. No equations, fitted parameters, or self-referential definitions are present that would make reported gains equivalent to the inputs by construction. The gain decomposition is post-hoc analysis of experimental results rather than a derivation. No load-bearing self-citations, uniqueness theorems, or smuggled ansatzes appear; the central claim rests on direct comparison to random and original-space baselines, which is externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated premise that PFN embeddings induce a geometry superior for medoid selection, but no details are given.

pith-pipeline@v0.9.1-grok · 5843 in / 1185 out tokens · 38145 ms · 2026-06-29T18:19:24.314422+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 5 canonical work pages · 3 internal anchors

[1]

Position: Why Tabular Foundation Models Should Be a Research Priority,

B. van Breugel and M. van der Schaar, “Position: Why Tabular Foundation Models Should Be a Research Priority,” inProceedings of the 41st International Conference on Machine Learning (ICML), 2024

2024
[2]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

L. Grinsztajn, K. Flöge, O. Key, F. Birkel, P. Jund, B. Roof, B. Jäger, D. Safaric, S. Alessi, A. Hayler,et al., “TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models,” arXiv preprint arXiv:2511.08667, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Deep Neural Networks and Tabular Data: A Survey,

V . Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci, “Deep Neural Networks and Tabular Data: A Survey,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 6, pp. 7499–7519, 2022

2022
[4]

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second,

N. Hollmann, S. Müller, K. Eggensperger, and F. Hutter, “TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second,” inInternational Conference on Learning Representations (ICLR), 2023

2023
[5]

Accurate Predictions on Small Data with a Tabular Foundation Model,

N. Hollmann, S. Müller, L. Purucker, A. Krishnakumar, M. Körfer, S. B. Hoo, R. T. Schirrmeis- ter, and F. Hutter, “Accurate Predictions on Small Data with a Tabular Foundation Model,” Nature, vol. 637, no. 8045, pp. 319–326, 2025

2025
[6]

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data,

J. Qu, D. Holzmüller, G. Varoquaux, and M. Le Morvan, “TabICL: A Tabular Foundation Model for In-Context Learning on Large Data,” inProceedings of the 42nd International Conference on Machine Learning (ICML), 2025

2025
[7]

TabICLv2: A better, faster, scalable, and open tabular foundation model.arXiv:2602.11139, 2026

J. Qu, D. Holzmüller, G. Varoquaux, and M. Le Morvan, “TabICLv2: A better, faster, scalable, and open Tabular Foundation Model,”arXiv preprint arXiv:2602.11139, 2026

work page arXiv 2026
[8]

TabDPT: Scaling Tabular Foundation Models on Real Data,

J. Ma, V . Thomas, R. Hosseinzadeh, A. Labach, J. C. Cresswell, K. Golestan, G. Yu, A. L. Caterini, and M. V olkovs, “TabDPT: Scaling Tabular Foundation Models on Real Data,” in Advances in Neural Information Processing Systems (NeurIPS), 2025. 10

2025
[9]

Language Models are Few-Shot Learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,et al., “Language Models are Few-Shot Learners,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 1877–1901, 2020

1901
[10]

TabLLM: Few- shot classification of tabular data with large language models,

S. Hegselmann, A. Buendia, H. Lang, M. Agrawal, X. Jiang, and D. Sontag, “TabLLM: Few- shot classification of tabular data with large language models,” inInternational Conference on Artificial Intelligence and Statistics (AISTATS), pp. 5549–5581, PMLR, 2023

2023
[11]

Cold-start active learning for image classification,

Q. Jin, M. Yuan, S. Li, H. Wang, M. Wang, and Z. Song, “Cold-start active learning for image classification,”Information Sciences, vol. 616, pp. 16–36, 2022

2022
[12]

Unsupervised Selective Labeling for More Effective Semi- supervised Learning,

X. Wang, L. Lian, and S. X. Yu, “Unsupervised Selective Labeling for More Effective Semi- supervised Learning,” inProceedings of the European Conference on Computer Vision (ECCV), pp. 427–445, Springer, 2022

2022
[13]

Enhancing Semi- Supervised Learning via Representative and Diverse Sample Selection,

Q. Shao, J. Kang, Q. Chen, Z. Li, H. Xu, Y . Cao, J. Liang, and J. Wu, “Enhancing Semi- Supervised Learning via Representative and Diverse Sample Selection,”Advances in Neural Information Processing Systems (NeurIPS), vol. 37, pp. 111199–111226, 2024

2024
[14]

Cold PAWS: Unsupervised Class Discovery and Addressing the Cold-start Problem for Semi-supervised Learning,

E. J. Mannix and H. D. Bondell, “Cold PAWS: Unsupervised Class Discovery and Addressing the Cold-start Problem for Semi-supervised Learning,”arXiv preprint arXiv:2305.10071, 2023

work page arXiv 2023
[15]

Active Learning for Convolutional Neural Networks: A Core-Set Approach,

O. Sener and S. Savarese, “Active Learning for Convolutional Neural Networks: A Core-Set Approach,” inInternational Conference on Learning Representations (ICLR), 2018

2018
[16]

Emerg- ing Properties in Self-Supervised Vision Transformers,

M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerg- ing Properties in Self-Supervised Vision Transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9650–9660, 2021

2021
[17]

Masked Autoencoders are Scalable Vision Learners,

K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked Autoencoders are Scalable Vision Learners,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000–16009, 2022

2022
[18]

Learning transferable visual models from natural language super- vision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark,et al., “Learning transferable visual models from natural language super- vision,” inProceedings of the 38th International Conference on Machine Learning (ICML), pp. 8748–8763, PMLR, 2021

2021
[19]

Representation Learning for Tabular Data: A Comprehensive Survey,

J.-P. Jiang, S.-Y . Liu, H.-R. Cai, Q.-L. Zhou, and H.-J. Ye, “Representation Learning for Tabular Data: A Comprehensive Survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 48, no. 6, pp. 6488–6508, 2026

2026
[20]

A Survey on Self-supervised Learning for Non-sequential Tabular Data,

W.-Y . Wang, W.-W. Du, D. Xu, W. Wang, and W.-C. Peng, “A Survey on Self-supervised Learning for Non-sequential Tabular Data,”Machine Learning, vol. 114, no. 1, p. 16, 2025

2025
[21]

Towards Localization via Data Em- bedding for TabPFN,

M. Koshil, T. Nagler, M. Feurer, and K. Eggensperger, “Towards Localization via Data Em- bedding for TabPFN,” inProceedings of the Third Table Representation Learning Workshop at NeurIPS, 2024

2024
[22]

TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering

T. Zhao, G. Wang, Y . S. Tan, and Q. Zhang, “TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering,”arXiv preprint arXiv:2601.21656, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[23]

TuneTables: Context Optimization for Scalable Prior-data Fitted Networks,

B. Feuer, R. T. Schirrmeister, V . Cherepanova, C. Hegde, F. Hutter, M. Goldblum, N. Cohen, and C. White, “TuneTables: Context Optimization for Scalable Prior-data Fitted Networks,” Advances in Neural Information Processing Systems (NeurIPS), vol. 37, pp. 83430–83464, 2024

2024
[24]

End-to-End Compression for Tabular Foundation Models

G. Zabërgja, R. Kamel, A. Kadra, C. M. M. Frey, and J. Grabocka, “End-to-End Compression for Tabular Foundation Models,”arXiv preprint arXiv:2602.05649, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[25]

A survey on deep semi-supervised learning,

X. Yang, Z. Song, I. King, and Z. Xu, “A survey on deep semi-supervised learning,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 9, pp. 8934–8954, 2023

2023
[26]

A survey on semi-supervised learning,

J. E. Van Engelen and H. H. Hoos, “A survey on semi-supervised learning,”Machine learning, vol. 109, no. 2, pp. 373–440, 2020

2020
[27]

Theoretical Insights into In-context Learning with Unlabeled Data,

Y . Li, X. Chang, M. Kara, X. Liu, A. Roy-Chowdhury, and S. Oymak, “Theoretical Insights into In-context Learning with Unlabeled Data,” inAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025
[28]

Generalizing from a few examples: A survey on few-shot learning,

Y . Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,”ACM Computing Surveys, vol. 53, no. 3, pp. 1–34, 2020. 11

2020
[29]

Zero-Shot Coreset Selection via Iterative Subspace Sampling,

B. A. Griffin, J. Marks, and J. J. Corso, “Zero-Shot Coreset Selection via Iterative Subspace Sampling,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2114–2124, 2026

2026
[30]

Prototype selection for nearest neighbor classifi- cation: Taxonomy and empirical study,

S. García, J. Derrac, J. Cano, and F. Herrera, “Prototype selection for nearest neighbor classifi- cation: Taxonomy and empirical study,”IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 34, no. 3, pp. 417–435, 2012

2012
[31]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 4171–4186, 2019

2019
[32]

VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain,

J. Yoon, Y . Zhang, J. Jordon, and M. van der Schaar, “VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain,”Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 11033–11043, 2020

2020
[33]

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption,

D. Bahri, H. Jiang, Y . Tay, and D. Metzler, “SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption,” inInternational Conference on Learning Representations (ICLR), 2022

2022
[34]

SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training,

G. Somepalli, A. Schwarzschild, M. Goldblum, C. B. Bruss, and T. Goldstein, “SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training,” inProceedings of the First Table Representation Learning Workshop at NeurIPS, 2022

2022
[35]

TransTab: Learning Transferable Tabular Transformers Across Tables,

Z. Wang and J. Sun, “TransTab: Learning Transferable Tabular Transformers Across Tables,” Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 2902–2915, 2022

2022
[36]

ZEUS: Zero-shot Embeddings for Unsupervised Separation of Tabular Data,

P. Marszałek, T. Ku´smierczyk, W. Wydma´nski, J. Tabor, and M. ´Smieja, “ZEUS: Zero-shot Embeddings for Unsupervised Separation of Tabular Data,” inAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025
[37]

OpenML Benchmarking Suites,

B. Bischl, G. Casalicchio, M. Feurer, P. Gijsbers, F. Hutter, M. Lang, R. G. Mantovani, J. N. van Rijn, and J. Vanschoren, “OpenML Benchmarking Suites,” inAdvances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2021

2021
[38]

A baseline for few-shot image classification,

G. S. Dhillon, P. Chaudhari, A. Ravichandran, and S. Soatto, “A baseline for few-shot image classification,” inInternational Conference on Learning Representations, 2020

2020
[39]

Statistical comparisons of classifiers over multiple data sets,

J. Demšar, “Statistical comparisons of classifiers over multiple data sets,”Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006

2006
[40]

Controlling the false discovery rate: a practical and powerful approach to multiple testing,

Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,”Journal of the Royal Statistical Society: Series B (Methodological), vol. 57, no. 1, pp. 289–300, 1995

1995
[41]

On a test of whether one of two random variables is stochasti- cally larger than the other,

H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochasti- cally larger than the other,”The Annals of Mathematical Statistics, pp. 50–60, 1947

1947
[42]

Jackknife resampling,

H. Friedl and E. Stampfer, “Jackknife resampling,” inEncyclopedia of Environmetrics(A. H. El-Shaarawi and W. W. Piegorsch, eds.), John Wiley & Sons, 2002

2002
[43]

TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications,

D. Salinaset al., “TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications,” inProceedings of the 41st International Conference on Machine Learning (ICML), 2024

2024
[44]

XGBoost: A Scalable Tree Boosting System,

T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 785–794, 2016

2016
[45]

A Memetic Algorithm for Evolutionary Prototype Selection: A Scaling Up Approach,

S. García, J. R. Cano, and F. Herrera, “A Memetic Algorithm for Evolutionary Prototype Selection: A Scaling Up Approach,”Pattern Recognition, vol. 41, no. 8, pp. 2693–2709, 2008

2008
[46]

k-means++: The Advantages of Careful Seeding,

D. Arthur and S. Vassilvitskii, “k-means++: The Advantages of Careful Seeding,” inProceed- ings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035, 2007. 12 A Societal Impact Statement We do not identify significant foreseeable risks associated with this work beyond those already present in existing unsupervised instance se...

2007

[1] [1]

Position: Why Tabular Foundation Models Should Be a Research Priority,

B. van Breugel and M. van der Schaar, “Position: Why Tabular Foundation Models Should Be a Research Priority,” inProceedings of the 41st International Conference on Machine Learning (ICML), 2024

2024

[2] [2]

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

L. Grinsztajn, K. Flöge, O. Key, F. Birkel, P. Jund, B. Roof, B. Jäger, D. Safaric, S. Alessi, A. Hayler,et al., “TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models,” arXiv preprint arXiv:2511.08667, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[3] [3]

Deep Neural Networks and Tabular Data: A Survey,

V . Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci, “Deep Neural Networks and Tabular Data: A Survey,”IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 6, pp. 7499–7519, 2022

2022

[4] [4]

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second,

N. Hollmann, S. Müller, K. Eggensperger, and F. Hutter, “TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second,” inInternational Conference on Learning Representations (ICLR), 2023

2023

[5] [5]

Accurate Predictions on Small Data with a Tabular Foundation Model,

N. Hollmann, S. Müller, L. Purucker, A. Krishnakumar, M. Körfer, S. B. Hoo, R. T. Schirrmeis- ter, and F. Hutter, “Accurate Predictions on Small Data with a Tabular Foundation Model,” Nature, vol. 637, no. 8045, pp. 319–326, 2025

2025

[6] [6]

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data,

J. Qu, D. Holzmüller, G. Varoquaux, and M. Le Morvan, “TabICL: A Tabular Foundation Model for In-Context Learning on Large Data,” inProceedings of the 42nd International Conference on Machine Learning (ICML), 2025

2025

[7] [7]

TabICLv2: A better, faster, scalable, and open tabular foundation model.arXiv:2602.11139, 2026

J. Qu, D. Holzmüller, G. Varoquaux, and M. Le Morvan, “TabICLv2: A better, faster, scalable, and open Tabular Foundation Model,”arXiv preprint arXiv:2602.11139, 2026

work page arXiv 2026

[8] [8]

TabDPT: Scaling Tabular Foundation Models on Real Data,

J. Ma, V . Thomas, R. Hosseinzadeh, A. Labach, J. C. Cresswell, K. Golestan, G. Yu, A. L. Caterini, and M. V olkovs, “TabDPT: Scaling Tabular Foundation Models on Real Data,” in Advances in Neural Information Processing Systems (NeurIPS), 2025. 10

2025

[9] [9]

Language Models are Few-Shot Learners,

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell,et al., “Language Models are Few-Shot Learners,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 1877–1901, 2020

1901

[10] [10]

TabLLM: Few- shot classification of tabular data with large language models,

S. Hegselmann, A. Buendia, H. Lang, M. Agrawal, X. Jiang, and D. Sontag, “TabLLM: Few- shot classification of tabular data with large language models,” inInternational Conference on Artificial Intelligence and Statistics (AISTATS), pp. 5549–5581, PMLR, 2023

2023

[11] [11]

Cold-start active learning for image classification,

Q. Jin, M. Yuan, S. Li, H. Wang, M. Wang, and Z. Song, “Cold-start active learning for image classification,”Information Sciences, vol. 616, pp. 16–36, 2022

2022

[12] [12]

Unsupervised Selective Labeling for More Effective Semi- supervised Learning,

X. Wang, L. Lian, and S. X. Yu, “Unsupervised Selective Labeling for More Effective Semi- supervised Learning,” inProceedings of the European Conference on Computer Vision (ECCV), pp. 427–445, Springer, 2022

2022

[13] [13]

Enhancing Semi- Supervised Learning via Representative and Diverse Sample Selection,

Q. Shao, J. Kang, Q. Chen, Z. Li, H. Xu, Y . Cao, J. Liang, and J. Wu, “Enhancing Semi- Supervised Learning via Representative and Diverse Sample Selection,”Advances in Neural Information Processing Systems (NeurIPS), vol. 37, pp. 111199–111226, 2024

2024

[14] [14]

Cold PAWS: Unsupervised Class Discovery and Addressing the Cold-start Problem for Semi-supervised Learning,

E. J. Mannix and H. D. Bondell, “Cold PAWS: Unsupervised Class Discovery and Addressing the Cold-start Problem for Semi-supervised Learning,”arXiv preprint arXiv:2305.10071, 2023

work page arXiv 2023

[15] [15]

Active Learning for Convolutional Neural Networks: A Core-Set Approach,

O. Sener and S. Savarese, “Active Learning for Convolutional Neural Networks: A Core-Set Approach,” inInternational Conference on Learning Representations (ICLR), 2018

2018

[16] [16]

Emerg- ing Properties in Self-Supervised Vision Transformers,

M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerg- ing Properties in Self-Supervised Vision Transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9650–9660, 2021

2021

[17] [17]

Masked Autoencoders are Scalable Vision Learners,

K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked Autoencoders are Scalable Vision Learners,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000–16009, 2022

2022

[18] [18]

Learning transferable visual models from natural language super- vision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark,et al., “Learning transferable visual models from natural language super- vision,” inProceedings of the 38th International Conference on Machine Learning (ICML), pp. 8748–8763, PMLR, 2021

2021

[19] [19]

Representation Learning for Tabular Data: A Comprehensive Survey,

J.-P. Jiang, S.-Y . Liu, H.-R. Cai, Q.-L. Zhou, and H.-J. Ye, “Representation Learning for Tabular Data: A Comprehensive Survey,”IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 48, no. 6, pp. 6488–6508, 2026

2026

[20] [20]

A Survey on Self-supervised Learning for Non-sequential Tabular Data,

W.-Y . Wang, W.-W. Du, D. Xu, W. Wang, and W.-C. Peng, “A Survey on Self-supervised Learning for Non-sequential Tabular Data,”Machine Learning, vol. 114, no. 1, p. 16, 2025

2025

[21] [21]

Towards Localization via Data Em- bedding for TabPFN,

M. Koshil, T. Nagler, M. Feurer, and K. Eggensperger, “Towards Localization via Data Em- bedding for TabPFN,” inProceedings of the Third Table Representation Learning Workshop at NeurIPS, 2024

2024

[22] [22]

TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering

T. Zhao, G. Wang, Y . S. Tan, and Q. Zhang, “TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering,”arXiv preprint arXiv:2601.21656, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[23] [23]

TuneTables: Context Optimization for Scalable Prior-data Fitted Networks,

B. Feuer, R. T. Schirrmeister, V . Cherepanova, C. Hegde, F. Hutter, M. Goldblum, N. Cohen, and C. White, “TuneTables: Context Optimization for Scalable Prior-data Fitted Networks,” Advances in Neural Information Processing Systems (NeurIPS), vol. 37, pp. 83430–83464, 2024

2024

[24] [24]

End-to-End Compression for Tabular Foundation Models

G. Zabërgja, R. Kamel, A. Kadra, C. M. M. Frey, and J. Grabocka, “End-to-End Compression for Tabular Foundation Models,”arXiv preprint arXiv:2602.05649, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[25] [25]

A survey on deep semi-supervised learning,

X. Yang, Z. Song, I. King, and Z. Xu, “A survey on deep semi-supervised learning,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 9, pp. 8934–8954, 2023

2023

[26] [26]

A survey on semi-supervised learning,

J. E. Van Engelen and H. H. Hoos, “A survey on semi-supervised learning,”Machine learning, vol. 109, no. 2, pp. 373–440, 2020

2020

[27] [27]

Theoretical Insights into In-context Learning with Unlabeled Data,

Y . Li, X. Chang, M. Kara, X. Liu, A. Roy-Chowdhury, and S. Oymak, “Theoretical Insights into In-context Learning with Unlabeled Data,” inAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025

[28] [28]

Generalizing from a few examples: A survey on few-shot learning,

Y . Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,”ACM Computing Surveys, vol. 53, no. 3, pp. 1–34, 2020. 11

2020

[29] [29]

Zero-Shot Coreset Selection via Iterative Subspace Sampling,

B. A. Griffin, J. Marks, and J. J. Corso, “Zero-Shot Coreset Selection via Iterative Subspace Sampling,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2114–2124, 2026

2026

[30] [30]

Prototype selection for nearest neighbor classifi- cation: Taxonomy and empirical study,

S. García, J. Derrac, J. Cano, and F. Herrera, “Prototype selection for nearest neighbor classifi- cation: Taxonomy and empirical study,”IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 34, no. 3, pp. 417–435, 2012

2012

[31] [31]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 4171–4186, 2019

2019

[32] [32]

VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain,

J. Yoon, Y . Zhang, J. Jordon, and M. van der Schaar, “VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain,”Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 11033–11043, 2020

2020

[33] [33]

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption,

D. Bahri, H. Jiang, Y . Tay, and D. Metzler, “SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption,” inInternational Conference on Learning Representations (ICLR), 2022

2022

[34] [34]

SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training,

G. Somepalli, A. Schwarzschild, M. Goldblum, C. B. Bruss, and T. Goldstein, “SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training,” inProceedings of the First Table Representation Learning Workshop at NeurIPS, 2022

2022

[35] [35]

TransTab: Learning Transferable Tabular Transformers Across Tables,

Z. Wang and J. Sun, “TransTab: Learning Transferable Tabular Transformers Across Tables,” Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 2902–2915, 2022

2022

[36] [36]

ZEUS: Zero-shot Embeddings for Unsupervised Separation of Tabular Data,

P. Marszałek, T. Ku´smierczyk, W. Wydma´nski, J. Tabor, and M. ´Smieja, “ZEUS: Zero-shot Embeddings for Unsupervised Separation of Tabular Data,” inAdvances in Neural Information Processing Systems (NeurIPS), 2025

2025

[37] [37]

OpenML Benchmarking Suites,

B. Bischl, G. Casalicchio, M. Feurer, P. Gijsbers, F. Hutter, M. Lang, R. G. Mantovani, J. N. van Rijn, and J. Vanschoren, “OpenML Benchmarking Suites,” inAdvances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2021

2021

[38] [38]

A baseline for few-shot image classification,

G. S. Dhillon, P. Chaudhari, A. Ravichandran, and S. Soatto, “A baseline for few-shot image classification,” inInternational Conference on Learning Representations, 2020

2020

[39] [39]

Statistical comparisons of classifiers over multiple data sets,

J. Demšar, “Statistical comparisons of classifiers over multiple data sets,”Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006

2006

[40] [40]

Controlling the false discovery rate: a practical and powerful approach to multiple testing,

Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,”Journal of the Royal Statistical Society: Series B (Methodological), vol. 57, no. 1, pp. 289–300, 1995

1995

[41] [41]

On a test of whether one of two random variables is stochasti- cally larger than the other,

H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochasti- cally larger than the other,”The Annals of Mathematical Statistics, pp. 50–60, 1947

1947

[42] [42]

Jackknife resampling,

H. Friedl and E. Stampfer, “Jackknife resampling,” inEncyclopedia of Environmetrics(A. H. El-Shaarawi and W. W. Piegorsch, eds.), John Wiley & Sons, 2002

2002

[43] [43]

TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications,

D. Salinaset al., “TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications,” inProceedings of the 41st International Conference on Machine Learning (ICML), 2024

2024

[44] [44]

XGBoost: A Scalable Tree Boosting System,

T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 785–794, 2016

2016

[45] [45]

A Memetic Algorithm for Evolutionary Prototype Selection: A Scaling Up Approach,

S. García, J. R. Cano, and F. Herrera, “A Memetic Algorithm for Evolutionary Prototype Selection: A Scaling Up Approach,”Pattern Recognition, vol. 41, no. 8, pp. 2693–2709, 2008

2008

[46] [46]

k-means++: The Advantages of Careful Seeding,

D. Arthur and S. Vassilvitskii, “k-means++: The Advantages of Careful Seeding,” inProceed- ings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035, 2007. 12 A Societal Impact Statement We do not identify significant foreseeable risks associated with this work beyond those already present in existing unsupervised instance se...

2007