A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
Title resolution pending
14 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.
The paper proposes Retrieval Augmented Forecasting (RAF) that augments time-series foundation models with retrieved similar series to improve forecasting accuracy across domains.
ADKO is a decentralized framework where agents share compact GP-derived tokens and LM insights to achieve collaborative Bayesian optimization with a decomposed regret bound that includes compression and approximation losses.
Pre-trained TabPFN acts as an effective training-free summary network for neural posterior estimation, matching or outperforming standard methods while preserving useful marginal and location information in the posterior.
Decoupled PFNs use controllable synthetic priors to train separate latent-signal and noise heads, making epistemic-aleatoric decomposition identifiable and improving acquisition in noisy settings.
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
SGNNs pretrain neural networks on synthetic corpora from multiple mechanistic models and noise levels to enable robust forecasting and back-to-simulation attribution across epidemiology, ecology, and other fields.
Amortized transformer model with conditional fixed-point iterations learns SCM causal mechanisms from data and graphs, matching per-dataset baselines and outperforming in low-data regimes.
CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks including large-scale hierarchical tasks.
Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.
Preprocessing-optimized TabPFN achieves top average rank in regression on 66 NIR datasets and remains competitive on outliers and extrapolation compared to PLS, Ridge, CatBoost, and CNN-1D.
Thresholding and downsampling effectively mitigate class imbalance in PFNs for tabular classification due to their calibration and limited-data strengths.
citing papers explorer
-
STRABLE: Benchmarking Tabular Machine Learning with Strings
A new corpus of 108 mixed string-numeric tables shows that advanced tabular learners with basic string embeddings perform well on most real-world data, while large LLM encoders help on free-text heavy tables.
-
What learning algorithm is in-context learning? Investigations with linear models
Transformers performing in-context learning implicitly implement gradient descent, ridge regression, and least-squares predictors for linear models, with behavior shifting based on model depth, width, and data noise.
-
SurvivalPFN: Amortizing Survival Prediction via In-Context Bayesian Inference
SurvivalPFN amortizes Bayesian survival analysis for right-censored data by pretraining a prior-data fitted network on synthetic identifiable DGPs and then performing in-context inference, achieving competitive results on 61 real datasets.
-
Retrieval Augmented Time Series Forecasting
The paper proposes Retrieval Augmented Forecasting (RAF) that augments time-series foundation models with retrieved similar series to improve forecasting accuracy across domains.
-
ADKO: Agentic Decentralized Knowledge Optimization
ADKO is a decentralized framework where agents share compact GP-derived tokens and LM insights to achieve collaborative Bayesian optimization with a decomposed regret bound that includes compression and approximation losses.
-
Pre-trained Tabular Foundation Models as Versatile Summary Networks for Neural Posterior Estimation
Pre-trained TabPFN acts as an effective training-free summary network for neural posterior estimation, matching or outperforming standard methods while preserving useful marginal and location information in the posterior.
-
Decoupled PFNs: Identifiable Epistemic-Aleatoric Decomposition via Structured Synthetic Priors
Decoupled PFNs use controllable synthetic priors to train separate latent-signal and noise heads, making epistemic-aleatoric decomposition identifiable and improving acquisition in noisy settings.
-
Tabular foundation models for in-context prediction of molecular properties
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
-
Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery
SGNNs pretrain neural networks on synthetic corpora from multiple mechanistic models and noise levels to enable robust forecasting and back-to-simulation attribution across epidemiology, ecology, and other fields.
-
Amortized Inference of Causal Models via Conditional Fixed-Point Iterations
Amortized transformer model with conditional fixed-point iterations learns SCM causal mechanisms from data and graphs, matching per-dataset baselines and outperforming in low-data regimes.
-
Modular Multimodal Classification Without Fine-Tuning: A Simple Compositional Approach
CoMET achieves strong multimodal classification performance by composing frozen modality encoders, PCA compression, and tabular foundation models without any training, reaching state-of-the-art on diverse benchmarks including large-scale hierarchical tasks.
-
Foundation Models for Credit Risk Prediction: A Game Changer?
Tabular foundation models outperform standard methods in credit risk PD and LGD tasks, with larger gains on smaller datasets when used out-of-the-box.
-
Tabular foundation models for robust calibration of near-infrared chemical sensing data
Preprocessing-optimized TabPFN achieves top average rank in regression on 66 NIR datasets and remains competitive on outliers and extrapolation compared to PLS, Ridge, CatBoost, and CNN-1D.
-
Correcting Class Imbalance in Prior-Data Fitted Networks for Tabular Classification
Thresholding and downsampling effectively mitigate class imbalance in PFNs for tabular classification due to their calibration and limited-data strengths.