Tabular foundation models excel on tiny- to medium-sized IID data but are outperformed by traditional tree-based and deep learning models on non-IID, large, and high-dimensional datasets, based on evaluations across 11 models and 142 datasets in the new BeyondArena benchmark.
Turning tabular foundation models into graph foundation models
7 Pith papers cite this work. Polarity classification is still indexing.
abstract
While foundation models have revolutionized fields such as natural language processing and computer vision, their potential in graph machine learning remains largely unexplored. One of the key challenges in designing graph foundation models (GFMs) is handling diverse node features that can vary across different graph datasets. While many works on GFMs have focused exclusively on text-attributed graphs, the problem of handling arbitrary features of other types in GFMs has not been fully addressed. However, this problem is not unique to the graph domain, as it also arises in the field of machine learning for tabular data. In this work, motivated by the recent success of tabular foundation models (TFMs) like TabPFNv2 and LimiX, we propose G2T-FM, a simple framework that allows tabular foundation models to be applied to graph node-level tasks. Specifically, G2T-FM augments the original node features with neighborhood feature aggregation, adds structural embeddings, and then applies a TFM to the constructed node representations. Even in the in-context learning setting, our model achieves strong results when combined with a strong TFM, outperforming both prior GFMs and well-tuned GNNs trained from scratch. Moreover, after finetuning, G2T-FM consistently surpasses well-tuned GNN baselines, often by a significant margin. In summary, our paper reveals the potential of a previously overlooked direction: utilizing tabular foundation models for graph machine learning tasks.
citation-role summary
citation-polarity summary
fields
cs.LG 7roles
background 3polarities
background 3representative citing papers
DyGFM introduces decoupled pre-training and divergence-conditioned prompts to create the first multi-domain dynamic graph foundation model that outperforms baselines on node classification and link prediction.
OpenRFM combines a relational transformer backbone with a batch-level ICL layer and homophily-aware synthetic-plus-real pre-training to improve relational in-context learning by ~30% over prior open models and surpass KumoRFMv1.
TabPFN-3 scales tabular foundation models to 1M rows with synthetic pretraining, test-time compute, and benchmark-leading performance on tabular, relational, and tabular-text tasks while being up to 20x faster than TabPFN-2.5.
KumoRFM-2 pre-trains on synthetic and real relational data across row, column, foreign-key and cross-sample axes, injects task information early, and achieves up to 8% gains over supervised baselines on 41 benchmarks in few-shot and fine-tuned regimes while handling billion-scale datasets.
TabPFN-2.5 scales tabular foundation models to 20x larger datasets, outperforms tuned tree models on TabArena, achieves near-perfect win rates against default XGBoost, and adds a distillation engine for fast production deployment.
Introduces curvature-stratified evaluation showing relational learning model rankings are stable within curvature regimes but shift across them, making performance geometry-dependent.
citing papers explorer
No citing papers match the current filters.