KumoRFM-2: Scaling Foundation Models for Relational Learning

· 2026 · cs.LG · arXiv 2604.12596

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

We introduce KumoRFM-2, the next iteration of a pre-trained foundation model for relational data. KumoRFM-2 supports in-context learning as well as fine-tuning and is applicable to a wide range of predictive tasks. In contrast to tabular foundation models, KumoRFM-2 natively operates on relational data, processing one or more connected tables simultaneously without manual table flattening or target variable generation, all while preserving temporal consistency. KumoRFM-2 leverages a large corpus of synthetic and real-world data to pre-train across four axes: the row and column dimensions at the individual table level, and the foreign key and cross-sample dimensions at the database level. In contrast to its predecessor, KumoRFM-2 injects task information as early as possible, enabling sharper selection of task-relevant columns and improved robustness to noisy data. Through extensive experiments on 41 challenging benchmarks and analysis around expressivity and sensitivity, we demonstrate that KumoRFM-2 outperforms supervised and foundational approaches by up to 8%, while maintaining strong performance under extreme settings of cold start and noisy data. To our knowledge, this is the first time a few-shot foundation model has been shown to surpass supervised approaches on common benchmark tasks, with performance further improving upon fine-tuning. Finally, while KumoRFM-1 was limited to small-scale in-memory datasets, KumoRFM-2 scales to billion-scale relational datasets.

citation-role summary

baseline 2

citation-polarity summary

baseline 2

representative citing papers

TabPFN-3: Technical Report

cs.LG · 2026-05-13 · unverdicted · novelty 6.0 · 2 refs

TabPFN-3 scales tabular foundation models to 1M rows with synthetic pretraining, test-time compute, and benchmark-leading performance on tabular, relational, and tabular-text tasks while being up to 20x faster than TabPFN-2.5.

RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases

cs.AI · 2026-06-02 · unverdicted · novelty 5.0

RelGT-AC adds column masking, unified task head, and TF-IDF encoding to RelGT, outperforming GraphSAGE on regression autocomplete tasks and gaining up to 10 AUROC on text-heavy tasks across RelBench v2 datasets.

Incorporating Deep Learning Design in Database Queries

cs.DB · 2026-05-22 · unverdicted · novelty 5.0

RelaNN associates tuples with learnable embeddings and lifts relational queries to jointly process data and embeddings, enabling declarative implementation of graph neural networks inside database systems.

RelAgent: LLM Agents as Data Scientists for Relational Learning

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

RelAgent uses an LLM agent to autonomously generate SQL feature programs paired with classical models for interpretable relational learning predictions that execute efficiently on standard databases.

citing papers explorer

Showing 4 of 4 citing papers.

TabPFN-3: Technical Report cs.LG · 2026-05-13 · unverdicted · none · ref 57 · 2 links · internal anchor
TabPFN-3 scales tabular foundation models to 1M rows with synthetic pretraining, test-time compute, and benchmark-leading performance on tabular, relational, and tabular-text tasks while being up to 20x faster than TabPFN-2.5.
RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases cs.AI · 2026-06-02 · unverdicted · none · ref 10 · internal anchor
RelGT-AC adds column masking, unified task head, and TF-IDF encoding to RelGT, outperforming GraphSAGE on regression autocomplete tasks and gaining up to 10 AUROC on text-heavy tasks across RelBench v2 datasets.
Incorporating Deep Learning Design in Database Queries cs.DB · 2026-05-22 · unverdicted · none · ref 26 · internal anchor
RelaNN associates tuples with learnable embeddings and lifts relational queries to jointly process data and embeddings, enabling declarative implementation of graph neural networks inside database systems.
RelAgent: LLM Agents as Data Scientists for Relational Learning cs.LG · 2026-05-08 · unverdicted · none · ref 16 · internal anchor
RelAgent uses an LLM agent to autonomously generate SQL feature programs paired with classical models for interpretable relational learning predictions that execute efficiently on standard databases.

KumoRFM-2: Scaling Foundation Models for Relational Learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer