pith. sign in

arxiv: 2605.16085 · v1 · pith:DKDPDEJZnew · submitted 2026-05-15 · 💻 cs.DB · cs.AI

Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks

Pith reviewed 2026-05-19 18:34 UTC · model grok-4.3

classification 💻 cs.DB cs.AI
keywords relational databasesfoundation modelslanguage modelsgraph neural networksrelational entity graphsBARTGraphSAGERelBench
0
0 comments X

The pith

A hybrid of fine-tuned BART and GraphSAGE on relational entity graphs enriches embeddings and competes with supervised baselines for relational database tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes combining language models with graph neural networks to model relational databases without flattening them into single tables. It shows that adding a GraphSAGE layer over relational entity graphs improves BART's row representations by injecting structural context. This hybrid achieves competitive results on prediction tasks from RelBench, suggesting a path to foundation models that handle structured relational data efficiently. A sympathetic reader would care because it bridges semantic understanding from text models with the relational structure that conventional approaches discard.

Core claim

We propose a hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over REGs to inject relational context. Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This performance is competitive with supervised baselines such as LightGBM (68.86) and narrows the gap to RDL (72.62) to within 5.22 points, though a substantial gap remains to state-of-the-art foundation models such as KumoRFM (82.63).

What carries the argument

Hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over relational entity graphs (REGs) to inject relational context.

Load-bearing premise

That the specific hybrid of fine-tuned BART plus GraphSAGE on relational entity graphs will generalize to arbitrary unseen databases and tasks sufficiently to serve as a foundation model, rather than remaining competitive only on the tested RelBench subset.

What would settle it

Evaluating the hybrid model on a completely new relational database and task outside the RelBench benchmark to check if performance remains competitive without additional fine-tuning.

Figures

Figures reproduced from arXiv: 2605.16085 by Fabian Leeske, Jingcheng Wu, Lucas Etteldorf, Max Finkenbeiner, Mojtaba Nayyeri, Ratan Bahadur Thapa, Steffen Staab.

Figure 1
Figure 1. Figure 1: Overview of the hybrid architecture. A fine-tuned BART encoder generates row-level embeddings from linearized database rows, which serve as initial node features in the relational entity graph (REG). Node￾type-specific linear layers project the 1024-dimensional BART embeddings to the 256-dimensional hidden space. Two shared SAGEConv layers then perform message passing across all edge types, and a linear de… view at source ↗
Figure 2
Figure 2. Figure 2: Loss curves during GNN pre-training with a masking probability of 0.15. (a) Combined scaled cosine and MSE loss for the training and validation sets. (b) Training MSE loss. The model largely converges by approximately epoch 5. 5. Experiments To evaluate whether our hybrid LM-GNN architecture can generalize to unseen relational databases, we conduct experiments on a held-out dataset from RelBench. We first … view at source ↗
Figure 3
Figure 3. Figure 3: Downstream adaptation pipeline for the driver-dnf classification task. The driver identifier is encoded by the pre-trained BART encoder and enriched via the pre-trained GNN, while the task date is encoded by a separate date encoder. The resulting embeddings are concatenated and passed through an MLP head trained with cross-entropy loss. Color coding: light red denotes task input features, grey represents l… view at source ↗
Figure 4
Figure 4. Figure 4: Training and validation metrics (loss, accuracy, and ROC-AUC) on the rel-f1 downstream task with frozen GNN parameters. The validation ROC-AUC exhibits considerable instability and does not show a clear upward trend. The training dynamics (Figs. 4 and 5) corroborate these findings. With GNN fine-tuning, ROC￾AUC improves steadily across epochs, indicating that the model progressively adapts its relational r… view at source ↗
Figure 5
Figure 5. Figure 5: Training and validation metrics (loss, accuracy, and ROC-AUC) on the rel-f1 downstream task with learnable GNN parameters. In contrast to the frozen setting ( [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A t-SNE visualization of row embeddings after BART encoding (before GNN processing) for the rel-f1 database. Each color corresponds to a distinct node type within the generated heterogeneous graph (e.g., drivers, races). The encoder produces coherent type-specific clusters, although some expected overlap is visible. GNN as Relational Encoder. The GNN compensates by injecting relational context through mess… view at source ↗
Figure 7
Figure 7. Figure 7: A t-SNE visualization of row embeddings after GNN message passing for the rel-f1 database. The color coding matches that of [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
read the original abstract

Relational databases store much of the world's structured information, and they are essential for driving complex predictive applications. However, deep learning progress on relational data remains limited, as conventional approaches flatten databases into single tables via manual feature engineering, discarding relational context. Relational deep learning (RDL) addresses this by modeling databases as relational entity graphs (REGs) for graph neural networks (GNNs), but remains task- and database-specific. To combine the strengths of both paradigms, we propose a hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over REGs to inject relational context. Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This performance is competitive with supervised baselines such as LightGBM (68.86) and narrows the gap to RDL (72.62) to within 5.22 points, though a substantial gap remains to state-of-the-art foundation models such as KumoRFM (82.63). These results suggest that lightweight hybrid LM-GNN architectures offer a promising and resource-efficient path towards foundation models for relational databases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a hybrid architecture that pairs a fine-tuned BART encoder for intra-row textual semantics with a GraphSAGE GNN operating over relational entity graphs (REGs) constructed from database schemas. On the RelBench benchmark the hybrid is evaluated on the single driver-dnf task from the rel-f1 dataset, where it reports a ROC-AUC of 67.40 that is competitive with LightGBM (68.86) and narrows the gap to RDL (72.62) while remaining below KumoRFM (82.63). The authors conclude that such lightweight LM-GNN hybrids constitute a resource-efficient route toward foundation models for relational databases.

Significance. If the reported enrichment of row embeddings generalizes, the work would supply a practical, lower-cost alternative to full-scale relational foundation models by reusing existing language-model encoders and adding only a modest GNN layer. The concrete numerical comparisons to public baselines on a fixed benchmark constitute a clear, falsifiable strength.

major comments (2)
  1. [Abstract] Abstract: the claim that the hybrid 'offers a promising and resource-efficient path towards foundation models' rests on the premise of generalization to arbitrary unseen databases and tasks, yet only a single-task result on driver-dnf / rel-f1 is presented; no cross-database transfer, multi-task pre-training protocol, or ablation isolating the contribution of the REG structure versus simple feature concatenation is reported.
  2. [Abstract] Abstract: the competitiveness statement (67.40 vs. LightGBM 68.86 and RDL 72.62) cannot be assessed without details on training procedure, hyperparameter search, statistical significance, or confirmation that all baselines used identical data splits; these omissions are load-bearing for the central empirical claim.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'substantially enriches BART's row embeddings' is used without a quantitative baseline (e.g., BART-only performance on the same task).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the hybrid 'offers a promising and resource-efficient path towards foundation models' rests on the premise of generalization to arbitrary unseen databases and tasks, yet only a single-task result on driver-dnf / rel-f1 is presented; no cross-database transfer, multi-task pre-training protocol, or ablation isolating the contribution of the REG structure versus simple feature concatenation is reported.

    Authors: We agree that the current evaluation is restricted to a single task and that stronger claims about generalization would require additional experiments such as cross-database transfer or multi-task pre-training. The manuscript presents this result as an initial demonstration of the hybrid architecture's feasibility rather than a comprehensive validation of foundation-model capabilities. In the revised version we will tone down the abstract claim to emphasize the preliminary nature of the findings, add a dedicated limitations paragraph, and include a brief conceptual argument (supported by the architecture description) that the REG structure enables relational message passing that simple feature concatenation cannot replicate. We will also outline planned future work on broader transfer protocols. revision: partial

  2. Referee: [Abstract] Abstract: the competitiveness statement (67.40 vs. LightGBM 68.86 and RDL 72.62) cannot be assessed without details on training procedure, hyperparameter search, statistical significance, or confirmation that all baselines used identical data splits; these omissions are load-bearing for the central empirical claim.

    Authors: We acknowledge that the original manuscript omitted several reproducibility details required to fully evaluate the reported numbers. In the revised manuscript we will expand the experimental section to describe the training procedure, the hyperparameter search protocol, the use of multiple random seeds for statistical significance, and explicit confirmation that all baselines were run on the identical RelBench-provided data splits. These additions will allow readers to assess the competitiveness claim directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results are benchmarked externally

full rationale

The paper proposes a hybrid BART+GraphSAGE architecture and reports its ROC-AUC of 67.40 on the driver-dnf task from RelBench, directly comparing it to external baselines (LightGBM 68.86, RDL 72.62, KumoRFM 82.63). No equations, fitted parameters, or self-defined quantities are shown to reduce the reported performance metrics to the authors' own inputs by construction. The evaluation uses standard supervised metrics on a fixed public benchmark without any self-citation load-bearing the central claim or any renaming of known results as novel derivations. The architecture description and experimental outcomes remain self-contained against external references.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach rests on standard domain assumptions about graph modeling of databases and language-model semantics; no new entities are postulated and free parameters are the usual training hyperparameters whose values are not reported in the abstract.

free parameters (2)
  • BART fine-tuning schedule
    Learning rate, number of epochs, and batch size for adapting BART to relational rows are chosen to optimize the hybrid performance.
  • GraphSAGE aggregation and layer count
    Number of message-passing layers and choice of mean/max/pool aggregator are selected during model development.
axioms (2)
  • domain assumption Relational databases can be represented as relational entity graphs without loss of predictive signal.
    Invoked when the authors replace manual flattening with GNN message passing over REGs.
  • domain assumption Fine-tuned BART embeddings capture sufficient intra-row semantics for downstream relational tasks.
    Core premise of the hybrid design that allows the GNN to focus on inter-row context.

pith-pipeline@v0.9.0 · 5780 in / 1504 out tokens · 55804 ms · 2026-05-19T18:34:33.806852+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 3 internal anchors

  1. [1]

    M. Fey, W. Hu, K. Huang, J. E. Lenssen, R. Ranjan, J. Robinson, R. Ying, J. You, J. Leskovec, Position: Relational deep learning-graph representation learning on relational databases, in: Forty-first International Conference on Machine Learning, 2024. URL: https://proceedings.mlr.press/v235/fey 24a.html

  2. [2]

    V. P. Dwivedi, C. Kanatsoulis, S. Huang, J. Leskovec, Relational deep learning: Challenges, founda- tions and next-generation architectures, in: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, 2025, pp. 5999–6009. doi:10.1145/3711896.3736 558

  3. [3]

    Y. Wang, X. Wang, Q. Gan, M. Wang, Q. Yang, D. Wipf, M. Zhang, Griffin: Towards a graph-centric relational database foundation model, in: ICML, volume 267 ofProceedings of Machine Learning Research, PMLR, 2025, pp. 64604–64627. URL: https://proceedings.mlr.press/v267/wang25da.html

  4. [4]

    M. Fey, V. Kocijan, F. Lopez, J. E. Lenssen, J. Leskovec, KumoRFM: A Foundation Model for In- Context Learning on Relational Data, White Paper, Kumo AI, 2025. URL: https://kumo.ai/research /kumo_relational_foundation_model.pdf

  5. [5]

    Vogel, B

    L. Vogel, B. Hilprecht, C. Binnig, Towards foundation models for relational databases [vision paper], arXiv preprint arXiv:2305.15321 (2023). doi:10.48550/ARXIV.2305.15321

  6. [6]

    On the Opportunities and Risks of Foundation Models

    R. Bommasani, D. A. Hudson, et al., On the opportunities and risks of foundation models, CoRR abs/2108.07258 (2021). doi:10.48550/ARXIV.2108.07258

  7. [7]

    Brown, B

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in Neural Information Processing Systems 33 (2020) 1877–1901. URL: https://proceedings.neurips.cc/paper/2020/hash/1 457c0d6bfcb4967418bfb8ac142f64a-Abstract.html

  8. [8]

    Radford, J

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from natural language supervision, in: Proceedings of the 38th International Conference on Machine Learning (ICML), volume 139 ofProceedings of Machine Learning Research, 2021, pp. 8748...

  9. [9]

    H. Zhou, L. Halilaj, S. Monka, S. Schmid, Y. Zhu, J. Wu, N. Nazer, S. Staab, Seeing and knowing in the wild: Open-domain visual entity recognition with large-scale knowledge graphs via contrastive learning, in: AAAI, AAAI Press, 2026, pp. 13638–13646. doi:10.1609/AAAI.V40I16.38370

  10. [10]

    Ericsson, H

    L. Ericsson, H. Gouk, C. C. Loy, T. M. Hospedales, Self-supervised representation learning: Introduction, advances, and challenges, IEEE Signal Process. Mag. 39 (2022) 42–62. doi: 10.1109/ MSP.2021.3134634

  11. [11]

    2021, IEEE Proceedings, 109, 43, doi: 10.1109/JPROC.2020.3004555

    F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, Q. He, A comprehensive survey on transfer learning, Proc. IEEE 109 (2021) 43–76. doi:10.1109/JPROC.2020.3004555

  12. [12]

    T. Chen, C. Guestrin, XGBoost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016, pp. 785–794. doi:10.1145/2939672.2939785

  13. [13]

    Borisov, T

    V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, G. Kasneci, Deep neural networks and tabular data: A survey, IEEE Transactions on Neural Networks and Learning Systems 35 (2022) 7499–7519. doi:10.1109/TNNLS.2022.3229161

  14. [14]

    Grinsztajn, E

    L. Grinsztajn, E. Oyallon, G. Varoquaux, Why do tree-based models still outperform deep learning on typical tabular data?, Advances in Neural Information Processing Systems 35 (2022) 507–520. URL: http://papers.nips.cc/paper_files/paper/2022/hash/0378c7692da36807bdec87ab043cdadc-A bstract-Datasets_and_Benchmarks.html

  15. [15]

    TabTransformer: Tabular Data Modeling Using Contextual Embeddings

    X. Huang, A. Khetan, M. Cvitkovic, Z. Karnin, Tabtransformer: Tabular data modeling using contextual embeddings, arXiv preprint arXiv:2012.06678 (2020). doi: 10.48550/ARXIV.2012.06 678

  16. [16]

    Gorishniy, I

    Y. Gorishniy, I. Rubachev, V. Khrulkov, A. Babenko, Revisiting deep learning models for tabular data, Advances in Neural Information Processing Systems 34 (2021) 18932–18943. URL: https: //proceedings.neurips.cc/paper/2021/hash/9d86d83f925f2149e9edb0ac3b49229c-Abstract.html

  17. [17]

    SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training.arXiv preprint arXiv:2106.01342,

    G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, T. Goldstein, Saint: Improved neu- ral networks for tabular data via row attention and contrastive pre-training, arXiv preprint arXiv:2106.01342 (2021). doi:10.48550/ARXIV.2106.01342

  18. [18]

    Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025

    N. Hollmann, S. Müller, L. Purucker, A. Krishnakumar, M. Körfer, S. B. Hoo, R. T. Schirrmeister, F. Hutter, Accurate predictions on small data with a tabular foundation model, Nature 637 (2025) 319–326. doi:10.1038/s41586-024-08328-6

  19. [19]

    doi: 10.18653/v1/2020.acl-main.703

    M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer, BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, in: ACL, Association for Computational Linguistics, 2020, pp. 7871–7880. doi:10.18653/V1/2020.ACL-MAIN.703

  20. [20]

    P. Yin, G. Neubig, W. Yih, S. Riedel, Tabert: Pretraining for joint understanding of textual and tabular data, in: ACL, Association for Computational Linguistics, 2020, pp. 8413–8426. doi:10.18653/V1/2020.ACL-MAIN.745

  21. [21]

    X. Deng, H. Sun, A. Lees, Y. Wu, C. Yu, TURL: Table understanding through representation learning, Proceedings of the VLDB Endowment 14 (2020) 307–319. doi:10.14778/3430915.3430921

  22. [22]

    N. Tang, J. Fan, F. Li, J. Tu, X. Du, G. Li, S. Madden, M. Ouzzani, RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation, Proc. VLDB Endow. 14 (2021) 1254–1261. doi:10.14778/3457390.3457391

  23. [23]

    T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: 5th International Conference on Learning Representations (ICLR), 2017. URL: https://openreview.net /forum?id=SJU4ayYgl

  24. [24]

    Z. Ding, J. Wu, J. Wu, Y. Xia, B. Xiong, V. Tresp, Temporal fact reasoning over hyper-relational knowledge graphs, in: EMNLP (Findings), Findings of ACL, Association for Computational Linguistics, 2024, pp. 355–373. doi:10.18653/V1/2024.FINDINGS-EMNLP.20

  25. [25]

    Hamilton, Z

    W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, Advances in Neural Information Processing Systems 30 (2017). URL: https://proceedings.neurips.cc/paper/201 7/hash/5dd9db5e033da9c6fb5ba83c7a7ebea9-Abstract.html

  26. [26]

    Z. Hou, X. Liu, Y. Cen, Y. Dong, H. Yang, C. Wang, J. Tang, Graphmae: Self-supervised masked graph autoencoders, in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 594–604. doi:10.1145/3534678.3539321

  27. [27]

    V. P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y. Bengio, X. Bresson, Benchmarking graph neural networks, Journal of Machine Learning Research 24 (2023) 1–48. URL: https://jmlr.org/papers/v2 4/22-0567.html

  28. [28]

    M. Fey, J. E. Lenssen, Fast graph representation learning with pytorch geometric, CoRR abs/1903.02428 (2019). doi:10.48550/ARXIV.1903.02428

  29. [29]

    Robinson, R

    J. Robinson, R. Ranjan, W. Hu, K. Huang, J. Han, A. Dobles, M. Fey, J. E. Lenssen, Y. Yuan, Z. Zhang, et al., Relbench: A benchmark for deep learning on relational databases, Advances in Neural Information Processing Systems 37 (2024) 21330–21341. URL: http://papers.nips.cc/paper_files/p aper/2024/hash/25cd345233c65fac1fec0ce61d0f7836-Abstract-Datasets_an...

  30. [30]

    Zhang, D

    X. Zhang, D. Song, D. Tao, Continual learning on graphs: Challenges, solutions, and opportunities, arXiv preprint arXiv:2402.11565 (2024). doi:10.48550/ARXIV.2402.11565

  31. [31]

    G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.-Y. Liu, Lightgbm: A highly efficient gradient boosting decision tree, Advances in Neural Information Processing Systems 30 (2017). URL: https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abs tract.html

  32. [32]

    van der Maaten, G

    L. van der Maaten, G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research 9 (2008) 2579–2605. URL: https://www.jmlr.org/papers/v9/vandermaaten08a.html

  33. [33]

    Y. Zhu, N. Potyka, M. Nayyeri, B. Xiong, Y. He, E. Kharlamov, S. Staab, Predictive multiplicity of knowledge graph embeddings in link prediction, in: EMNLP (Findings), Findings of ACL, Association for Computational Linguistics, 2024, pp. 334–354. doi:10.18653/V1/2024.FINDI NGS-EMNLP.19

  34. [34]

    Y. Zhu, J. Wu, Y. Wang, H. Zhou, J. Chen, E. Kharlamov, S. Staab, Certainty in uncertainty: Reasoning over uncertain knowledge graphs with statistical guarantees, in: EMNLP, Association for Computational Linguistics, 2025, pp. 8730–8752. doi:10.18653/V1/2025.EMNLP-MAIN.441