Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks
Pith reviewed 2026-05-19 18:34 UTC · model grok-4.3
The pith
A hybrid of fine-tuned BART and GraphSAGE on relational entity graphs enriches embeddings and competes with supervised baselines for relational database tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over REGs to inject relational context. Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This performance is competitive with supervised baselines such as LightGBM (68.86) and narrows the gap to RDL (72.62) to within 5.22 points, though a substantial gap remains to state-of-the-art foundation models such as KumoRFM (82.63).
What carries the argument
Hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over relational entity graphs (REGs) to inject relational context.
Load-bearing premise
That the specific hybrid of fine-tuned BART plus GraphSAGE on relational entity graphs will generalize to arbitrary unseen databases and tasks sufficiently to serve as a foundation model, rather than remaining competitive only on the tested RelBench subset.
What would settle it
Evaluating the hybrid model on a completely new relational database and task outside the RelBench benchmark to check if performance remains competitive without additional fine-tuning.
Figures
read the original abstract
Relational databases store much of the world's structured information, and they are essential for driving complex predictive applications. However, deep learning progress on relational data remains limited, as conventional approaches flatten databases into single tables via manual feature engineering, discarding relational context. Relational deep learning (RDL) addresses this by modeling databases as relational entity graphs (REGs) for graph neural networks (GNNs), but remains task- and database-specific. To combine the strengths of both paradigms, we propose a hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over REGs to inject relational context. Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This performance is competitive with supervised baselines such as LightGBM (68.86) and narrows the gap to RDL (72.62) to within 5.22 points, though a substantial gap remains to state-of-the-art foundation models such as KumoRFM (82.63). These results suggest that lightweight hybrid LM-GNN architectures offer a promising and resource-efficient path towards foundation models for relational databases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a hybrid architecture that pairs a fine-tuned BART encoder for intra-row textual semantics with a GraphSAGE GNN operating over relational entity graphs (REGs) constructed from database schemas. On the RelBench benchmark the hybrid is evaluated on the single driver-dnf task from the rel-f1 dataset, where it reports a ROC-AUC of 67.40 that is competitive with LightGBM (68.86) and narrows the gap to RDL (72.62) while remaining below KumoRFM (82.63). The authors conclude that such lightweight LM-GNN hybrids constitute a resource-efficient route toward foundation models for relational databases.
Significance. If the reported enrichment of row embeddings generalizes, the work would supply a practical, lower-cost alternative to full-scale relational foundation models by reusing existing language-model encoders and adding only a modest GNN layer. The concrete numerical comparisons to public baselines on a fixed benchmark constitute a clear, falsifiable strength.
major comments (2)
- [Abstract] Abstract: the claim that the hybrid 'offers a promising and resource-efficient path towards foundation models' rests on the premise of generalization to arbitrary unseen databases and tasks, yet only a single-task result on driver-dnf / rel-f1 is presented; no cross-database transfer, multi-task pre-training protocol, or ablation isolating the contribution of the REG structure versus simple feature concatenation is reported.
- [Abstract] Abstract: the competitiveness statement (67.40 vs. LightGBM 68.86 and RDL 72.62) cannot be assessed without details on training procedure, hyperparameter search, statistical significance, or confirmation that all baselines used identical data splits; these omissions are load-bearing for the central empirical claim.
minor comments (1)
- [Abstract] Abstract: the phrase 'substantially enriches BART's row embeddings' is used without a quantitative baseline (e.g., BART-only performance on the same task).
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments. We address each major point below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the hybrid 'offers a promising and resource-efficient path towards foundation models' rests on the premise of generalization to arbitrary unseen databases and tasks, yet only a single-task result on driver-dnf / rel-f1 is presented; no cross-database transfer, multi-task pre-training protocol, or ablation isolating the contribution of the REG structure versus simple feature concatenation is reported.
Authors: We agree that the current evaluation is restricted to a single task and that stronger claims about generalization would require additional experiments such as cross-database transfer or multi-task pre-training. The manuscript presents this result as an initial demonstration of the hybrid architecture's feasibility rather than a comprehensive validation of foundation-model capabilities. In the revised version we will tone down the abstract claim to emphasize the preliminary nature of the findings, add a dedicated limitations paragraph, and include a brief conceptual argument (supported by the architecture description) that the REG structure enables relational message passing that simple feature concatenation cannot replicate. We will also outline planned future work on broader transfer protocols. revision: partial
-
Referee: [Abstract] Abstract: the competitiveness statement (67.40 vs. LightGBM 68.86 and RDL 72.62) cannot be assessed without details on training procedure, hyperparameter search, statistical significance, or confirmation that all baselines used identical data splits; these omissions are load-bearing for the central empirical claim.
Authors: We acknowledge that the original manuscript omitted several reproducibility details required to fully evaluate the reported numbers. In the revised manuscript we will expand the experimental section to describe the training procedure, the hyperparameter search protocol, the use of multiple random seeds for statistical significance, and explicit confirmation that all baselines were run on the identical RelBench-provided data splits. These additions will allow readers to assess the competitiveness claim directly. revision: yes
Circularity Check
No significant circularity; empirical results are benchmarked externally
full rationale
The paper proposes a hybrid BART+GraphSAGE architecture and reports its ROC-AUC of 67.40 on the driver-dnf task from RelBench, directly comparing it to external baselines (LightGBM 68.86, RDL 72.62, KumoRFM 82.63). No equations, fitted parameters, or self-defined quantities are shown to reduce the reported performance metrics to the authors' own inputs by construction. The evaluation uses standard supervised metrics on a fixed public benchmark without any self-citation load-bearing the central claim or any renaming of known results as novel derivations. The architecture description and experimental outcomes remain self-contained against external references.
Axiom & Free-Parameter Ledger
free parameters (2)
- BART fine-tuning schedule
- GraphSAGE aggregation and layer count
axioms (2)
- domain assumption Relational databases can be represented as relational entity graphs without loss of predictive signal.
- domain assumption Fine-tuned BART embeddings capture sufficient intra-row semantics for downstream relational tasks.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over REGs to inject relational context
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
masked value reconstruction objective... scaled cosine error + MSE
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. Fey, W. Hu, K. Huang, J. E. Lenssen, R. Ranjan, J. Robinson, R. Ying, J. You, J. Leskovec, Position: Relational deep learning-graph representation learning on relational databases, in: Forty-first International Conference on Machine Learning, 2024. URL: https://proceedings.mlr.press/v235/fey 24a.html
work page 2024
-
[2]
V. P. Dwivedi, C. Kanatsoulis, S. Huang, J. Leskovec, Relational deep learning: Challenges, founda- tions and next-generation architectures, in: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, 2025, pp. 5999–6009. doi:10.1145/3711896.3736 558
-
[3]
Y. Wang, X. Wang, Q. Gan, M. Wang, Q. Yang, D. Wipf, M. Zhang, Griffin: Towards a graph-centric relational database foundation model, in: ICML, volume 267 ofProceedings of Machine Learning Research, PMLR, 2025, pp. 64604–64627. URL: https://proceedings.mlr.press/v267/wang25da.html
work page 2025
-
[4]
M. Fey, V. Kocijan, F. Lopez, J. E. Lenssen, J. Leskovec, KumoRFM: A Foundation Model for In- Context Learning on Relational Data, White Paper, Kumo AI, 2025. URL: https://kumo.ai/research /kumo_relational_foundation_model.pdf
work page 2025
-
[5]
L. Vogel, B. Hilprecht, C. Binnig, Towards foundation models for relational databases [vision paper], arXiv preprint arXiv:2305.15321 (2023). doi:10.48550/ARXIV.2305.15321
-
[6]
On the Opportunities and Risks of Foundation Models
R. Bommasani, D. A. Hudson, et al., On the opportunities and risks of foundation models, CoRR abs/2108.07258 (2021). doi:10.48550/ARXIV.2108.07258
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2108.07258 2021
-
[7]
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, Advances in Neural Information Processing Systems 33 (2020) 1877–1901. URL: https://proceedings.neurips.cc/paper/2020/hash/1 457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
work page 2020
-
[8]
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from natural language supervision, in: Proceedings of the 38th International Conference on Machine Learning (ICML), volume 139 ofProceedings of Machine Learning Research, 2021, pp. 8748...
work page 2021
-
[9]
H. Zhou, L. Halilaj, S. Monka, S. Schmid, Y. Zhu, J. Wu, N. Nazer, S. Staab, Seeing and knowing in the wild: Open-domain visual entity recognition with large-scale knowledge graphs via contrastive learning, in: AAAI, AAAI Press, 2026, pp. 13638–13646. doi:10.1609/AAAI.V40I16.38370
-
[10]
L. Ericsson, H. Gouk, C. C. Loy, T. M. Hospedales, Self-supervised representation learning: Introduction, advances, and challenges, IEEE Signal Process. Mag. 39 (2022) 42–62. doi: 10.1109/ MSP.2021.3134634
-
[11]
2021, IEEE Proceedings, 109, 43, doi: 10.1109/JPROC.2020.3004555
F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, Q. He, A comprehensive survey on transfer learning, Proc. IEEE 109 (2021) 43–76. doi:10.1109/JPROC.2020.3004555
-
[12]
T. Chen, C. Guestrin, XGBoost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016, pp. 785–794. doi:10.1145/2939672.2939785
-
[13]
V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, G. Kasneci, Deep neural networks and tabular data: A survey, IEEE Transactions on Neural Networks and Learning Systems 35 (2022) 7499–7519. doi:10.1109/TNNLS.2022.3229161
-
[14]
L. Grinsztajn, E. Oyallon, G. Varoquaux, Why do tree-based models still outperform deep learning on typical tabular data?, Advances in Neural Information Processing Systems 35 (2022) 507–520. URL: http://papers.nips.cc/paper_files/paper/2022/hash/0378c7692da36807bdec87ab043cdadc-A bstract-Datasets_and_Benchmarks.html
work page 2022
-
[15]
TabTransformer: Tabular Data Modeling Using Contextual Embeddings
X. Huang, A. Khetan, M. Cvitkovic, Z. Karnin, Tabtransformer: Tabular data modeling using contextual embeddings, arXiv preprint arXiv:2012.06678 (2020). doi: 10.48550/ARXIV.2012.06 678
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2012.06 2012
-
[16]
Y. Gorishniy, I. Rubachev, V. Khrulkov, A. Babenko, Revisiting deep learning models for tabular data, Advances in Neural Information Processing Systems 34 (2021) 18932–18943. URL: https: //proceedings.neurips.cc/paper/2021/hash/9d86d83f925f2149e9edb0ac3b49229c-Abstract.html
work page 2021
-
[17]
G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, T. Goldstein, Saint: Improved neu- ral networks for tabular data via row attention and contrastive pre-training, arXiv preprint arXiv:2106.01342 (2021). doi:10.48550/ARXIV.2106.01342
-
[18]
Accurate predictions on small data with a tab- ular foundation model.Nature, 637(8045):319–326, 2025
N. Hollmann, S. Müller, L. Purucker, A. Krishnakumar, M. Körfer, S. B. Hoo, R. T. Schirrmeister, F. Hutter, Accurate predictions on small data with a tabular foundation model, Nature 637 (2025) 319–326. doi:10.1038/s41586-024-08328-6
-
[19]
doi: 10.18653/v1/2020.acl-main.703
M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer, BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, in: ACL, Association for Computational Linguistics, 2020, pp. 7871–7880. doi:10.18653/V1/2020.ACL-MAIN.703
-
[20]
P. Yin, G. Neubig, W. Yih, S. Riedel, Tabert: Pretraining for joint understanding of textual and tabular data, in: ACL, Association for Computational Linguistics, 2020, pp. 8413–8426. doi:10.18653/V1/2020.ACL-MAIN.745
-
[21]
X. Deng, H. Sun, A. Lees, Y. Wu, C. Yu, TURL: Table understanding through representation learning, Proceedings of the VLDB Endowment 14 (2020) 307–319. doi:10.14778/3430915.3430921
-
[22]
N. Tang, J. Fan, F. Li, J. Tu, X. Du, G. Li, S. Madden, M. Ouzzani, RPT: relational pre-trained transformer is almost all you need towards democratizing data preparation, Proc. VLDB Endow. 14 (2021) 1254–1261. doi:10.14778/3457390.3457391
-
[23]
T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: 5th International Conference on Learning Representations (ICLR), 2017. URL: https://openreview.net /forum?id=SJU4ayYgl
work page 2017
-
[24]
Z. Ding, J. Wu, J. Wu, Y. Xia, B. Xiong, V. Tresp, Temporal fact reasoning over hyper-relational knowledge graphs, in: EMNLP (Findings), Findings of ACL, Association for Computational Linguistics, 2024, pp. 355–373. doi:10.18653/V1/2024.FINDINGS-EMNLP.20
-
[25]
W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, Advances in Neural Information Processing Systems 30 (2017). URL: https://proceedings.neurips.cc/paper/201 7/hash/5dd9db5e033da9c6fb5ba83c7a7ebea9-Abstract.html
work page 2017
-
[26]
Z. Hou, X. Liu, Y. Cen, Y. Dong, H. Yang, C. Wang, J. Tang, Graphmae: Self-supervised masked graph autoencoders, in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 594–604. doi:10.1145/3534678.3539321
-
[27]
V. P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y. Bengio, X. Bresson, Benchmarking graph neural networks, Journal of Machine Learning Research 24 (2023) 1–48. URL: https://jmlr.org/papers/v2 4/22-0567.html
work page 2023
-
[28]
M. Fey, J. E. Lenssen, Fast graph representation learning with pytorch geometric, CoRR abs/1903.02428 (2019). doi:10.48550/ARXIV.1903.02428
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1903.02428 1903
-
[29]
J. Robinson, R. Ranjan, W. Hu, K. Huang, J. Han, A. Dobles, M. Fey, J. E. Lenssen, Y. Yuan, Z. Zhang, et al., Relbench: A benchmark for deep learning on relational databases, Advances in Neural Information Processing Systems 37 (2024) 21330–21341. URL: http://papers.nips.cc/paper_files/p aper/2024/hash/25cd345233c65fac1fec0ce61d0f7836-Abstract-Datasets_an...
work page 2024
-
[30]
X. Zhang, D. Song, D. Tao, Continual learning on graphs: Challenges, solutions, and opportunities, arXiv preprint arXiv:2402.11565 (2024). doi:10.48550/ARXIV.2402.11565
-
[31]
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.-Y. Liu, Lightgbm: A highly efficient gradient boosting decision tree, Advances in Neural Information Processing Systems 30 (2017). URL: https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abs tract.html
work page 2017
-
[32]
L. van der Maaten, G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning Research 9 (2008) 2579–2605. URL: https://www.jmlr.org/papers/v9/vandermaaten08a.html
work page 2008
-
[33]
Y. Zhu, N. Potyka, M. Nayyeri, B. Xiong, Y. He, E. Kharlamov, S. Staab, Predictive multiplicity of knowledge graph embeddings in link prediction, in: EMNLP (Findings), Findings of ACL, Association for Computational Linguistics, 2024, pp. 334–354. doi:10.18653/V1/2024.FINDI NGS-EMNLP.19
-
[34]
Y. Zhu, J. Wu, Y. Wang, H. Zhou, J. Chen, E. Kharlamov, S. Staab, Certainty in uncertainty: Reasoning over uncertain knowledge graphs with statistical guarantees, in: EMNLP, Association for Computational Linguistics, 2025, pp. 8730–8752. doi:10.18653/V1/2025.EMNLP-MAIN.441
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.