{"paper":{"title":"Towards Foundation Models for Relational Databases with Language Models and Graph Neural Networks","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A hybrid of fine-tuned BART and GraphSAGE on relational entity graphs enriches embeddings and competes with supervised baselines for relational database tasks.","cross_cats":["cs.AI"],"primary_cat":"cs.DB","authors_text":"Fabian Leeske, Jingcheng Wu, Lucas Etteldorf, Max Finkenbeiner, Mojtaba Nayyeri, Ratan Bahadur Thapa, Steffen Staab","submitted_at":"2026-05-15T15:46:38Z","abstract_excerpt":"Relational databases store much of the world's structured information, and they are essential for driving complex predictive applications. However, deep learning progress on relational data remains limited, as conventional approaches flatten databases into single tables via manual feature engineering, discarding relational context. Relational deep learning (RDL) addresses this by modeling databases as relational entity graphs (REGs) for graph neural networks (GNNs), but remains task- and database-specific. To combine the strengths of both paradigms, we propose a hybrid architecture combining a"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This performance is competitive with supervised baselines such as LightGBM (68.86) and narrows the gap to RDL (72.62) to within 5.22 points, though a substantial gap remains to state-of-the-art foundation models such as KumoRFM (82.63).","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the specific hybrid of fine-tuned BART plus GraphSAGE on relational entity graphs will generalize to arbitrary unseen databases and tasks sufficiently to serve as a foundation model, rather than remaining competitive only on the tested RelBench subset.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A BART-GraphSAGE hybrid achieves ROC-AUC 67.40 on one RelBench task, competitive with LightGBM but still behind specialized relational deep learning and foundation models.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A hybrid of fine-tuned BART and GraphSAGE on relational entity graphs enriches embeddings and competes with supervised baselines for relational database tasks.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"2151a3de4660a90363f0b7b92ec9d92fa3ceabd73feb6e6c3745e4ce7ec732f1"},"source":{"id":"2605.16085","kind":"arxiv","version":1},"verdict":{"id":"e03f2f8f-1de9-472a-983f-cba38d4fcc68","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T18:34:33.806852Z","strongest_claim":"Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This performance is competitive with supervised baselines such as LightGBM (68.86) and narrows the gap to RDL (72.62) to within 5.22 points, though a substantial gap remains to state-of-the-art foundation models such as KumoRFM (82.63).","one_line_summary":"A BART-GraphSAGE hybrid achieves ROC-AUC 67.40 on one RelBench task, competitive with LightGBM but still behind specialized relational deep learning and foundation models.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the specific hybrid of fine-tuned BART plus GraphSAGE on relational entity graphs will generalize to arbitrary unseen databases and tasks sufficiently to serve as a foundation model, rather than remaining competitive only on the tested RelBench subset.","pith_extraction_headline":"A hybrid of fine-tuned BART and GraphSAGE on relational entity graphs enriches embeddings and competes with supervised baselines for relational database tasks."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.16085/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T19:01:18.968648Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T18:41:13.337619Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T17:33:41.534341Z","status":"skipped","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T16:41:55.501321Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"4a5214b351ba6e0415bb58ec44b0d3081f6d81f89955acddbf1c2ab844a178f4"},"references":{"count":34,"sample":[{"doi":"","year":2024,"title":"M. Fey, W. Hu, K. Huang, J. E. Lenssen, R. Ranjan, J. Robinson, R. Ying, J. You, J. Leskovec, Position: Relational deep learning-graph representation learning on relational databases, in: Forty-first ","work_id":"26530017-1cf5-46f5-82b4-19eeaff4404a","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1145/3711896.3736","year":2025,"title":"V. P. Dwivedi, C. Kanatsoulis, S. Huang, J. Leskovec, Relational deep learning: Challenges, founda- tions and next-generation architectures, in: Proceedings of the 31st ACM SIGKDD Conference on Knowle","work_id":"9697288d-9d2e-4815-9055-003b6b28e703","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Y. Wang, X. Wang, Q. Gan, M. Wang, Q. Yang, D. Wipf, M. Zhang, Griffin: Towards a graph-centric relational database foundation model, in: ICML, volume 267 ofProceedings of Machine Learning Research, P","work_id":"38571c97-cdff-42e5-a6a6-0f79e8790f7e","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"M. Fey, V. Kocijan, F. Lopez, J. E. Lenssen, J. Leskovec, KumoRFM: A Foundation Model for In- Context Learning on Relational Data, White Paper, Kumo AI, 2025. URL: https://kumo.ai/research /kumo_relat","work_id":"c707d78a-cd5c-40cf-810a-9c8d7de7b977","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.48550/arxiv.2305.15321","year":2023,"title":"L. Vogel, B. Hilprecht, C. Binnig, Towards foundation models for relational databases [vision paper], arXiv preprint arXiv:2305.15321 (2023). doi:10.48550/ARXIV.2305.15321","work_id":"cc116f9f-d0d4-484d-a32a-36ac3f141bb2","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":34,"snapshot_sha256":"8cee8d7eb573399c21b193d3b7f3b52fc2be542cf8494aa8174d9bb791850a3b","internal_anchors":3},"formal_canon":{"evidence_count":2,"snapshot_sha256":"2f7f70efaae5c5358a8bc99c5c5d5355a033cf7ab73149d0febb4355766f3616"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}