A Fair Evaluation of Graph Foundation Models for Node Property Prediction

Dmitry Eremeev; Gleb Bazhenov; Liudmila Prokhorenkova; Oleg Platonov

arxiv: 2606.24509 · v1 · pith:EBHIHC5Gnew · submitted 2026-06-23 · 💻 cs.LG · cs.AI· cs.SI

A Fair Evaluation of Graph Foundation Models for Node Property Prediction

Oleg Platonov , Gleb Bazhenov , Dmitry Eremeev , Liudmila Prokhorenkova This is my paper

Pith reviewed 2026-06-26 00:48 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.SI

keywords graph foundation modelsnode property predictiongraph neural networksprior-data fitted networksmodel evaluationpredictive performanceinference cost

0 comments

The pith

Only the most recent Graph Foundation Models based on Prior-data Fitted Networks outperform well-tuned Graph Neural Networks on node property prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper conducts a unified reevaluation of nine recent Graph Foundation Models against strong Graph Neural Network baselines for node property prediction. It shows that only the newest models following the Prior-data Fitted Networks paradigm achieve better predictive accuracy than the baselines. Earlier GFMs do not surpass the GNNs under consistent conditions. The work corrects for varying evaluation practices that had made direct comparisons unreliable across studies. Readers care because these models target practical tasks such as fraud detection and recommendation systems, where knowing which approaches actually advance performance guides deployment decisions.

Core claim

In a fair and rigorous reevaluation of 9 recent GFMs for node property prediction, only the most recent ones based on the Prior-data Fitted Networks paradigm outperform well-tuned GNNs in predictive performance, although at a higher inference cost.

What carries the argument

A unified evaluation setting that standardizes datasets, protocols, and hyperparameter tuning to enable direct comparison of GFMs against GNN baselines.

If this is right

Most GFMs proposed to date do not improve predictive performance over well-tuned GNNs when placed under identical evaluation conditions.
PFN-based GFMs deliver higher accuracy but incur greater inference cost than the GNN baselines.
Inconsistent evaluation practices have previously obscured which models represent genuine advances.
Applications in fraud detection and recommendation systems may adopt the top PFN models only where the added inference cost is acceptable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Model developers could explore hybrids that retain GNN efficiency while incorporating prior-data fitting to close the cost gap.
Claims of foundation-model superiority on graphs should be tested against standardized GNN baselines before being treated as settled.
Extending the protocol to larger or more heterogeneous graphs would test whether the observed performance ordering holds beyond the current datasets.

Load-bearing premise

The chosen evaluation protocols, datasets, and hyperparameter tuning procedures constitute a fair and rigorous unified setting that enables reliable comparison across GFMs and GNN baselines.

What would settle it

Re-running the same experiments with additional datasets or different tuning procedures where well-tuned GNNs match or exceed the top PFN-based GFMs would falsify the performance claim.

read the original abstract

Due to the wide use of graph-structured data in different fields of industry and science, the development of Graph Foundation Models (GFMs) has recently attracted a lot of attention. While many different types of models are called GFMs, particular interest has been paid to GFMs designed for node property prediction tasks, which is one of the most popular settings in Graph ML with lots of real-world applications from fraud detection in financial and social networks to recommendation systems for e-commerce and user-generated content platforms. While a number of GFMs for this task have been recently proposed, the field has not converged to a unified evaluation setting, and different works evaluate their models in widely different ways, preventing reliable comparison of GFMs with each other and with other types of models. In this work, we conduct a fair and rigorous reevaluation of 9 recent GFMs for node property prediction, comparing them to strong Graph Neural Network (GNN) baselines. We find that, among these GFMs, only the most recent ones based on the Prior-data Fitted Networks paradigm outperform well-tuned GNNs in predictive performance, although at a higher inference cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper runs a unified benchmark on nine GFMs for node prediction and reports that only the newest PFN-based ones beat tuned GNN baselines, at higher inference cost.

read the letter

The main thing to know is that after standardizing the setup, most GFMs for node property prediction do not beat well-tuned GNNs, with only the recent Prior-data Fitted Network variants coming out ahead.

The work is useful because it tackles the inconsistent evaluation practices the abstract flags. Running nine models plus strong GNN baselines under one protocol gives a clearer ordering than the scattered results in prior papers. That matters for tasks like fraud detection and recommendations where people actually need to decide what to deploy. Credit for including competitive GNN baselines instead of weak ones and for focusing on a practical task.

The soft spot is the reliance on the claim that the protocol is fair and rigorous. Without the exact hyperparameter search spaces, number of runs, dataset splits, or statistical tests visible in the abstract, it is hard to judge whether the GNNs were tuned as aggressively as the GFMs or if small protocol choices drive the result. The higher inference cost is noted but not quantified here, so the practical trade-off stays a bit vague. No load-bearing flaw jumps out from the stress test, and the empirical nature of the work avoids circularity.

This is the sort of paper a graph ML reading group should discuss to update their sense of where GFMs stand versus simpler methods. Practitioners choosing models would get direct value from the reported ordering.

It deserves peer review because it fills a real gap in evaluation consistency with replicable experiments, even if the tuning details will need close checking in revision.

Referee Report

0 major / 0 minor

Summary. The manuscript conducts a fair and rigorous reevaluation of 9 recent Graph Foundation Models (GFMs) for node property prediction tasks. It compares them to strong Graph Neural Network (GNN) baselines under a unified setting and reports that only the most recent GFMs based on the Prior-data Fitted Networks (PFN) paradigm outperform well-tuned GNNs, although at higher inference cost.

Significance. If the claimed unified evaluation protocol is sound and reproducible, the work would provide a much-needed standardized benchmark for GFMs versus GNNs. This addresses the current lack of convergence on evaluation settings in the field and could help clarify whether recent GFMs deliver practical advantages for node property prediction applications such as fraud detection and recommendation systems.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their summary recognizing the value of a unified evaluation for GFMs versus GNNs. The recommendation is listed as uncertain, but the report contains no specific major comments to address. We stand ready to provide further details on the protocol or results if that would resolve the uncertainty.

Circularity Check

0 steps flagged

No significant circularity in empirical benchmarking study

full rationale

This is a purely empirical reevaluation paper that compares existing GFMs to GNN baselines on node property prediction tasks using datasets and protocols. No mathematical derivations, first-principles predictions, fitted parameters renamed as outputs, or self-citation chains appear in the abstract or described content. The central claim rests on experimental results rather than any reduction of predictions to inputs by construction, satisfying the default expectation of no circularity for non-derivational work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical evaluation study; no free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 5744 in / 1025 out tokens · 19503 ms · 2026-06-26T00:48:57.202212+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 8 linked inside Pith

[1]

L., Kiros, J

Ba, J. L., Kiros, J. R., and Hinton, G. E. Layer normalization. arXiv preprint arXiv:1607.06450,

Pith/arXiv arXiv
[2]

A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosse- lut, A., Brunskill, E., et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258,

Pith/arXiv arXiv
[3]

Can TabPFN compete with gnns for node classification via graph tabularization?arXiv preprint arXiv:2512.08798,

Choi, J., Kang, W., Kim, M., Kim, J., and Park, N. Can TabPFN compete with gnns for node classification via graph tabularization?arXiv preprint arXiv:2512.08798,

arXiv
[4]

Personalized audiobook recommendations at Spotify through graph neural networks

De Nadai, M., Fabbri, F., Gigioli, P., Wang, A., Li, A., Sil- vestri, F., Kim, L., Lin, S., Radosavljevic, V ., Ghael, S., Nyhan, D., Bouchard, H., Lalmas, M., and Damianou, A. Personalized audiobook recommendations at Spotify through graph neural networks. InCompanion Proceed- ings of the ACM Web Conference 2024, pp. 403–412,

2024
[5]

Dong, K., Mao, H., Guo, Z., and Chawla, N. V . Universal link predictor by in-context learning on graphs.arXiv preprint arXiv:2402.07738,

arXiv
[6]

Turning tabular foundation models into graph foundation models.arXiv preprint arXiv:2508.20906,

Eremeev, D., Bazhenov, G., Platonov, O., Babenko, A., and Prokhorenkova, L. Turning tabular foundation models into graph foundation models.arXiv preprint arXiv:2508.20906,

Pith/arXiv arXiv
[7]

E., Ranjan, R., Robinson, J., Ying, R., You, J., and Leskovec, J

Fey, M., Hu, W., Huang, K., Lenssen, J. E., Ranjan, R., Robinson, J., Ying, R., You, J., and Leskovec, J. Rela- tional deep learning: Graph representation learning on relational databases.arXiv preprint arXiv:2312.04615,

arXiv
[8]

Towards foundation models on graphs: An analysis on cross-dataset transfer of pretrained GNNs.arXiv preprint arXiv:2412.17609,

Frasca, F., Jogl, F., Eliasof, M., Ostrovsky, M., Sch¨onlieb, C.-B., G¨artner, T., and Maron, H. Towards foundation models on graphs: An analysis on cross-dataset transfer of pretrained GNNs.arXiv preprint arXiv:2412.17609,

arXiv
[9]

B., B ¨uhler, M., Garg, A., Sa- faric, D., Robertson, J., J¨ager, B., Alessi, S., Hayler, A., Moroshan, V ., Purucker, L., Singer, P., Arazi, A., Siems, J., Metzen, J

Grinsztajn, L., Fl¨oge, K., Key, O., Birkel, F., Jund, P., Roof, B., Manium, M., Hoo, S. B., B ¨uhler, M., Garg, A., Sa- faric, D., Robertson, J., J¨ager, B., Alessi, S., Hayler, A., Moroshan, V ., Purucker, L., Singer, P., Arazi, A., Siems, J., Metzen, J. H., Grab, G., Erickson, N., Guo, S., Kalfon, E., Bing, S., Salinas, D., Cornu, C., Wehrhahn, L. C., ...

Pith/arXiv arXiv
[10]

I., Bronstein, M., and Finkelshtein, B

Hayler, A., Huang, X., Ceylan, I. I., Bronstein, M., and Finkelshtein, B. Bringing graphs to the table: Zero-shot node classification via tabular foundation models.arXiv preprint arXiv:2509.07143,

arXiv
[11]

Lipton, Z. C. and Steinhardt, J. Troubling trends in machine learning scholarship.arXiv preprint arXiv:1807.03341,

Pith/arXiv arXiv
[12]

and Prokhorenkova, L

Platonov, O. and Prokhorenkova, L. Cluster atten- tion for graph machine learning.arXiv preprint arXiv:2604.07492,

Pith/arXiv arXiv
[13]

On finetuning tabular foundation models.arXiv preprint arXiv:2506.08982,

Rubachev, I., Kotelnikov, A., Kartashev, N., and Babenko, A. On finetuning tabular foundation models.arXiv preprint arXiv:2506.08982,

arXiv
[14]

Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(56):1929–1958,

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(56):1929–1958,

1929
[15]

GraphBench: Next- generation graph learning benchmarking.arXiv preprint arXiv:2512.04475,

Stoll, T., Qian, C., Finkelshtein, B., Parviz, A., Weber, D., Frasca, F., Shavit, H., Siraudin, A., Mielke, A., Anastacio, M., M ¨uller, E., Bechler-Speicher, M., Bron- stein, M., Galkin, M., Hoos, H., Niepert, M., Perozzi, B., T ¨onshoff, J., and Morris, C. GraphBench: Next- generation graph learning benchmarking.arXiv preprint arXiv:2512.04475,

Pith/arXiv arXiv
[16]

Tree-structured parzen estimator: Understand- ing its algorithm components and their roles for better empirical performance.arXiv preprint arXiv:2304.11127,

Watanabe, S. Tree-structured parzen estimator: Understand- ing its algorithm components and their roles for better empirical performance.arXiv preprint arXiv:2304.11127,

Pith/arXiv arXiv
[17]

and Huang, C

Xia, L. and Huang, C. AnyGraph: Graph foundation model in the wild.arXiv preprint arXiv:2408.10700,

arXiv
[18]

OpenGraph: Towards open graph foundation models

Xia, L., Kao, B., and Huang, C. OpenGraph: Towards open graph foundation models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,

2024
[19]

SAMGPT: Text-free graph foundation model for multi- domain pre-training and cross-domain adaptation

Yu, X., Gong, Z., Zhou, C., Fang, Y ., and Zhang, H. SAMGPT: Text-free graph foundation model for multi- domain pre-training and cross-domain adaptation. In Proceedings of the ACM Web Conference 2025, pp. 1142– 1153,

2025
[20]

LimiX: Unleashing structured-data modeling capability for gen- eralist intelligence.arXiv preprint arXiv:2509.03505,

Zhang, X., Ren, G., Yu, H., Yuan, H., Wang, H., Li, J., Wu, J., Mo, L., Mao, L., Hao, M., Dai, N., Xu, R., Li, S., Zhang, T., He, Y ., Wang, Y ., Zhang, Y ., Xu, Z., Li, D., Gao, F., Zou, H., Liu, J., Liu, J., Xu, J., Cheng, K., Li, K., Zhou, L., Li, Q., Fan, S., Lin, X., Han, X., Li, X., Lu, Y ., Xue, Y ., Jiang, Y ., Wang, Z., Wang, Z., and Cui, P. Limi...

arXiv
[21]

8 A Fair Evaluation of Graph Foundation Models for Node Property Prediction A. Non-PFN-Based GFMs for Node Property Prediction The key challenge for GFMs dealing with node property prediction is that graphs can come from vastly different domains (e.g., social networks, co-purchasing networks, road networks). This implies that the graphs can have different...

2024
[22]

learning how to learn

under the name of Transformer Neural Processes). The idea of PFNs is to train models to make predictions on previously unseen datasets in a single forward pass. PFNs perform in-context learning (ICL): rather than updating the model parameters for each new dataset, they use the context provided as input to make predictions without per-dataset training. In ...

2017
[23]

Thus, these models handle graph-structured data natively instead of having to simplify it by converting to a table

— are significantly more involved: they create custom graph-native model architectures by augmenting standard PFN Transformers with graph neighborhood aggregation modules, design prior distributions over attributed graphs, and train these models on synthetic node property prediction datasets sampled from these prior distributions. Thus, these models handl...

2025
[24]

and Graph Foundation Models (Eremeev et al., 2025

2025
[25]

and normalizations (Ioffe & Szegedy, 2015; Ba et al., 2016), which often significantly improve the performance of neural models and are frequently used by the models being compared to GNNs. Further, from our extensive experience of training GNNs across a wide range of datasets, tasks, and settings, both for research and for industrial applications, we obs...

2015
[26]

with simple additive attention and Local Graph Transformer (LGT) (Shi et al., 2021; Platonov et al., 2023b) with scaled dot product attention (Vaswani et al.,

2021
[27]

The official implementation of GNNs from Platonov et al

(note that this Graph Transformer variant only allows each node to attend to its neighbors — we specifically refer to it as a Local Graph Transformer following Platonov & Prokhorenkova (2026) to distinguish it from more common Global Graph Transformers that allow each node to attend to any other node unconstrained by the graph structure). The official imp...

2026
[28]

in the MPNNs class from the official codebase of Luo et al. (2024). In our experiments in Section 3, we observe that the GNN implementations from Luo et al. (2024) typically perform better for the simpler GCN and GraphSAGE models, while the GNN implementations from Platonov et al. (2023b) typically perform better for the more complex attention-based GAT a...

2024
[29]

Further, we also observe that the GNN implementations from Platonov et al

is the strongest of the GNNs in our experiments, achieving the fifth place among all the considered models according to both average rank and average normalized score aggregated metrics (and the third and the second places on the subset of only node classification datasets according to average rank and average normalized score aggregated metrics, respecti...

2024
[30]

and GraphBench (Stoll et al., 2025). While GraphBench mostly aims at widening the scope of Graph ML tasks and does not present many classic node property prediction tasks, GraphLand aims exactly at node property prediction and provides a diverse collection of datasets representing real-world industrial applications of this task. We believe GraphLand can b...

2025
[31]

For each of the considered GNNs, we search over all hyperparameters available in the corresponding official implementation

provided in the Optuna library (Akiba et al., 2019). For each of the considered GNNs, we search over all hyperparameters available in the corresponding official implementation. The complete hyperparameter search spaces are provided in Appendix I. We train all GNNs with the AdamW optimizer (Kingma & Ba, 2015; Loshchilov & Hutter,

2019
[32]

GFMs can support the in-context learning (ICL) regime, the fine-tuning (FT) regime, or both

for a maximum of3000steps using early stopping based on the validation set performance with a patience of1000steps. GFMs can support the in-context learning (ICL) regime, the fine-tuning (FT) regime, or both. One of the benefits of the ICL regime is that it does not require hyperparameter optimization, as the pretrained model is used as is without weight ...

2019
[33]

Note that for GNNs, the models with the best hyperparameters can be widely different in size across GNN types and datasets

We can see that inference-time ensembling almost always improves the predictive performance of the 12 A Fair Evaluation of Graph Foundation Models for Node Property Prediction Table 3.The memory (VRAM, in GB) required for: a single training run with the best hyperparameters (Tr), a single inference run with the best hyperparameters (Inf). Note that for GN...

2024

[1] [1]

L., Kiros, J

Ba, J. L., Kiros, J. R., and Hinton, G. E. Layer normalization. arXiv preprint arXiv:1607.06450,

Pith/arXiv arXiv

[2] [2]

A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosse- lut, A., Brunskill, E., et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258,

Pith/arXiv arXiv

[3] [3]

Can TabPFN compete with gnns for node classification via graph tabularization?arXiv preprint arXiv:2512.08798,

Choi, J., Kang, W., Kim, M., Kim, J., and Park, N. Can TabPFN compete with gnns for node classification via graph tabularization?arXiv preprint arXiv:2512.08798,

arXiv

[4] [4]

Personalized audiobook recommendations at Spotify through graph neural networks

De Nadai, M., Fabbri, F., Gigioli, P., Wang, A., Li, A., Sil- vestri, F., Kim, L., Lin, S., Radosavljevic, V ., Ghael, S., Nyhan, D., Bouchard, H., Lalmas, M., and Damianou, A. Personalized audiobook recommendations at Spotify through graph neural networks. InCompanion Proceed- ings of the ACM Web Conference 2024, pp. 403–412,

2024

[5] [5]

Dong, K., Mao, H., Guo, Z., and Chawla, N. V . Universal link predictor by in-context learning on graphs.arXiv preprint arXiv:2402.07738,

arXiv

[6] [6]

Turning tabular foundation models into graph foundation models.arXiv preprint arXiv:2508.20906,

Eremeev, D., Bazhenov, G., Platonov, O., Babenko, A., and Prokhorenkova, L. Turning tabular foundation models into graph foundation models.arXiv preprint arXiv:2508.20906,

Pith/arXiv arXiv

[7] [7]

E., Ranjan, R., Robinson, J., Ying, R., You, J., and Leskovec, J

Fey, M., Hu, W., Huang, K., Lenssen, J. E., Ranjan, R., Robinson, J., Ying, R., You, J., and Leskovec, J. Rela- tional deep learning: Graph representation learning on relational databases.arXiv preprint arXiv:2312.04615,

arXiv

[8] [8]

Towards foundation models on graphs: An analysis on cross-dataset transfer of pretrained GNNs.arXiv preprint arXiv:2412.17609,

Frasca, F., Jogl, F., Eliasof, M., Ostrovsky, M., Sch¨onlieb, C.-B., G¨artner, T., and Maron, H. Towards foundation models on graphs: An analysis on cross-dataset transfer of pretrained GNNs.arXiv preprint arXiv:2412.17609,

arXiv

[9] [9]

B., B ¨uhler, M., Garg, A., Sa- faric, D., Robertson, J., J¨ager, B., Alessi, S., Hayler, A., Moroshan, V ., Purucker, L., Singer, P., Arazi, A., Siems, J., Metzen, J

Grinsztajn, L., Fl¨oge, K., Key, O., Birkel, F., Jund, P., Roof, B., Manium, M., Hoo, S. B., B ¨uhler, M., Garg, A., Sa- faric, D., Robertson, J., J¨ager, B., Alessi, S., Hayler, A., Moroshan, V ., Purucker, L., Singer, P., Arazi, A., Siems, J., Metzen, J. H., Grab, G., Erickson, N., Guo, S., Kalfon, E., Bing, S., Salinas, D., Cornu, C., Wehrhahn, L. C., ...

Pith/arXiv arXiv

[10] [10]

I., Bronstein, M., and Finkelshtein, B

Hayler, A., Huang, X., Ceylan, I. I., Bronstein, M., and Finkelshtein, B. Bringing graphs to the table: Zero-shot node classification via tabular foundation models.arXiv preprint arXiv:2509.07143,

arXiv

[11] [11]

Lipton, Z. C. and Steinhardt, J. Troubling trends in machine learning scholarship.arXiv preprint arXiv:1807.03341,

Pith/arXiv arXiv

[12] [12]

and Prokhorenkova, L

Platonov, O. and Prokhorenkova, L. Cluster atten- tion for graph machine learning.arXiv preprint arXiv:2604.07492,

Pith/arXiv arXiv

[13] [13]

On finetuning tabular foundation models.arXiv preprint arXiv:2506.08982,

Rubachev, I., Kotelnikov, A., Kartashev, N., and Babenko, A. On finetuning tabular foundation models.arXiv preprint arXiv:2506.08982,

arXiv

[14] [14]

Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(56):1929–1958,

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting.Journal of Machine Learning Research, 15(56):1929–1958,

1929

[15] [15]

GraphBench: Next- generation graph learning benchmarking.arXiv preprint arXiv:2512.04475,

Stoll, T., Qian, C., Finkelshtein, B., Parviz, A., Weber, D., Frasca, F., Shavit, H., Siraudin, A., Mielke, A., Anastacio, M., M ¨uller, E., Bechler-Speicher, M., Bron- stein, M., Galkin, M., Hoos, H., Niepert, M., Perozzi, B., T ¨onshoff, J., and Morris, C. GraphBench: Next- generation graph learning benchmarking.arXiv preprint arXiv:2512.04475,

Pith/arXiv arXiv

[16] [16]

Tree-structured parzen estimator: Understand- ing its algorithm components and their roles for better empirical performance.arXiv preprint arXiv:2304.11127,

Watanabe, S. Tree-structured parzen estimator: Understand- ing its algorithm components and their roles for better empirical performance.arXiv preprint arXiv:2304.11127,

Pith/arXiv arXiv

[17] [17]

and Huang, C

Xia, L. and Huang, C. AnyGraph: Graph foundation model in the wild.arXiv preprint arXiv:2408.10700,

arXiv

[18] [18]

OpenGraph: Towards open graph foundation models

Xia, L., Kao, B., and Huang, C. OpenGraph: Towards open graph foundation models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,

2024

[19] [19]

SAMGPT: Text-free graph foundation model for multi- domain pre-training and cross-domain adaptation

Yu, X., Gong, Z., Zhou, C., Fang, Y ., and Zhang, H. SAMGPT: Text-free graph foundation model for multi- domain pre-training and cross-domain adaptation. In Proceedings of the ACM Web Conference 2025, pp. 1142– 1153,

2025

[20] [20]

LimiX: Unleashing structured-data modeling capability for gen- eralist intelligence.arXiv preprint arXiv:2509.03505,

Zhang, X., Ren, G., Yu, H., Yuan, H., Wang, H., Li, J., Wu, J., Mo, L., Mao, L., Hao, M., Dai, N., Xu, R., Li, S., Zhang, T., He, Y ., Wang, Y ., Zhang, Y ., Xu, Z., Li, D., Gao, F., Zou, H., Liu, J., Liu, J., Xu, J., Cheng, K., Li, K., Zhou, L., Li, Q., Fan, S., Lin, X., Han, X., Li, X., Lu, Y ., Xue, Y ., Jiang, Y ., Wang, Z., Wang, Z., and Cui, P. Limi...

arXiv

[21] [21]

8 A Fair Evaluation of Graph Foundation Models for Node Property Prediction A. Non-PFN-Based GFMs for Node Property Prediction The key challenge for GFMs dealing with node property prediction is that graphs can come from vastly different domains (e.g., social networks, co-purchasing networks, road networks). This implies that the graphs can have different...

2024

[22] [22]

learning how to learn

under the name of Transformer Neural Processes). The idea of PFNs is to train models to make predictions on previously unseen datasets in a single forward pass. PFNs perform in-context learning (ICL): rather than updating the model parameters for each new dataset, they use the context provided as input to make predictions without per-dataset training. In ...

2017

[23] [23]

Thus, these models handle graph-structured data natively instead of having to simplify it by converting to a table

— are significantly more involved: they create custom graph-native model architectures by augmenting standard PFN Transformers with graph neighborhood aggregation modules, design prior distributions over attributed graphs, and train these models on synthetic node property prediction datasets sampled from these prior distributions. Thus, these models handl...

2025

[24] [24]

and Graph Foundation Models (Eremeev et al., 2025

2025

[25] [25]

and normalizations (Ioffe & Szegedy, 2015; Ba et al., 2016), which often significantly improve the performance of neural models and are frequently used by the models being compared to GNNs. Further, from our extensive experience of training GNNs across a wide range of datasets, tasks, and settings, both for research and for industrial applications, we obs...

2015

[26] [26]

with simple additive attention and Local Graph Transformer (LGT) (Shi et al., 2021; Platonov et al., 2023b) with scaled dot product attention (Vaswani et al.,

2021

[27] [27]

The official implementation of GNNs from Platonov et al

(note that this Graph Transformer variant only allows each node to attend to its neighbors — we specifically refer to it as a Local Graph Transformer following Platonov & Prokhorenkova (2026) to distinguish it from more common Global Graph Transformers that allow each node to attend to any other node unconstrained by the graph structure). The official imp...

2026

[28] [28]

in the MPNNs class from the official codebase of Luo et al. (2024). In our experiments in Section 3, we observe that the GNN implementations from Luo et al. (2024) typically perform better for the simpler GCN and GraphSAGE models, while the GNN implementations from Platonov et al. (2023b) typically perform better for the more complex attention-based GAT a...

2024

[29] [29]

Further, we also observe that the GNN implementations from Platonov et al

is the strongest of the GNNs in our experiments, achieving the fifth place among all the considered models according to both average rank and average normalized score aggregated metrics (and the third and the second places on the subset of only node classification datasets according to average rank and average normalized score aggregated metrics, respecti...

2024

[30] [30]

and GraphBench (Stoll et al., 2025). While GraphBench mostly aims at widening the scope of Graph ML tasks and does not present many classic node property prediction tasks, GraphLand aims exactly at node property prediction and provides a diverse collection of datasets representing real-world industrial applications of this task. We believe GraphLand can b...

2025

[31] [31]

For each of the considered GNNs, we search over all hyperparameters available in the corresponding official implementation

provided in the Optuna library (Akiba et al., 2019). For each of the considered GNNs, we search over all hyperparameters available in the corresponding official implementation. The complete hyperparameter search spaces are provided in Appendix I. We train all GNNs with the AdamW optimizer (Kingma & Ba, 2015; Loshchilov & Hutter,

2019

[32] [32]

GFMs can support the in-context learning (ICL) regime, the fine-tuning (FT) regime, or both

for a maximum of3000steps using early stopping based on the validation set performance with a patience of1000steps. GFMs can support the in-context learning (ICL) regime, the fine-tuning (FT) regime, or both. One of the benefits of the ICL regime is that it does not require hyperparameter optimization, as the pretrained model is used as is without weight ...

2019

[33] [33]

Note that for GNNs, the models with the best hyperparameters can be widely different in size across GNN types and datasets

We can see that inference-time ensembling almost always improves the predictive performance of the 12 A Fair Evaluation of Graph Foundation Models for Node Property Prediction Table 3.The memory (VRAM, in GB) required for: a single training run with the best hyperparameters (Tr), a single inference run with the best hyperparameters (Inf). Note that for GN...

2024