arxiv: 2604.06391 · v1 · submitted 2026-04-07 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Toward a universal foundation model for graph-structured data

Sakib Mostafa , Lei Xing , Md. Tauhidul Islam

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords graph foundation modelstructural promptstransferable representationsbiomedical networksmessage-passing backbonezero-shot generalizationpretrained graph modelSagePPI benchmark

0 comments

The pith

A pretrained graph model using structural prompts transfers across diverse biomedical networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to build a reusable foundation model for graphs that works across different datasets, unlike standard graph neural networks that tie their learned representations to one specific set of node features and labels. In biomedical research, networks such as molecular interactions or gene regulations vary widely between experiments, so models trained on one cannot easily apply to another. The approach encodes general structural properties including node degrees, centralities, communities, and diffusion patterns as prompts that feed into a message-passing backbone. This produces embeddings in a shared space that support pretraining on mixed graphs and then quick reuse on new ones. The result is strong performance on benchmarks, including better zero-shot and few-shot results on held-out data.

Core claim

The model learns transferable structural representations that are not specific to particular node identities or feature schemes by leveraging feature-agnostic graph properties including degree statistics, centrality measures, community structure indicators, and diffusion-based signatures encoded as structural prompts. These prompts are integrated with a message-passing backbone to embed diverse graphs into a shared representation space. The model is pretrained once on heterogeneous graphs and subsequently reused on unseen datasets with minimal adaptation, achieving a mean ROC-AUC of 95.5% on the SagePPI benchmark after supervised fine-tuning, which is a 21.8% improvement over the best sup

What carries the argument

Structural prompts encoding feature-agnostic graph properties such as degree statistics, centrality measures, community structure, and diffusion signatures, integrated with a message-passing backbone to embed graphs into a shared representation space.

If this is right

The model can be applied to new biomedical graphs with minimal adaptation while matching or exceeding dataset-specific supervised baselines.
It shows superior zero-shot and few-shot generalization on graphs held out from pretraining.
The approach supports reusable models for graph analysis in domains where data varies across cohorts and institutions.
Pretraining once on heterogeneous graphs reduces the need for full retraining on each new dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The prompting technique might extend to other structured data types if similar topological summaries can be defined.
Explicit structural encoding could lower the data requirements for effective graph transfer learning in general.
The shared space might support new cross-graph tasks such as mapping knowledge between different molecular networks.
Testing the model on non-biomedical graphs like social or citation networks would check the breadth of the structural approach.

Load-bearing premise

Feature-agnostic properties like node degrees and community structures alone are enough to produce representations that transfer well across graphs with very different node features, topologies, and label spaces.

What would settle it

A zero-shot evaluation on a new graph dataset with mismatched topology and features where the pretrained model performs no better than a non-pretrained baseline or random guessing would falsify the transferability claim.

Figures

Figures reproduced from arXiv: 2604.06391 by Lei Xing, Md. Tauhidul Islam, Sakib Mostafa.

**Figure 1.** Figure 1: Architecture of the Graph Foundation Model. a, Structural prompt construction. For each node, local context descriptors—degree, clustering coefficient, k-core number, ego-network statistics, and PageRank—are combined with global context indicators derived from two community detection algorithms, Label Propagation and SCoDA, each contributing community identity, community size, and intra-community density. … view at source ↗

**Figure 2.** Figure 2: The pretrained GFM encodes protein functional organization in SagePPI and generalizes with minimal supervision. a, Few-shot generalization curves. Mean ROC-AUC is shown as a function of the number of labeled proteins per GO process, evaluated on held-out test proteins using a logistic probe trained on K positive and K negative examples per label (K ∈ {1, 5, 10, 20}). Solid lines denote GFM variants; dashed… view at source ↗

**Figure 3.** Figure 3: The pretrained GFM generalizes across the ogbn-proteins benchmark and captures GO hierarchy structure. a, Macro-averaged ROC curves across all GO term labels for GFM zero-shot, GFM Trained, and four supervised GNN baselines (GCN, GraphSAGE, GAT, GIN). Solid lines denote GFM variants; dashed lines denote supervised baselines. The dotted diagonal denotes random performance. b, Summary classification performa… view at source ↗

**Figure 4.** Figure 4: The pretrained GFM generalizes across GO ontologies in the STRING interaction network. a, Mean ROC-AUC and accuracy for Molecular Function (MF) GO term prediction. All six models are compared under the same train/validation/test split on the StringGO network. b, Mean ROC-AUC and accuracy for Cellular Component (CC) GO term prediction under identical conditions. c, Macro-averaged ROC curves for MF. Solid li… view at source ↗

**Figure 5.** Figure 5: The pretrained GFM generalizes to unseen protein structural fold classes across 144 tissue-specific interaction networks. a, Classification performance at K = 20 labeled examples per fold class. Mean ROC-AUC, accuracy, and specificity are shown for all six models, evaluated on test fold classes that were entirely absent during training. Error bars denote one standard deviation across five evaluation seeds.… view at source ↗

**Figure 2.** Figure 2: 22/31 [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

read the original abstract

Graphs are a central representation in biomedical research, capturing molecular interaction networks, gene regulatory circuits, cell--cell communication maps, and knowledge graphs. Despite their importance, currently there is not a broadly reusable foundation model available for graph analysis comparable to those that have transformed language and vision. Existing graph neural networks are typically trained on a single dataset and learn representations specific only to that graph's node features, topology, and label space, limiting their ability to transfer across domains. This lack of generalization is particularly problematic in biology and medicine, where networks vary substantially across cohorts, assays, and institutions. Here we introduce a graph foundation model designed to learn transferable structural representations that are not specific to specific node identities or feature schemes. Our approach leverages feature-agnostic graph properties, including degree statistics, centrality measures, community structure indicators, and diffusion-based signatures, and encodes them as structural prompts. These prompts are integrated with a message-passing backbone to embed diverse graphs into a shared representation space. The model is pretrained once on heterogeneous graphs and subsequently reused on unseen datasets with minimal adaptation. Across multiple benchmarks, our pretrained model matches or exceeds strong supervised baselines while demonstrating superior zero-shot and few-shot generalization on held-out graphs. On the SagePPI benchmark, supervised fine-tuning of the pretrained backbone achieves a mean ROC-AUC of 95.5%, a gain of 21.8% over the best supervised message-passing baseline. The proposed technique thus provides a unique approach toward reusable, foundation-scale models for graph-structured data in biomedical and network science applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Structural prompts offer a fresh angle on graph foundation models for biomedicine, but the abstract's lack of details leaves the performance claims unverified.

read the letter

The main thing to know is that this work pretrains a GNN using only structural graph properties encoded as prompts to enable transfer to new graphs, claiming big gains on benchmarks like SagePPI. What is new is the specific combination of those prompts with a message-passing backbone for zero-shot and few-shot use on held-out biomedical data. It does well at laying out the problem with existing GNNs that overfit to one graph's features and labels. The soft spots are the missing pieces: no information on pretraining graphs, prompt integration, baseline setups, or ablations. This makes it hard to assess if the 21.8% improvement is due to the method or other factors. The assumption that structural signatures alone can carry the load when node features often matter in these domains is a potential weak point that needs checking in the full paper. This paper is for researchers in graph machine learning applied to biology who are interested in building more general models. It could give value to someone looking for ideas on transferable representations, but only if the experiments hold up. I recommend sending it for peer review because the idea is coherent and the problem is important enough to warrant referee input on the methods and results.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a graph foundation model for biomedical and network data that encodes feature-agnostic structural properties (degree statistics, centrality, community structure, diffusion signatures) as prompts to a message-passing backbone. The model is pretrained on heterogeneous graphs and then adapted to new datasets, claiming to achieve strong performance and generalization, with a specific result of 95.5% mean ROC-AUC on the SagePPI benchmark after supervised fine-tuning, representing a 21.8% improvement over the best supervised message-passing baseline.

Significance. If the experimental claims are substantiated with full details and ablations, the work would address a genuine gap by moving toward reusable foundation models for graphs in domains where data heterogeneity across cohorts and assays limits current GNNs. The approach of using only structural prompts for transfer is a distinctive angle that, if shown to work without relying on node features, could enable broader reuse than dataset-specific models.

major comments (2)

[Abstract] Abstract: The reported performance numbers (95.5% ROC-AUC and 21.8% gain on SagePPI) and zero/few-shot generalization claims are presented without any experimental details on data splits, baseline definitions, hyperparameter choices, or ablation studies. This absence is load-bearing because the universality claim cannot be evaluated without knowing whether the comparison baselines incorporated node features while the proposed model used only structural prompts.
[Results section (SagePPI evaluation)] Results section (SagePPI evaluation): The central claim that feature-agnostic structural prompts produce transferable representations across graphs differing in node features, topology, and label spaces is undermined by the lack of an ablation showing performance of a standard message-passing baseline when node features are also withheld, or confirmation that the backbone processes node features during fine-tuning. Without this, the 21.8% gain may reflect an unfair comparison rather than a foundation-model advantage.

minor comments (1)

[Method] The integration of structural prompts with the message-passing backbone is described at a high level in the abstract but lacks a precise algorithmic description or pseudocode that would allow reproduction of how prompts are encoded and fused.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important aspects of our experimental reporting. We agree that more details are needed to fully support the claims and will revise the manuscript to include them. Our responses to the major comments are as follows.

read point-by-point responses

Referee: [Abstract] Abstract: The reported performance numbers (95.5% ROC-AUC and 21.8% gain on SagePPI) and zero/few-shot generalization claims are presented without any experimental details on data splits, baseline definitions, hyperparameter choices, or ablation studies. This absence is load-bearing because the universality claim cannot be evaluated without knowing whether the comparison baselines incorporated node features while the proposed model used only structural prompts.

Authors: We acknowledge the validity of this observation. The current manuscript version presents the key results in the abstract without accompanying details, which limits the ability to assess the claims. In the revised manuscript, we will add a dedicated experimental details subsection in the methods and expand the results section to specify data splits (e.g., train/validation/test ratios and how held-out graphs are selected), precise definitions of baselines (including their use of node features), hyperparameter search procedures, and additional ablation studies. We will explicitly state that our model relies solely on structural prompts and is feature-agnostic by design, whereas standard baselines utilize available node features where present. This will substantiate the universality and transfer claims. revision: yes
Referee: [Results section (SagePPI evaluation)] Results section (SagePPI evaluation): The central claim that feature-agnostic structural prompts produce transferable representations across graphs differing in node features, topology, and label spaces is undermined by the lack of an ablation showing performance of a standard message-passing baseline when node features are also withheld, or confirmation that the backbone processes node features during fine-tuning. Without this, the 21.8% gain may reflect an unfair comparison rather than a foundation-model advantage.

Authors: This is a fair critique. To strengthen the evidence, we will include in the revised results section an ablation experiment comparing our pretrained model against a standard message-passing GNN baseline trained without node features (using only structural prompts or constant features). We will also add text confirming that during both pretraining and fine-tuning, the message-passing backbone receives the structural prompts as node inputs rather than original node features. While the reported gain demonstrates the value of pretraining on diverse structural properties for transfer to new biomedical graphs, we agree that this additional control is necessary to rule out comparison artifacts and will report the results of the ablation. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on empirical pretraining and held-out evaluation

full rationale

The paper presents a descriptive architecture for pretraining a message-passing backbone on heterogeneous graphs using feature-agnostic structural prompts (degree statistics, centrality, community structure, diffusion signatures). Performance numbers such as the 95.5% ROC-AUC on SagePPI are reported as outcomes of supervised fine-tuning and zero/few-shot testing on unseen datasets, not as quantities derived by construction from fitted parameters or self-referential definitions. No equations appear that equate a prediction to its own input, no uniqueness theorem is invoked via self-citation, and no ansatz is smuggled through prior work. The derivation chain is therefore self-contained against external benchmarks rather than internally forced.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the domain assumption that standard graph-theoretic properties suffice for transferable representations; no free parameters, axioms, or invented entities are explicitly quantified in the abstract.

axioms (1)

domain assumption Feature-agnostic structural properties such as degree, centrality, and diffusion signatures capture sufficient information for cross-graph transfer
Invoked in the design of the structural prompts and the claim of universality

invented entities (1)

structural prompts no independent evidence
purpose: To encode graph properties for integration with the message-passing backbone
New encoding mechanism introduced to achieve feature-agnostic representations

pith-pipeline@v0.9.0 · 5579 in / 1484 out tokens · 52357 ms · 2026-05-10T18:27:34.038399+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our approach leverages feature-agnostic graph properties, including degree statistics, centrality measures, community structure indicators, and diffusion-based signatures, and encodes them as structural prompts. These prompts are integrated with a message-passing backbone...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The model is pretrained once on heterogeneous graphs and subsequently reused on unseen datasets with minimal adaptation.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 8 canonical work pages · 3 internal anchors

[1]

& Oltvai, Z

Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization.Nat. Rev. Genet.5, 101–113 (2004). 2.Davidson, E. H.et al.A genomic regulatory network for development.Science295, 1669–1678 (2002)

2004
[2]

& Lewis, N

Armingol, E., Officer, A., Harismendy, O. & Lewis, N. E. Deciphering cell–cell interactions and communication from gene expression.Nat. Rev. Genet.22, 71–88 (2021)

2021
[3]

Nicholson, D. N. & Greene, C. S. Constructing knowledge graphs and their biomedical applications.Comput. Struct. Biotechnol. J.18, 1414–1428 (2020)

2020
[4]

InAdvances in Neural Information Processing Systems, vol

Brown, T.et al.Language models are few-shot learners. InAdvances in Neural Information Processing Systems, vol. 33 (2020)

2020
[5]

On the Opportunities and Risks of Foundation Models

Dosovitskiy, A.et al.An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations(2021). 7.Bommasani, R.et al.On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[6]

Syst.32, 4–24 (2021)

Wu, Z.et al.A comprehensive survey on graph neural networks.IEEE Transactions on Neural Networks Learn. Syst.32, 4–24 (2021)

2021
[7]

Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. InInternational Conference on Learning Representations(2017)

2017
[8]

& Jegelka, S

Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are graph neural networks? InInternational Conference on Learning Representations(2019). 11.Veli ˇckovi´c, P.et al.Graph attention networks. InInternational Conference on Learning Representations(2018)

2019
[9]

& Leskovec, J

Hamilton, W., Ying, Z. & Leskovec, J. Inductive representation learning on large graphs. InAdvances in Neural Information Processing Systems, vol. 30 (2017). 13.Veli ˇckovi´c, P.et al.Deep graph infomax. InInternational Conference on Learning Representations(2019). 17/31

2017
[10]

InICML Workshop on Graph Representation Learning and Beyond(2020)

Zhu, Y .et al.Deep graph contrastive representation learning. InICML Workshop on Graph Representation Learning and Beyond(2020)

2020
[11]

InInternational Conference on Learning Representations(2022)

Thakoor, S.et al.Large-scale representation learning on graphs via bootstrapping. InInternational Conference on Learning Representations(2022)

2022
[12]

Zhu, Z., Lin, K., Jain, A. K. & Zhou, J. Transfer learning in deep reinforcement learning: A survey.arXiv preprint arXiv:2009.07888(2020)

work page arXiv 2009
[13]

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Bronstein, M. M., Bruna, J., Cohen, T. & Veliˇckovi´c, P. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges.arXiv preprint arXiv:2104.13478(2021)

work page internal anchor Pith review arXiv 2021
[14]

& Wu, X.-M

Li, Q., Han, Z. & Wu, X.-M. Deeper insights into graph convolutional networks for semi-supervised classification. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018). 19.Newman, M. E. The structure and function of complex networks.SIAM Rev.45, 167–256 (2003)

2018
[15]

InAdvances in Neural Information Processing Systems, vol

Hu, W.et al.Open graph benchmark: Datasets for machine learning on graphs. InAdvances in Neural Information Processing Systems, vol. 33 (2020)

2020
[16]

Szklarczyk, D.et al.STRING v11: protein–protein association networks with increased coverage supporting functional discovery in genome-wide experimental datasets.Nucleic Acids Res.47, D607–D613 (2019)

2019
[17]

The Gene Ontology resource: enriching a GOld mine.Nucleic Acids Res.49, D325–D334 (2021)

The Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine.Nucleic Acids Res.49, D325–D334 (2021)

2021
[18]

& Zitnik, M

Huang, K. & Zitnik, M. Graph meta learning via local subgraphs. InAdvances in Neural Information Processing Systems, vol. 33 (2020)

2020
[19]

G., Brenner, S

Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures.J. Mol. Biol.247, 536–540 (1995). 25.Ashburner, M.et al.Gene ontology: tool for the unification of biology.Nat. Genet.25, 25–29 (2000)

1995
[20]

Liberzon, A.et al.The Molecular Signatures Database (MSigDB) hallmark gene set collection.Cell Syst.1, 417–425 (2015)

2015
[21]

P., Barabási, A.-L

Jeong, H., Mason, S. P., Barabási, A.-L. & Oltvai, Z. N. Lethality and centrality in protein networks.Nature411, 41–42 (2001)

2001
[22]

An O(m) Algorithm for Cores Decomposition of Networks

Batagelj, V . & Zaversnik, M. An O(m) algorithm for cores decomposition of networks.arXiv preprint cs/0310049(2003)

work page Pith review arXiv 2003
[23]

& Lelarge, M

Hollocou, A., Maudet, J., Bonald, T. & Lelarge, M. A linear streaming algorithm for community detection in very large networks.arXiv preprint arXiv:1703.02955(2017)

work page arXiv 2017
[24]

N., Albert, R

Raghavan, U. N., Albert, R. & Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. vol. 76, 036106 (APS, 2007). 31.Dinarello, C. A. Proinflammatory cytokines.Chest118, 503–508 (2000). 32.Cramer, P. Organization and regulation of gene transcription.Nature573, 45–54 (2019)

2007
[25]

& Axel, R

Buck, L. & Axel, R. A novel multigene family may encode odorant receptors: a molecular basis for odor recognition.Cell 65, 175–187 (1991)

1991
[26]

& Newman, M

Girvan, M. & Newman, M. E. Community structure in social and biological networks.Proc. Natl. Acad. Sci.99, 7821–7826 (2002)

2002
[27]

M., Sprecher, E., Trifonov, V

Yu, H., Kim, P. M., Sprecher, E., Trifonov, V . & Gerstein, M. The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics.PLOS Comput. Biol.3, e59 (2007). 36.Regev, A.et al.The Human Cell Atlas.eLife6, e27041 (2017). 37.Stokes, J. M.et al.A deep learning approach to antibiotic discovery.Cell180, 688–702 (2020)

2007
[28]

& Winograd, T

Page, L., Brin, S., Motwani, R. & Winograd, T. The pagerank citation ranking: Bringing order to the web. Tech. Rep., Stanford infolab (1999)

1999
[29]

& Gurevych, I

Reimers, N. & Gurevych, I. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing(2019)

2019
[30]

& Günnemann, S

Gasteiger, J., Bojchevski, A. & Günnemann, S. Predict then propagate: Graph neural networks meet personalized PageRank. InInternational Conference on Learning Representations(2019). 18/31

2019
[31]

Oord, A. v. d., Li, Y . & Vinyals, O. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

& Hinton, G

Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. InInternational Conference on Machine Learning(2020). 43.Yang, Z., Cohen, W. & Salakhudinov, R. Revisiting semi-supervised learning with graph embeddings. InProceedings of the 33rd International Conference on Machine Learning(2016)

2020
[33]

Pitfalls of Graph Neural Network Evaluation

Shchur, O., Mumme, M., Bojchevski, A. & Günnemann, S. Pitfalls of graph neural network evaluation.arXiv preprint arXiv:1811.05868(2018)

work page Pith review arXiv 2018
[34]

and Cangea, C

Mernyei, P. & Cangea, C. Wiki-CS: A Wikipedia-based benchmark for graph neural networks.arXiv preprint arXiv:2007.02901(2020)

work page arXiv 2007
[35]

Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations(2015). 47.Micikevicius, P.et al.Mixed precision training. InInternational Conference on Learning Representations(2018)

2015
[36]

Shen, J.et al.Predicting protein–protein interactions based only on sequences information.Proc. Natl. Acad. Sci.104, 4337–4341 (2007)

2007
[37]

& Lenssen, J

Fey, M. & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. InICLR Workshop on Representation Learning on Graphs and Manifolds(2019). 19/31 Supplementary Materials Figure S1.Full macro-averaged ROC curves for SagePPI.Macro-averaged receiver operating characteristic curves across all 121 GO biological process labels for GFM zero-sho...

2019