GraphVec: Cross-Domain Graph Vectorization for Graph-Level Representation Learning

Jicong Fan; Qi Feng

arxiv: 2602.04244 · v2 · submitted 2026-02-04 · 💻 cs.LG

GraphVec: Cross-Domain Graph Vectorization for Graph-Level Representation Learning

Qi Feng , Jicong Fan This is my paper

Pith reviewed 2026-05-16 07:23 UTC · model grok-4.3

classification 💻 cs.LG

keywords cross-domain graph learninggraph vectorizationspectral embeddingsfew-shot graph classificationgraph clusteringtransfer learningGIN transformer

0 comments

The pith

GraphVec turns graphs from unrelated domains into fixed vectors by aligning spectral features from multi-scale global graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to learn graph representations that transfer across domains even when graphs differ in structure, node attribute types, feature dimensions, and availability. It avoids raw node attributes by building multi-scale global graphs over every node in a dataset, extracts spectral embeddings to capture domain-agnostic relations, and aligns those embeddings across datasets with a density-maximization mean alignment procedure that is proven to converge monotonically. The resulting vectors feed a GIN-Graph Transformer backbone augmented by a multi-layer reference distribution module that retains node-level distributional detail beyond ordinary pooling. A generalization bound is supplied for the full model. If the approach holds, pre-trained graph models can move knowledge between heterogeneous datasets such as molecules and citation networks without retraining from scratch.

Core claim

GraphVec constructs multi-scale global graphs over all nodes in each dataset and extracts spectral embeddings to obtain domain-agnostic relational features. A density-maximization mean alignment algorithm over orthogonal transformations is introduced and shown to converge monotonically, rendering the spectral features comparable across domains. The model combines a GIN-Graph Transformer backbone with a multi-layer reference distribution module that preserves node-level distributional information beyond standard pooling, and a generalization error bound is derived for the resulting architecture.

What carries the argument

Density-maximization mean alignment over orthogonal transformations applied to spectral embeddings extracted from multi-scale global graphs, which produces comparable fixed-dimensional vectors while discarding original node-attribute semantics.

Load-bearing premise

Spectral embeddings from the multi-scale global graphs remain informative enough that the alignment step can produce truly comparable features even when node attributes differ completely in semantics and availability across domains.

What would settle it

An experiment that removes or randomizes all node attributes and shows that cross-domain few-shot classification accuracy falls to the level of random guessing or unaligned baselines would falsify the claim that the spectral features plus alignment suffice for transfer.

Figures

Figures reproduced from arXiv: 2602.04244 by Jicong Fan, Qi Feng.

**Figure 2.** Figure 2: The change of classification accuracy in ENZYMES when the number of datasets used in pre-training [PITH_FULL_IMAGE:figures/full_fig_p026_2.png] view at source ↗

**Figure 3.** Figure 3: Classification accuracy trends of our method GraphVec-FM with varying [PITH_FULL_IMAGE:figures/full_fig_p026_3.png] view at source ↗

**Figure 4.** Figure 4: T-SNE visualization of aligned node embeddings of datasets from different domains. [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: The few-shot graph classification accuracy in datasets with node attributes when the number of global [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗

**Figure 6.** Figure 6: The few-shot graph classification accuracy in datasets without node attributes when the number of global [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗

read the original abstract

Learning universal graph representations across heterogeneous domains is difficult because graph datasets differ in topology, node-attribute semantics, feature dimensions, and even attribute availability. We propose GraphVec, a language-model-free graph vectorization model that maps diverse graphs into transferable fixed-dimensional embeddings for graph-level tasks. Instead of directly using incomparable raw node attributes, GraphVec constructs multi-scale global graphs over all nodes in each dataset and extracts spectral embeddings to obtain domain-agnostic relational features. To make these spectral features comparable across datasets, we introduce a density-maximization mean alignment algorithm over orthogonal transformations and prove its monotonic convergence. GraphVec further combines a GIN--Graph Transformer backbone with a multi-layer reference distribution module, which preserves node-level distributional information beyond standard pooling. We also provide a generalization error bound for the proposed model. Experiments on 13 datasets with more than 15 comparison methods demonstrate that GraphVec consistently outperforms strong graph pretraining baselines in cross-domain few-shot graph classification and graph clustering. Beyond graph-level tasks, GraphVec also yields strong node-level representations, achieving competitive performance on few-shot node classification against representative graph prompt learning methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GraphVec offers a spectral alignment pipeline for cross-domain graph embeddings that avoids language models, but the gains may hinge on unverified assumptions about preserved signal after alignment.

read the letter

The main takeaway is that GraphVec builds multi-scale global graphs, pulls spectral embeddings to sidestep raw node attributes, aligns them across domains via density-maximization orthogonal transforms with a claimed monotonic convergence proof, and layers on a GIN-Transformer plus reference distribution module. It also states a generalization bound and tests the whole thing on 13 datasets against over 15 baselines for cross-domain few-shot classification and clustering.

Referee Report

3 major / 2 minor

Summary. The paper proposes GraphVec, a language-model-free graph vectorization model that constructs multi-scale global graphs per dataset to extract spectral embeddings as domain-agnostic relational features, applies a density-maximization mean alignment over orthogonal transformations (with a claimed monotonic convergence proof) to enable cross-domain comparability, and combines a GIN-Graph Transformer backbone with a multi-layer reference distribution module. It provides a generalization error bound and reports that the resulting fixed-dimensional embeddings outperform strong graph pretraining baselines in cross-domain few-shot graph classification and clustering across 13 datasets, while also yielding competitive node-level representations.

Significance. If the spectral embeddings and alignment procedure reliably preserve task-relevant signal across domains with incompatible node-attribute semantics and availability, the approach could advance universal, transferable graph representations without relying on language models or raw feature alignment. The provision of a convergence proof and generalization bound strengthens the theoretical contribution relative to purely empirical cross-domain methods.

major comments (3)

[Abstract / Method] Abstract and method description: The generalization error bound does not address potential information loss from the density-maximization alignment step when original node attributes differ in semantics or are absent; this is load-bearing for the cross-domain transfer claim, as the bound appears to treat the aligned spectral features as given without bounding the alignment-induced distortion.
[Alignment Algorithm] Alignment procedure: The monotonic convergence proof for the density-maximization mean alignment over orthogonal transformations is presented as guaranteeing comparability, but it is unclear whether the procedure preserves downstream-task-relevant structure (as opposed to merely increasing density) when input graphs have heterogeneous topologies and attribute distributions; this assumption underpins the claim that gains come from the vectorization rather than the GIN-Graph Transformer backbone.
[Experiments] Experiments: The reported outperformance on 13 datasets against >15 baselines requires explicit ablation isolating the contribution of the spectral vectorization and alignment from the reference distribution module and backbone; without such controls, it remains possible that post-hoc dataset or hyperparameter choices drive the gains rather than the proposed components.

minor comments (2)

[Method] Clarify the precise construction of the multi-scale global graphs (e.g., how scales are chosen and how the graph is built from all nodes) to ensure reproducibility.
[Experiments] The abstract mentions 'more than 15 comparison methods' but the experimental section should list them explicitly with citations and note which are graph pretraining baselines versus others.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment point by point below. Where the comments identify areas for clarification or additional controls, we will revise the manuscript accordingly. Our responses focus on strengthening the theoretical grounding and experimental rigor without altering the core claims.

read point-by-point responses

Referee: [Abstract / Method] Abstract and method description: The generalization error bound does not address potential information loss from the density-maximization alignment step when original node attributes differ in semantics or are absent; this is load-bearing for the cross-domain transfer claim, as the bound appears to treat the aligned spectral features as given without bounding the alignment-induced distortion.

Authors: We agree that the generalization bound is stated for the model after alignment and does not explicitly bound distortion introduced by the alignment step itself. The bound focuses on the downstream GIN-Graph Transformer with reference distribution module operating on the aligned embeddings. Because the alignment uses orthogonal transformations (isometries), intra-domain distances and norms are preserved, but cross-domain semantic shifts are not theoretically quantified in the current bound. In the revision we will add a dedicated paragraph in the theoretical section acknowledging this limitation and framing it as an avenue for future work on distortion-aware bounds. The empirical cross-domain results remain the primary support for transferability. revision: partial
Referee: [Alignment Algorithm] Alignment procedure: The monotonic convergence proof for the density-maximization mean alignment over orthogonal transformations is presented as guaranteeing comparability, but it is unclear whether the procedure preserves downstream-task-relevant structure (as opposed to merely increasing density) when input graphs have heterogeneous topologies and attribute distributions; this assumption underpins the claim that gains come from the vectorization rather than the GIN-Graph Transformer backbone.

Authors: The convergence proof establishes monotonic increase in the density objective under orthogonal transformations, which are distance-preserving isometries; thus relative structure within each domain's spectral embedding is retained while the means are aligned to maximize overlap. We will augment the alignment section with a short analysis (including a controlled synthetic experiment) demonstrating that task-relevant metrics such as class separability are not degraded post-alignment. To isolate the vectorization contribution from the backbone, we will also expand the existing ablation tables to report performance with and without the alignment step while keeping the backbone fixed. revision: partial
Referee: [Experiments] Experiments: The reported outperformance on 13 datasets against >15 baselines requires explicit ablation isolating the contribution of the spectral vectorization and alignment from the reference distribution module and backbone; without such controls, it remains possible that post-hoc dataset or hyperparameter choices drive the gains rather than the proposed components.

Authors: We accept that the current ablation presentation can be made more explicit. The manuscript already contains component-wise ablations, but we will reorganize them into a single dedicated subsection that systematically removes (i) the spectral vectorization, (ii) the alignment procedure, and (iii) the reference distribution module while holding the GIN-Graph Transformer backbone constant. Additional controls for hyperparameter sensitivity across the 13 datasets will be included. These revisions will be presented with the same evaluation protocol to directly address the concern. revision: yes

Circularity Check

0 steps flagged

Derivation chain self-contained; no reductions to fitted inputs or self-citations

full rationale

The paper defines GraphVec via explicit algorithmic steps: multi-scale global graph construction followed by spectral embedding extraction, a separately proven density-maximization alignment over orthogonal transformations, a GIN-Graph Transformer backbone augmented by a reference distribution module, and an independent generalization error bound. None of these steps are shown to reduce by construction to parameters fitted on the evaluation data or to prior self-citations that carry the uniqueness claim. The experimental results are presented as downstream validation rather than as the definitional basis for the method itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on the assumption that spectral embeddings from multi-scale global graphs capture domain-agnostic relational structure and that the alignment procedure preserves enough information for downstream few-shot tasks; no free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption Spectral embeddings of multi-scale global graphs yield domain-agnostic relational features
Invoked when the method replaces raw node attributes with these embeddings to achieve cross-domain comparability.
standard math The density-maximization mean alignment over orthogonal transformations converges monotonically
Stated as a proved property of the alignment algorithm.

pith-pipeline@v0.9.0 · 5492 in / 1466 out tokens · 77956 ms · 2026-05-16T07:23:06.594013+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

constructs multi-scale global graphs ... extracts spectral embeddings ... density-maximization mean alignment algorithm over orthogonal transformations and prove its monotonic convergence
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4.1 ... L({R(t)_j}) is non-decreasing ... R(t)_j - R(t-1)_j → 0

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

[1]

A family of tractable graph distances

Jose Bento and Stratis Ioannidis. A family of tractable graph distances. InProceedings of the 2018 SIAM International Conference on Data Mining, pages 333–341. SIAM,

work page 2018
[2]

Towards foundation models for knowledge graph reasoning.arXiv preprint arXiv:2310.04562,

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. Towards foundation models for knowledge graph reasoning.arXiv preprint arXiv:2310.04562,

work page arXiv
[3]

You only transfer what you share: Intersection-induced graph transfer learning for link prediction.arXiv preprint arXiv:2302.14189,

Wenqing Zheng, Edward W Huang, Nikhil Rao, Zhangyang Wang, and Karthik Subbian. You only transfer what you share: Intersection-induced graph transfer learning for link prediction.arXiv preprint arXiv:2302.14189,

work page arXiv
[4]

Zhang, X

Duo Zhang, Xinzijian Liu, Xiangyu Zhang, Chengqian Zhang, Chun Cai, Hangrui Bi, Yiming Du, Xuejian Qin, Jiameng Huang, Bowen Li, et al. Dpa-2: Towards a universal large atomic model for molecular and material simulation.arXiv preprint arXiv:2312.15492,

work page arXiv
[5]

Zero-shot logical query reasoning on any knowledge graph.arXiv preprint arXiv:2404.07198,

Mikhail Galkin, Jincheng Zhou, Bruno Ribeiro, Jian Tang, and Zhaocheng Zhu. Zero-shot logical query reasoning on any knowledge graph.arXiv preprint arXiv:2404.07198,

work page arXiv
[6]

Graphany: A foundation model for node classification on any graph.arXiv preprint arXiv:2405.20445, 2024a

Jianan Zhao, Hesham Mostafa, Michael Galkin, Michael Bronstein, Zhaocheng Zhu, and Jian Tang. Graphany: A foundation model for node classification on any graph.arXiv preprint arXiv:2405.20445, 2024a. Divyansha Lachi, Mehdi Azabou, Vinam Arora, and Eva Dyer. Graphfm: A scalable framework for multi-graph pretraining.arXiv preprint arXiv:2407.11907,

work page arXiv
[7]

Cross- domain graph data scaling: A showcase with diffusion models.arXiv preprint arXiv:2406.01899, 2024a

Wenzhuo Tang, Haitao Mao, Danial Dervovic, Ivan Brugere, Saumitra Mishra, Yuying Xie, and Jiliang Tang. Cross- domain graph data scaling: A showcase with diffusion models.arXiv preprint arXiv:2406.01899, 2024a. Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Revisiting semi-supervised learning with graph embeddings. InInternational conference on mach...

work page arXiv
[8]

Fatemi, J

Bahare Fatemi, Jonathan Halcrow, and Bryan Perozzi. Talk like a graph: Encoding graphs for large language models. arXiv preprint arXiv:2310.04560,

work page arXiv
[9]

One for all: Towards training one graph model for all classification tasks.arXiv preprint arXiv:2310.00149, 2023

Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, and Muhan Zhang. One for all: Towards training one graph model for all classification tasks.arXiv preprint arXiv:2310.00149, 2023a. Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language mod...

work page arXiv
[10]

Samgpt: Text-free graph foundation model for multi-domain pre-training and cross-domain adaptation

Xingtong Yu, Zechuan Gong, Chang Zhou, Yuan Fang, and Hui Zhang. Samgpt: Text-free graph foundation model for multi-domain pre-training and cross-domain adaptation. InProceedings of the ACM on Web Conference 2025, pages 1142–1153, 2025a. Xingbo Fu, Yinhan He, and Jundong Li. Edge prompt tuning for graph neural networks. InThe Thirteenth International Conf...

work page 2025
[11]

Xingtong Yu, Chang Zhou, Zhongwei Kuai, Xinming Zhang, and Yuan Fang

URLhttps://openreview.net/forum?id=92vMaHotTM. Xingtong Yu, Chang Zhou, Zhongwei Kuai, Xinming Zhang, and Yuan Fang. Gcot: Chain-of-thought prompt learning for graphs.arXiv preprint arXiv:2502.08092, 2025b. Shuo Wang, Bokui Wang, Zhixiang Shen, Boyan Deng, and Zhao Kang. Multi-domain graph foundation models: Robust knowledge transfer via topology alignmen...

work page arXiv
[12]

Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization.arXiv preprint arXiv:1908.01000,

Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization.arXiv preprint arXiv:1908.01000,

work page arXiv 1908
[13]

Simgrace: A simple framework for graph contrastive learning without data augmentation

Jun Xia, Lirong Wu, Jintao Chen, Bozhen Hu, and Stan Z Li. Simgrace: A simple framework for graph contrastive learning without data augmentation. InProceedings of the ACM web conference 2022, pages 1070–1079,

work page 2022
[14]

Graphmae2: A decoding-enhanced masked self-supervised graph learner

Zhenyu Hou, Yufei He, Yukuo Cen, Xiao Liu, Yuxiao Dong, Evgeny Kharlamov, and Jie Tang. Graphmae2: A decoding-enhanced masked self-supervised graph learner. InProceedings of the ACM web conference 2023, pages 737–746,

work page 2023
[15]

Graph pooling for graph neural networks: Progress, challenges, and opportunities.arXiv preprint arXiv:2204.07321,

Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan. All in one: Multi-task prompting for graph neural networks. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2120–2131, 2023b. Zemin Liu, Xingtong Yu, Yuan Fang, and Xinming Zhang. Graphprompt: Unifying pre-training and downstream tasks for graph neural...

work page arXiv 2023
[16]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Gft: Graph foundation model with transferable tree vocabulary.Advances in Neural Information Processing Systems, 37:107403–107443, 2024b

12 arXivTemplateA PREPRINT Zehong Wang, Zheyuan Zhang, Nitesh Chawla, Chuxu Zhang, and Yanfang Ye. Gft: Graph foundation model with transferable tree vocabulary.Advances in Neural Information Processing Systems, 37:107403–107443, 2024b. Li Sun, Zhenhao Huang, Suyang Zhou, Qiqi Wan, Hao Peng, and Philip Yu. Riemanngfm: Learning a graph foundation model fro...

work page 2025
[18]

Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann

Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. Tudataset: A collection of benchmark datasets for learning with graphs. InICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020),

work page 2020
[19]

Graver: Generative graph vocabularies for robust graph foundation models fine-tuning.arXiv preprint arXiv:2511.05592, 2025b

Haonan Yuan, Qingyun Sun, Junhua Shi, Xingcheng Fu, Bryan Hooi, Jianxin Li, and Philip S Yu. Graver: Generative graph vocabularies for robust graph foundation models fine-tuning.arXiv preprint arXiv:2511.05592, 2025b. Xingtong Yu, Chang Zhou, Yuan Fang, and Xinming Zhang. Multigprompt for multi-task pre-training and prompting on graphs. InProceedings of t...

work page arXiv 2024
[20]

pre-train and adaptation

13 arXivTemplateA PREPRINT A Notations Symbol Meaning Symbol Meaning x a real number x a vector X a matrix In identity matrix of sizen×n G a graph g vector representation ofG G a set of graphs D a dataset ∥x∥ the Euclidean norm ofx ∥x∥1 theℓ 1 norm ofx [M] the set{1,2, . . . , M} X∥Yor[X,Y] vertical concatenation ∥X∥F Frobenius norm of matrix ∥X∥2 spectra...

work page 2025
[21]

These graph-level models are usually small and not general

adopts a meta-learning approach to learn model initialization for few-shot graph classification. These graph-level models are usually small and not general. Some GFMs can be adapted to graph-level tasks. For 14 arXivTemplateA PREPRINT instance, GraphPrompt [Liu et al., 2023b], GraphPrompt+ [Yu et al., 2024b], and EdgePrompt [Fu et al., 2025] use learnable...

work page 2025
[22]

The output of the layer isS, for which we have ∥S−S ′∥2 = vuut NX i=1 RX j=1 |sij −s ′ ij|2 ≤4 r θ n vuut NX i=1 RX j=1 ∥Hi −H ′ i∥2 F = 4 r θR n ∥H−H ′∥F This finished the proof

≤ |x−y| for any x, y≥0 , (b) holds due to the triangle inequality, and (c) holds by the Cauchy–Schwarz inequality. The output of the layer isS, for which we have ∥S−S ′∥2 = vuut NX i=1 RX j=1 |sij −s ′ ij|2 ≤4 r θ n vuut NX i=1 RX j=1 ∥Hi −H ′ i∥2 F = 4 r θR n ∥H−H ′∥F This finished the proof. Lemma E.5.Suppose the GIN f has Q layers and each layer has an...

work page 2017
[23]

Moreover, LF is not very sensitive to γ, which is learned adaptively

• Since LF scales with O(√γR), we could use a relatively large R to enrich the final vector representation for each graph, thereby improving the expressiveness. Moreover, LF is not very sensitive to γ, which is learned adaptively. 21 arXivTemplateA PREPRINT Table 6: Dataset Statistics. Dataset Domain #Graphs #Avg.Nodes #Features #Classes Task ENZYMES Bioi...

work page 2000
[24]

The Gaussian kernel parameter γ in the reference layer employs a separate learning rate α2 = 0.1

In the pre-training stage, all modules are optimized using Adam optimizer [Kinga et al., 2015] with fixed learning rate α1 = 0.0005 and a weight decay factor of 10−5, trained for 50 epochs. The Gaussian kernel parameter γ in the reference layer employs a separate learning rate α2 = 0.1. The batch size for all datasets is fixed to

work page 2015
[25]

Regarding data splitting, we randomly choose 50 graphs in each class for training, and the remaining samples are used for testing

Few-shot learning settingsIn the downstream tasks of few-shot graph classification, the classifier is a softmax classifier, which follows the setting in EdgePrompt [Fu et al., 2025]. Regarding data splitting, we randomly choose 50 graphs in each class for training, and the remaining samples are used for testing. The number of epochs is set to 500, and the...

work page 2025
[26]

H.4 The Impact of Nyström Approximation To address the computational complexity associated with large-scale graphs, we employ the Nyström approximation during pre-training

This adaptation demonstrates that GraphVec-FM can also be effectively extended to node-level tasks. H.4 The Impact of Nyström Approximation To address the computational complexity associated with large-scale graphs, we employ the Nyström approximation during pre-training. To systematically evaluate its impact, we pre-train GraphVec-FM using the Nyström me...

work page 2000
[27]

We can see that with more datasets used in pre-training, the performance in downstream tasks becomes better. This result indicates that the generalization ability of graph embeddings generated by our GraphVec-FM can benefit 24 arXivTemplateA PREPRINT Table 8: 50-shot graph classification performance comparison with different pre-trained models. We color t...

work page 2025
[28]

The comparison numbers are from AMGC [Yang et al., 2025]

We also provide the full version of Table 2 with NMI in Table 13 Table 12: Graph clustering results on PTC-MM, MUTAG, COX2 and BZR Dataset PTC-MM MUTAG COX2 BZR ACC NMI ARI ACC NMI ARI ACC NMI ARI ACC NMI ARI GraphCL+SC62.09±0.56 2.14±0.43 3.36±0.87 73.22±2.6632.19±2.05 23.44±2.45 75.01±2.12 1.24±0.372.39±2.28 72.88±1.661.90±0.383.47±0.59GWF [Xu et al., 2...

work page 2022

[1] [1]

A family of tractable graph distances

Jose Bento and Stratis Ioannidis. A family of tractable graph distances. InProceedings of the 2018 SIAM International Conference on Data Mining, pages 333–341. SIAM,

work page 2018

[2] [2]

Towards foundation models for knowledge graph reasoning.arXiv preprint arXiv:2310.04562,

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. Towards foundation models for knowledge graph reasoning.arXiv preprint arXiv:2310.04562,

work page arXiv

[3] [3]

You only transfer what you share: Intersection-induced graph transfer learning for link prediction.arXiv preprint arXiv:2302.14189,

Wenqing Zheng, Edward W Huang, Nikhil Rao, Zhangyang Wang, and Karthik Subbian. You only transfer what you share: Intersection-induced graph transfer learning for link prediction.arXiv preprint arXiv:2302.14189,

work page arXiv

[4] [4]

Zhang, X

Duo Zhang, Xinzijian Liu, Xiangyu Zhang, Chengqian Zhang, Chun Cai, Hangrui Bi, Yiming Du, Xuejian Qin, Jiameng Huang, Bowen Li, et al. Dpa-2: Towards a universal large atomic model for molecular and material simulation.arXiv preprint arXiv:2312.15492,

work page arXiv

[5] [5]

Zero-shot logical query reasoning on any knowledge graph.arXiv preprint arXiv:2404.07198,

Mikhail Galkin, Jincheng Zhou, Bruno Ribeiro, Jian Tang, and Zhaocheng Zhu. Zero-shot logical query reasoning on any knowledge graph.arXiv preprint arXiv:2404.07198,

work page arXiv

[6] [6]

Graphany: A foundation model for node classification on any graph.arXiv preprint arXiv:2405.20445, 2024a

Jianan Zhao, Hesham Mostafa, Michael Galkin, Michael Bronstein, Zhaocheng Zhu, and Jian Tang. Graphany: A foundation model for node classification on any graph.arXiv preprint arXiv:2405.20445, 2024a. Divyansha Lachi, Mehdi Azabou, Vinam Arora, and Eva Dyer. Graphfm: A scalable framework for multi-graph pretraining.arXiv preprint arXiv:2407.11907,

work page arXiv

[7] [7]

Cross- domain graph data scaling: A showcase with diffusion models.arXiv preprint arXiv:2406.01899, 2024a

Wenzhuo Tang, Haitao Mao, Danial Dervovic, Ivan Brugere, Saumitra Mishra, Yuying Xie, and Jiliang Tang. Cross- domain graph data scaling: A showcase with diffusion models.arXiv preprint arXiv:2406.01899, 2024a. Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Revisiting semi-supervised learning with graph embeddings. InInternational conference on mach...

work page arXiv

[8] [8]

Fatemi, J

Bahare Fatemi, Jonathan Halcrow, and Bryan Perozzi. Talk like a graph: Encoding graphs for large language models. arXiv preprint arXiv:2310.04560,

work page arXiv

[9] [9]

One for all: Towards training one graph model for all classification tasks.arXiv preprint arXiv:2310.00149, 2023

Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, and Muhan Zhang. One for all: Towards training one graph model for all classification tasks.arXiv preprint arXiv:2310.00149, 2023a. Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language mod...

work page arXiv

[10] [10]

Samgpt: Text-free graph foundation model for multi-domain pre-training and cross-domain adaptation

Xingtong Yu, Zechuan Gong, Chang Zhou, Yuan Fang, and Hui Zhang. Samgpt: Text-free graph foundation model for multi-domain pre-training and cross-domain adaptation. InProceedings of the ACM on Web Conference 2025, pages 1142–1153, 2025a. Xingbo Fu, Yinhan He, and Jundong Li. Edge prompt tuning for graph neural networks. InThe Thirteenth International Conf...

work page 2025

[11] [11]

Xingtong Yu, Chang Zhou, Zhongwei Kuai, Xinming Zhang, and Yuan Fang

URLhttps://openreview.net/forum?id=92vMaHotTM. Xingtong Yu, Chang Zhou, Zhongwei Kuai, Xinming Zhang, and Yuan Fang. Gcot: Chain-of-thought prompt learning for graphs.arXiv preprint arXiv:2502.08092, 2025b. Shuo Wang, Bokui Wang, Zhixiang Shen, Boyan Deng, and Zhao Kang. Multi-domain graph foundation models: Robust knowledge transfer via topology alignmen...

work page arXiv

[12] [12]

Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization.arXiv preprint arXiv:1908.01000,

Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization.arXiv preprint arXiv:1908.01000,

work page arXiv 1908

[13] [13]

Simgrace: A simple framework for graph contrastive learning without data augmentation

Jun Xia, Lirong Wu, Jintao Chen, Bozhen Hu, and Stan Z Li. Simgrace: A simple framework for graph contrastive learning without data augmentation. InProceedings of the ACM web conference 2022, pages 1070–1079,

work page 2022

[14] [14]

Graphmae2: A decoding-enhanced masked self-supervised graph learner

Zhenyu Hou, Yufei He, Yukuo Cen, Xiao Liu, Yuxiao Dong, Evgeny Kharlamov, and Jie Tang. Graphmae2: A decoding-enhanced masked self-supervised graph learner. InProceedings of the ACM web conference 2023, pages 737–746,

work page 2023

[15] [15]

Graph pooling for graph neural networks: Progress, challenges, and opportunities.arXiv preprint arXiv:2204.07321,

Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan. All in one: Multi-task prompting for graph neural networks. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2120–2131, 2023b. Zemin Liu, Xingtong Yu, Yuan Fang, and Xinming Zhang. Graphprompt: Unifying pre-training and downstream tasks for graph neural...

work page arXiv 2023

[16] [16]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Gft: Graph foundation model with transferable tree vocabulary.Advances in Neural Information Processing Systems, 37:107403–107443, 2024b

12 arXivTemplateA PREPRINT Zehong Wang, Zheyuan Zhang, Nitesh Chawla, Chuxu Zhang, and Yanfang Ye. Gft: Graph foundation model with transferable tree vocabulary.Advances in Neural Information Processing Systems, 37:107403–107443, 2024b. Li Sun, Zhenhao Huang, Suyang Zhou, Qiqi Wan, Hao Peng, and Philip Yu. Riemanngfm: Learning a graph foundation model fro...

work page 2025

[18] [18]

Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann

Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. Tudataset: A collection of benchmark datasets for learning with graphs. InICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020),

work page 2020

[19] [19]

Graver: Generative graph vocabularies for robust graph foundation models fine-tuning.arXiv preprint arXiv:2511.05592, 2025b

Haonan Yuan, Qingyun Sun, Junhua Shi, Xingcheng Fu, Bryan Hooi, Jianxin Li, and Philip S Yu. Graver: Generative graph vocabularies for robust graph foundation models fine-tuning.arXiv preprint arXiv:2511.05592, 2025b. Xingtong Yu, Chang Zhou, Yuan Fang, and Xinming Zhang. Multigprompt for multi-task pre-training and prompting on graphs. InProceedings of t...

work page arXiv 2024

[20] [20]

pre-train and adaptation

13 arXivTemplateA PREPRINT A Notations Symbol Meaning Symbol Meaning x a real number x a vector X a matrix In identity matrix of sizen×n G a graph g vector representation ofG G a set of graphs D a dataset ∥x∥ the Euclidean norm ofx ∥x∥1 theℓ 1 norm ofx [M] the set{1,2, . . . , M} X∥Yor[X,Y] vertical concatenation ∥X∥F Frobenius norm of matrix ∥X∥2 spectra...

work page 2025

[21] [21]

These graph-level models are usually small and not general

adopts a meta-learning approach to learn model initialization for few-shot graph classification. These graph-level models are usually small and not general. Some GFMs can be adapted to graph-level tasks. For 14 arXivTemplateA PREPRINT instance, GraphPrompt [Liu et al., 2023b], GraphPrompt+ [Yu et al., 2024b], and EdgePrompt [Fu et al., 2025] use learnable...

work page 2025

[22] [22]

The output of the layer isS, for which we have ∥S−S ′∥2 = vuut NX i=1 RX j=1 |sij −s ′ ij|2 ≤4 r θ n vuut NX i=1 RX j=1 ∥Hi −H ′ i∥2 F = 4 r θR n ∥H−H ′∥F This finished the proof

≤ |x−y| for any x, y≥0 , (b) holds due to the triangle inequality, and (c) holds by the Cauchy–Schwarz inequality. The output of the layer isS, for which we have ∥S−S ′∥2 = vuut NX i=1 RX j=1 |sij −s ′ ij|2 ≤4 r θ n vuut NX i=1 RX j=1 ∥Hi −H ′ i∥2 F = 4 r θR n ∥H−H ′∥F This finished the proof. Lemma E.5.Suppose the GIN f has Q layers and each layer has an...

work page 2017

[23] [23]

Moreover, LF is not very sensitive to γ, which is learned adaptively

• Since LF scales with O(√γR), we could use a relatively large R to enrich the final vector representation for each graph, thereby improving the expressiveness. Moreover, LF is not very sensitive to γ, which is learned adaptively. 21 arXivTemplateA PREPRINT Table 6: Dataset Statistics. Dataset Domain #Graphs #Avg.Nodes #Features #Classes Task ENZYMES Bioi...

work page 2000

[24] [24]

The Gaussian kernel parameter γ in the reference layer employs a separate learning rate α2 = 0.1

In the pre-training stage, all modules are optimized using Adam optimizer [Kinga et al., 2015] with fixed learning rate α1 = 0.0005 and a weight decay factor of 10−5, trained for 50 epochs. The Gaussian kernel parameter γ in the reference layer employs a separate learning rate α2 = 0.1. The batch size for all datasets is fixed to

work page 2015

[25] [25]

Regarding data splitting, we randomly choose 50 graphs in each class for training, and the remaining samples are used for testing

Few-shot learning settingsIn the downstream tasks of few-shot graph classification, the classifier is a softmax classifier, which follows the setting in EdgePrompt [Fu et al., 2025]. Regarding data splitting, we randomly choose 50 graphs in each class for training, and the remaining samples are used for testing. The number of epochs is set to 500, and the...

work page 2025

[26] [26]

H.4 The Impact of Nyström Approximation To address the computational complexity associated with large-scale graphs, we employ the Nyström approximation during pre-training

This adaptation demonstrates that GraphVec-FM can also be effectively extended to node-level tasks. H.4 The Impact of Nyström Approximation To address the computational complexity associated with large-scale graphs, we employ the Nyström approximation during pre-training. To systematically evaluate its impact, we pre-train GraphVec-FM using the Nyström me...

work page 2000

[27] [27]

We can see that with more datasets used in pre-training, the performance in downstream tasks becomes better. This result indicates that the generalization ability of graph embeddings generated by our GraphVec-FM can benefit 24 arXivTemplateA PREPRINT Table 8: 50-shot graph classification performance comparison with different pre-trained models. We color t...

work page 2025

[28] [28]

The comparison numbers are from AMGC [Yang et al., 2025]

We also provide the full version of Table 2 with NMI in Table 13 Table 12: Graph clustering results on PTC-MM, MUTAG, COX2 and BZR Dataset PTC-MM MUTAG COX2 BZR ACC NMI ARI ACC NMI ARI ACC NMI ARI ACC NMI ARI GraphCL+SC62.09±0.56 2.14±0.43 3.36±0.87 73.22±2.6632.19±2.05 23.44±2.45 75.01±2.12 1.24±0.372.39±2.28 72.88±1.661.90±0.383.47±0.59GWF [Xu et al., 2...

work page 2022