GraphVec: Cross-Domain Graph Vectorization for Graph-Level Representation Learning
Pith reviewed 2026-05-16 07:23 UTC · model grok-4.3
The pith
GraphVec turns graphs from unrelated domains into fixed vectors by aligning spectral features from multi-scale global graphs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GraphVec constructs multi-scale global graphs over all nodes in each dataset and extracts spectral embeddings to obtain domain-agnostic relational features. A density-maximization mean alignment algorithm over orthogonal transformations is introduced and shown to converge monotonically, rendering the spectral features comparable across domains. The model combines a GIN-Graph Transformer backbone with a multi-layer reference distribution module that preserves node-level distributional information beyond standard pooling, and a generalization error bound is derived for the resulting architecture.
What carries the argument
Density-maximization mean alignment over orthogonal transformations applied to spectral embeddings extracted from multi-scale global graphs, which produces comparable fixed-dimensional vectors while discarding original node-attribute semantics.
Load-bearing premise
Spectral embeddings from the multi-scale global graphs remain informative enough that the alignment step can produce truly comparable features even when node attributes differ completely in semantics and availability across domains.
What would settle it
An experiment that removes or randomizes all node attributes and shows that cross-domain few-shot classification accuracy falls to the level of random guessing or unaligned baselines would falsify the claim that the spectral features plus alignment suffice for transfer.
Figures
read the original abstract
Learning universal graph representations across heterogeneous domains is difficult because graph datasets differ in topology, node-attribute semantics, feature dimensions, and even attribute availability. We propose GraphVec, a language-model-free graph vectorization model that maps diverse graphs into transferable fixed-dimensional embeddings for graph-level tasks. Instead of directly using incomparable raw node attributes, GraphVec constructs multi-scale global graphs over all nodes in each dataset and extracts spectral embeddings to obtain domain-agnostic relational features. To make these spectral features comparable across datasets, we introduce a density-maximization mean alignment algorithm over orthogonal transformations and prove its monotonic convergence. GraphVec further combines a GIN--Graph Transformer backbone with a multi-layer reference distribution module, which preserves node-level distributional information beyond standard pooling. We also provide a generalization error bound for the proposed model. Experiments on 13 datasets with more than 15 comparison methods demonstrate that GraphVec consistently outperforms strong graph pretraining baselines in cross-domain few-shot graph classification and graph clustering. Beyond graph-level tasks, GraphVec also yields strong node-level representations, achieving competitive performance on few-shot node classification against representative graph prompt learning methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes GraphVec, a language-model-free graph vectorization model that constructs multi-scale global graphs per dataset to extract spectral embeddings as domain-agnostic relational features, applies a density-maximization mean alignment over orthogonal transformations (with a claimed monotonic convergence proof) to enable cross-domain comparability, and combines a GIN-Graph Transformer backbone with a multi-layer reference distribution module. It provides a generalization error bound and reports that the resulting fixed-dimensional embeddings outperform strong graph pretraining baselines in cross-domain few-shot graph classification and clustering across 13 datasets, while also yielding competitive node-level representations.
Significance. If the spectral embeddings and alignment procedure reliably preserve task-relevant signal across domains with incompatible node-attribute semantics and availability, the approach could advance universal, transferable graph representations without relying on language models or raw feature alignment. The provision of a convergence proof and generalization bound strengthens the theoretical contribution relative to purely empirical cross-domain methods.
major comments (3)
- [Abstract / Method] Abstract and method description: The generalization error bound does not address potential information loss from the density-maximization alignment step when original node attributes differ in semantics or are absent; this is load-bearing for the cross-domain transfer claim, as the bound appears to treat the aligned spectral features as given without bounding the alignment-induced distortion.
- [Alignment Algorithm] Alignment procedure: The monotonic convergence proof for the density-maximization mean alignment over orthogonal transformations is presented as guaranteeing comparability, but it is unclear whether the procedure preserves downstream-task-relevant structure (as opposed to merely increasing density) when input graphs have heterogeneous topologies and attribute distributions; this assumption underpins the claim that gains come from the vectorization rather than the GIN-Graph Transformer backbone.
- [Experiments] Experiments: The reported outperformance on 13 datasets against >15 baselines requires explicit ablation isolating the contribution of the spectral vectorization and alignment from the reference distribution module and backbone; without such controls, it remains possible that post-hoc dataset or hyperparameter choices drive the gains rather than the proposed components.
minor comments (2)
- [Method] Clarify the precise construction of the multi-scale global graphs (e.g., how scales are chosen and how the graph is built from all nodes) to ensure reproducibility.
- [Experiments] The abstract mentions 'more than 15 comparison methods' but the experimental section should list them explicitly with citations and note which are graph pretraining baselines versus others.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment point by point below. Where the comments identify areas for clarification or additional controls, we will revise the manuscript accordingly. Our responses focus on strengthening the theoretical grounding and experimental rigor without altering the core claims.
read point-by-point responses
-
Referee: [Abstract / Method] Abstract and method description: The generalization error bound does not address potential information loss from the density-maximization alignment step when original node attributes differ in semantics or are absent; this is load-bearing for the cross-domain transfer claim, as the bound appears to treat the aligned spectral features as given without bounding the alignment-induced distortion.
Authors: We agree that the generalization bound is stated for the model after alignment and does not explicitly bound distortion introduced by the alignment step itself. The bound focuses on the downstream GIN-Graph Transformer with reference distribution module operating on the aligned embeddings. Because the alignment uses orthogonal transformations (isometries), intra-domain distances and norms are preserved, but cross-domain semantic shifts are not theoretically quantified in the current bound. In the revision we will add a dedicated paragraph in the theoretical section acknowledging this limitation and framing it as an avenue for future work on distortion-aware bounds. The empirical cross-domain results remain the primary support for transferability. revision: partial
-
Referee: [Alignment Algorithm] Alignment procedure: The monotonic convergence proof for the density-maximization mean alignment over orthogonal transformations is presented as guaranteeing comparability, but it is unclear whether the procedure preserves downstream-task-relevant structure (as opposed to merely increasing density) when input graphs have heterogeneous topologies and attribute distributions; this assumption underpins the claim that gains come from the vectorization rather than the GIN-Graph Transformer backbone.
Authors: The convergence proof establishes monotonic increase in the density objective under orthogonal transformations, which are distance-preserving isometries; thus relative structure within each domain's spectral embedding is retained while the means are aligned to maximize overlap. We will augment the alignment section with a short analysis (including a controlled synthetic experiment) demonstrating that task-relevant metrics such as class separability are not degraded post-alignment. To isolate the vectorization contribution from the backbone, we will also expand the existing ablation tables to report performance with and without the alignment step while keeping the backbone fixed. revision: partial
-
Referee: [Experiments] Experiments: The reported outperformance on 13 datasets against >15 baselines requires explicit ablation isolating the contribution of the spectral vectorization and alignment from the reference distribution module and backbone; without such controls, it remains possible that post-hoc dataset or hyperparameter choices drive the gains rather than the proposed components.
Authors: We accept that the current ablation presentation can be made more explicit. The manuscript already contains component-wise ablations, but we will reorganize them into a single dedicated subsection that systematically removes (i) the spectral vectorization, (ii) the alignment procedure, and (iii) the reference distribution module while holding the GIN-Graph Transformer backbone constant. Additional controls for hyperparameter sensitivity across the 13 datasets will be included. These revisions will be presented with the same evaluation protocol to directly address the concern. revision: yes
Circularity Check
Derivation chain self-contained; no reductions to fitted inputs or self-citations
full rationale
The paper defines GraphVec via explicit algorithmic steps: multi-scale global graph construction followed by spectral embedding extraction, a separately proven density-maximization alignment over orthogonal transformations, a GIN-Graph Transformer backbone augmented by a reference distribution module, and an independent generalization error bound. None of these steps are shown to reduce by construction to parameters fitted on the evaluation data or to prior self-citations that carry the uniqueness claim. The experimental results are presented as downstream validation rather than as the definitional basis for the method itself.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Spectral embeddings of multi-scale global graphs yield domain-agnostic relational features
- standard math The density-maximization mean alignment over orthogonal transformations converges monotonically
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
constructs multi-scale global graphs ... extracts spectral embeddings ... density-maximization mean alignment algorithm over orthogonal transformations and prove its monotonic convergence
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 4.1 ... L({R(t)_j}) is non-decreasing ... R(t)_j - R(t-1)_j → 0
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A family of tractable graph distances
Jose Bento and Stratis Ioannidis. A family of tractable graph distances. InProceedings of the 2018 SIAM International Conference on Data Mining, pages 333–341. SIAM,
work page 2018
-
[2]
Towards foundation models for knowledge graph reasoning.arXiv preprint arXiv:2310.04562,
Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, and Zhaocheng Zhu. Towards foundation models for knowledge graph reasoning.arXiv preprint arXiv:2310.04562,
-
[3]
Wenqing Zheng, Edward W Huang, Nikhil Rao, Zhangyang Wang, and Karthik Subbian. You only transfer what you share: Intersection-induced graph transfer learning for link prediction.arXiv preprint arXiv:2302.14189,
- [4]
-
[5]
Zero-shot logical query reasoning on any knowledge graph.arXiv preprint arXiv:2404.07198,
Mikhail Galkin, Jincheng Zhou, Bruno Ribeiro, Jian Tang, and Zhaocheng Zhu. Zero-shot logical query reasoning on any knowledge graph.arXiv preprint arXiv:2404.07198,
-
[6]
Jianan Zhao, Hesham Mostafa, Michael Galkin, Michael Bronstein, Zhaocheng Zhu, and Jian Tang. Graphany: A foundation model for node classification on any graph.arXiv preprint arXiv:2405.20445, 2024a. Divyansha Lachi, Mehdi Azabou, Vinam Arora, and Eva Dyer. Graphfm: A scalable framework for multi-graph pretraining.arXiv preprint arXiv:2407.11907,
-
[7]
Wenzhuo Tang, Haitao Mao, Danial Dervovic, Ivan Brugere, Saumitra Mishra, Yuying Xie, and Jiliang Tang. Cross- domain graph data scaling: A showcase with diffusion models.arXiv preprint arXiv:2406.01899, 2024a. Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Revisiting semi-supervised learning with graph embeddings. InInternational conference on mach...
- [8]
-
[9]
Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, and Muhan Zhang. One for all: Towards training one graph model for all classification tasks.arXiv preprint arXiv:2310.00149, 2023a. Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language mod...
-
[10]
Samgpt: Text-free graph foundation model for multi-domain pre-training and cross-domain adaptation
Xingtong Yu, Zechuan Gong, Chang Zhou, Yuan Fang, and Hui Zhang. Samgpt: Text-free graph foundation model for multi-domain pre-training and cross-domain adaptation. InProceedings of the ACM on Web Conference 2025, pages 1142–1153, 2025a. Xingbo Fu, Yinhan He, and Jundong Li. Edge prompt tuning for graph neural networks. InThe Thirteenth International Conf...
work page 2025
-
[11]
Xingtong Yu, Chang Zhou, Zhongwei Kuai, Xinming Zhang, and Yuan Fang
URLhttps://openreview.net/forum?id=92vMaHotTM. Xingtong Yu, Chang Zhou, Zhongwei Kuai, Xinming Zhang, and Yuan Fang. Gcot: Chain-of-thought prompt learning for graphs.arXiv preprint arXiv:2502.08092, 2025b. Shuo Wang, Bokui Wang, Zhixiang Shen, Boyan Deng, and Zhao Kang. Multi-domain graph foundation models: Robust knowledge transfer via topology alignmen...
-
[12]
Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, and Jian Tang. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization.arXiv preprint arXiv:1908.01000,
-
[13]
Simgrace: A simple framework for graph contrastive learning without data augmentation
Jun Xia, Lirong Wu, Jintao Chen, Bozhen Hu, and Stan Z Li. Simgrace: A simple framework for graph contrastive learning without data augmentation. InProceedings of the ACM web conference 2022, pages 1070–1079,
work page 2022
-
[14]
Graphmae2: A decoding-enhanced masked self-supervised graph learner
Zhenyu Hou, Yufei He, Yukuo Cen, Xiao Liu, Yuxiao Dong, Evgeny Kharlamov, and Jie Tang. Graphmae2: A decoding-enhanced masked self-supervised graph learner. InProceedings of the ACM web conference 2023, pages 737–746,
work page 2023
-
[15]
Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan. All in one: Multi-task prompting for graph neural networks. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2120–2131, 2023b. Zemin Liu, Xingtong Yu, Yuan Fang, and Xinming Zhang. Graphprompt: Unifying pre-training and downstream tasks for graph neural...
-
[16]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
12 arXivTemplateA PREPRINT Zehong Wang, Zheyuan Zhang, Nitesh Chawla, Chuxu Zhang, and Yanfang Ye. Gft: Graph foundation model with transferable tree vocabulary.Advances in Neural Information Processing Systems, 37:107403–107443, 2024b. Li Sun, Zhenhao Huang, Suyang Zhou, Qiqi Wan, Hao Peng, and Philip Yu. Riemanngfm: Learning a graph foundation model fro...
work page 2025
-
[18]
Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann
Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. Tudataset: A collection of benchmark datasets for learning with graphs. InICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020),
work page 2020
-
[19]
Haonan Yuan, Qingyun Sun, Junhua Shi, Xingcheng Fu, Bryan Hooi, Jianxin Li, and Philip S Yu. Graver: Generative graph vocabularies for robust graph foundation models fine-tuning.arXiv preprint arXiv:2511.05592, 2025b. Xingtong Yu, Chang Zhou, Yuan Fang, and Xinming Zhang. Multigprompt for multi-task pre-training and prompting on graphs. InProceedings of t...
-
[20]
13 arXivTemplateA PREPRINT A Notations Symbol Meaning Symbol Meaning x a real number x a vector X a matrix In identity matrix of sizen×n G a graph g vector representation ofG G a set of graphs D a dataset ∥x∥ the Euclidean norm ofx ∥x∥1 theℓ 1 norm ofx [M] the set{1,2, . . . , M} X∥Yor[X,Y] vertical concatenation ∥X∥F Frobenius norm of matrix ∥X∥2 spectra...
work page 2025
-
[21]
These graph-level models are usually small and not general
adopts a meta-learning approach to learn model initialization for few-shot graph classification. These graph-level models are usually small and not general. Some GFMs can be adapted to graph-level tasks. For 14 arXivTemplateA PREPRINT instance, GraphPrompt [Liu et al., 2023b], GraphPrompt+ [Yu et al., 2024b], and EdgePrompt [Fu et al., 2025] use learnable...
work page 2025
-
[22]
≤ |x−y| for any x, y≥0 , (b) holds due to the triangle inequality, and (c) holds by the Cauchy–Schwarz inequality. The output of the layer isS, for which we have ∥S−S ′∥2 = vuut NX i=1 RX j=1 |sij −s ′ ij|2 ≤4 r θ n vuut NX i=1 RX j=1 ∥Hi −H ′ i∥2 F = 4 r θR n ∥H−H ′∥F This finished the proof. Lemma E.5.Suppose the GIN f has Q layers and each layer has an...
work page 2017
-
[23]
Moreover, LF is not very sensitive to γ, which is learned adaptively
• Since LF scales with O(√γR), we could use a relatively large R to enrich the final vector representation for each graph, thereby improving the expressiveness. Moreover, LF is not very sensitive to γ, which is learned adaptively. 21 arXivTemplateA PREPRINT Table 6: Dataset Statistics. Dataset Domain #Graphs #Avg.Nodes #Features #Classes Task ENZYMES Bioi...
work page 2000
-
[24]
The Gaussian kernel parameter γ in the reference layer employs a separate learning rate α2 = 0.1
In the pre-training stage, all modules are optimized using Adam optimizer [Kinga et al., 2015] with fixed learning rate α1 = 0.0005 and a weight decay factor of 10−5, trained for 50 epochs. The Gaussian kernel parameter γ in the reference layer employs a separate learning rate α2 = 0.1. The batch size for all datasets is fixed to
work page 2015
-
[25]
Few-shot learning settingsIn the downstream tasks of few-shot graph classification, the classifier is a softmax classifier, which follows the setting in EdgePrompt [Fu et al., 2025]. Regarding data splitting, we randomly choose 50 graphs in each class for training, and the remaining samples are used for testing. The number of epochs is set to 500, and the...
work page 2025
-
[26]
This adaptation demonstrates that GraphVec-FM can also be effectively extended to node-level tasks. H.4 The Impact of Nyström Approximation To address the computational complexity associated with large-scale graphs, we employ the Nyström approximation during pre-training. To systematically evaluate its impact, we pre-train GraphVec-FM using the Nyström me...
work page 2000
-
[27]
We can see that with more datasets used in pre-training, the performance in downstream tasks becomes better. This result indicates that the generalization ability of graph embeddings generated by our GraphVec-FM can benefit 24 arXivTemplateA PREPRINT Table 8: 50-shot graph classification performance comparison with different pre-trained models. We color t...
work page 2025
-
[28]
The comparison numbers are from AMGC [Yang et al., 2025]
We also provide the full version of Table 2 with NMI in Table 13 Table 12: Graph clustering results on PTC-MM, MUTAG, COX2 and BZR Dataset PTC-MM MUTAG COX2 BZR ACC NMI ARI ACC NMI ARI ACC NMI ARI ACC NMI ARI GraphCL+SC62.09±0.56 2.14±0.43 3.36±0.87 73.22±2.6632.19±2.05 23.44±2.45 75.01±2.12 1.24±0.372.39±2.28 72.88±1.661.90±0.383.47±0.59GWF [Xu et al., 2...
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.