pith. sign in

arxiv: 2602.11629 · v2 · pith:6O6BJQ5Znew · submitted 2026-02-12 · 💻 cs.LG

GP2F: Cross-Domain Graph Prompting with Adaptive Fusion of Pre-trained Graph Neural Networks

Pith reviewed 2026-05-25 07:32 UTC · model grok-4.3

classification 💻 cs.LG
keywords graph prompt learningcross-domain adaptationgraph neural networksadaptive fusionfew-shot classificationpre-trained modelsnode classificationgraph classification
0
0 comments X

The pith

In cross-domain graph prompt learning, fusing a frozen pre-trained branch with an adapted branch produces smaller estimation error than either alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates through theoretical analysis that cross-domain graph prompt learning works because it combines retained pre-trained knowledge with task-specific adaptation, yielding lower estimation error than full fine-tuning or linear probing alone. This matters for real applications where pre-training data and downstream tasks come from different distributions. The authors instantiate the idea in GP2F by running a frozen branch and an adapted branch in parallel, then fusing them adaptively with contrastive and topology-consistent losses. Experiments show the resulting method outperforms prior GPL approaches on few-shot node and graph classification under domain shift.

Core claim

Jointly leveraging the frozen pre-trained branch and the adapted branch yields a smaller estimation error than using either branch alone, formally proving that cross-domain GPL benefits from the integration between pre-trained knowledge and task-specific adaptation. GP2F realizes this dual-branch design and performs adaptive fusion under topology constraints via a contrastive loss and a topology-consistent loss.

What carries the argument

Dual-branch architecture consisting of a frozen pre-trained branch and an adapted branch with lightweight adapters, fused adaptively under topology constraints.

If this is right

  • GP2F outperforms existing GPL methods on cross-domain few-shot node classification.
  • GP2F outperforms existing GPL methods on cross-domain few-shot graph classification.
  • The integration of pre-trained knowledge and task-specific adaptation reduces estimation error under domain shift.
  • Adaptive fusion with contrastive and topology-consistent losses preserves structural information during prompting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dual-branch complementarity principle could be tested on non-graph pre-trained models facing domain shifts.
  • If the error reduction holds, optimal prompting strategies may involve explicit balancing of retention versus adaptation rather than pure fine-tuning.
  • Different fusion mechanisms could be substituted for the contrastive and topology losses while preserving the core error-reduction benefit.

Load-bearing premise

The frozen pre-trained branch and the adapted branch are complementary such that their adaptive fusion under topology constraints produces smaller estimation error than either branch alone.

What would settle it

An experiment in which the fused GP2F model exhibits estimation error no smaller than that of the frozen branch alone or the adapted branch alone on cross-domain few-shot tasks would falsify the central claim.

Figures

Figures reproduced from arXiv: 2602.11629 by Di Jin, Dongxiao He, Jitao Zhao, Wenxuan Sun, Yongqi Huang.

Figure 1
Figure 1. Figure 1: In cross-domain 1-shot scenario, where the GNN is pre-trained on Cora using GRACE, competitive performance is observed between representative GPL methods and two strong baselines, full Fine-Tuning (FT) and Linear Probing (LP). 1. Introduction Graph Prompt Learning (GPL) (Zi et al., 2024; Huang et al., 2025) has emerged as an effective and promising paradigm for bridging graph pre-training and downstream ta… view at source ↗
Figure 2
Figure 2. Figure 2: Overall framework of the proposed GP2F. Given a graph in target domain DT and a encoder pre-trained on source domain DS, P roj(·) is a MLP used for dimension alignment. The frozen branch consists of a pre-trained GNN to preserve universal knowledge, while the adapted branch uses learnable adapters A for downstream adaptation. Additionally, a contrastive loss Lctr is used to align the two branches, and a BC… view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy of 3-shot and 5-shot cross-domain node classification experiments using GRACE for pre-training [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Analysis of key components in GP2F via 1-shot node classification pre-trained with GRACE [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Hyperparameter analysis of r for 1-shot node classifica￾tion with GRACE pre-trained. removes both Lctr and Lfus; (2) w/o Lctr, which removes Lctr; (3) w/o Lfus, which removes Lfus; and (4) Prompt Only, which keeps only the adapted branch. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Accuracy of 3-shot and 5-shot cross-domain node classification experiments using GRACE for pre-training. 4 8 16 32 64 r 57.0 58.5 60.0 Accuracy (%) Cora 4 8 16 32 64 r 47.0 48.5 50.0 CiteSeer 4 8 16 32 64 r 66.0 67.5 69.0 Photo 4 8 16 32 64 r 45.0 46.5 48.0 WikiCS [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Hyperparameter analysis of r for 1-shot node classification with DGI pre-trained. uses subgraph-level prompts to guide representation aggregation. GraphControl (Zhu et al., 2024) introduces ControlNet to balance generalization and adaptation. DAGPrompt (Chen et al., 2025) applies LoRA-based tuning and performs layer-wise prediction based on similarities between subgraph embeddings and learnable class proto… view at source ↗
Figure 8
Figure 8. Figure 8: Hyperparameter analysis of r for 1-shot node classification with GraphMAE pre-trained. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
read the original abstract

Graph Prompt Learning (GPL) has recently emerged as a promising paradigm for downstream adaptation of pre-trained graph models, mitigating the misalignment between pre-training objectives and downstream tasks. Recently, the focus of GPL has shifted from in-domain to cross-domain scenarios, which is closer to the real world applications, where the pre-training source and downstream target often differ substantially in data distribution. However, why GPLs remain effective under such domain shifts is still unexplored. Empirically, we observe that representative GPL methods are competitive with two simple baselines in cross-domain settings: full fine-tuning (FT) and linear probing (LP), motivating us to explore a deeper understanding of the prompting mechanism. We provide a theoretical analysis demonstrating that jointly leveraging these two complementary branches yields a smaller estimation error than using either branch alone, formally proving that cross-domain GPL benefits from the integration between pre-trained knowledge and task-specific adaptation. Based on this insight, we propose GP2F, a dual-branch GPL method that explicitly instantiates the two extremes: (1) a frozen branch that retains pre-trained knowledge, and (2) an adapted branch with lightweight adapters for task-specific adaptation. We then perform adaptive fusion under topology constraints via a contrastive loss and a topology-consistent loss. Extensive experiments on cross-domain few-shot node and graph classification demonstrate that our method outperforms existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces GP2F, a dual-branch graph prompt learning method for cross-domain settings. One branch freezes a pre-trained GNN to retain knowledge while the second uses lightweight adapters for task-specific adaptation; the branches are fused adaptively via a contrastive loss and a topology-consistent loss. The central claim is a theoretical analysis proving that the fused estimator achieves strictly smaller estimation error than either branch alone, with supporting experiments on cross-domain few-shot node and graph classification tasks showing outperformance over prior GPL methods.

Significance. If the error-bound derivation is rigorous and explicitly handles domain shift, the work would supply a principled account of why GPL remains effective across domains and a concrete dual-branch construction that operationalizes the pre-trained/adapted complementarity. The empirical evaluation on few-shot cross-domain tasks is a positive element. No machine-checked proofs or open code are mentioned.

major comments (2)
  1. [Theoretical analysis] Theoretical analysis section: the claimed proof that joint use of the frozen and adapted branches yields smaller estimation error than either alone must incorporate the distribution shift between pre-training and downstream graphs when bounding the fused estimator. If the derivation treats branch errors as uncorrelated without explicit justification under domain shift, or if the topology-consistent loss appears only heuristically rather than inside the error bound, the formal guarantee does not follow for the cross-domain regime targeted by the paper.
  2. [Method] § on method / fusion: the adaptive fusion step is presented as instantiating the theoretical complementarity, yet it is unclear whether the contrastive and topology-consistent losses are derived from or directly used to tighten the estimation-error bound; without this link the central claim that the construction provably reduces error rests on an unverified assumption.
minor comments (2)
  1. [Abstract] Abstract: states that representative GPL methods are competitive with FT and LP but supplies no quantitative metrics or table references, reducing immediate clarity.
  2. Notation for the two branches and the fusion weights should be introduced once with consistent symbols across the theoretical and experimental sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. Below we respond point by point to the major comments, offering clarifications on the theoretical analysis and its relation to the proposed method while indicating where revisions will strengthen the presentation.

read point-by-point responses
  1. Referee: [Theoretical analysis] Theoretical analysis section: the claimed proof that joint use of the frozen and adapted branches yields smaller estimation error than either alone must incorporate the distribution shift between pre-training and downstream graphs when bounding the fused estimator. If the derivation treats branch errors as uncorrelated without explicit justification under domain shift, or if the topology-consistent loss appears only heuristically rather than inside the error bound, the formal guarantee does not follow for the cross-domain regime targeted by the paper.

    Authors: We appreciate the referee's emphasis on rigor for the cross-domain setting. The error-bound derivation expresses the fused estimator's error in terms of the divergence between pre-training and downstream distributions, thereby incorporating domain shift. The low-correlation assumption between branches follows from the frozen branch retaining distribution-invariant features while the adapted branch captures task-specific signals; this separation is formalized via the adaptive fusion weights. The topology-consistent loss is not purely heuristic: it appears inside the bound as a regularizer that reduces the variance term of the fused estimator. To make these steps fully explicit, we will revise the theoretical section with additional intermediate inequalities that highlight the domain-shift term and the role of the topology loss within the final bound. revision: partial

  2. Referee: [Method] § on method / fusion: the adaptive fusion step is presented as instantiating the theoretical complementarity, yet it is unclear whether the contrastive and topology-consistent losses are derived from or directly used to tighten the estimation-error bound; without this link the central claim that the construction provably reduces error rests on an unverified assumption.

    Authors: We agree that an explicit bridge between the losses and the bound would strengthen the central claim. The contrastive loss encourages diversity between branches (supporting the low-correlation premise of the proof) while the topology-consistent loss enforces the consistency condition used to bound the fused variance. Although the losses are not algebraically derived from the bound, they are constructed to satisfy the assumptions under which the bound guarantees error reduction. We will add a short subsection in the method that maps each loss term to the corresponding term in the theoretical bound, thereby removing any appearance of an unverified assumption. revision: yes

Circularity Check

0 steps flagged

No circularity detected; theoretical claim asserted without inspectable equations

full rationale

The abstract asserts a theoretical analysis proving that dual-branch fusion yields strictly smaller estimation error than either branch alone, but supplies no equations, bounds, or derivation steps. No load-bearing mathematical reduction (self-definitional, fitted-input-as-prediction, or self-citation chain) can be quoted or exhibited. The paper's central claim therefore cannot be shown to collapse to its inputs by construction from the provided text, satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone provides no identifiable free parameters, axioms, or invented entities; any hyperparameters for losses or adapters are not specified.

pith-pipeline@v0.9.0 · 5784 in / 1038 out tokens · 28087 ms · 2026-05-25T07:32:15.406609+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 2 internal anchors

  1. [1]

    Dagprompt: Pushing the limits of graph prompting with a distribution- aware graph prompt tuning approach

    Chen, Q., Wang, L., Zheng, B., and Song, G. Dagprompt: Pushing the limits of graph prompting with a distribution- aware graph prompt tuning approach. InProceedings of the ACM on Web Conference 2025, pp. 4346–4358,

  2. [2]

    One Prompt Fits All: Universal Graph Adaptation for Pretrained models.arXiv preprint arXiv:2509.22416,

    Huang, Y ., Zhao, J., He, D., Wang, X., Li, Y ., Huang, Y ., Jin, D., and Feng, Z. One Prompt Fits All: Universal Graph Adaptation for Pretrained models.arXiv preprint arXiv:2509.22416,

  3. [3]

    Graphprompt: Uni- fying pre-training and downstream tasks for graph neural networks

    Liu, Z., Yu, X., Fang, Y ., and Zhang, X. Graphprompt: Uni- fying pre-training and downstream tasks for graph neural networks. InProceedings of the ACM web conference 2023, pp. 417–428,

  4. [4]

    and Cangea, C

    Mernyei, P. and Cangea, C. Wiki-cs: A wikipedia-based benchmark for graph neural networks.arXiv preprint arXiv:2007.02901,

  5. [5]

    TUDataset: A collection of benchmark datasets for learning with graphs

    Morris, C., Kriege, N. M., Bause, F., Kersting, K., Mutzel, P., and Neumann, M. Tudataset: A collection of benchmark datasets for learning with graphs.CoRR, abs/2007.08663,

  6. [6]

    Pitfalls of Graph Neural Network Evaluation

    Shchur, O., Mumme, M., Bojchevski, A., and G¨unnemann, S. Pitfalls of graph neural network evaluation.arXiv preprint arXiv:1811.05868,

  7. [7]

    Hgprompt: Bridg- ing homogeneous and heterogeneous graphs for few-shot prompt learning

    Yu, X., Fang, Y ., Liu, Z., and Zhang, X. Hgprompt: Bridg- ing homogeneous and heterogeneous graphs for few-shot prompt learning. InProceedings of the AAAI conference on artificial intelligence, volume 38, pp. 16578–16586, 2024a. Yu, X., Zhou, C., Fang, Y ., and Zhang, X. Text-free multi- domain graph pre-training: Toward graph foundation models.arXiv pre...

  8. [8]

    Deep graph contrastive representation learning.arXiv preprint arXiv:2006.04131,

    Zhu, Y ., Xu, Y ., Yu, F., Liu, Q., Wu, S., and Wang, L. Deep graph contrastive representation learning.arXiv preprint arXiv:2006.04131,

  9. [9]

    Graphcontrol: Adding conditional control to universal graph pre-trained models for graph domain transfer learn- ing

    Zhu, Y ., Wang, Y ., Shi, H., Zhang, Z., Jiao, D., and Tang, S. Graphcontrol: Adding conditional control to universal graph pre-trained models for graph domain transfer learn- ing. InProceedings of the ACM Web Conference 2024, pp. 539–550,

  10. [10]

    • Cora,CiteSeer,PubMed(Yang et al., 2016), andogbn-arxiv(Hu et al.,

  11. [11]

    Node features correspond to bag-of-words representations or word embeddings of the paper content

    are citation networks where nodes repre- sent scientific papers and edges denote citation relationships. Node features correspond to bag-of-words representations or word embeddings of the paper content. Labels indicate the academic topic of the paper. • Amazon Computers,Amazon Photo(Shchur et al., 2018), andogbn-products(Hu et al.,

  12. [12]

    Nodes correspond to computer science articles, and edges represent hyperlinks between them

    is a web-link graph derived from Wikipedia. Nodes correspond to computer science articles, and edges represent hyperlinks between them. Node features are averaged GloVe word embeddings of the article text. • MUTAG(Debnath et al., 1991),COX2, andBZR(Morris et al.,

  13. [13]

    are molecular graph datasets, where nodes 12 GP2F: Cross-Domain Graph Prompting with Adaptive Fusion of Pre-trained Graph Neural Networks Table 6.Statistics of node classification datasets. Datasets #Nodes #Edges #Features #Classes Cora 2,708 5,429 1,433 7 CiteSeer 3,327 4,732 3,703 6 PubMed 19,717 44,338 500 3 Computers 13,752 245,861 767 10 Photo 7,650 ...

  14. [14]

    Some work also developed to improve augmentations, such as GraphCL (You et al., 2020), as well as improve stability like SimGRACE (Xia et al., 2022)

    performs node-level contrast across augmented views using an InfoNCE loss. Some work also developed to improve augmentations, such as GraphCL (You et al., 2020), as well as improve stability like SimGRACE (Xia et al., 2022). Generative methods focus on reconstructing corrupted graphs. GraphMAE (Hou et al.,

  15. [15]

    At the same time, several works combine graph models with large language models to build graph foundation models

    improves cross-domain transfer by learning domain-invariant representations with expert-style routing mechanisms. At the same time, several works combine graph models with large language models to build graph foundation models. Methods such as ZeroG (Li et al., 2024), OFA (Liu et al., 2024), GraphCLIP (Zhu et al., 2025), and GraphGPT (Tang et al.,