pith. sign in

arxiv: 2501.05614 · v2 · submitted 2025-01-09 · 💻 cs.CR · cs.AI

Watermarking Graph Neural Networks via Explanations for Ownership Protection

Pith reviewed 2026-05-23 05:37 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords watermarkinggraph neural networksownership protectionexplanationsintellectual propertyNP-hardrobustness to attacks
0
0 comments X

The pith

Watermarking GNN explanations makes them statistically distinct for black-box ownership verification, with locating the mark proven NP-hard.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes watermarking graph neural networks by altering their explanations so the watermarked ones stand out statistically from ordinary explanations. Ownership is then confirmed by running a statistical significance test on queried explanations rather than by checking for backdoor triggers. This sidesteps the data manipulation required by earlier backdoor watermarking, removing both misclassification risks and vulnerability to poisoning attacks that erase the mark. A proof shows that, even when an adversary knows the full procedure, recovering the specific watermarked explanations is NP-hard. Tests confirm the mark survives standard removal attempts such as fine-tuning and pruning.

Core claim

By watermarking GNN explanations to be statistically distinct from others, ownership claims can be verified through statistical significance testing in a black-box setting. The approach inherits verification convenience from backdoor methods yet avoids any training-data changes, thereby eliminating ownership ambiguity and data-poisoning vulnerabilities. Theoretical analysis establishes that locating the watermark remains NP-hard with full knowledge of the method, and empirical results show resilience to fine-tuning and pruning.

What carries the argument

Statistical distinction of watermarked GNN explanations, verified by significance testing, with an NP-hardness reduction showing location is computationally intractable even under full knowledge.

If this is right

  • Ownership verification works from black-box queries alone via a statistical test on explanations.
  • No training-data manipulation is required, removing sources of misclassification and poisoning attacks.
  • The watermark resists removal by fine-tuning or pruning.
  • Even an informed attacker cannot locate the mark in polynomial time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same statistical-distinction idea could be tested on explanation methods for other model families.
  • If the NP-hardness holds in practice, it raises the bar for automated watermark removal tools.
  • Industry GNN deployments could adopt explanation queries as a lightweight ownership check without retraining.

Load-bearing premise

Explanations can be modified so that the watermarked subset becomes reliably distinguishable by a statistical test without creating new performance problems or verification ambiguities.

What would settle it

A concrete polynomial-time algorithm that, given full knowledge of the watermarking procedure, recovers the watermarked explanations with non-negligible probability on standard GNN benchmarks.

Figures

Figures reproduced from arXiv: 2501.05614 by Binghui Wang, Jane Downer, Ren Wang, Yingdan Shi, Ziyan Liu.

Figure 1
Figure 1. Figure 1: Overview of our explanation-based GNN watermarking method. During embedding, [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Effect of pruning (top) and fine-tuning (bottom) [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The probability that a randomly-chosen subgraph overlaps with a watermarked subgraph. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Watermarking metrics for varied watermarked [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Watermarking metrics for varied number of wa [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pruning GCN models [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Fine-tuned GCN models [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Pruning and fine-tuning attacks against varied number of watermarked subgraphs ( [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Pruning and fine-tuning attacks against varied sizes of watermarked subgraphs ( [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Fine-tuning results at increased learning rates (SAGE architecture). [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
read the original abstract

Graph Neural Networks (GNNs) are widely deployed in industry, making their intellectual property valuable. However, protecting GNNs from unauthorized use remains a challenge. Watermarking offers a solution by embedding ownership information into models. Existing watermarking methods have two limitations: First, they rarely focus on graph data or GNNs. Second, the de facto backdoor-based method relies on manipulating training data, which can introduce ownership ambiguity through misclassification and vulnerability to data poisoning attacks that can interrupt the backdoor mechanism. Our explanation-based watermarking inherits the strengths of backdoor-based methods (e.g., black-box verification) without data manipulation, eliminating ownership ambiguity and data dependencies. In particular, we watermark GNN explanations such that these explanations are statistically distinct from others, so ownership claims must be verified through statistical significance. We theoretically prove that, even with full knowledge of our method, locating the watermark is NP-hard. Empirically, our method demonstrates robustness to fine-tuning and pruning attacks. By addressing these challenges, our approach significantly advances GNN intellectual property protection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes an explanation-based watermarking scheme for GNNs that embeds ownership signals by altering selected explanations to make them statistically distinguishable from normal ones, enabling black-box verification via significance testing. It claims this avoids the data-manipulation and ambiguity problems of backdoor watermarking, proves that locating the watermark remains NP-hard even with full method knowledge, and reports empirical robustness to fine-tuning and pruning.

Significance. If the NP-hardness result and the statistical-distinctness mechanism hold under realistic GNN explanation variability, the work would provide a useful alternative to backdoor methods for GNN IP protection. The avoidance of training-data changes and the black-box verification property are clear strengths; however, the manuscript supplies no derivations, test statistics, or experimental tables to substantiate these claims.

major comments (3)
  1. [Abstract, §3] Abstract and §3 (method description): the central claim that watermarked explanations are statistically distinct enough for reliable significance testing is asserted without specifying the embedding procedure, the exact test statistic, or how multiple-testing across heterogeneous graphs is controlled; this is load-bearing for both the verification procedure and the claimed elimination of ownership ambiguity.
  2. [Abstract, theoretical section] Abstract and theoretical section: the NP-hardness proof for locating the watermark is stated but no reduction, hardness assumption, or proof sketch is supplied, preventing assessment of whether the result actually supports security against a fully informed adversary.
  3. [Empirical evaluation section] Empirical evaluation section: robustness to fine-tuning and pruning is claimed, yet no datasets, attack parameters, fidelity metrics, or statistical test outcomes are reported, so it is impossible to verify whether the distinctness survives the variability already present in GNNExplainer/PGExplainer outputs.
minor comments (1)
  1. [§3] Notation for the watermarked explanation set and the null distribution should be introduced once and used consistently.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify our work. We address each major comment below and will revise the manuscript to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract, §3] Abstract and §3 (method description): the central claim that watermarked explanations are statistically distinct enough for reliable significance testing is asserted without specifying the embedding procedure, the exact test statistic, or how multiple-testing across heterogeneous graphs is controlled; this is load-bearing for both the verification procedure and the claimed elimination of ownership ambiguity.

    Authors: We agree that the abstract and §3 would benefit from greater explicitness. In the revised manuscript we will describe the embedding procedure in full, state the precise test statistic employed for significance testing, and explain the multiple-testing control applied across heterogeneous graphs. revision: yes

  2. Referee: [Abstract, theoretical section] Abstract and theoretical section: the NP-hardness proof for locating the watermark is stated but no reduction, hardness assumption, or proof sketch is supplied, preventing assessment of whether the result actually supports security against a fully informed adversary.

    Authors: The theoretical section contains the NP-hardness result. To enable direct assessment we will insert a proof sketch, identify the reduction, and state the hardness assumptions in the revised manuscript. revision: yes

  3. Referee: [Empirical evaluation section] Empirical evaluation section: robustness to fine-tuning and pruning is claimed, yet no datasets, attack parameters, fidelity metrics, or statistical test outcomes are reported, so it is impossible to verify whether the distinctness survives the variability already present in GNNExplainer/PGExplainer outputs.

    Authors: We will expand the empirical evaluation section to report the datasets, attack parameters, fidelity metrics, and statistical test outcomes so that readers can verify robustness under GNN explanation variability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on independent theoretical and empirical steps

full rationale

The abstract presents the core method as watermarking explanations to achieve statistical distinctness (verified via significance testing) and states a separate theoretical proof that locating the watermark is NP-hard even with full knowledge. No equations, fitted parameters, or self-citations are shown reducing the NP-hardness result or robustness claims to the input design choices by construction. The statistical-distinctness property is introduced as an explicit design goal rather than a renamed fit or self-referential definition. Absent any quoted reduction in the provided text, the derivation chain remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are described. The method implicitly relies on standard statistical testing assumptions for verification.

pith-pipeline@v0.9.0 · 5723 in / 976 out tokens · 17748 ms · 2026-05-23T05:37:36.150039+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 2 internal anchors

  1. [1]

    Semi-Supervised Classification with Graph Convolutional Networks

    ISSN 00401706. Huang, Q., Yamada, M., Tian, Y ., Singh, D., and Chang, Y . Graphlime: Local interpretable model explanations for graph neural networks. IEEE Transactions on Knowledge and Data Engineering, 35(7):6968–6972, 2023. doi: 10.1109/TKDE.2 022.3187455. Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks, 20...

  2. [2]

    Szyller, S., Atli, B

    URL https://api.semanticscholar.org/ CorpusID:35426171. Szyller, S., Atli, B. G., Marchal, S., and Asokan, N. Dawn: Dy- namic adversarial watermarking of neural networks. In Pro- ceedings of the 29th ACM International Conference on Multi- media, MM ’21, pp. 4417–4425, New York, NY , USA, 2021. Association for Computing Machinery. ISBN 9781450386517. doi: ...

  3. [4]

    Embedding Watermarks into Deep Neural Networks

    URL http://arxiv.org/abs/1701.04082. Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., and Bengio, Y . Graph attention networks, 2018. URLhttps: //arxiv.org/abs/1710.10903. Virinchi, S. Using graph neural networks to recommend related products, 2022. URL https://www.amazon.science /blog/using-graph-neural-networks-to-rec ommend-related-pr...

  4. [5]

    IEEE, 2024. Wang, B. and Gong, N. Z. Attacking graph-based classification via manipulating the graph structure. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communi- cations Security, pp. 2023–2040, 2019. Wang, J., Wu, H., Zhang, X., and Yao, Y . Watermarking in deep neural networks via error back-propagation. Electronic Imag- ing, 202...

  5. [6]

    URL https://www.sciencedirect.com/science/ar ticle/pii/S2666651021000012

    doi: https://doi.org/10.1016/j.aiopen.2021.01.001. URL https://www.sciencedirect.com/science/ar ticle/pii/S2666651021000012. Zhou, Y ., Huo, H., Hou, Z., and Bu, F. A deep graph convo- lutional neural network architecture for graph classification. PLOS ONE, 18, 2023. URL https://api.semantic scholar.org/CorpusID:257428249. Zügner, D., Borchert, O., Akbarn...

  6. [7]

    sufficiently large

    Define subgraphs S = {𝐺𝑟 𝑎𝑛𝑑 1 , · · ·, 𝐺𝑟 𝑎𝑛𝑑 𝐷 }, where each subgraph is size 𝑛𝑠𝑢𝑏 = 𝑐𝑒𝑖𝑙 (𝑠 × |V 𝑡𝑟 |). Each subgraph 𝐺𝑟 𝑎𝑛𝑑 𝑖 is defined by randomly selecting 𝑛𝑠𝑢𝑏 nodes from V𝑡𝑟 . 𝐷 should be “sufficiently large” (𝐷 > 100) to approximate a population

  7. [8]

    Using Equation 6, collect binarized explanations, ˆe𝑟 𝑎𝑛𝑑 𝑖 , for 1 ≤ 𝑖 ≤ 𝐷

  8. [9]

    for i=1 to I simulations do Randomly select 𝑇 distinct indices 𝑖𝑑𝑥 1,

    Initialize empty list, 𝑚𝑎𝑡𝑐ℎ𝐶𝑜𝑢𝑛𝑡𝑠 = {}. for i=1 to I simulations do Randomly select 𝑇 distinct indices 𝑖𝑑𝑥 1, . . . , 𝑖𝑑𝑥𝑇 from the range {1, · · ·, 𝐷}. For each 𝑖𝑑𝑥 𝑖, let V𝑟 𝑎𝑛𝑑 𝑖𝑑 𝑥𝑖 and X𝑟 𝑎𝑛𝑑 𝑖𝑑 𝑥𝑖 be the nodes of 𝐺𝑟 𝑎𝑛𝑑 𝑖𝑑 𝑥𝑖 and their features, respectively. Compute ˆe𝑟 𝑎𝑛𝑑 𝑖𝑑 𝑥𝑖 = 𝑠𝑖𝑔𝑛 (𝑒𝑥 𝑝𝑙𝑎𝑖𝑛 (X𝑟 𝑎𝑛𝑑 𝑖𝑑 𝑥𝑖 , 𝑓 (V 𝑟 𝑎𝑛𝑑 𝑖𝑑 𝑥𝑖 )) for each 𝑖 in 1...

  9. [10]

    For 1 ≤ 𝑖 ≤ 𝑇, let P𝑐𝑑𝑡 𝑖 = 𝑓 (V 𝑐𝑑𝑡 𝑖 ) and X𝑐𝑑𝑡 𝑖 be the corresponding features of V 𝑐𝑑𝑡 𝑖

  10. [11]

    Let the binarized explanation of the 𝑖𝑡 ℎ candidate subgraph be defined as: ˆe𝑐𝑑𝑡 𝑖 = 𝑠𝑖𝑔𝑛 𝑒𝑥 𝑝𝑙𝑎𝑖𝑛 (X𝑐𝑑𝑡 𝑖 , P𝑐𝑑𝑡 𝑖 )

  11. [12]

    Compute MI 𝑐𝑑𝑡 across tensors in {ˆe𝑐𝑑𝑡 𝑖 }𝑇 𝑖=1 using Equation 14

  12. [13]

    Compute the significance of this value as the p-value of a one-tailed 𝑧-test: 𝑧𝑡𝑒𝑠𝑡 = MI𝑐𝑑𝑡 − 𝜇𝑛𝑎𝑡𝑒 𝜎𝑛𝑎𝑡𝑒 𝑝𝑧𝑡𝑒𝑠𝑡 = 1 − Φ(𝑧𝑡𝑒𝑠𝑡 ), Where Φ (𝑧𝑡𝑒𝑠𝑡 ) is the cumulative distribution function of the standard normal distribution

  13. [14]

    If 𝑝𝑧𝑡𝑒𝑠𝑡 < 𝛼 𝑣 , the candidate subgraphs provide enough evidence of ownership to reject 𝐻0

    If 𝑝𝑧𝑡𝑒𝑠𝑡 ≥ 𝛼𝑣 , the candidate subgraphs do not provide adequate ownership evidence. If 𝑝𝑧𝑡𝑒𝑠𝑡 < 𝛼 𝑣 , the candidate subgraphs provide enough evidence of ownership to reject 𝐻0. A.2. Gaussian Kernel Matrices Define ¯K as a collection of matrices {¯K(1) , . . . , ¯K(𝐹 ) }, where ¯K(𝑘 ) (size 𝑁 × 𝑁) is the centered and normalized version of Gaussian kernel ...

  14. [15]

    Multiplying ˜K𝑇 ˜K (an 𝑂 (𝐹2𝑁2) operation, resulting in an 𝐹 × 𝐹 matrix)

  15. [16]

    Obtaining and adding 𝜆I𝐹 (an 𝑂 (𝐹2) operation, resulting in an 𝐹 × 𝐹 matrix)

  16. [17]

    Inverting the result (an 𝑂 (𝐹3) operation, resulting in an 𝐹 × 𝐹 matrix)

  17. [18]

    Multiplying by ˜K𝑇 (an 𝑂 (𝐹2𝑁2) operation, resulting in an 𝐹 × 𝑁2 matrix)

  18. [19]

    For obtaining explanations of 𝑇 subgraphs during a given epoch of watermark embedding, the complexity is therefore: 𝑂 (𝑇 (𝐹2𝑁2 + 𝐹3)) Total Complexity

    Multiplying the result by ˜L (an 𝑂 (𝐹2𝑁2) operation, resulting in an 𝑁2 × 1 vector) The total complexity of a single explanation is therefore 𝑂 (𝐹2𝑁2) + 𝑂 (𝐹2) + 𝑂 (𝐹3) + 𝑂 (𝐹2𝑁2) + 𝑂 (𝐹2𝑁2) = 𝑂 (𝐹2𝑁2 + 𝐹3). For obtaining explanations of 𝑇 subgraphs during a given epoch of watermark embedding, the complexity is therefore: 𝑂 (𝑇 (𝐹2𝑁2 + 𝐹3)) Total Complexit...

  19. [20]

    Subgraphs are formed by randomly selecting 𝑛 = 𝑠 · |V 𝑡𝑟 | nodes from the training set (where |V 𝑡𝑟 | is the number of training nodes and 𝑠 is a proportion of that size)

    Node Classification: The dataset is a single graph. Subgraphs are formed by randomly selecting 𝑛 = 𝑠 · |V 𝑡𝑟 | nodes from the training set (where |V 𝑡𝑟 | is the number of training nodes and 𝑠 is a proportion of that size). (Note: in this case, 𝑛 is equal to the value 𝑛𝑠𝑢𝑏 referenced previously in the paper.) For each subgraph: • The 𝑛 × 𝐹 node feature mat...

  20. [21]

    Subgraphs are formed by randomly selecting 𝑛 = 𝑠 · |E𝑡𝑟 | edges

    Link Prediction: Again, the dataset is a single graph. Subgraphs are formed by randomly selecting 𝑛 = 𝑠 · |E𝑡𝑟 | edges. For each subgraph: • Each row in the 𝑛 × 𝐹 feature matrix represents the features of a single link. These features are derived by combining the feature vectors of the two nodes defining the link, using methods such as concatenation or av...

  21. [22]

    teacher” model, and an untrained “student

    Graph Classification: For graph-level predictions, the dataset D𝑡𝑟 is a collection of graphs. We extend the above pattern to 𝑇 collections of 𝑛 = 𝑠 · |D𝑡𝑟 | subgraphs, where each subgraph is drawn from a different graph in the training set. Specifically: • Each subgraph in a collection is summarized by a feature vector of length 𝐹 (e.g., by averaging its ...