arxiv: 2605.06433 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: unknown

Federated Cross-Client Subgraph Pattern Detection

Selin Ceydeli , Rui Wang , Kubilay Atasu

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:41 UTC · model grok-4.3

classification 💻 cs.LG

keywords federated learninggraph neural networkssubgraph pattern detectiondistributed graphsembedding exchangerepresentation equivalencestructural observability

0 comments

The pith

Per-step embedding exchange in federated GNNs recovers the same node representations as a centralized model for subgraph pattern detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how subgraph pattern detection with graph neural networks breaks down when the underlying graph is partitioned across clients instead of available in one place. Local computations on each client miss structures that cross partition lines, so the learned node representations diverge from what a single centralized GNN would produce. The authors introduce a mechanism that has clients exchange their intermediate node embeddings after every layer of the forward pass while keeping raw features and labels private. When the graph satisfies an extended-subgraph assumption and all clients share the same model parameters, the exchanged embeddings make the federated representations identical to the centralized ones. Experiments on synthetic directed multigraphs containing cycles, bicliques, and scatter-gather patterns show that fresh per-step exchanges combined with standard federated parameter averaging close most of the observed gap.

Core claim

Under an extended-subgraph assumption and shared model parameters across clients, this framework recovers the same node representations as a centralized GNN over the full graph.

What carries the argument

The per-step, layer-wise embedding exchange framework in which clients synchronize intermediate node representations at each layer of the forward pass.

If this is right

Cross-client subgraph patterns become locally identifiable without moving raw data between parties.
Representation equivalence to the centralized case holds when the extended-subgraph assumption is met.
Embedding exchange and federated parameter aggregation are complementary operations.
Fresh per-step exchanges recover more of the centralized behavior than stale per-epoch exchanges.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support collaborative analysis of interaction patterns across organizations that cannot pool their graphs.
Communication cost grows linearly with the number of layers, which may limit applicability to very deep GNNs.
Empirical tests on real partitioned graphs would reveal how often the extended-subgraph assumption actually holds.

Load-bearing premise

The extended-subgraph assumption that enables recovery of centralized representations via per-step embedding exchange.

What would settle it

A concrete graph partition violating the extended-subgraph assumption together with a measurement showing that node representations still diverge after embedding exchange.

Figures

Figures reproduced from arXiv: 2605.06433 by Kubilay Atasu, Rui Wang, Selin Ceydeli.

**Figure 1.** Figure 1: Three money laundering patterns spanning two institutions (Inst. 1 and Inst. 2). Solid black view at source ↗

**Figure 2.** Figure 2: Layer-wise embedding exchange on a 4-cycle ( view at source ↗

**Figure 3.** Figure 3: Macro PR-AUC as the federation scales from view at source ↗

read the original abstract

Subgraph pattern detection aims to uncover complex interaction structures in graphs. However, state-of-the-art graph neural network (GNN)-based solutions assume centralized access to the entire graph. When graphs are instead distributed across multiple parties, client-local GNN computations diverge from those of a centralized model, resulting in a representation-equivalence gap. We formalize this as a structural observability problem, where subgraph patterns crossing partition boundaries become locally unidentifiable. To bridge this gap, we propose a per-step, layer-wise embedding exchange framework in which clients synchronize intermediate node representations at each layer of the forward pass, without exposing raw features or labels. Under an extended-subgraph assumption and shared model parameters across clients, this framework recovers the same node representations as a centralized GNN over the full graph. Experiments on synthetic directed multigraphs with cycles, bicliques, and scatter-gather patterns show that embedding exchange and federated parameter aggregation are complementary rather than interchangeable: their combination recovers most of the representation gap, provided exchanged embeddings are fresh per-step rather than stale per-epoch.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

The paper's main move is a per-step layer-wise embedding exchange that closes most of the federated-centralized gap for subgraph detection under an extended-subgraph assumption, with synthetic tests showing it complements rather than replaces parameter averaging. They frame the mismatch as a structural observability issue where patterns crossing client boundaries stay invisible to local GNNs. The fix keeps raw features and labels private while syncing intermediate node embeddings at every layer of the forward pass, plus the usual federated averaging of model weights. On synthetic directed multigraphs containing cycles, bicliques, and scatter-gather patterns, the combination recovers most of the representation gap when the exchanged embeddings are fresh each step rather than stale per epoch. That timing distinction is a concrete, testable point and the experiments make it visible. The work is honest about conditioning the equivalence claim on the extended-subgraph assumption and shared parameters. This is useful for anyone already running GNNs on partitioned graphs who wants to reduce the representation drift without full data sharing. The soft spots are straightforward. The assumption itself is stated but not given an explicit hop-distance or boundary-neighbor definition, so it is hard to judge how often it holds for typical real-world cuts. All reported results are on synthetic graphs; there is no error analysis or bound on how much the recovered representations deviate when the assumption is only approximately true. No real datasets or larger-scale runs appear in the abstract-level description. The paper is aimed at researchers working on federated graph learning and privacy-sensitive network analysis. A reader who needs a practical tweak to existing federated GNN pipelines will find the complementarity result and the observability framing worth seeing. It deserves peer review because the problem is real, the proposed mechanism is specific, and the synthetic evidence is clear enough to evaluate. Referees can ask for a sharper statement of the assumption and at least one non-synthetic experiment.

Referee Report

3 major / 2 minor

Summary. The manuscript claims to formalize the representation gap in federated GNNs for subgraph pattern detection as a structural observability problem. It proposes a per-step layer-wise embedding exchange mechanism that, under an 'extended-subgraph assumption' and with shared model parameters, recovers the node representations of a centralized GNN. This is supported by experiments on synthetic directed multigraphs showing that fresh per-step embedding exchange combined with federated parameter aggregation closes most of the gap.

Significance. If the extended-subgraph assumption holds for real-world graph partitions, this framework could significantly advance privacy-preserving subgraph pattern detection in distributed settings by enabling equivalent performance to centralized models without sharing raw data. The insight that embedding exchange and parameter aggregation are complementary is valuable, and the synthetic validation on patterns like cycles and bicliques provides initial evidence. However, the lack of a rigorous proof or broad empirical validation of the assumption limits the immediate impact.

major comments (3)

[Abstract and formalization] Abstract and formalization section: The extended-subgraph assumption is invoked as the condition for recovering centralized representations via per-step embedding exchange, but no explicit definition (such as required hop distance or boundary-neighbor inclusion) or proof that typical real-world partitions satisfy it is provided. This is load-bearing for the central equivalence claim.
[Method] Method section: The per-step embedding exchange is described at a high level, but there is no derivation, observability analysis, or mathematical argument showing how it exactly recovers the centralized GNN representations under the stated assumption, despite framing the gap as a structural observability problem.
[Experiments] Experiments section: Synthetic experiments claim the combination recovers most of the representation gap, but lack details on how partitions were generated to satisfy or test the extended-subgraph assumption, error bounds, or ablation cases where the assumption fails.

minor comments (2)

[Notation] Notation for local vs. exchanged embeddings and node representations should be made more explicit and consistent throughout to improve readability.
[Experiments] Experimental details such as number of clients, graph sizes, and exact metrics for measuring the representation gap should be expanded in tables or figure captions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and insightful comments, which have helped us identify areas for improvement in our manuscript. We address each of the major comments below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Abstract and formalization] Abstract and formalization section: The extended-subgraph assumption is invoked as the condition for recovering centralized representations via per-step embedding exchange, but no explicit definition (such as required hop distance or boundary-neighbor inclusion) or proof that typical real-world partitions satisfy it is provided. This is load-bearing for the central equivalence claim.

Authors: We agree that the extended-subgraph assumption requires a more explicit definition to support the central claim. In the revised manuscript, we will add a formal definition in the formalization section, specifying the hop distance requirements and the inclusion of boundary neighbors. Regarding a proof for real-world partitions, we will include a discussion on how common partitioning methods (e.g., METIS or random) can be adapted to satisfy the assumption, along with illustrative examples. A comprehensive proof for arbitrary partitions is not feasible without specifying the partitioning algorithm, as the assumption is a sufficient condition rather than a necessary one for all cases. revision: yes
Referee: [Method] Method section: The per-step embedding exchange is described at a high level, but there is no derivation, observability analysis, or mathematical argument showing how it exactly recovers the centralized GNN representations under the stated assumption, despite framing the gap as a structural observability problem.

Authors: We acknowledge the need for a more rigorous mathematical treatment. The revised method section will include a detailed derivation using induction over the GNN layers. Starting from the structural observability problem formulation, we will show how the per-step exchange ensures that the local computation at each client incorporates the necessary information from neighboring clients' embeddings, thereby recovering the exact centralized representations under the extended-subgraph assumption. This will provide the missing observability analysis. revision: yes
Referee: [Experiments] Experiments section: Synthetic experiments claim the combination recovers most of the representation gap, but lack details on how partitions were generated to satisfy or test the extended-subgraph assumption, error bounds, or ablation cases where the assumption fails.

Authors: We will revise the experiments section to provide greater detail and rigor. Specifically, we will describe the partition generation process, ensuring it adheres to the extended-subgraph assumption, report quantitative error bounds on the representation differences, and add ablation experiments where the assumption is intentionally violated to highlight the performance degradation. These additions will better substantiate the claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; equivalence is conditional on an explicit assumption rather than reducing to inputs by construction.

full rationale

The paper formalizes a structural observability gap and proposes per-step layer-wise embedding exchange, then states that under an 'extended-subgraph assumption' plus shared parameters the framework recovers centralized GNN node representations. This is presented as a conditional claim, not a derivation that loops back to fitted parameters or self-definitions. No equations in the provided text reduce the result to its inputs by construction, no load-bearing self-citations are invoked for uniqueness, and the assumption is openly required for the claim to hold. The derivation remains self-contained against external benchmarks once the assumption is granted.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on the extended-subgraph assumption and shared model parameters; no free parameters or invented entities are introduced.

axioms (1)

domain assumption extended-subgraph assumption
Assumes partitioned subgraphs are extended such that layer-wise embedding exchange recovers full centralized representations.

pith-pipeline@v0.9.0 · 5485 in / 1116 out tokens · 35645 ms · 2026-05-08T12:41:10.782938+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 7 canonical work pages

[1]

Proceedings of the 34th International Conference on Machine Learning , series=

Neural Message Passing for Quantum Chemistry , author=. Proceedings of the 34th International Conference on Machine Learning , series=. 2017 , publisher=

2017
[2]

Federated Learning With Non-IID Data: A Survey , year=

Lu, Zili and Pan, Heng and Dai, Yueyue and Si, Xueming and Zhang, Yan , journal=. Federated Learning With Non-IID Data: A Survey , year=
[3]

International Conference on Artificial Intelligence and Statistics , year=

Communication-Efficient Learning of Deep Networks from Decentralized Data , author=. International Conference on Artificial Intelligence and Statistics , year=
[4]

Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) , year =

Zhang, Ke and Yang, Carl and Li, Xiaoxiao and Sun, Lichao and Yiu, Siu Ming , title =. Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) , year =

2021
[5]

Advances in Neural Information Processing Systems , volume=

Inductive Representation Learning on Large Graphs , author=. Advances in Neural Information Processing Systems , volume=
[6]

Provably powerful graph neural networks for directed multigraphs , year =

Egressy, B\'. Provably powerful graph neural networks for directed multigraphs , year =. doi:10.1609/aaai.v38i10.29069 , booktitle =

work page doi:10.1609/aaai.v38i10.29069
[7]

NPJ Digital Medicine , year=

The future of digital health with federated learning , author=. NPJ Digital Medicine , year=
[8]

ArXiv , year=

Towards Federated Graph Learning for Collaborative Financial Crimes Detection , author=. ArXiv , year=
[9]

2025 , issue_date =

Li, Xunkai and Zhu, Yinlin and Pang, Boyang and Yan, Guochen and Yan, Yeyu and Li, Zening and Wu, Zhengyu and Zhang, Wentao and Li, Rong-Hua and Wang, Guoren , title =. 2025 , issue_date =. doi:10.14778/3718057.3718061 , journal =

work page doi:10.14778/3718057.3718061 2025
[10]

arXiv preprint arXiv:2401.04336 , year =

Deep Efficient Private Neighbor Generation for Subgraph Federated Learning , author =. arXiv preprint arXiv:2401.04336 , year =

work page arXiv
[11]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Yao, Yuan and Cao, Ke and Huang, Xiao and Yu, Shui , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[12]

Proceedings of the 7th International Joint Conference on Web and Big Data (APWeb-WAIM) , year =

Zhi Liu and Hanlin Zhou and Feng Xia and Guojiang Shen and Vidya Saikrishna and Xiaohua He and Jiaxin Du and Xiangjie Kong , title =. Proceedings of the 7th International Joint Conference on Web and Big Data (APWeb-WAIM) , year =. doi:10.1007/978-981-97-2303-4_11 , keywords =

work page doi:10.1007/978-981-97-2303-4_11
[13]

International Conference on Learning Representations , year=

Decoupled Subgraph Federated Learning , author=. International Conference on Learning Representations , year=
[14]

2026 , issn =

OptimES: Optimizing federated learning using remote embeddings for graph neural networks , journal =. 2026 , issn =. doi:https://doi.org/10.1016/j.jpdc.2026.105227 , author =

work page doi:10.1016/j.jpdc.2026.105227 2026
[15]

Proceedings of the ACM on Management of Data (SIGMOD) , year =

Li, Anran and Chen, Yuanyuan and Zhang, Jian and Cheng, Mingfei and Huang, Yihao and Wu, Yueming and Luu, Anh Tuan and Yu, Han , title =. Proceedings of the ACM on Management of Data (SIGMOD) , year =
[16]

Realistic synthetic financial transactions for anti-money laundering models , year =

Altman, Erik and Blanu. Realistic synthetic financial transactions for anti-money laundering models , year =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =
[17]

Identity-aware Graph Neural Networks , volume =

You, Jiaxuan and Gomes Selman, Jonathan and Ying, Rex and Leskovec, Jure , year =. Identity-aware Graph Neural Networks , volume =. Proceedings of the AAAI Conference on Artificial Intelligence , doi =
[18]

Boosting the Cycle Counting Power of Graph Neural Networks with I\

Yinan Huang and Xingang Peng and Jianzhu Ma and Muhan Zhang , booktitle=. Boosting the Cycle Counting Power of Graph Neural Networks with I\
[19]

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics , pages =

The Power of Recursion in Graph Neural Networks for Counting Substructures , author =. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics , pages =. 2023 , editor =

2023
[20]

Building powerful and equivariant graph neural networks with message-passing , journal =

Cl. Building powerful and equivariant graph neural networks with message-passing , journal =. 2020 , eprinttype =. 2006.15107 , timestamp =

work page arXiv 2020
[21]

Proceedings of the 40th International Conference on Machine Learning , pages =

Graph Positional Encoding via Random Feature Propagation , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

2023
[22]

Principal neighbourhood aggregation for graph nets , year =

Corso, Gabriele and Cavalleri, Luca and Beaini, Dominique and Li\`. Principal neighbourhood aggregation for graph nets , year =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =
[23]

Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Improving Subgraph Matching by Combining Algorithms and Graph Neural Networks , author=. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2 , pages=
[24]

Advances in Neural Information Processing Systems , volume=

Iteratively refined early interaction alignment for subgraph matching based graph retrieval , author=. Advances in Neural Information Processing Systems , volume=
[25]

Advances in Neural Information Processing Systems , volume=

Maximum common subgraph guided graph retrieval: late and early interaction networks , author=. Advances in Neural Information Processing Systems , volume=
[26]

Proceedings of the AAAI conference on artificial intelligence , volume=

Flowscope: Spotting money laundering based on graphs , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[27]

33rd USENIX Security Symposium (USENIX Security 24) , pages=

\ MAGIC \ : Detecting advanced persistent threats via masked graph representation learning , author=. 33rd USENIX Security Symposium (USENIX Security 24) , pages=
[28]

Advances in Neural Information Processing Systems , volume=

Fragment-based pretraining and finetuning on molecular graphs , author=. Advances in Neural Information Processing Systems , volume=
[29]

Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining , pages=

Graph neural networks: foundation, frontiers and applications , author=. Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining , pages=
[30]

IEEE transactions on pattern analysis and machine intelligence , volume=

A (sub) graph isomorphism algorithm for matching large graphs , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2004 , publisher=

2004
[31]

Journal of the ACM (JACM) , volume=

An algorithm for subgraph isomorphism , author=. Journal of the ACM (JACM) , volume=. 1976 , publisher=

1976
[32]

IEEE Transactions on Parallel and Distributed Systems , volume=

Federated learning over coupled graphs , author=. IEEE Transactions on Parallel and Distributed Systems , volume=. 2023 , publisher=

2023
[33]

Nature Communications , volume=

A federated graph neural network framework for privacy-preserving personalization , author=. Nature Communications , volume=. 2022 , publisher=

2022
[34]

Journal of Statistical Mechanics: Theory and Experiment , volume =

Fast unfolding of communities in large networks , author =. Journal of Statistical Mechanics: Theory and Experiment , volume =. 2008 , doi =

2008
[35]

SIAM Journal on Scientific Computing , volume =

A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , author =. SIAM Journal on Scientific Computing , volume =. 1998 , doi =

1998
[36]

Advances in neural information processing systems , volume=

Pytorch: An imperative style, high-performance deep learning library , author=. Advances in neural information processing systems , volume=
[37]

and Weimer, Markus and Smola, Alex and Li, Lihong , title =

Zinkevich, Martin A. and Weimer, Markus and Smola, Alex and Li, Lihong , title =. Proceedings of the 24th International Conference on Neural Information Processing Systems - Volume 2 , pages =. 2010 , publisher =

2010
[38]

ArXiv , year=

Revisiting Distributed Synchronous SGD , author=. ArXiv , year=
[39]

Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=

Deep learning with differential privacy , author=. Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages=

2016
[40]

Encyclopedia of Cryptography, Security and Privacy , pages=

Differential privacy , author=. Encyclopedia of Cryptography, Security and Privacy , pages=. 2025 , publisher=

2025
[41]

proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security , pages=

Practical secure aggregation for privacy-preserving machine learning , author=. proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security , pages=

2017
[42]

Asian Journal of Research in Social Sciences and Humanities , volume=

Social impact of money laundering , author=. Asian Journal of Research in Social Sciences and Humanities , volume=. 2015 , publisher=

2015
[43]

and Buschmann Alsbirk, Lasse and Coscia, Michele , title =

Gige, Ada M. and Buschmann Alsbirk, Lasse and Coscia, Michele , title =. Royal Society Open Science , volume =. 2026 , month =. doi:10.1098/rsos.251922 , url =

work page doi:10.1098/rsos.251922 2026