pith. sign in

arxiv: 2602.13485 · v2 · pith:RUU6CIJTnew · submitted 2026-02-13 · 💻 cs.LG · stat.ML

Federated Learning of Nonlinear Temporal Dynamics with Graph Attention-based Cross-Client Interpretability

Pith reviewed 2026-05-21 12:17 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords federated learningnonlinear dynamicsgraph attention networkstemporal interdependenciesstate space modelsdecentralized systemsinterpretability
0
0 comments X

The pith

A federated framework uses latent states and graph attention to characterize cross-client temporal interdependencies in nonlinear decentralized systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Modern industrial networks rely on distributed sensors where subsystems interact through time series data, yet raw measurements stay local and client models cannot be altered. The paper shows how each client can compress its observations into low-dimensional latent states with an unchanged nonlinear state-space model. A central server then fits a graph attention network to model transitions across those latents. By linking the Jacobian of the server transition to the attention coefficients, the method supplies an explicit, interpretable description of how dynamics at one client influence others. This matters because it delivers both performance guarantees and the first such interpretability result under realistic privacy and heterogeneity constraints.

Core claim

The framework lets each client map its high-dimensional observations to low-dimensional latent states via a fixed nonlinear state-space model; the server learns a graph-structured neural transition over the communicated latents with a Graph Attention Network; cross-client temporal interdependencies are then interpreted by relating the Jacobian of that transition model to the resulting attention coefficients, yielding the first such characterization for decentralized nonlinear systems together with convergence guarantees to a centralized oracle.

What carries the argument

The mapping from the Jacobian of the server-side state transition model to the attention coefficients produced by the Graph Attention Network, which supplies the interpretability of cross-client temporal influences.

If this is right

  • Theoretical convergence of the federated procedure to the performance of a centralized oracle is guaranteed.
  • Interpretability of interdependencies is obtained without retraining or inspecting any client model.
  • Scalability and privacy are preserved while matching the accuracy of decentralized baselines on real data.
  • The approach extends to heterogeneous observation spaces across clients.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same Jacobian-to-attention link could be tested in other attention-based federated models outside time-series settings.
  • Applications to smart-grid or transportation networks would require checking whether the latent dimensionality chosen locally remains sufficient when subsystem counts grow.
  • Noise in the communicated latents could be injected in future experiments to quantify degradation of both prediction and interpretability.

Load-bearing premise

Relating the Jacobian of the learned server-side transition model to attention coefficients provides a valid and interpretable characterization of cross-client temporal interdependencies in nonlinear systems.

What would settle it

A controlled synthetic experiment in which known cross-client coupling strengths are varied and the derived attention coefficients fail to rank or recover those strengths would falsify the interpretability claim.

Figures

Figures reproduced from arXiv: 2602.13485 by Ayse Tursucular, Ayush Mohanty, Nagi Gebraeel, Nazal Mohamed.

Figure 1
Figure 1. Figure 1: Schematic of our proposed framework with communication between client [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Element-wise ℓ2 norms of attention residuals relative to ground truth for the centralized and server GAT. in Eq. (1), where the state-transition function 𝑓 is implemented as a one-layer GAT with predetermined attention coefficients (serving as ground truth), the measurement function is a client-specific nonlinear mapping defined as 𝑔𝑚 (ℎ 𝑚 𝑡 ) = tanh 𝑊𝑚ℎ 𝑚 𝑡 + 𝑏𝑚  where 𝑊𝑚 ∈ R 𝑑𝑚 ×𝑝𝑚 and 𝑏𝑚 ∈ R 𝑑𝑚 are fix… view at source ↗
Figure 3
Figure 3. Figure 3: Correlation between 𝛼 of ground-truth GAT with centralized oracle’s GAT (left), and server GAT (right). Jacobian-Based Interpretability. To address Q2 of Section 6, we evaluate the Jacobian blocks 𝐽𝑚𝑛 (𝑡) of the learned server-side dynamics [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Element-wise Jacobian (standardized) residuals relative to ground truth for the centralized and server GAT. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Correlation between attention coefficients and Jacobian magnitudes of the server-side GAT (avg across runs). [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Server model used for HAI dataset Experimental Setting. Unlike the synthetic setting, real-world systems can exhibit multi-step temporal dependencies in their latent states. To capture these effects, the server uses client-specific LSTM encoders that summarize historical [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Correlation of (time-averaged) attention coefficients [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Training loss curves for the server and three clients. [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Validation residual norms (avg. across all clients) [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Server loss 𝐿𝑠 versus Gaussian noise during communication. Left: 𝐿𝑠 as a function of server-to-client noise 𝜎𝑔. Right: 𝐿𝑠 as a function of client-to-server noise 𝜎𝑐𝑎. mechanism is disrupted, yielding a quasi-intervention scenario in which the client subsystems (P1, P2, P3) can be treated as independently evolving for the purpose of local model training. Specifically, for each client 𝑚, we train a local re… view at source ↗
Figure 11
Figure 11. Figure 11: Training loss for clients P1, P2, P3 and the server. [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
read the original abstract

Networks of modern industrial systems are increasingly monitored by distributed sensors, where each system comprises multiple subsystems generating high dimensional time series data. These subsystems are often interdependent, making it important to understand how temporal patterns at one subsystem relate to others. This is challenging in decentralized settings where raw measurements cannot be shared and client observations are heterogeneous. In practical deployments each subsystem (client) operates a fixed proprietary model that cannot be modified or retrained, limiting existing approaches. Nonlinear dynamics further make cross client temporal interdependencies difficult to interpret because they are embedded in nonlinear state transition functions. We present a federated framework for learning temporal interdependencies across clients under these constraints. Each client maps high dimensional local observations to low dimensional latent states using a nonlinear state space model. A central server learns a graph structured neural state transition model over the communicated latent states using a Graph Attention Network. For interpretability we relate the Jacobian of the learned server side transition model to attention coefficients, providing the first interpretable characterization of cross client temporal interdependencies in decentralized nonlinear systems. We establish theoretical convergence guarantees to a centralized oracle and validate the framework through synthetic experiments demonstrating convergence, interpretability, scalability and privacy. Additional real world experiments show performance comparable to decentralized baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces a federated framework for learning nonlinear temporal dynamics across distributed clients with heterogeneous observations and fixed proprietary local models. Each client employs a nonlinear state-space model to map high-dimensional local time series to low-dimensional latent states. A central server then trains a Graph Attention Network (GAT) to model the state transitions over these latents in a graph-structured manner. For interpretability, the paper relates the Jacobian of the learned server-side transition model to the GAT attention coefficients, claiming this yields the first interpretable characterization of cross-client temporal interdependencies in decentralized nonlinear systems. Theoretical convergence guarantees to a centralized oracle are provided, along with synthetic experiments demonstrating convergence, interpretability, scalability, and privacy, plus real-world experiments showing performance comparable to decentralized baselines.

Significance. If the Jacobian-attention relation can be rigorously justified as an interpretable mapping, the work would represent a meaningful advance in federated learning for interdependent dynamical systems, particularly under privacy and proprietary-model constraints common in industrial monitoring. The explicit convergence guarantees to an oracle and the dual validation on synthetic and real data are strengths that support the framework's soundness. The approach integrates SSM encoders with GAT-based transitions in a way that addresses practical decentralization challenges.

major comments (1)
  1. [Abstract and interpretability discussion] Abstract and interpretability section: the central claim that relating the Jacobian of the server-side GAT transition model to attention coefficients provides a valid interpretable characterization of cross-client temporal interdependencies lacks a derivation. In a nonlinear state-transition function realized by GAT, the attention coefficients are obtained via softmax over a shared attention mechanism applied to node features; they are not required to equal or be monotonically related to the entries of the Jacobian J = ∂f/∂x evaluated at a latent state. No explicit proof or condition is given showing that A_ij ∝ J_ij (or any fixed relation) holds once client SSM nonlinearities and GAT layers are accounted for. This assumption is load-bearing for the headline contribution of 'first interpretable characterization' and requires either a supporting derivation or a clear statement of the (pot
minor comments (1)
  1. [Experiments] The description of the synthetic experiments would benefit from explicit specification of the nonlinear dynamics and graph topologies used to generate the data, to facilitate exact reproduction of the reported convergence behavior.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for their thorough review and valuable suggestions. The feedback highlights important aspects of our interpretability claim that we will address to strengthen the manuscript. We respond to the major comment below.

read point-by-point responses
  1. Referee: [Abstract and interpretability discussion] Abstract and interpretability section: the central claim that relating the Jacobian of the server-side GAT transition model to attention coefficients provides a valid interpretable characterization of cross-client temporal interdependencies lacks a derivation. In a nonlinear state-transition function realized by GAT, the attention coefficients are obtained via softmax over a shared attention mechanism applied to node features; they are not required to equal or be monotonically related to the entries of the Jacobian J = ∂f/∂x evaluated at a latent state. No explicit proof or condition is given showing that A_ij ∝ J_ij (or any fixed relation) holds once client SSM nonlinearities and GAT layers are accounted for. This assumption is load-bearing for the headline contribution of 'first interpretable characterization' and requires either a supporting

    Authors: We thank the referee for pointing out this important gap in our presentation of the interpretability aspect. Upon reflection, the manuscript indeed does not include a complete mathematical derivation establishing a direct proportionality between the GAT attention coefficients and the Jacobian entries after accounting for the client-side nonlinear SSMs. The relation is motivated by the fact that the attention mechanism in the GAT is intended to capture the pairwise influences between clients' latent states in the transition dynamics, which aligns with the concept of sensitivity analysis via the Jacobian. To address this, we will revise the manuscript to include a supporting derivation in the interpretability section. Specifically, we will show that for a GAT with a single attention head and linear scoring function, the attention coefficients A_ij can be related to the partial derivatives ∂f_i / ∂x_j under the assumption that the attention is computed on the concatenated features and that higher-order nonlinearities are small. We will also explicitly state the conditions and limitations of this approximation, including the impact of the nonlinear encoders. This revision will clarify that the interpretability is approximate but still provides meaningful insights into cross-client dependencies. We believe this will solidify the contribution without overstating the exactness of the relation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes a federated setup with client-side nonlinear SSM encoders producing latent states, a server-side GAT for the transition model, and a methodological step that relates the Jacobian of that transition model to its attention coefficients for interpretability. This relation is introduced as an interpretive device rather than a quantity derived from or fitted to the same inputs by construction. Convergence guarantees to a centralized oracle are stated as an external theoretical benchmark. No equations or claims in the abstract reduce a central result to a self-definition, a renamed fit, or a load-bearing self-citation chain. The framework therefore remains self-contained against the described external benchmarks and does not exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to identify specific free parameters, axioms, or invented entities beyond the high-level framework description.

pith-pipeline@v0.9.0 · 5758 in / 1090 out tokens · 75699 ms · 2026-05-21T12:17:09.911005+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray Kavukcuoglu

    Peter W. Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray Kavukcuoglu. 2016. Interaction Networks for Learning about Objects, Relations and Physics. InAdvances in Neural Information Processing Systems

  2. [2]

    Shaked Brody, Uri Alon, and Eran Yahav. 2022. How Attentive Are Graph Attention Networks?. InInternational Conference on Learning Representa- tions

  3. [3]

    Philippe Brouillard, Sébastien Lachapelle, Alexandre Lacoste, Simon Lacoste-Julien, and Alexandre Drouin. 2020. Differentiable causal discovery from interventional data.Advances in Neural Information Processing Systems33 (2020), 21865–21877

  4. [4]

    Chaochao Chen, Jun Zhou, Longfei Zheng, Huiwen Wu, Lingjuan Lyu, Jia Wu, Bingzhe Wu, Ziqi Liu, Li Wang, and Xiaolin Zheng. 2022. Vertically Federated Graph Neural Network for Privacy-Preserving Node Classification. InProceedings of the Thirty-First International Joint Conference on Artificial Intelligence

  5. [5]

    Lei Chen, Sergey Gorbachev, Dong Yue, Chunxia Dou, Xiangpeng Xie, Shengquan Li, Nan Zhao, and Tingjun Zhang. 2023. Impact of cascading failure on power distribution and data transmission in cyber-physical power systems.IEEE Transactions on Network Science and Engineering11, 2 (2023), 1580–1590

  6. [6]

    Erdun Gao, Junjia Chen, Li Shen, Tongliang Liu, Mingming Gong, and Howard Bondell. 2023. FedDAG: Federated DAG Structure Learning. Transactions on Machine Learning Research(2023)

  7. [7]

    Colin Graber and Alexander G. Schwing. 2020. Dynamic Neural Relational Inference. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  8. [8]

    Yu, Yu Rong, Peilin Zhao, Junzhou Huang, Murali Annavaram, and Salman Avestimehr

    Chaoyang He, Keshav Balasubramanian, Emir Ceyani, Carl Yang, Han Xie, Lichao Sun, Lifang He, Liangwei Yang, Philip S. Yu, Yu Rong, Peilin Zhao, Junzhou Huang, Murali Annavaram, and Salman Avestimehr. 2021. FedGraphNN: A Federated Learning System and Benchmark for Graph Neural Networks. arXiv:2104.07145 [cs.LG] https://arxiv.org/abs/2104.07145

  9. [9]

    Hoang, Bao Duong, and Thin Nguyen

    Nu T. Hoang, Bao Duong, and Thin Nguyen. 2024. Scalable Variational Causal Discovery Unconstrained by Acyclicity. InProceedings of the 26th European Conference on Artificial Intelligence

  10. [10]

    Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2021. Advances and open problems in federated learning.Foundations and trends®in machine learning 14, 1–2 (2021), 1–210

  11. [11]

    Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, and Richard Zemel

    Thomas N. Kipf, Ethan Fetaya, Kuan-Chieh Wang, Max Welling, and Richard Zemel. 2018. Neural Relational Inference for Interacting Systems. In Proceedings of the 35th International Conference on Machine Learning

  12. [12]

    Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, and Simon Lacoste-Julien. 2020. Gradient-Based Neural DAG Learning. InProceedings of the Eighth International Conference on Learning Representations

  13. [13]

    Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. 2020. Federated learning: Challenges, methods, and future directions.IEEE signal processing magazine37, 3 (2020), 50–60

  14. [14]

    Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In International Conference on Learning Representations

  15. [15]

    Tengfei Ma, Trong Nghia Hoang, and Jie Chen. 2023. Federated learning of models pre-trained on different features with consensus graphs. In Uncertainty in artificial intelligence. PMLR, 1336–1346

  16. [16]

    Peihua Mai and Yan Pang. 2023. VerFedGNN: Vertical Federated Graph Neural Network for Recommender Systems. InProceedings of the 40th International Conference on Machine Learning

  17. [17]

    Ignavier Ng and Kun Zhang. 2022. Towards federated Bayesian network structure learning with continuous optimization. InInternational Conference on Artificial Intelligence and Statistics. PMLR, 8095–8111

  18. [18]

    Roxana Pamfil, Nisara Sriwattanaworachai, Shaan Desai, Philip Pilgerstorfer, Konstantinos Georgatzis, Paul Beaumont, and Bryon Aragam. 2020. DYNOTEARS: Structure Learning from Time-Series Data. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics

  19. [19]

    Chao Shang, Jie Chen, and Jinbo Bi. 2021. Discrete Graph Structure Learning for Forecasting Multiple Time Series. InInternational Conference on Learning Representations

  20. [20]

    Dong-Hoon Shin, Dajun Qian, and Junshan Zhang. 2014. Cascading effects in interdependent networks.Ieee Network28, 4 (2014), 82–87

  21. [21]

    Hyeok-Ki Shin, Woomyo Lee, Seungoh Choi, Jeong-Han Yun, and Byung-Gi Min. 2023. HAI security datasets. https://github.com/icsdataset/hai

  22. [22]

    Keith Stouffer, Joe Falco, Karen Scarfone, et al. 2011. Guide to industrial control systems (ICS) security.NIST special publication800, 82 (2011), 16–16

  23. [23]

    Keith Stouffer, Keith Stouffer, Michael Pease, CheeYee Tang, Timothy Zimmerman, Victoria Pillitteri, Suzanne Lightman, Adam Hahn, Stephanie Saravia, Aslam Sherule, et al. 2023. Guide to operational technology (ot) security. (2023). Federated Learning of Nonlinear Temporal Dynamics with Graph Attention-based Cross-Client Interpretability 17

  24. [24]

    Haobin Tan, Yao Xiao, Amelie Chi Zhou, Kezhong Lu, and Xuan Yang. 2025. Distributed and Adaptive Partitioning for Large Graphs in Geo- Distributed Data Centers.IEEE Transactions on Parallel and Distributed Systems(2025)

  25. [26]

    Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations

  26. [27]

    Hao Wang, Hao Luo, Lei Ren, Mingyi Huo, Yuchen Jiang, and Okyay Kaynak. 2024. Data-driven design of distributed monitoring and optimization system for manufacturing systems.IEEE Transactions on Industrial Informatics20, 7 (2024), 9455–9464

  27. [28]

    Chuhan Wu, Fangzhao Wu, Lingjuan Lyu, Tao Qi, Yongfeng Huang, and Xing Xie. 2022. A federated graph neural network framework for privacy-preserving personalization.Nature Communications13, 1 (2022), 3091

  28. [29]

    Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi Zhang. 2020. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  29. [30]

    Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, and Chengqi Zhang. 2019. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

  30. [31]

    Luolin Xiong, Anshul Goyal, Kankar Bhattacharya, Yang Tang, Zhaoyang Dong, Feng Qian, and Venkata Balaji Thummalacherla. 2025. DRL-Based Distributed Coordination of ISO and DSOs in Bi-Level Electricity Markets.IEEE Transactions on Industrial Informatics(2025)

  31. [32]

    Dezhi Yang, Xintong He, Jun Wang, Guoxian Yu, Carlotta Domeniconi, and Jinglin Zhang. 2024. Federated Causality Learning with Explainable Adaptive Optimization. InProceedings of the AAAI Conference on Artificial Intelligence

  32. [33]

    Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. InProceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence

  33. [34]

    Yue Yu, Jie Chen, Tian Gao, and Mo Yu. 2019. DAG-GNN: DAG Structure Learning with Graph Neural Networks. InProceedings of the 36th International Conference on Machine Learning

  34. [35]

    Ke Zhang, Carl Yang, Xiaoxiao Li, Lichao Sun, and Siu Ming Yiu. 2021. Subgraph federated learning with missing neighbor generation.Advances in neural information processing systems34 (2021), 6671–6682

  35. [36]

    Longfei Zheng, Jun Zhou, Chaochao Chen, Bingzhe Wu, Li Wang, and Benyu Zhang. 2021. ASFGNN: Automated Separated-Federated Graph Neural Network.Peer-to-Peer Networking and Applications14 (2021), 1692–1704

  36. [37]

    Ravikumar, and Eric P

    Xun Zheng, Bryon Aragam, Pradeep K. Ravikumar, and Eric P. Xing. 2018. DAGs with NO TEARS: Continuous Optimization for Structure Learning. InAdvances in Neural Information Processing Systems. A Experimental Details A.1 Additional Results on Synthetic Dataset Details on Model Architecture.We used 𝑀= 3clients with latent dimension 𝑝𝑚 = 1, observation dimens...

  37. [38]

    We optimize the centralized model using Adam with learning rate10 −3, batch size 512, and up to 200 epochs. Early stopping is applied based on validation reconstruction loss with patience 15 and a minimum improvement threshold of10 −5, and the best-performing model parameters are restored. This centralized model explicitly models global interdependencies ...