Explaining Temporal Graph Neural Networks via Feature-induced Information Flow

Klaus-Robert M\"uller; Ping Xiong; Shinichi Nakajima; Thomas Schnake

arxiv: 2606.27201 · v1 · pith:KE5OCKBAnew · submitted 2026-06-25 · 💻 cs.LG

Explaining Temporal Graph Neural Networks via Feature-induced Information Flow

Ping Xiong , Thomas Schnake , Klaus-Robert M\"uller , Shinichi Nakajima This is my paper

Pith reviewed 2026-06-26 04:54 UTC · model grok-4.3

classification 💻 cs.LG

keywords explainable AItemporal graph neural networksinformation flowrelevance measuresevent-based modelsattribution methodsgraph explainability

0 comments

The pith

A new attribution method traces the complete information flow in event-based temporal graph neural networks, including through event-induced variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Event-based Temporal Graph Neural Networks capture long-range temporal dependencies through event-induced variables that mediate node interactions, yet prior explanation techniques only track direct contributions from event embeddings to outputs. This paper introduces an attribution approach built on the Normalized Relevance Measure framework that quantifies flows from event embeddings as well as flows passing through those mediating variables. A modular decomposition procedure extends the framework to handle the full architectural complexity of ETGNNs while preserving comparability across layers and enabling higher-order interaction analysis. Experiments on synthetic epidemic-tracing and social-dynamics datasets plus a real political-event network show the method yields more complete and human-interpretable explanations than existing approaches.

Core claim

The method analyzes the entire information flow through all event-associated variables in ETGNNs by extending the NRM framework with a modular decomposition procedure that constructs relevance structures for complex architectures, explicitly quantifying both direct contributions from event embeddings and indirect flows through event-induced variables.

What carries the argument

Normalized Relevance Measure (NRM) framework extended by a modular decomposition procedure that builds relevance structures layer by layer for arbitrary ETGNN architectures

If this is right

Explanations now capture contributions from event-induced variables that mediate interactions between nodes.
Higher-order interactions between multiple events can be analyzed explicitly.
Latent variables remain comparable in relevance across different layers of the network.
The same procedure applies systematically to different ETGNN architectures without custom redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on additional ETGNN tasks such as recommender systems to check whether the added pathways improve explanation fidelity.
If the decomposition scales, it may serve as a template for explaining other models that introduce auxiliary variables to encode temporal structure.
Users of epidemic or political forecasting models could inspect which past events influence current predictions through the newly visible mediating routes.

Load-bearing premise

The modular decomposition can be applied to any ETGNN architecture without omitting or distorting information pathways that run through event-induced variables.

What would settle it

A manual enumeration of all information pathways in a small, fully observable ETGNN where the modular decomposition misses at least one documented route from an event embedding through an induced variable to the output.

Figures

Figures reproduced from arXiv: 2606.27201 by Klaus-Robert M\"uller, Ping Xiong, Shinichi Nakajima, Thomas Schnake.

**Figure 1.** Figure 1: Overview of the proposed Event Relevance (ER). Our ER (bottom row) is defined within the NRM framework, which facilitates tracing information flow [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: A series connection (left) and a parallel connection (right) of modules. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Modularization applied to an LSTM network (a). Its proper FFNN [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of ETGNN: (a) forward computation process, and its (b) [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Illustration of ER-feat for the event e, explaining the node-level prediction of node v. (a) ER-feat (b) ER-msg (c) ER-Emb (d) ER ...... ...... ...... [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of four ER definitions. (a) ER-feat is the relevance of all [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Joint relevance of events e1 and e2, where e1 is associated with nodes u and w and e2 is with nodes v and w. 5. Empirical Evaluation of Explaining ETGNNs In this section, we conduct qualitative and quantitative experiments with synthetic and real-world temporal graph datasets, and compare our methods with baselines.4 Specifically, we consider two artificial and one real-world datasets: a simulated infecti… view at source ↗

**Figure 8.** Figure 8: Top-20 most relevant event edges for node prediction for an example episode in the Infection dataset. Nodes are depicted as circles except special ones: [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Marginal and joint event relevances for predicting the infection of [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Class definitions in the Attacker dataset. The number next to each [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 11.** Figure 11: Explanation by baseline and our proposed methods. The ground-truth attacker subgraphs are marked with circles, and the numbers on the edges denote the [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

read the original abstract

Event-based Temporal Graph Neural Networks (ETGNNs) have demonstrated strong performance across a wide range of applications, including social network analysis, epidemic tracing, recommender systems, and political event forecasting. However, their increasing complexity poses significant challenges for explainability. Existing explanation methods focus only on a subset of the information flow within ETGNNs, typically tracing contributions from the event-related embeddings to the output. Consequently, they overlook the important pathways through event-induced variables, which mediate interactions between nodes and thereby play a central role in capturing long-range temporal dependencies. To overcome this limitation, we propose a novel attribution method that analyzes the \emph{entire} information flow through all event-associated variables. Our method is built upon the recent Normalized Relevance Measure (NRM) framework, which enables explicit quantification of information flow originating from event embeddings as well as information flow passing through event-induced variables. It also ensures comparability of latent variables across layers, and supports higher-order analysis of interactions between events. To handle the architectural complexity of ETGNNs, we extend the NRM framework with a modular decomposition procedure that facilitates the systematic construction of relevance structure for complex neural architectures. We evaluate our approach on two synthetic datasets for epidemic tracing and social dynamics, as well as a real-world dataset of political event networks. Our qualitative and quantitative experiments show that our method consistently outperforms existing explanation approaches while producing more human-interpretable explanations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends NRM with modular decomposition to trace full information flow through event-induced variables in ETGNNs.

read the letter

The main point is that this work adapts the Normalized Relevance Measure to ETGNNs so it can follow information through every variable, including the event-induced ones that prior methods left out.

They start from NRM and add a modular decomposition step that breaks complex architectures into pieces for building the relevance structure. This lets them measure flow from event embeddings, flow passing through the induced variables, layer-to-layer comparability, and higher-order event interactions. The tests cover two synthetic sets for epidemic tracing and social dynamics plus a real political event network, with claims of better performance and clearer human-readable outputs than existing approaches.

The modular procedure itself is the part not already in the cited literature, and it directly targets the gap where explanations missed the mediating paths that capture long-range temporal effects.

The soft spot is the assumption that the decomposition preserves every pathway without omission or rerouting. If any event-induced mediation gets collapsed, the full-flow guarantee does not hold and the outperformance cannot be credited to the method. The abstract gives no explicit invariance check or error analysis on that point, so the experiments will need close inspection to confirm the numbers actually support the claim.

This is for people already working on attribution for temporal graphs or similar complex models. A reader who knows NRM or graph explanation techniques will see a concrete extension worth considering. It has a clear proposal and evaluation plan, so it deserves a serious referee even if revisions are likely on the validation side.

I would send it to peer review.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce an attribution method for Event-based Temporal Graph Neural Networks (ETGNNs) that extends the Normalized Relevance Measure (NRM) framework via a modular decomposition procedure. This enables quantification of the entire information flow, including pathways through event-induced variables that mediate long-range temporal dependencies, while ensuring cross-layer comparability and supporting higher-order interaction analysis. Experiments on two synthetic datasets (epidemic tracing, social dynamics) and one real-world political event network dataset are said to show consistent outperformance over prior explanation methods with improved human interpretability.

Significance. If the modular decomposition is shown to be lossless, the work would meaningfully advance explainability for ETGNNs by addressing a documented gap in methods that trace only direct event-embedding contributions. The explicit handling of event-induced mediation and the NRM extension for comparability and higher-order analysis represent concrete technical contributions that could generalize to other complex temporal architectures.

major comments (2)

[Method (modular decomposition)] Method section (modular decomposition procedure): no invariance argument, summation check, or empirical ablation is provided to confirm that relevance scores remain unchanged and that all pathways through event-induced variables are preserved after decomposition; this is load-bearing for the central claim that the method captures the 'entire' information flow and thereby outperforms prior approaches.
[Experiments] Experiments section: the abstract states that quantitative experiments demonstrate consistent outperformance, yet no specific metrics (e.g., fidelity, sparsity, or explanation accuracy scores), baseline implementations, or result tables are referenced; without these, it is impossible to attribute gains to the full-flow analysis rather than implementation details.

minor comments (1)

[Abstract] Abstract: the two synthetic datasets are described only by application area ('epidemic tracing and social dynamics') without naming conventions, generation parameters, or statistics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our contributions. We respond to each major comment below and outline the corresponding revisions.

read point-by-point responses

Referee: Method section (modular decomposition procedure): no invariance argument, summation check, or empirical ablation is provided to confirm that relevance scores remain unchanged and that all pathways through event-induced variables are preserved after decomposition; this is load-bearing for the central claim that the method captures the 'entire' information flow and thereby outperforms prior approaches.

Authors: We agree that the current Method section lacks an explicit invariance argument and summation check. In the revision we will add a formal proof that the modular decomposition is lossless (total relevance is preserved and all event-induced pathways are retained) together with an empirical ablation on the epidemic and social synthetic datasets. These additions directly support the claim of capturing the entire information flow. revision: yes
Referee: Experiments section: the abstract states that quantitative experiments demonstrate consistent outperformance, yet no specific metrics (e.g., fidelity, sparsity, or explanation accuracy scores), baseline implementations, or result tables are referenced; without these, it is impossible to attribute gains to the full-flow analysis rather than implementation details.

Authors: The Experiments section already reports fidelity, sparsity, and explanation accuracy scores in Tables 2–4 with explicit baseline implementations and ablation studies that isolate the contribution of full-flow analysis. To address the referee’s concern we will revise the abstract to cite these concrete metrics and tables, making the quantitative claims self-contained. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method extends external NRM framework with empirical validation

full rationale

The paper presents its attribution method as an extension of the external Normalized Relevance Measure (NRM) framework, with a modular decomposition procedure for ETGNN architectures. No equations or claims in the abstract reduce predictions or relevance scores to fitted parameters defined inside the paper, nor do they rely on self-citation chains or imported uniqueness theorems for the central result. The superiority claims rest on qualitative/quantitative experiments across synthetic and real-world datasets rather than on any definitional equivalence or renamed empirical pattern. This is the common case of a self-contained proposal whose validity is externally falsifiable via the reported evaluations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5790 in / 1042 out tokens · 32722 ms · 2026-06-26T04:54:37.457763+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 3 linked inside Pith

[1]

C. Gao, X. Wang, X. He, Y . Li, Graph neural networks for recommender system, in: WSDM, ACM, 2022, pp. 1623–1625

2022
[2]

S. Deng, H. Rangwala, Y . Ning, Learning dynamic context graphs for predicting social events, in: KDD, ACM, 2019, pp. 1007–1016

2019
[3]

Cencetti, G

G. Cencetti, G. Santin, A. Longa, E. Pigani, A. Barrat, C. Cattuto, S. Lehmann, M. Salathé, B. Lepri, Digital prox- imity tracing on empirical contact networks for pandemic control, Nat. Commun. 12 (1) (2021) 1655

2021
[4]

L. Zhao, Y . Song, C. Zhang, Y . Liu, P. Wang, T. Lin, M. Deng, H. Li, T-GCN: A temporal graph convolutional network for traffic prediction, IEEE Trans. Intell. Transp. Syst. 21 (9) (2020) 3848–3858

2020
[5]

W. Xia, M. Lai, C. Shan, Y . Zhang, X. Dai, X. Li, D. Li, Explaining temporal graph models through an explorer- navigator framework, in: ICLR, OpenReview.net, 2023

2023
[6]

J. Chen, R. Ying, Tempme: Towards the explainability of temporal graph neural networks via motif discovery, in: NeurIPS, 2023

2023
[7]

Xiong, T

P. Xiong, T. Schnake, G. Montavon, K.-R. Müller, S. Naka- jima, Normalized relevance measure as a unifying frame- work to explain neural network latent structures, CoRR abs/2606.00557 (2026)

Pith/arXiv arXiv 2026
[8]

Schnake, O

T. Schnake, O. Eberle, J. Lederer, S. Nakajima, K. T. Schütt, K.-R. Müller, G. Montavon, Higher-order explana- tions of graph neural networks via relevant walks, IEEE Trans. Pattern Anal. Mach. Intell. 44 (11) (2022) 7581– 7596

2022
[9]

S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, W. Samek, On pixel-wise explanations for non- linear classifier decisions by layer-wise relevance propaga- tion, PloS one 10 (7) (2015) e0130140

2015
[10]

Achtibat, S

R. Achtibat, S. M. V . Hatefi, M. Dreyer, A. Jain, T. Wie- gand, S. Lapuschkin, W. Samek, Attnlrp: Attention-aware layer-wise relevance propagation for transformers, in: ICML, V ol. 235 of Proceedings of Machine Learning Re- search, PMLR/OpenReview.net, 2024, pp. 135–168

2024
[11]

Kauffmann, J

J. Kauffmann, J. Dippel, L. Ruff, W. Samek, K.-R. Müller, G. Montavon, Explainable ai reveals clever hans effects in unsupervised learning models, Nature Machine Intelli- gence 7 (3) (2025) 412–422

2025
[12]

what is relevant in a text document?

L. Arras, F. Horn, G. Montavon, K.-R. Müller, W. Samek, "what is relevant in a text document?": An interpretable machine learning approach, PLOS ONE 12 (8) (2017) 1– 23

2017
[13]

A. Ali, T. Schnake, O. Eberle, G. Montavon, K.-R. Müller, L. Wolf, XAI for transformers: Better explanations through conservative propagation, in: ICML, Proceedings of Ma- chine Learning Research, PMLR, 2022, pp. 435–451

2022
[14]

Esders, T

M. Esders, T. Schnake, J. Lederer, A. Kabylda, G. Mon- tavon, A. Tkatchenko, K.-R. Müller, Analyzing atomic interactions in molecules as learned by neural networks, J. Chem. Theory Comput. 21 (2) (2025) 714–729

2025
[15]

Montavon, S

G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. Müller, Explaining nonlinear classification decisions with deep taylor decomposition, Pattern Recognit. 65 (2017) 211–222

2017
[16]

Samek, G

W. Samek, G. Montavon, S. Lapuschkin, C. J. Anders, K.- R. Müller, Explaining deep neural networks and beyond: A review of methods and applications, Proc. IEEE 109 (3) (2021) 247–278

2021
[17]

Montavon, A

G. Montavon, A. Binder, S. Lapuschkin, W. Samek, K.-R. Müller, Layer-wise relevance propagation: An overview, in: Explainable AI, Springer, 2019, pp. 193–209

2019
[18]

Arras, J

L. Arras, J. A. Arjona-Medina, M. Widrich, G. Montavon, M. Gillhofer, K.-R. Müller, S. Hochreiter, W. Samek, Ex- plaining and interpreting LSTMs, in: Explainable AI: In- terpreting, Explaining and Visualizing Deep Learning, V ol. 11700, Springer, 2019, pp. 211–238

2019
[19]

Xiong, T

P. Xiong, T. Schnake, G. Montavon, K.-R. Müller, S. Naka- jima, Efficient computation of higher-order subgraph attri- bution via message passing, in: ICML, PMLR, 2022, pp. 24478–24495

2022
[20]

J. Chen, T. Ma, C. Xiao, Fastgcn: Fast learning with graph convolutional networks via importance sampling, in: ICLR (Poster), OpenReview.net, 2018

2018
[21]

W. L. Hamilton, Z. Ying, J. Leskovec, Inductive repre- sentation learning on large graphs, in: NIPS, 2017, pp. 1024–1034

2017
[22]

T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: 5th International Con- ference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

2017
[23]

K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, K.-R. Müller, Schnet–a deep learning ar- chitecture for molecules and materials, J. Chem. Phys. 148 (24) (2018) 241722

2018
[24]

H. Yuan, H. Yu, S. Gui, S. Ji, Explainability in graph neural networks: A taxonomic survey, IEEE Trans. Pattern Anal. Mach. Intell. 45 (5) (2023) 5782–5799

2023
[25]

W. Ju, Z. Fang, Y . Gu, Z. Liu, Q. Long, Z. Qiao, Y . Qin, J. Shen, F. Sun, Z. Xiao, J. Yang, J. Yuan, Y . Zhao, Y . Wang, X. Luo, M. Zhang, A comprehensive survey on deep 14 graph representation learning, Neural Networks 173 (2024) 106207

2024
[26]

Monti, D

F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, M. M. Bronstein, Geometric deep learning on graphs and manifolds using mixture model cnns, in: CVPR, IEEE Computer Society, 2017, pp. 5425–5434

2017
[27]

Rossi, B

E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, M. M. Bronstein, Temporal graph networks for deep learn- ing on dynamic graphs, CoRR abs/2006.10637 (2020)

Pith/arXiv arXiv 2006
[28]

Y . Ma, Z. Guo, Z. Ren, J. Tang, D. Yin, Streaming graph neural networks, in: SIGIR, ACM, 2020, pp. 719–728

2020
[29]

Kumar, X

S. Kumar, X. Zhang, J. Leskovec, Predicting dynamic embedding trajectory in temporal interaction networks, in: KDD, ACM, 2019, pp. 1269–1278

2019
[30]

D. Xu, C. Ruan, E. Körpeoglu, S. Kumar, K. Achan, Induc- tive representation learning on temporal graphs, in: ICLR, OpenReview.net, 2020

2020
[31]

R. S. Trivedi, M. Farajtabar, P. Biswal, H. Zha, Dyrep: Learning representations over dynamic graphs, in: ICLR (Poster), OpenReview.net, 2019

2019
[32]

Kocsis, C

L. Kocsis, C. Szepesvári, Bandit based monte-carlo plan- ning, in: ECML, V ol. 4212 of Lecture Notes in Computer Science, Springer, 2006, pp. 282–293

2006
[33]

Tishby, F

N. Tishby, F. C. N. Pereira, W. Bialek, The informa- tion bottleneck method, CoRR physics/0004057 (2000). arXiv:physics/0004057

Pith/arXiv arXiv 2000
[34]

K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, Y . Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in: EMNLP, ACL, 2014, pp. 1724– 1734

2014
[35]

Shrikumar, P

A. Shrikumar, P. Greenside, A. Kundaje, Learning impor- tant features through propagating activation differences, in: ICML, PMLR, 2017, pp. 3145–3153

2017
[36]

Ancona, E

M. Ancona, E. Ceolini, C. Öztireli, M. Gross, Towards bet- ter understanding of gradient-based attribution methods for deep neural networks, in: ICLR (Poster), OpenReview.net, 2018

2018
[37]

Samek, A

W. Samek, A. Binder, G. Montavon, S. Lapuschkin, K.-R. Müller, Evaluating the visualization of what a deep neural network has learned, IEEE Trans. Neural Netw. Learn. Syst. 28 (11) (2017) 2660–2673

2017
[38]

Blücher, J

S. Blücher, J. Vielhaben, N. Strodthoff, Preddiff: Explana- tions and interactions from conditional expectations, Artif. Intell. 312 (2022) 103774

2022
[39]

W. Jin, M. Qu, X. Jin, X. Ren, Recurrent event network: Autoregressive structure inferenceover temporal knowl- edge graphs, in: EMNLP (1), Association for Computa- tional Linguistics, 2020, pp. 6669–6683

2020
[40]

Boschee, J

E. Boschee, J. Lautenschlager, S. O’Brien, S. Shellman, J. Starz, M. Ward, ICEWS Coded Event Data (2015)

2015
[41]

Montavon, Gradient-based vs

G. Montavon, Gradient-based vs. propagation-based ex- planations: An axiomatic comparison, in: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, Springer International Publishing, Cham, 2019, pp. 253– 265

2019
[42]

Shinzo Abe−Express intent to meet or negotiate→North Korea

L. Arras, G. Montavon, K.-R. Müller, W. Samek, Explain- ing recurrent neural network predictions in sentiment analy- sis, in: W ASSA@EMNLP, Association for Computational Linguistics, 2017, pp. 159–168. Appendix A. Modularized NRM for LSTM Here we give details of example application of the NRM framework to an LSTM network (Figure A.1 left). Following Xiong...

2017

[1] [1]

C. Gao, X. Wang, X. He, Y . Li, Graph neural networks for recommender system, in: WSDM, ACM, 2022, pp. 1623–1625

2022

[2] [2]

S. Deng, H. Rangwala, Y . Ning, Learning dynamic context graphs for predicting social events, in: KDD, ACM, 2019, pp. 1007–1016

2019

[3] [3]

Cencetti, G

G. Cencetti, G. Santin, A. Longa, E. Pigani, A. Barrat, C. Cattuto, S. Lehmann, M. Salathé, B. Lepri, Digital prox- imity tracing on empirical contact networks for pandemic control, Nat. Commun. 12 (1) (2021) 1655

2021

[4] [4]

L. Zhao, Y . Song, C. Zhang, Y . Liu, P. Wang, T. Lin, M. Deng, H. Li, T-GCN: A temporal graph convolutional network for traffic prediction, IEEE Trans. Intell. Transp. Syst. 21 (9) (2020) 3848–3858

2020

[5] [5]

W. Xia, M. Lai, C. Shan, Y . Zhang, X. Dai, X. Li, D. Li, Explaining temporal graph models through an explorer- navigator framework, in: ICLR, OpenReview.net, 2023

2023

[6] [6]

J. Chen, R. Ying, Tempme: Towards the explainability of temporal graph neural networks via motif discovery, in: NeurIPS, 2023

2023

[7] [7]

Xiong, T

P. Xiong, T. Schnake, G. Montavon, K.-R. Müller, S. Naka- jima, Normalized relevance measure as a unifying frame- work to explain neural network latent structures, CoRR abs/2606.00557 (2026)

Pith/arXiv arXiv 2026

[8] [8]

Schnake, O

T. Schnake, O. Eberle, J. Lederer, S. Nakajima, K. T. Schütt, K.-R. Müller, G. Montavon, Higher-order explana- tions of graph neural networks via relevant walks, IEEE Trans. Pattern Anal. Mach. Intell. 44 (11) (2022) 7581– 7596

2022

[9] [9]

S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, W. Samek, On pixel-wise explanations for non- linear classifier decisions by layer-wise relevance propaga- tion, PloS one 10 (7) (2015) e0130140

2015

[10] [10]

Achtibat, S

R. Achtibat, S. M. V . Hatefi, M. Dreyer, A. Jain, T. Wie- gand, S. Lapuschkin, W. Samek, Attnlrp: Attention-aware layer-wise relevance propagation for transformers, in: ICML, V ol. 235 of Proceedings of Machine Learning Re- search, PMLR/OpenReview.net, 2024, pp. 135–168

2024

[11] [11]

Kauffmann, J

J. Kauffmann, J. Dippel, L. Ruff, W. Samek, K.-R. Müller, G. Montavon, Explainable ai reveals clever hans effects in unsupervised learning models, Nature Machine Intelli- gence 7 (3) (2025) 412–422

2025

[12] [12]

what is relevant in a text document?

L. Arras, F. Horn, G. Montavon, K.-R. Müller, W. Samek, "what is relevant in a text document?": An interpretable machine learning approach, PLOS ONE 12 (8) (2017) 1– 23

2017

[13] [13]

A. Ali, T. Schnake, O. Eberle, G. Montavon, K.-R. Müller, L. Wolf, XAI for transformers: Better explanations through conservative propagation, in: ICML, Proceedings of Ma- chine Learning Research, PMLR, 2022, pp. 435–451

2022

[14] [14]

Esders, T

M. Esders, T. Schnake, J. Lederer, A. Kabylda, G. Mon- tavon, A. Tkatchenko, K.-R. Müller, Analyzing atomic interactions in molecules as learned by neural networks, J. Chem. Theory Comput. 21 (2) (2025) 714–729

2025

[15] [15]

Montavon, S

G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. Müller, Explaining nonlinear classification decisions with deep taylor decomposition, Pattern Recognit. 65 (2017) 211–222

2017

[16] [16]

Samek, G

W. Samek, G. Montavon, S. Lapuschkin, C. J. Anders, K.- R. Müller, Explaining deep neural networks and beyond: A review of methods and applications, Proc. IEEE 109 (3) (2021) 247–278

2021

[17] [17]

Montavon, A

G. Montavon, A. Binder, S. Lapuschkin, W. Samek, K.-R. Müller, Layer-wise relevance propagation: An overview, in: Explainable AI, Springer, 2019, pp. 193–209

2019

[18] [18]

Arras, J

L. Arras, J. A. Arjona-Medina, M. Widrich, G. Montavon, M. Gillhofer, K.-R. Müller, S. Hochreiter, W. Samek, Ex- plaining and interpreting LSTMs, in: Explainable AI: In- terpreting, Explaining and Visualizing Deep Learning, V ol. 11700, Springer, 2019, pp. 211–238

2019

[19] [19]

Xiong, T

P. Xiong, T. Schnake, G. Montavon, K.-R. Müller, S. Naka- jima, Efficient computation of higher-order subgraph attri- bution via message passing, in: ICML, PMLR, 2022, pp. 24478–24495

2022

[20] [20]

J. Chen, T. Ma, C. Xiao, Fastgcn: Fast learning with graph convolutional networks via importance sampling, in: ICLR (Poster), OpenReview.net, 2018

2018

[21] [21]

W. L. Hamilton, Z. Ying, J. Leskovec, Inductive repre- sentation learning on large graphs, in: NIPS, 2017, pp. 1024–1034

2017

[22] [22]

T. N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in: 5th International Con- ference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

2017

[23] [23]

K. T. Schütt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko, K.-R. Müller, Schnet–a deep learning ar- chitecture for molecules and materials, J. Chem. Phys. 148 (24) (2018) 241722

2018

[24] [24]

H. Yuan, H. Yu, S. Gui, S. Ji, Explainability in graph neural networks: A taxonomic survey, IEEE Trans. Pattern Anal. Mach. Intell. 45 (5) (2023) 5782–5799

2023

[25] [25]

W. Ju, Z. Fang, Y . Gu, Z. Liu, Q. Long, Z. Qiao, Y . Qin, J. Shen, F. Sun, Z. Xiao, J. Yang, J. Yuan, Y . Zhao, Y . Wang, X. Luo, M. Zhang, A comprehensive survey on deep 14 graph representation learning, Neural Networks 173 (2024) 106207

2024

[26] [26]

Monti, D

F. Monti, D. Boscaini, J. Masci, E. Rodolà, J. Svoboda, M. M. Bronstein, Geometric deep learning on graphs and manifolds using mixture model cnns, in: CVPR, IEEE Computer Society, 2017, pp. 5425–5434

2017

[27] [27]

Rossi, B

E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, M. M. Bronstein, Temporal graph networks for deep learn- ing on dynamic graphs, CoRR abs/2006.10637 (2020)

Pith/arXiv arXiv 2006

[28] [28]

Y . Ma, Z. Guo, Z. Ren, J. Tang, D. Yin, Streaming graph neural networks, in: SIGIR, ACM, 2020, pp. 719–728

2020

[29] [29]

Kumar, X

S. Kumar, X. Zhang, J. Leskovec, Predicting dynamic embedding trajectory in temporal interaction networks, in: KDD, ACM, 2019, pp. 1269–1278

2019

[30] [30]

D. Xu, C. Ruan, E. Körpeoglu, S. Kumar, K. Achan, Induc- tive representation learning on temporal graphs, in: ICLR, OpenReview.net, 2020

2020

[31] [31]

R. S. Trivedi, M. Farajtabar, P. Biswal, H. Zha, Dyrep: Learning representations over dynamic graphs, in: ICLR (Poster), OpenReview.net, 2019

2019

[32] [32]

Kocsis, C

L. Kocsis, C. Szepesvári, Bandit based monte-carlo plan- ning, in: ECML, V ol. 4212 of Lecture Notes in Computer Science, Springer, 2006, pp. 282–293

2006

[33] [33]

Tishby, F

N. Tishby, F. C. N. Pereira, W. Bialek, The informa- tion bottleneck method, CoRR physics/0004057 (2000). arXiv:physics/0004057

Pith/arXiv arXiv 2000

[34] [34]

K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, Y . Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in: EMNLP, ACL, 2014, pp. 1724– 1734

2014

[35] [35]

Shrikumar, P

A. Shrikumar, P. Greenside, A. Kundaje, Learning impor- tant features through propagating activation differences, in: ICML, PMLR, 2017, pp. 3145–3153

2017

[36] [36]

Ancona, E

M. Ancona, E. Ceolini, C. Öztireli, M. Gross, Towards bet- ter understanding of gradient-based attribution methods for deep neural networks, in: ICLR (Poster), OpenReview.net, 2018

2018

[37] [37]

Samek, A

W. Samek, A. Binder, G. Montavon, S. Lapuschkin, K.-R. Müller, Evaluating the visualization of what a deep neural network has learned, IEEE Trans. Neural Netw. Learn. Syst. 28 (11) (2017) 2660–2673

2017

[38] [38]

Blücher, J

S. Blücher, J. Vielhaben, N. Strodthoff, Preddiff: Explana- tions and interactions from conditional expectations, Artif. Intell. 312 (2022) 103774

2022

[39] [39]

W. Jin, M. Qu, X. Jin, X. Ren, Recurrent event network: Autoregressive structure inferenceover temporal knowl- edge graphs, in: EMNLP (1), Association for Computa- tional Linguistics, 2020, pp. 6669–6683

2020

[40] [40]

Boschee, J

E. Boschee, J. Lautenschlager, S. O’Brien, S. Shellman, J. Starz, M. Ward, ICEWS Coded Event Data (2015)

2015

[41] [41]

Montavon, Gradient-based vs

G. Montavon, Gradient-based vs. propagation-based ex- planations: An axiomatic comparison, in: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, Springer International Publishing, Cham, 2019, pp. 253– 265

2019

[42] [42]

Shinzo Abe−Express intent to meet or negotiate→North Korea

L. Arras, G. Montavon, K.-R. Müller, W. Samek, Explain- ing recurrent neural network predictions in sentiment analy- sis, in: W ASSA@EMNLP, Association for Computational Linguistics, 2017, pp. 159–168. Appendix A. Modularized NRM for LSTM Here we give details of example application of the NRM framework to an LSTM network (Figure A.1 left). Following Xiong...

2017