Normalized Relevance Measure as a Unifying Framework to Explain Neural Network Latent Structures

Gr\'egoire Montavon; Klaus-Robert M\"uller; Ping Xiong; Shinichi Nakajima; Thomas Schnake

arxiv: 2606.00557 · v1 · pith:4N3LMD7Rnew · submitted 2026-05-30 · 💻 cs.LG

Normalized Relevance Measure as a Unifying Framework to Explain Neural Network Latent Structures

Ping Xiong , Thomas Schnake , Gr\'egoire Montavon , Klaus-Robert M\"uller , Shinichi Nakajima This is my paper

Pith reviewed 2026-06-28 19:30 UTC · model grok-4.3

classification 💻 cs.LG

keywords normalized relevance measureexplainable AIneural network explanationslatent representationspropagation-based methodsVGG16

0 comments

The pith

The normalized relevance measure framework defines neuron importance as a signed measure that unifies prior explanation methods for neural networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the normalized relevance measure (NRM) as a general procedure for attributing importance to any collection of neurons in any layers of a neural network. Relevance is constructed explicitly through marginalization and conditioning steps that follow additive and multiplicative rules, mirroring the structure of probability measures, with an added normalization step that makes scores comparable across layers. The framework demonstrates that common propagation-based explanation techniques are instances of this same underlying quantity computed under different choices of operations. Application to VGG16 networks illustrates how joint analysis across layers can trace information paths inside the model.

Core claim

The normalized relevance measure (NRM) framework attributes relevance to arbitrary sets of neurons across layers of arbitrary architectures by defining relevance explicitly as a normalized signed measure constructed using marginalization and conditioning operations based on additive and multiplicative laws in direct analogy to probability measures. The normalization property guarantees comparability across layers, and the framework subsumes existing propagation-based explanation algorithms by identifying the underlying quantity being computed.

What carries the argument

The normalized relevance measure (NRM), a signed measure on neuron sets that is constructed via marginalization and conditioning and then normalized to enable cross-layer comparison.

If this is right

Existing propagation-based methods compute specific cases of the same relevance quantity under different marginalization choices.
Relevance scores become directly comparable between early and late layers because of the normalization step.
Joint relevance analysis across multiple layers can trace how information flows through internal representations in models such as VGG16.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same marginalization and conditioning construction might be applied to recurrent or transformer architectures to produce cross-layer relevance maps.
Choosing different conditioning operations within the NRM definition could generate families of new explanation techniques beyond those already known.
The framework's emphasis on sets of neurons rather than single units suggests a route to explanations that respect group structure in the latent space.

Load-bearing premise

Relevance of arbitrary neuron sets can be rigorously defined as a normalized signed measure built from marginalization and conditioning operations in the same way probability measures are built.

What would settle it

A calculation on a trained network where the NRM values for neuron sets in one layer cannot be meaningfully compared in scale to those in another layer would show the normalization property fails to hold.

Figures

Figures reproduced from arXiv: 2606.00557 by Gr\'egoire Montavon, Klaus-Robert M\"uller, Ping Xiong, Shinichi Nakajima, Thomas Schnake.

**Figure 2.** Figure 2: (a) Walk (full joint), (b) single neuron (marginal) , (c) conditional, and (d) substructure relevances. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Multiple input and output variables. (b) Additive (left) and multiplicative (right) forward [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Sum of joint contributions over the top- [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Channel-level joint relevance explanation on three example input images. For each row from left to [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Sum of joint contributions over the top- [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Cluster-level joint relevance analysis. Top: Heatmap explanations of the clusters in block 1 and [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Left: A proper FFNN representation of RNN. Right: The relevance scores of input features at each time step t. x80 should receive the highest relevance score according to the data construction. Upper: NRM (normalized) assigns highest relevance to x80 (marked with red line). Lower: unnormalized relevance explodes in the earlier time steps. We synthesize a toy time-series dataset with T = 100 time steps for b… view at source ↗

read the original abstract

To understand how a neural network (NN) functions and makes predictions, it has become increasingly clear that analyzing only the input domain is insufficient -- one must also examine its internal inference mechanisms to capture the complete picture. To explain the internal inference mechanisms of such models, it is essential to analyze the importance of latent representations for a given task. In this paper, we propose the \emph{normalized relevance measure} (NRM) framework -- a novel general explanation procedure that attributes relevance to \emph{arbitrary sets of neurons across layers of arbitrary architectures}. In the NRM framework, relevance of selected neurons is explicitly defined as a normalized signed measure, constructed using simple operations -- marginalization and conditioning based on additive and multiplicative laws -- in analogy to the probability measures. The normalization property further guarantees comparability across layers. The NRM framework subsumes existing propagation-based explanation algorithms by explicitly identifying the underlying quantity being computed. We demonstrate the utility of the framework in computer vision applications, where joint relevance analysis across multiple layers reveals key information flows in VGG16 networks. Overall, the NRM framework provides a general, mathematically grounded approach to understanding how modern NNs propagate information, offering a versatile and broadly applicable foundation for explainable artificial intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NRM frames neuron relevance as a normalized signed measure via marginalization and conditioning, but the subsumption of LRP-style methods rests on an identification that the abstract does not demonstrate.

read the letter

The core move here is to define relevance for any set of neurons as a normalized signed measure built from marginalization and conditioning steps, in direct parallel to probability. This gives a clean way to compare importance across layers and architectures, which is the practical payoff.

The paper does a decent job stating the generality: it targets arbitrary neuron groups in any network, not just input attributions, and the normalization is meant to make cross-layer numbers meaningful. The VGG16 example is a reasonable place to show joint analysis of information flow.

The soft spot is the subsumption claim. The abstract says the framework works by explicitly identifying what existing propagation rules compute, yet no mapping or derivation appears in the provided text. Without those steps it is difficult to verify whether the signed-measure and normalization properties survive when applied to actual LRP or DeepLIFT rules, or whether the construction stays at the level of analogy.

This is aimed at XAI researchers who already work with propagation methods and want a single mathematical language for latent explanations. A reader looking for new algorithms or large-scale empirical comparisons will find less here.

The idea is coherent enough on its own terms to warrant peer review, mainly to check whether the claimed identifications are carried through in the full derivations and whether the VGG results add insight beyond re-labeling existing quantities.

Referee Report

1 major / 2 minor

Summary. The paper proposes the normalized relevance measure (NRM) framework, which defines relevance of arbitrary sets of neurons across layers of arbitrary neural network architectures as a normalized signed measure constructed via marginalization and conditioning operations (additive and multiplicative laws) in direct analogy to probability measures. The normalization is claimed to enable cross-layer comparability. The central claim is that NRM subsumes existing propagation-based explanation methods (e.g., LRP, DeepLIFT) by explicitly identifying the underlying quantity each computes. Utility is illustrated via joint relevance analysis across layers in VGG16 for computer vision tasks.

Significance. If the subsumption claim is established via explicit mappings, NRM would offer a mathematically grounded unifying lens for propagation-based explanations, with the signed-measure normalization providing a concrete mechanism for cross-layer and cross-architecture comparison. The probability-measure analogy supplies a clear axiomatic structure that could facilitate further theoretical development in XAI.

major comments (1)

[Abstract, §3] Abstract and §3 (NRM definition): the central subsumption claim—that NRM subsumes LRP, DeepLIFT, etc., 'by explicitly identifying the underlying quantity being computed'—requires a concrete mapping showing that each standard rule set computes exactly the normalized signed measure under the paper's marginalization/conditioning operations. No such derivation, table, or proposition appears; without it the unification remains an asserted analogy rather than a demonstrated equivalence, which is load-bearing for the paper's primary contribution.

minor comments (2)

[§2] Notation for the signed measure (e.g., how the normalization constant is defined when the measure can be negative) should be stated explicitly in the main text rather than left to supplementary material.
[§4] The VGG16 experiments would benefit from a quantitative baseline comparison (e.g., against layer-wise relevance propagation alone) rather than purely qualitative information-flow visualizations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The point raised about strengthening the subsumption claim is well-taken, and we address it directly below.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (NRM definition): the central subsumption claim—that NRM subsumes LRP, DeepLIFT, etc., 'by explicitly identifying the underlying quantity being computed'—requires a concrete mapping showing that each standard rule set computes exactly the normalized signed measure under the paper's marginalization/conditioning operations. No such derivation, table, or proposition appears; without it the unification remains an asserted analogy rather than a demonstrated equivalence, which is load-bearing for the paper's primary contribution.

Authors: We agree that the subsumption claim would be substantially strengthened by explicit derivations. Although the NRM definition in §3 is constructed precisely so that common propagation rules emerge as special cases of the marginalization and conditioning operations (with the normalization ensuring the signed-measure property), the current manuscript presents this at the level of the general framework rather than case-by-case mappings. In the revised version we will add a new proposition in §3 together with a summary table that derives the exact equivalence for the standard LRP rules (including the z-rule and αβ-rule) and the DeepLIFT rules, showing the specific marginalization/conditioning steps that recover each method as an instance of the normalized relevance measure. This addition will make the unification demonstrative rather than asserted. revision: yes

Circularity Check

0 steps flagged

No circularity: NRM definition and subsumption claim remain independent of inputs

full rationale

The abstract defines NRM explicitly via marginalization/conditioning operations producing a normalized signed measure, then asserts subsumption via identification of existing algorithms as computing that same quantity. No equations, self-citations, or fitted parameters are shown that would reduce the central claim to a tautology or prior author result by construction. The derivation chain is therefore self-contained against external benchmarks; the unification is presented as an explicit mapping rather than a renaming or self-referential fit.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only abstract available; the core construction rests on an unelaborated analogy to probability measures.

axioms (1)

domain assumption Relevance of neuron sets can be defined as a normalized signed measure using marginalization and conditioning based on additive and multiplicative laws in analogy to probability measures.
This is the foundational construction stated in the abstract.

pith-pipeline@v0.9.1-grok · 5764 in / 1107 out tokens · 18961 ms · 2026-06-28T19:30:06.163411+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Explaining Temporal Graph Neural Networks via Feature-induced Information Flow
cs.LG 2026-06 unverdicted novelty 6.0

A new attribution method for ETGNNs extends NRM with modular decomposition to analyze entire event-induced information flow and outperforms prior explainers on synthetic epidemic/social and real political event data.

Reference graph

Works this paper leans on

52 extracted references · 1 canonical work pages · cited by 1 Pith paper

[1]

Achtibat, M

R. Achtibat, M. Dreyer, I. Eisenbraun, S. Bosse, T. Wiegand, W. Samek, and S. Lapuschkin. From attribution maps to human-understandable explanations through concept relevance propagation.Nat. Mac. Intell., 5(9):1006–1019, 2023

2023
[2]

Achtibat, S

R. Achtibat, S. M. V . Hatefi, M. Dreyer, A. Jain, T. Wiegand, S. Lapuschkin, and W. Samek. Attnlrp: Attention-aware layer-wise relevance propagation for transformers. InICML, pages 135–168. PMLR/OpenReview.net, 2024

2024
[3]

Ancona, E

M. Ancona, E. Ceolini, C. Öztireli, and M. Gross. Towards better understanding of gradient-based attribution methods for deep neural networks. InICLR (Poster). OpenReview.net, 2018. 23

2018
[4]

Arras, J

L. Arras, J. A. Arjona-Medina, M. Widrich, G. Montavon, M. Gillhofer, K.-R. Müller, S. Hochreiter, and W. Samek. Explaining and interpreting LSTMs. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, volume 11700, pages 211–238. Springer, 2019

2019
[5]

S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PloS one, 10(7):e0130140, 2015

2015
[6]

C. M. Bishop.Pattern Recognition and Machine Learning (Information Science and Statistics), pages 402–411. Springer-Verlag, Berlin, Heidelberg, 2006

2006
[7]

F. Bley, S. Lapuschkin, W. Samek, and G. Montavon. Explaining predictive uncertainty by exposing second-order effects.Pattern Recognition, 160:111171, 2025

2025
[8]

Blücher, J

S. Blücher, J. Vielhaben, and N. Strodthoff. Preddiff: Explanations and interactions from conditional expectations.Artif. Intell., 312:103774, 2022

2022
[9]

Chormai, J

P. Chormai, J. Herrmann, K.-R. Müller, and G. Montavon. Disentangled expla- nations of neural network predictions by finding relevant subspaces.IEEE Trans. Pattern Anal. Mach. Intell., 46(11):7283–7299, 2024

2024
[10]

Conmy, A

A. Conmy, A. N. Mavor-Parker, A. Lynch, S. Heimersheim, and A. Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. InNeurIPS, 2023

2023
[11]

T. Cui, P. Marttinen, and S. Kaski. Learning global pairwise interactions with bayesian neural networks. InECAI, Frontiers in Artificial Intelligence and Appli- cations, pages 1087–1094. IOS Press, 2020

2020
[12]

Dhamdhere, M

K. Dhamdhere, M. Sundararajan, and Q. Yan. How important is a neuron. In International Conference on Learning Representations, 2019

2019
[13]

Eberle, J

O. Eberle, J. Büttner, F. Kräutli, K.-R. Müller, M. Valleriani, and G. Montavon. Building and interpreting deep similarity models.IEEE Trans. Pattern Anal. Mach. Intell., 44(3):1149–1161, 2022

2022
[14]

Elhage, N

N. Elhage, N. Nanda, C. Olsson, T. Henighan, N. Joseph, B. Mann, A. Askell, Y . Bai, A. Chen, T. Conerly, N. DasSarma, D. Drain, D. Ganguli, Z. Hatfield- Dodds, D. Hernandez, A. Jones, J. Kernion, L. Lovitt, K. Ndousse, D. Amodei, T. Brown, J. Clark, J. Kaplan, S. McCandlish, and C. Olah. A mathematical framework for transformer circuits.Transformer Circu...

2021
[15]

Erhan, A

D. Erhan, A. Courville, and P. Vincent. Visualizing higher-layer features of a deep network.Technical Report, Univeristé de Montréal, 01 2009

2009
[16]

Ghorbani and J

A. Ghorbani and J. Zou. Neuron shapley: discovering the responsible neurons. In Proceedings of the 34th International Conference on Neural Information Process- ing Systems, Red Hook, NY , USA, 2020. Curran Associates Inc. 24

2020
[17]

Gunning, M

D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf, and G.-Z. Yang. Xai—explainable artificial intelligence.Science Robotics, 4(37), 2019

2019
[18]

J. C. Harsanyi. 17. a bargaining model for the cooperative n-person game. In Contributions to the Theory of Games, Volume IV, pages 325–356. Princeton University Press, Princeton, 1959

1959
[19]

Holzinger, A

A. Holzinger, A. Saranti, C. Molnar, P. Biecek, and W. Samek. Explainable AI methods - A brief overview. InxxAI@ICML, pages 13–38. Springer, 2020

2020
[20]

J. D. Janizek, P. Sturmfels, and S.-I. Lee. Explaining explanations: axiomatic feature interactions for deep networks.J. Mach. Learn. Res., 22(1), Jan. 2021

2021
[21]

Kauffmann, J

J. Kauffmann, J. Dippel, L. Ruff, W. Samek, K.-R. Müller, and G. Montavon. Explainable ai reveals clever hans effects in unsupervised learning models.Nature Machine Intelligence, 7(3):412–422, 2025

2025
[22]

J. R. Kauffmann, M. Esders, L. Ruff, G. Montavon, W. Samek, and K.-R. Müller. From clustering to cluster explanations via neural networks.IEEE Trans. Neural Networks Learn. Syst., 35(2):1926–1940, 2024

1926
[23]

Lapuschkin, S

S. Lapuschkin, S. Wäldchen, A. Binder, G. Montavon, W. Samek, and K.-R. Müller. Unmasking clever hans predictors and assessing what machines really learn.Nature communications, 10(1):1096, 2019

2019
[24]

S. Lloyd. Least squares quantization in PCM.IEEE Trans. Inf. Theory, 28(2): 129–137, 1982

1982
[25]

S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. InNeurIPS, volume 30, page 4768–4777. Curran Associates, Inc., 2017

2017
[26]

S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee. From local explanations to global understanding with explainable ai for trees.Nature Machine Intelligence, 2(1): 56–67, 2020

2020
[27]

Montavon, S

G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K.-R. Müller. Explain- ing nonlinear classification decisions with deep taylor decomposition.Pattern Recognit., 65:211–222, 2017

2017
[28]

Montavon, W

G. Montavon, W. Samek, and K.-R. Müller. Methods for interpreting and under- standing deep neural networks.Digit. Signal Process., 73:1–15, 2018

2018
[29]

C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye, and A. Mord- vintsev. The building blocks of interpretability.Distill, 2018

2018
[30]

D. Rai, Y . Zhou, S. Feng, A. Saparov, and Z. Yao. A practical review of mechanistic interpretability for transformer-based language models.ArXiv, abs/2407.02646, 2024. 25

work page arXiv 2024
[31]

why should I trust you?

M. T. Ribeiro, S. Singh, and C. Guestrin. "why should I trust you?": Explaining the predictions of any classifier. InKDD, pages 1135–1144. ACM, 2016

2016
[32]

G. C. Rota. On the foundations of combinatorial theory i. theory of möbius functions.Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 2(4): 340–368, Jan 1964

1964
[33]

Samek, G

W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, and K.-R. Müller, editors. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, volume 11700 ofLecture Notes in Computer Science. Springer, 2019

2019
[34]

Samek, G

W. Samek, G. Montavon, S. Lapuschkin, C. J. Anders, and K.-R. Müller. Explain- ing deep neural networks and beyond: A review of methods and applications.Proc. IEEE, 109(3):247–278, 2021

2021
[35]

Schnake, O

T. Schnake, O. Eberle, J. Lederer, S. Nakajima, K. T. Schütt, K.-R. Müller, and G. Montavon. Higher-order explanations of graph neural networks via relevant walks.IEEE Trans. Pattern Anal. Mach. Intell., 44(11):7581–7596, 2022

2022
[36]

Schnake, F

T. Schnake, F. R. Jafari, J. Lederer, P. Xiong, S. Nakajima, S. Gugler, G. Mon- tavon, and K.-R. Müller. Towards symbolic XAI - explanation through human understandable logical relationships between features.Inf. Fusion, 118:102923, 2025

2025
[37]

Strumbelj and I

E. Strumbelj and I. Kononenko. An efficient explanation of individual classifi- cations using game theory.J. Mach. Learn. Res., 11:1–18, Mar. 2010. ISSN 1532-4435

2010
[38]

Sundararajan, A

M. Sundararajan, A. Taly, and Q. Yan. Axiomatic attribution for deep networks. InICML- Volume 70, page 3319–3328. JMLR.org, 2017

2017
[39]

Tsang, D

M. Tsang, D. Cheng, and Y . Liu. Detecting statistical interactions from neural network weights. InICLR, 2018

2018
[40]

Xiong, T

P. Xiong, T. Schnake, G. Montavon, K.-R. Müller, and S. Nakajima. Efficient computation of higher-order subgraph attribution via message passing. InICML, pages 24478–24495. PMLR, 2022

2022
[41]

Xiong, T

P. Xiong, T. Schnake, M. Gastegger, G. Montavon, K.-R. Müller, and S. Nakajima. Relevant walk search for explaining graph neural networks. InICML, pages 38301–38324. PMLR, 2023

2023
[42]

Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec. Gnnexplainer: Generat- ing explanations for graph neural networks. InNeurIPS 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 9240–9251, 2019

2019
[43]

H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji. On explainability of graph neural networks via subgraph explorations. InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139, pages 12241–12252. PMLR, 2021. 26

2021
[44]

B. Zhou, Y . Sun, D. Bau, and A. Torralba. Interpretable basis decomposition for visual explanation. InECCV (8), Lecture Notes in Computer Science, pages 122–138. Springer, 2018. 27 Appendix A. Generalization to Arbitrary Architectures In Section 3, we described the NRM framework for a proper FFNNs—a FFNN with no skip connection and no intermediate input ...

2018
[45]

For a target output neuron n(L), i.e., the target class in classification, identify concept neurons, i.e., relevant intermediate neurons {n(l)} in some layer l∈ {1, . . . ,L−1}
[46]

This process is called Relevance Maximization (RelMax)

For each concept neuron n(l), find input samples that maximize the concept relevance. This process is called Relevance Maximization (RelMax)
[47]

This algorithm is called CRP

With respect to each concept neuron n(l) and an output neuron n(L), compute the input relevances. This algorithm is called CRP. The concept relevance (ConRel) computed in Step 1 corresponds to the joint relevance ConReln(l),n(L) =R(n (l),n (L)).(F.3) The quantity (F.3) is also the objective function to be maximized with respect to the input samplex∈ Xin S...
[48]

If the given network is not proper FFNN, convert it into a proper FFNN, following the procedure in Appendix A
[49]

Feed the input sample, and then compute the unnormalized relevance propa- gation matrices {eT(l)}L l=1 and output relevance er(L), from which the consecutive conditional relevance (17) and the output relevance (18) are computed
[50]

Choose a set of neurons S(L) to which we attribute, and the target neuron n(l) to be explained for l>max(L) (typically l=L)
[51]

Compute the relevance R(S(L),n (l)) or R(S(L)|n(l)) by the sum-product message passing algorithm (Theorem 1)
[52]

With this procedure, one can analyze the decision-making mechanism of any type of NN with arbitrarily high-order explanation

If we search for most relevance information flow, iterate Step 4 for different set S(L) of neurons. With this procedure, one can analyze the decision-making mechanism of any type of NN with arbitrarily high-order explanation. 38 Appendix I. Evaluation Methods This appendix provides details of our evaluation metrics. Appendix I.1. Joint contribution The jo...

[1] [1]

Achtibat, M

R. Achtibat, M. Dreyer, I. Eisenbraun, S. Bosse, T. Wiegand, W. Samek, and S. Lapuschkin. From attribution maps to human-understandable explanations through concept relevance propagation.Nat. Mac. Intell., 5(9):1006–1019, 2023

2023

[2] [2]

Achtibat, S

R. Achtibat, S. M. V . Hatefi, M. Dreyer, A. Jain, T. Wiegand, S. Lapuschkin, and W. Samek. Attnlrp: Attention-aware layer-wise relevance propagation for transformers. InICML, pages 135–168. PMLR/OpenReview.net, 2024

2024

[3] [3]

Ancona, E

M. Ancona, E. Ceolini, C. Öztireli, and M. Gross. Towards better understanding of gradient-based attribution methods for deep neural networks. InICLR (Poster). OpenReview.net, 2018. 23

2018

[4] [4]

Arras, J

L. Arras, J. A. Arjona-Medina, M. Widrich, G. Montavon, M. Gillhofer, K.-R. Müller, S. Hochreiter, and W. Samek. Explaining and interpreting LSTMs. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, volume 11700, pages 211–238. Springer, 2019

2019

[5] [5]

S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PloS one, 10(7):e0130140, 2015

2015

[6] [6]

C. M. Bishop.Pattern Recognition and Machine Learning (Information Science and Statistics), pages 402–411. Springer-Verlag, Berlin, Heidelberg, 2006

2006

[7] [7]

F. Bley, S. Lapuschkin, W. Samek, and G. Montavon. Explaining predictive uncertainty by exposing second-order effects.Pattern Recognition, 160:111171, 2025

2025

[8] [8]

Blücher, J

S. Blücher, J. Vielhaben, and N. Strodthoff. Preddiff: Explanations and interactions from conditional expectations.Artif. Intell., 312:103774, 2022

2022

[9] [9]

Chormai, J

P. Chormai, J. Herrmann, K.-R. Müller, and G. Montavon. Disentangled expla- nations of neural network predictions by finding relevant subspaces.IEEE Trans. Pattern Anal. Mach. Intell., 46(11):7283–7299, 2024

2024

[10] [10]

Conmy, A

A. Conmy, A. N. Mavor-Parker, A. Lynch, S. Heimersheim, and A. Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. InNeurIPS, 2023

2023

[11] [11]

T. Cui, P. Marttinen, and S. Kaski. Learning global pairwise interactions with bayesian neural networks. InECAI, Frontiers in Artificial Intelligence and Appli- cations, pages 1087–1094. IOS Press, 2020

2020

[12] [12]

Dhamdhere, M

K. Dhamdhere, M. Sundararajan, and Q. Yan. How important is a neuron. In International Conference on Learning Representations, 2019

2019

[13] [13]

Eberle, J

O. Eberle, J. Büttner, F. Kräutli, K.-R. Müller, M. Valleriani, and G. Montavon. Building and interpreting deep similarity models.IEEE Trans. Pattern Anal. Mach. Intell., 44(3):1149–1161, 2022

2022

[14] [14]

Elhage, N

N. Elhage, N. Nanda, C. Olsson, T. Henighan, N. Joseph, B. Mann, A. Askell, Y . Bai, A. Chen, T. Conerly, N. DasSarma, D. Drain, D. Ganguli, Z. Hatfield- Dodds, D. Hernandez, A. Jones, J. Kernion, L. Lovitt, K. Ndousse, D. Amodei, T. Brown, J. Clark, J. Kaplan, S. McCandlish, and C. Olah. A mathematical framework for transformer circuits.Transformer Circu...

2021

[15] [15]

Erhan, A

D. Erhan, A. Courville, and P. Vincent. Visualizing higher-layer features of a deep network.Technical Report, Univeristé de Montréal, 01 2009

2009

[16] [16]

Ghorbani and J

A. Ghorbani and J. Zou. Neuron shapley: discovering the responsible neurons. In Proceedings of the 34th International Conference on Neural Information Process- ing Systems, Red Hook, NY , USA, 2020. Curran Associates Inc. 24

2020

[17] [17]

Gunning, M

D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf, and G.-Z. Yang. Xai—explainable artificial intelligence.Science Robotics, 4(37), 2019

2019

[18] [18]

J. C. Harsanyi. 17. a bargaining model for the cooperative n-person game. In Contributions to the Theory of Games, Volume IV, pages 325–356. Princeton University Press, Princeton, 1959

1959

[19] [19]

Holzinger, A

A. Holzinger, A. Saranti, C. Molnar, P. Biecek, and W. Samek. Explainable AI methods - A brief overview. InxxAI@ICML, pages 13–38. Springer, 2020

2020

[20] [20]

J. D. Janizek, P. Sturmfels, and S.-I. Lee. Explaining explanations: axiomatic feature interactions for deep networks.J. Mach. Learn. Res., 22(1), Jan. 2021

2021

[21] [21]

Kauffmann, J

J. Kauffmann, J. Dippel, L. Ruff, W. Samek, K.-R. Müller, and G. Montavon. Explainable ai reveals clever hans effects in unsupervised learning models.Nature Machine Intelligence, 7(3):412–422, 2025

2025

[22] [22]

J. R. Kauffmann, M. Esders, L. Ruff, G. Montavon, W. Samek, and K.-R. Müller. From clustering to cluster explanations via neural networks.IEEE Trans. Neural Networks Learn. Syst., 35(2):1926–1940, 2024

1926

[23] [23]

Lapuschkin, S

S. Lapuschkin, S. Wäldchen, A. Binder, G. Montavon, W. Samek, and K.-R. Müller. Unmasking clever hans predictors and assessing what machines really learn.Nature communications, 10(1):1096, 2019

2019

[24] [24]

S. Lloyd. Least squares quantization in PCM.IEEE Trans. Inf. Theory, 28(2): 129–137, 1982

1982

[25] [25]

S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. InNeurIPS, volume 30, page 4768–4777. Curran Associates, Inc., 2017

2017

[26] [26]

S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee. From local explanations to global understanding with explainable ai for trees.Nature Machine Intelligence, 2(1): 56–67, 2020

2020

[27] [27]

Montavon, S

G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K.-R. Müller. Explain- ing nonlinear classification decisions with deep taylor decomposition.Pattern Recognit., 65:211–222, 2017

2017

[28] [28]

Montavon, W

G. Montavon, W. Samek, and K.-R. Müller. Methods for interpreting and under- standing deep neural networks.Digit. Signal Process., 73:1–15, 2018

2018

[29] [29]

C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye, and A. Mord- vintsev. The building blocks of interpretability.Distill, 2018

2018

[30] [30]

D. Rai, Y . Zhou, S. Feng, A. Saparov, and Z. Yao. A practical review of mechanistic interpretability for transformer-based language models.ArXiv, abs/2407.02646, 2024. 25

work page arXiv 2024

[31] [31]

why should I trust you?

M. T. Ribeiro, S. Singh, and C. Guestrin. "why should I trust you?": Explaining the predictions of any classifier. InKDD, pages 1135–1144. ACM, 2016

2016

[32] [32]

G. C. Rota. On the foundations of combinatorial theory i. theory of möbius functions.Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 2(4): 340–368, Jan 1964

1964

[33] [33]

Samek, G

W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, and K.-R. Müller, editors. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, volume 11700 ofLecture Notes in Computer Science. Springer, 2019

2019

[34] [34]

Samek, G

W. Samek, G. Montavon, S. Lapuschkin, C. J. Anders, and K.-R. Müller. Explain- ing deep neural networks and beyond: A review of methods and applications.Proc. IEEE, 109(3):247–278, 2021

2021

[35] [35]

Schnake, O

T. Schnake, O. Eberle, J. Lederer, S. Nakajima, K. T. Schütt, K.-R. Müller, and G. Montavon. Higher-order explanations of graph neural networks via relevant walks.IEEE Trans. Pattern Anal. Mach. Intell., 44(11):7581–7596, 2022

2022

[36] [36]

Schnake, F

T. Schnake, F. R. Jafari, J. Lederer, P. Xiong, S. Nakajima, S. Gugler, G. Mon- tavon, and K.-R. Müller. Towards symbolic XAI - explanation through human understandable logical relationships between features.Inf. Fusion, 118:102923, 2025

2025

[37] [37]

Strumbelj and I

E. Strumbelj and I. Kononenko. An efficient explanation of individual classifi- cations using game theory.J. Mach. Learn. Res., 11:1–18, Mar. 2010. ISSN 1532-4435

2010

[38] [38]

Sundararajan, A

M. Sundararajan, A. Taly, and Q. Yan. Axiomatic attribution for deep networks. InICML- Volume 70, page 3319–3328. JMLR.org, 2017

2017

[39] [39]

Tsang, D

M. Tsang, D. Cheng, and Y . Liu. Detecting statistical interactions from neural network weights. InICLR, 2018

2018

[40] [40]

Xiong, T

P. Xiong, T. Schnake, G. Montavon, K.-R. Müller, and S. Nakajima. Efficient computation of higher-order subgraph attribution via message passing. InICML, pages 24478–24495. PMLR, 2022

2022

[41] [41]

Xiong, T

P. Xiong, T. Schnake, M. Gastegger, G. Montavon, K.-R. Müller, and S. Nakajima. Relevant walk search for explaining graph neural networks. InICML, pages 38301–38324. PMLR, 2023

2023

[42] [42]

Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec. Gnnexplainer: Generat- ing explanations for graph neural networks. InNeurIPS 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 9240–9251, 2019

2019

[43] [43]

H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji. On explainability of graph neural networks via subgraph explorations. InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139, pages 12241–12252. PMLR, 2021. 26

2021

[44] [44]

B. Zhou, Y . Sun, D. Bau, and A. Torralba. Interpretable basis decomposition for visual explanation. InECCV (8), Lecture Notes in Computer Science, pages 122–138. Springer, 2018. 27 Appendix A. Generalization to Arbitrary Architectures In Section 3, we described the NRM framework for a proper FFNNs—a FFNN with no skip connection and no intermediate input ...

2018

[45] [45]

For a target output neuron n(L), i.e., the target class in classification, identify concept neurons, i.e., relevant intermediate neurons {n(l)} in some layer l∈ {1, . . . ,L−1}

[46] [46]

This process is called Relevance Maximization (RelMax)

For each concept neuron n(l), find input samples that maximize the concept relevance. This process is called Relevance Maximization (RelMax)

[47] [47]

This algorithm is called CRP

With respect to each concept neuron n(l) and an output neuron n(L), compute the input relevances. This algorithm is called CRP. The concept relevance (ConRel) computed in Step 1 corresponds to the joint relevance ConReln(l),n(L) =R(n (l),n (L)).(F.3) The quantity (F.3) is also the objective function to be maximized with respect to the input samplex∈ Xin S...

[48] [48]

If the given network is not proper FFNN, convert it into a proper FFNN, following the procedure in Appendix A

[49] [49]

Feed the input sample, and then compute the unnormalized relevance propa- gation matrices {eT(l)}L l=1 and output relevance er(L), from which the consecutive conditional relevance (17) and the output relevance (18) are computed

[50] [50]

Choose a set of neurons S(L) to which we attribute, and the target neuron n(l) to be explained for l>max(L) (typically l=L)

[51] [51]

Compute the relevance R(S(L),n (l)) or R(S(L)|n(l)) by the sum-product message passing algorithm (Theorem 1)

[52] [52]

With this procedure, one can analyze the decision-making mechanism of any type of NN with arbitrarily high-order explanation

If we search for most relevance information flow, iterate Step 4 for different set S(L) of neurons. With this procedure, one can analyze the decision-making mechanism of any type of NN with arbitrarily high-order explanation. 38 Appendix I. Evaluation Methods This appendix provides details of our evaluation metrics. Appendix I.1. Joint contribution The jo...