Playing the network backward: A Game Theoretic Attribution Framework

Georg Loho; Jakob Paul Zimmermann; Jim Berend; Sebastian Lapuschkin; Wojciech Samek

arxiv: 2605.06212 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.CV

Playing the network backward: A Game Theoretic Attribution Framework

Jakob Paul Zimmermann , Jim Berend , Georg Loho , Sebastian Lapuschkin , Wojciech Samek This is my paper

Pith reviewed 2026-05-08 13:22 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords attribution methodsexplainable AIgame theoryLRPvision transformersneural networksinterpretabilitygradients

0 comments

The pith

Backward attribution methods arise as equilibria in a two-player game on the extended network graph, turning explanation design into strategy selection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified framework by modeling backward attribution passes as a two-player game on an extended network graph. In this model, standard techniques such as gradients and the full alpha-beta-LRP family emerge as integrals over game trajectories under particular equilibria, so that attribution maps appear as projections of trajectory distributions. Desired properties of explanations, including localisation focus and robustness, can be expressed as game-theoretic notions like policy regularization or risk aversion and converted directly into new backward rules. One such adapted alpha-beta-LRP rule is shown to outperform earlier transformer-specific methods on all tested localisation metrics for ViT-B/16. If the recasting holds, attribution research gains a common language for comparing methods and deriving targeted variants rather than developing them in isolation.

Core claim

Backward attribution calculations are equivalent to integrals over trajectories in a two-player game on the extended network graph. Gradients arise under one equilibrium while the alpha-beta-LRP family arises under others; the resulting attribution maps are projections of the trajectory distributions. Game concepts such as policy regularization and extended action sets translate into novel adaptations of the backward rules that preserve core properties while adding specified behaviors.

What carries the argument

The two-player game on the extended network graph, in which equilibria and trajectory distributions recover standard attribution rules and generate new ones.

If this is right

Gradients and the full alpha-beta-LRP family are recovered as integrals over trajectories under specific equilibria.
Attribution maps become projections of trajectory distributions rather than the primary object.
Explanation properties such as localisation focus or stable attention routing are specified as game concepts and translated into new backward rules.
A selected adaptation of alpha-beta-LRP outperforms prior transformer-specific methods across all considered localisation metrics on ViT-B/16.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The game view could be used to combine multiple equilibria into hybrid attribution rules that trade off different properties.
Testing whether varying risk aversion parameters improves explanation stability under input noise would be a direct next experiment.
The framework suggests a route to import solution concepts from game theory to design attributions for architectures beyond vision transformers.

Load-bearing premise

The original backward attribution calculations can be recast as equilibria and trajectory distributions in the two-player game without distorting their mathematical properties or introducing artifacts that change explanation quality.

What would settle it

A direct check that a newly derived game adaptation of alpha-beta-LRP produces attribution maps mathematically inconsistent with the known alpha-beta-LRP formulas, or that it fails to improve localisation metrics on ViT-B/16 relative to prior rules, would falsify the claim of faithful recovery and useful extension.

Figures

Figures reproduced from arXiv: 2605.06212 by Georg Loho, Jakob Paul Zimmermann, Jim Berend, Sebastian Lapuschkin, Wojciech Samek.

**Figure 1.** Figure 1: We lift the backward pass through a network into a two-player game on an extended view at source ↗

**Figure 2.** Figure 2: Hellinger trajectory distance under cascading parameter randomisation [Adebayo et al., view at source ↗

**Figure 3.** Figure 3: Temperature sweeps aligned with Table 3(a) and (c) show the focus of attribution at lower view at source ↗

**Figure 4.** Figure 4: Stopping Game: trajectory distribution and local stopping decisions on a toy subnetwork. view at source ↗

**Figure 5.** Figure 5: Routing Game: local routing subgame around view at source ↗

**Figure 6.** Figure 6: Dense qualitative comparison on six ImageNet-S examples (ViT-B/16). Columns: Original; view at source ↗

**Figure 7.** Figure 7: Input-noise Hellinger trajectory distance. Solid: plain view at source ↗

**Figure 8.** Figure 8: Per-image standard deviation of the Hellinger trajectory distance vs. cascading random view at source ↗

read the original abstract

Attribution methods explain which input features drive a model's prediction, making them central to model debugging and mechanistic interpretability. Yet backward attribution methods, including gradients, LRP, and transformer-specific rules, lack a shared framework in which to compare the underlying backward calculations. We introduce such a framework by recasting backward attribution as a two-player game on an extended network graph, building on Gaubert and Vlassopoulos' ReLU Net Game. Gradients and the full alpha-beta-LRP family arise as integrals over game trajectories under specific equilibria, so attribution maps become projections of trajectory distributions rather than the primary object. Desired explanation properties, such as localisation focus, robustness to input noise, or stable attention routing, can be specified as game-theoretic concepts, including policy regularization, risk aversion, and extended action sets, and translate directly into novel adaptations of the well-known backward rules. On ViT-B/16, one such selected adaptation of alpha-beta-LRP outperforms prior transformer-specific backward methods across all considered localisation metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper recasts backward attribution as trajectories in a two-player game on an extended network graph, derives gradients and alpha-beta LRP as special cases, and reports better localization on ViT-B/16 with one new adaptation.

read the letter

The core move here is recasting backward attribution as a two-player game on an extended network graph, where gradients and the full alpha-beta-LRP family come out as integrals over trajectories under chosen equilibria. Attribution maps then become projections of those trajectory distributions rather than the starting point. The authors use this to turn desired properties like localization focus or robustness into game concepts such as policy regularization or extended action sets, which directly produce modified backward rules. On ViT-B/16 one selected adaptation beats prior transformer-specific methods on all the localization metrics they tested. That gives the work a practical anchor. What is new is the extension of the earlier ReLU Net Game to general networks including attention and softmax layers, plus the explicit derivation of standard methods as special cases inside the same setup. The framework supplies a shared language that lets people compare and design attribution rules more systematically instead of tweaking them in isolation. The empirical gains are concrete and the citation pattern builds cleanly on the prior game and standard LRP papers. The soft spot is whether the mapping stays exact. For the claim to hold without distortion, the payoff structure and equilibria must recover the original rules for every layer type, including non-ReLU operations. If equilibria are non-unique or the trajectory measure adds implicit smoothing, the reported improvements could come from the tweaks themselves rather than the game framing. The abstract states the equivalence but the details on how the extended graph is constructed and how equilibria are solved for each layer would need close inspection to rule out artifacts. This is aimed at interpretability researchers who already work with LRP or game-theoretic views of networks and want a principled way to generate new variants. Readers focused on transformer explanations will find the most immediate use in the adaptations and results. The idea is fresh enough and the evidence specific enough that it deserves a serious referee who can check the derivations and run further controls on the transformer cases. I would send it out for peer review.

Referee Report

3 major / 2 minor

Summary. The paper introduces a game-theoretic framework for backward attribution by recasting it as a two-player game on an extended network graph, extending the ReLU Net Game. It claims that gradients and the full alpha-beta-LRP family arise exactly as integrals over game trajectories under specific equilibria, allowing desired properties (localisation, robustness) to be encoded as game concepts such as policy regularization or extended action sets. Novel adaptations of alpha-beta-LRP are derived and shown to outperform prior transformer-specific backward rules on ViT-B/16 across localisation metrics.

Significance. If the claimed exact equivalences hold without distortion for all layer types, the framework supplies a unifying lens that could systematize the design of attribution methods and translate explanation desiderata into game-theoretic primitives. The empirical gains on ViT-B/16 provide concrete evidence of utility for transformer interpretability. The absence of free parameters in the core mapping and the machine-checkable nature of the special-case recoveries (if supplied) would strengthen the contribution.

major comments (3)

[§3] §3 (derivation of equilibria): The central claim that gradients and alpha-beta-LRP arise as integrals over trajectories requires explicit equilibrium conditions, payoff matrices, and a proof sketch showing that the original relevance-propagation rules (including alpha/beta weighting and gradient cases) are recovered exactly for every layer type, especially attention and softmax operations in ViT. Without these, it is impossible to verify that the trajectory measure introduces no implicit smoothing or artifacts.
[§4.2] §4.2 (extended graph construction): The translation of standard backward rules into action sets and payoffs on the extended graph must be shown to be faithful; any layer-specific choice of equilibria risks non-uniqueness or distortion that would undermine the assertion that attribution maps are merely projections of trajectory distributions.
[§5] §5 (empirical evaluation): The reported outperformance uses one selected adaptation of alpha-beta-LRP; the manuscript should clarify whether this adaptation was chosen after seeing the results and whether the full family of game-theoretic adaptations was evaluated to support the claim that the framework enables principled improvements.

minor comments (2)

[Figure 1] The notation for game trajectories and their distributions would benefit from an explicit diagram relating the extended graph to the original network layers.
[Abstract] Several sentences in the abstract and introduction repeat the unification claim without distinguishing between the theoretical mapping and the empirical adaptations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [§3] §3 (derivation of equilibria): The central claim that gradients and alpha-beta-LRP arise as integrals over trajectories requires explicit equilibrium conditions, payoff matrices, and a proof sketch showing that the original relevance-propagation rules (including alpha/beta weighting and gradient cases) are recovered exactly for every layer type, especially attention and softmax operations in ViT. Without these, it is impossible to verify that the trajectory measure introduces no implicit smoothing or artifacts.

Authors: We agree that greater explicitness will improve verifiability. In the revised manuscript we will expand §3 with the equilibrium conditions and payoff matrices for each layer type. We will also supply a proof sketch that recovers the original gradient and alpha-beta-LRP rules exactly, with dedicated treatment of attention and softmax layers in ViT, confirming that the trajectory integrals introduce no smoothing or other artifacts. revision: yes
Referee: [§4.2] §4.2 (extended graph construction): The translation of standard backward rules into action sets and payoffs on the extended graph must be shown to be faithful; any layer-specific choice of equilibria risks non-uniqueness or distortion that would undermine the assertion that attribution maps are merely projections of trajectory distributions.

Authors: We will revise §4.2 (and add an appendix if space is needed) to present the explicit translation of each standard backward rule into action sets and payoffs on the extended graph. The revision will specify the equilibrium selection rule per layer type and demonstrate that the resulting attribution maps are faithful projections of the trajectory distributions, thereby removing any ambiguity about non-uniqueness or distortion. revision: yes
Referee: [§5] §5 (empirical evaluation): The reported outperformance uses one selected adaptation of alpha-beta-LRP; the manuscript should clarify whether this adaptation was chosen after seeing the results and whether the full family of game-theoretic adaptations was evaluated to support the claim that the framework enables principled improvements.

Authors: We will clarify in the revised §5 that the reported adaptation was derived from the game-theoretic desiderata (policy regularization for localization) before the experiments were run. We will also report results for the other adaptations considered under the framework, thereby supporting the claim that the framework enables principled improvements rather than post-hoc selection. revision: partial

Circularity Check

0 steps flagged

No circularity: unification via external ReLU Net Game

full rationale

The paper builds its framework explicitly on the external Gaubert and Vlassopoulos ReLU Net Game and presents gradients plus the alpha-beta-LRP family as special cases arising under chosen equilibria. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided abstract and description. The central claim is a recasting that treats attribution maps as projections of trajectory distributions; this remains an independent modeling choice rather than a reduction to the paper's own inputs by construction. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Only the abstract is available, so the ledger is necessarily incomplete. The framework rests on the prior ReLU Net Game and on the assumption that backward passes correspond to game trajectories.

axioms (1)

domain assumption Backward attribution calculations can be represented as equilibria and trajectory distributions in a two-player game on an extended network graph
Central modeling step stated in the abstract; if false, the unification and new adaptations lose their foundation.

invented entities (2)

Extended network graph for the game no independent evidence
purpose: To host the two-player game whose trajectories yield attribution maps
Introduced to recast backward passes; no independent evidence provided in abstract
Game trajectories and their distributions no independent evidence
purpose: To serve as the underlying object from which attribution maps are projected
Core new object in the framework; no falsifiable handle given in abstract

pith-pipeline@v0.9.0 · 5482 in / 1409 out tokens · 54107 ms · 2026-05-08T13:22:51.938358+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Quantifying Attention Flow in Transformers

Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.385. URL https://aclanthology.org/2020.acl-main.385/. Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, and Wojciech Samek. AttnLRP: Attention-aware layer-wise relevance propagation for transformers. In Ruslan S...

work page doi:10.18653/v1/2020.acl-main.385 2020
[2]

Finite-time analysis of the multiarmed bandit problem

URLhttps://openreview.net/forum?id=B1J_rgWRW. Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit prob- lem.Machine Learning, 47(2–3):235–256, 2002. doi: https://doi.org/10.1023/A:1013689704352. Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On...

work page doi:10.1023/a:1013689704352 2002
[3]

why should i trust you?

doi: https://doi.org/10.1016/j.patcog.2021.108194. URLhttps://www.sciencedirect. com/science/article/pii/S0031320321003769. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 ...

work page doi:10.1016/j.patcog.2021.108194 2021
[4]

Attribution Patching Outperforms Automated Circuit Discovery

URLhttp://arxiv.org/abs/1409.1556. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Vi- sualising image classification models and saliency maps. InWorkshop at International Conference on Learning Representations, 2014. URLhttps://arxiv.org/abs/1312.6034. Leon Sixt, Maximilian Granz, and Tim Landgraf. When explanati...

work page doi:10.18653/v1/2024.blackboxnlp-1.25 2014
[5]

stub path

with P ′ l−1 =j ; the prepended edge uses W (l) ij , contributing |W (l) ij | to |w|(P) and either preserving or flipping parity depending on sgn(W (l) ij ). Crucially, the new gate factor G<l(P) for P equals G≤l−1(P ′): up to layer l−1 the gates are those along P ′, and there is no gate at layerlin theG <l product. Case W (l) ij >0 .Then W (l,+) ij =|W (...

work page 2025
[6]

We additionally maintain the non-negative payoff per player, and for each state its expectation over future paths

Separate player-specific payoffs.The original formulation tracks only one game value, which equals ±a(l) i of the original network. We additionally maintain the non-negative payoff per player, and for each state its expectation over future paths. The original game value is then recovered as the difference of these two player-specific quantities

work page
[7]

We replace these by the non-negative pair x+ k = max(x k,0) and x− k = max(−x k,0) , which in the SG keeps every player-specific payoff non-negative

Terminal SG values split into the positive and negative input parts.In the original formulation the game values at the input layer are the signed scalars ±xk. We replace these by the non-negative pair x+ k = max(x k,0) and x− k = max(−x k,0) , which in the SG keeps every player-specific payoff non-negative. This exposes the parity trajectory decomposition...

work page
[8]

Oracle with γO = 2 is not forced by the architecture; it is the simplest choice inside a one-parameter family of conservation-preserving Oracles (p,1−p) with matching discounts (1/p,1/(1−p)) . Any such split preserves the forward equivalence of Proposi- tion 1, since the constraint the forward pass imposes is that the player-specific value at an addition ...

work page
[9]

Part 2.By Theorem 5, a(l) i =a (l,+) stop,i −a (l,−) stop,i and a(m) j =a (m,+) stop,j −a (m,−) stop,j

Network Activation Gradient.The ordinary-network gradient of a(m) j with respect to the scalar activationa (l) i is ∂a(m) j ∂a(l) i =ξ q,+ Γu(s(l,act) i,+ )−Γ u(s(l,act) i,− ) .(74) Proof.We establish Part 1 by backward induction on the layer gapm−land derive Part 2 from it. Part 2.By Theorem 5, a(l) i =a (l,+) stop,i −a (l,−) stop,i and a(m) j =a (m,+) s...

work page 2025
[10]

for a textbook proof. As in Section 3.2, write ℓ(z) :=    logz, z >0, −∞, z= 0, ω, z <0, ω <−∞<0.(94) Thus, zero routed mass is assigned the stopping value−∞, while genuinely negative mass is assigned the strictly worse formal value ω. Moreover, we define the exponential function to evaluate to 0 both on−∞andω. exp(ω) := exp(−∞) := 0(95) We remark that...

work page 2015
[11]

For every player label p∈ {+,−}, Γx sxop,p = 1 2 Γx sadd z,p ,Γ x syop,p = 1 2 Γx sadd z,p ,(133) so the operand pair carries the full addition-state mass with no duplication

Residual Addition.Let z=x op +y op be an addition node with addition state sadd z,p and operand states sxop,p, syop,p in the notation of Definition 10. For every player label p∈ {+,−}, Γx sxop,p = 1 2 Γx sadd z,p ,Γ x syop,p = 1 2 Γx sadd z,p ,(133) so the operand pair carries the full addition-state mass with no duplication. 39

work page
[12]

Max Pooling.For a pooled output z= max{x 1, . . . , xm} with winner k⋆ and pooling state smax z,p (Definition 11), the deterministic value-maximising transition concentrates all mass on the winner: for every player labelp∈ {+,−}, Γx sxk⋆ ,p = Γ x smax z,p ,Γ x sxr,p = 0forr̸=k ⋆.(134) Proof.We proveR (L) u ·Γ (l) j =R (l) j at every layerlby backward indu...

work page
[13]

Sign-oracle split (output activation → output sign-branch, fixed (q, d)).At s(att,O,act) q,d,p an unobserved Oracle transitions uniformly to s(att,O,lin) (q,d),p,+ (player p retains the turn, trajectory discount 2α) or to s(att,O,lin) (q,d),p′,− (turn switches to opponent p′, discount 2β), each with probability 1

work page
[14]

The feature indexdis preserved

work page
[15]

Value-routing policy (output sign-branch→ V-projection linear).At s(att,O,lin) (q,d),p,σ the active player picks a key tokenkby the mixed action π⋆ q,d,σ(k) = Aq,k ˜vσ k,d Z σ q,d ,(148) derived in §E.1.4 as the equilibrium of a KL-regularised log-payoff problem against the reference µq =A q,·. The trajectory transitions to s(att,V,lin) (k,d),p,σ with the...

work page
[16]

risk-averse

V-projection routing (V-projection linear → input activation).At s(att,V,lin) (k,d),p,σ the active player picks an input dimeby the standard linear-state Gibbs policy of Definition 9 on the σ-stream weightsW σ V,e,d: π⋆ V,k,d,σ(e) = W σ V,e,d Xk,e ˜vσ k,d .(149) The trajectory transitions to s(att,X,act) (k,e),p with the player label preserved and traject...

work page 2026
[17]

= 0), tapering to 0 at the boundary π∈ {0,1} . It rewardsindecisiveness— exactly the role Shannon entropy H(π) plays in the Softplus variant of §C.3, where the entropy bonus is the active player’s surplus from being allowed to mix. The optimum π⋆ = Φ(z) =E ε∼N(0,1) [1(z+ε >0) ] is the hard ReLU gate averaged over a Gaussian shift of its threshold, alignin...

work page 2016
[18]

backward calculations

Mode-selection grid on the custom 50-image validation split.For every method we sweep the per-method ranges in Table 5. The single configuration per method reported in Tables 1 and 2 is the validation winner under the localisation rank-sum criterion below. The larger quantitative appendix tables (Appendix G.2) report the retained top configurations from t...

work page arXiv 2024
[19]

J.1 Heatmap similarity at full randomisation We apply the cascading parameter randomization test of Adebayo et al

on the main-paper attribution methods; §J.2 carries the same protocol over to the trajectory- space Hellinger diagnostic of Appendix I; §J.4 compares the two per image. J.1 Heatmap similarity at full randomisation We apply the cascading parameter randomization test of Adebayo et al. [2018] to all attribution meth- ods evaluated in Section 6. Starting from...

work page 2018
[20]

shadow-map

into (265) gives d= 0.1 , κ= 0.5·0.9 = 0.45 , and asymptote c∞ ≈0.18 , hence H∞ ≈ √1−0.18≈0.91 — within the same order of magnitude as the empirically observed H≈0.96 . The remaining discrepancy reflects correlation between the pretrained and randomized dead masks (not truly independent), concentration of β(l) ord away from deadA (whichlowers d relative t...

work page arXiv 2018

[1] [1]

Quantifying Attention Flow in Transformers

Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.385. URL https://aclanthology.org/2020.acl-main.385/. Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, and Wojciech Samek. AttnLRP: Attention-aware layer-wise relevance propagation for transformers. In Ruslan S...

work page doi:10.18653/v1/2020.acl-main.385 2020

[2] [2]

Finite-time analysis of the multiarmed bandit problem

URLhttps://openreview.net/forum?id=B1J_rgWRW. Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit prob- lem.Machine Learning, 47(2–3):235–256, 2002. doi: https://doi.org/10.1023/A:1013689704352. Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On...

work page doi:10.1023/a:1013689704352 2002

[3] [3]

why should i trust you?

doi: https://doi.org/10.1016/j.patcog.2021.108194. URLhttps://www.sciencedirect. com/science/article/pii/S0031320321003769. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 ...

work page doi:10.1016/j.patcog.2021.108194 2021

[4] [4]

Attribution Patching Outperforms Automated Circuit Discovery

URLhttp://arxiv.org/abs/1409.1556. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Vi- sualising image classification models and saliency maps. InWorkshop at International Conference on Learning Representations, 2014. URLhttps://arxiv.org/abs/1312.6034. Leon Sixt, Maximilian Granz, and Tim Landgraf. When explanati...

work page doi:10.18653/v1/2024.blackboxnlp-1.25 2014

[5] [5]

stub path

with P ′ l−1 =j ; the prepended edge uses W (l) ij , contributing |W (l) ij | to |w|(P) and either preserving or flipping parity depending on sgn(W (l) ij ). Crucially, the new gate factor G<l(P) for P equals G≤l−1(P ′): up to layer l−1 the gates are those along P ′, and there is no gate at layerlin theG <l product. Case W (l) ij >0 .Then W (l,+) ij =|W (...

work page 2025

[6] [6]

We additionally maintain the non-negative payoff per player, and for each state its expectation over future paths

Separate player-specific payoffs.The original formulation tracks only one game value, which equals ±a(l) i of the original network. We additionally maintain the non-negative payoff per player, and for each state its expectation over future paths. The original game value is then recovered as the difference of these two player-specific quantities

work page

[7] [7]

We replace these by the non-negative pair x+ k = max(x k,0) and x− k = max(−x k,0) , which in the SG keeps every player-specific payoff non-negative

Terminal SG values split into the positive and negative input parts.In the original formulation the game values at the input layer are the signed scalars ±xk. We replace these by the non-negative pair x+ k = max(x k,0) and x− k = max(−x k,0) , which in the SG keeps every player-specific payoff non-negative. This exposes the parity trajectory decomposition...

work page

[8] [8]

Oracle with γO = 2 is not forced by the architecture; it is the simplest choice inside a one-parameter family of conservation-preserving Oracles (p,1−p) with matching discounts (1/p,1/(1−p)) . Any such split preserves the forward equivalence of Proposi- tion 1, since the constraint the forward pass imposes is that the player-specific value at an addition ...

work page

[9] [9]

Part 2.By Theorem 5, a(l) i =a (l,+) stop,i −a (l,−) stop,i and a(m) j =a (m,+) stop,j −a (m,−) stop,j

Network Activation Gradient.The ordinary-network gradient of a(m) j with respect to the scalar activationa (l) i is ∂a(m) j ∂a(l) i =ξ q,+ Γu(s(l,act) i,+ )−Γ u(s(l,act) i,− ) .(74) Proof.We establish Part 1 by backward induction on the layer gapm−land derive Part 2 from it. Part 2.By Theorem 5, a(l) i =a (l,+) stop,i −a (l,−) stop,i and a(m) j =a (m,+) s...

work page 2025

[10] [10]

for a textbook proof. As in Section 3.2, write ℓ(z) :=    logz, z >0, −∞, z= 0, ω, z <0, ω <−∞<0.(94) Thus, zero routed mass is assigned the stopping value−∞, while genuinely negative mass is assigned the strictly worse formal value ω. Moreover, we define the exponential function to evaluate to 0 both on−∞andω. exp(ω) := exp(−∞) := 0(95) We remark that...

work page 2015

[11] [11]

For every player label p∈ {+,−}, Γx sxop,p = 1 2 Γx sadd z,p ,Γ x syop,p = 1 2 Γx sadd z,p ,(133) so the operand pair carries the full addition-state mass with no duplication

Residual Addition.Let z=x op +y op be an addition node with addition state sadd z,p and operand states sxop,p, syop,p in the notation of Definition 10. For every player label p∈ {+,−}, Γx sxop,p = 1 2 Γx sadd z,p ,Γ x syop,p = 1 2 Γx sadd z,p ,(133) so the operand pair carries the full addition-state mass with no duplication. 39

work page

[12] [12]

Max Pooling.For a pooled output z= max{x 1, . . . , xm} with winner k⋆ and pooling state smax z,p (Definition 11), the deterministic value-maximising transition concentrates all mass on the winner: for every player labelp∈ {+,−}, Γx sxk⋆ ,p = Γ x smax z,p ,Γ x sxr,p = 0forr̸=k ⋆.(134) Proof.We proveR (L) u ·Γ (l) j =R (l) j at every layerlby backward indu...

work page

[13] [13]

Sign-oracle split (output activation → output sign-branch, fixed (q, d)).At s(att,O,act) q,d,p an unobserved Oracle transitions uniformly to s(att,O,lin) (q,d),p,+ (player p retains the turn, trajectory discount 2α) or to s(att,O,lin) (q,d),p′,− (turn switches to opponent p′, discount 2β), each with probability 1

work page

[14] [14]

The feature indexdis preserved

work page

[15] [15]

Value-routing policy (output sign-branch→ V-projection linear).At s(att,O,lin) (q,d),p,σ the active player picks a key tokenkby the mixed action π⋆ q,d,σ(k) = Aq,k ˜vσ k,d Z σ q,d ,(148) derived in §E.1.4 as the equilibrium of a KL-regularised log-payoff problem against the reference µq =A q,·. The trajectory transitions to s(att,V,lin) (k,d),p,σ with the...

work page

[16] [16]

risk-averse

V-projection routing (V-projection linear → input activation).At s(att,V,lin) (k,d),p,σ the active player picks an input dimeby the standard linear-state Gibbs policy of Definition 9 on the σ-stream weightsW σ V,e,d: π⋆ V,k,d,σ(e) = W σ V,e,d Xk,e ˜vσ k,d .(149) The trajectory transitions to s(att,X,act) (k,e),p with the player label preserved and traject...

work page 2026

[17] [17]

= 0), tapering to 0 at the boundary π∈ {0,1} . It rewardsindecisiveness— exactly the role Shannon entropy H(π) plays in the Softplus variant of §C.3, where the entropy bonus is the active player’s surplus from being allowed to mix. The optimum π⋆ = Φ(z) =E ε∼N(0,1) [1(z+ε >0) ] is the hard ReLU gate averaged over a Gaussian shift of its threshold, alignin...

work page 2016

[18] [18]

backward calculations

Mode-selection grid on the custom 50-image validation split.For every method we sweep the per-method ranges in Table 5. The single configuration per method reported in Tables 1 and 2 is the validation winner under the localisation rank-sum criterion below. The larger quantitative appendix tables (Appendix G.2) report the retained top configurations from t...

work page arXiv 2024

[19] [19]

J.1 Heatmap similarity at full randomisation We apply the cascading parameter randomization test of Adebayo et al

on the main-paper attribution methods; §J.2 carries the same protocol over to the trajectory- space Hellinger diagnostic of Appendix I; §J.4 compares the two per image. J.1 Heatmap similarity at full randomisation We apply the cascading parameter randomization test of Adebayo et al. [2018] to all attribution meth- ods evaluated in Section 6. Starting from...

work page 2018

[20] [20]

shadow-map

into (265) gives d= 0.1 , κ= 0.5·0.9 = 0.45 , and asymptote c∞ ≈0.18 , hence H∞ ≈ √1−0.18≈0.91 — within the same order of magnitude as the empirically observed H≈0.96 . The remaining discrepancy reflects correlation between the pretrained and randomized dead masks (not truly independent), concentration of β(l) ord away from deadA (whichlowers d relative t...

work page arXiv 2018