pith. sign in

arxiv: 2605.19607 · v1 · pith:GEHZLP3Cnew · submitted 2026-05-19 · 💻 cs.CV · cs.AI· cs.LG

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution

Pith reviewed 2026-05-20 05:27 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords Spectral Integrated Gradientsfeature attributionIntegrated Gradientssingular value decompositioncoarse-to-fineexplainable AIimage classificationpath-based attribution
0
0 comments X p. Extension
pith:GEHZLP3C Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{GEHZLP3C}

Prints a linked pith:GEHZLP3C badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Spectral Integrated Gradients builds integration paths via SVD to introduce global structure before fine details, yielding cleaner attributions than straight-line paths.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Spectral Integrated Gradients to fix a key flaw in standard Integrated Gradients: the straight-line path from baseline to input turns on all features at once and accumulates noisy gradients. SIG instead decomposes the difference vector with singular value decomposition and follows a path that activates the largest singular components first, then progressively smaller ones. This creates a coarse-to-fine ordering that supplies global image structure before local details. Evaluations across multiple image classification datasets show the resulting attribution maps contain less noise and score higher on quantitative metrics than other path-based methods. The approach keeps the axiomatic properties that make Integrated Gradients attractive while changing only the choice of path.

Core claim

Integrated Gradients satisfies its axiomatic properties for any integration path, yet the conventional straight-line path from baseline to input activates every feature simultaneously and therefore collects noisy gradients along the way. Spectral Integrated Gradients replaces that path with one derived from the singular value decomposition of the baseline-to-input difference: the path is parameterized so that singular components are added in descending order of singular value. The resulting attributions therefore receive global structure before fine-grained details, producing maps with visibly reduced noise and higher quantitative scores on standard image-classification benchmarks while the

What carries the argument

Spectral Integrated Gradients, which constructs the integration path by ordering singular components from largest to smallest singular value so that global structure precedes fine details.

If this is right

  • Attribution maps exhibit visibly lower noise levels on image data.
  • Quantitative metrics for attribution quality improve over existing path-based baselines.
  • Axiomatic guarantees of Integrated Gradients remain intact under the new path choice.
  • The coarse-to-fine ordering applies across diverse image classification datasets without retraining.
  • The method requires only an SVD on the input-baseline difference and no change to the underlying model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same SVD ordering idea could be tested on non-image domains where a natural low-rank decomposition exists.
  • If the singular-value ordering aligns with human visual perception, it might explain why certain explanations feel more intuitive.
  • Future work could compare the learned singular directions against known dataset biases or model failure modes.
  • The approach suggests that other path-based methods might also benefit from ordering features by some measure of global importance rather than uniform activation.

Load-bearing premise

Ordering singular components from largest to smallest singular value will reduce noise and improve attribution quality without introducing new artifacts or violating the axiomatic properties of Integrated Gradients.

What would settle it

If quantitative noise metrics and performance scores on standard image datasets show no consistent improvement over straight-line Integrated Gradients or other path variants, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.19607 by Jaesik Choi, Kyowoon Lee, Seongwoo Lim, Soyeon Kim.

Figure 1
Figure 1. Figure 1: Overview of Spectral Integrated Gradients (SIG). (A) In the logit surface landscape, the linear path (gray line) of IG [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Frequency-domain analysis of integration paths. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Path analysis on ImageNet using the VGG16 clas [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison of attribution maps on ImageNet (InceptionV3), Oxford-IIIT Pet (ResNet18), and Oxford 102 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: RemOve And Retrain (ROAR) test on CIFAR10. SIG [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Extended runtime analysis on faithfulness metrics. [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Additional examples of frequency-domain analysis of integration paths. Supplementary to Figure 2, these examples [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Additional path analysis examples. Supplementary to Figure 3, these results on ResNet18 (left) and VGG16 (right) [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Extended path analysis comparing IG, BIG, EIG, MIG, GIG, and SIG. For each example, the top row displays the [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative comparison against baselines on ImageNet 2012 using three classifiers. Bold test labels indicate the [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative comparison on Oxford 102 Flower using three classifiers. Bold test labels indicate the predicted class, and [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Qualitative comparison on Oxford-IIIT Pet using three classifiers. Bold test labels indicate the predicted class, and [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗
read the original abstract

Integrated Gradients (IG) is a widely adopted feature attribution method that satisfies desirable axiomatic properties. However, the choice of integration path significantly affects the quality of attributions, and the standard straight-line path introduces all input features simultaneously, often accumulating noisy gradients along the way. To address this limitation, we propose Spectral Integrated Gradients, which constructs integration paths based on singular value decomposition (SVD) of the baseline-to-input difference. By progressively activating singular components from largest to smallest, SIG introduces global structure before fine-grained details, naturally following a coarse-to-fine progression. Through extensive evaluation across diverse image classification datasets, we demonstrate that SIG produces cleaner attribution maps with reduced noise and achieves improved quantitative performance compared to existing path-based attribution methods. Our code is available at https://github.com/leekwoon/sig/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Spectral Integrated Gradients (SIG), a path-based variant of Integrated Gradients for feature attribution in image classification models. SIG constructs the integration path by applying SVD to the baseline-to-input difference and progressively activating singular components ordered from largest to smallest singular value, aiming for a coarse-to-fine progression that reduces noise compared to the standard straight-line path. The authors report that this yields cleaner attribution maps and improved quantitative performance across diverse datasets, with code released for reproducibility.

Significance. If the central claims hold, SIG could provide a practical enhancement to attribution quality in computer vision without violating IG axioms, potentially aiding interpretability of CNNs. The open-source code is a clear strength for reproducibility and further testing.

major comments (3)
  1. [§3] §3 (Path Construction): The SVD-based path must be explicitly shown to be a continuous, differentiable curve parameterized by a single scalar t ∈ [0,1] from baseline (zero components) to full input. It is unclear whether the progressive activation of singular components (largest to smallest) forms a monotonic trajectory that satisfies the fundamental theorem of calculus underlying the IG proof, or if it introduces piecewise behavior or interpolation artifacts.
  2. [§4 or §5] §4 or §5 (Evaluation): The claim of improved quantitative performance and reduced noise requires concrete metrics (e.g., insertion/deletion scores, faithfulness metrics), statistical tests, dataset names/sizes, and direct comparisons to baselines such as standard IG and other path variants. The abstract asserts superiority after 'extensive evaluation' but the provided details do not allow verification of effect sizes or controls for path-dependent biases.
  3. [Axioms discussion] Axioms section: Completeness (sum of attributions equals f(x) − f(baseline)) must be verified numerically for the SVD path, as any deviation from a valid IG curve could make attributions incomplete or path-dependent in uncontrolled ways. The paper should include a proof sketch or empirical check that the ordering from largest to smallest singular values preserves this property.
minor comments (2)
  1. [Abstract] Abstract: Briefly list the specific datasets and at least one quantitative metric to support the performance claim, improving readability for readers scanning the paper.
  2. [Method] Notation: Define the parameterization of the SVD path (e.g., how components are scaled with t) with an equation early in the method section to avoid ambiguity in later derivations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each of the major comments point by point below, providing clarifications and indicating revisions where necessary to strengthen the presentation of Spectral Integrated Gradients.

read point-by-point responses
  1. Referee: [§3] §3 (Path Construction): The SVD-based path must be explicitly shown to be a continuous, differentiable curve parameterized by a single scalar t ∈ [0,1] from baseline (zero components) to full input. It is unclear whether the progressive activation of singular components (largest to smallest) forms a monotonic trajectory that satisfies the fundamental theorem of calculus underlying the IG proof, or if it introduces piecewise behavior or interpolation artifacts.

    Authors: We appreciate the referee's emphasis on rigorously defining the integration path. In the original manuscript, Section 3 describes the construction via SVD and progressive activation, but we acknowledge that the explicit parameterization as a function of t could be stated more formally. In the revised manuscript, we will add the definition of the path γ(t) for t ∈ [0,1] as a continuous curve in the spectral domain that monotonically activates components from largest to smallest singular value. This construction ensures differentiability and satisfies the conditions for the path integral underlying Integrated Gradients, with a derivation showing that the fundamental theorem of calculus holds without introducing piecewise artifacts or uncontrolled interpolation. revision: yes

  2. Referee: [§4 or §5] §4 or §5 (Evaluation): The claim of improved quantitative performance and reduced noise requires concrete metrics (e.g., insertion/deletion scores, faithfulness metrics), statistical tests, dataset names/sizes, and direct comparisons to baselines such as standard IG and other path variants. The abstract asserts superiority after 'extensive evaluation' but the provided details do not allow verification of effect sizes or controls for path-dependent biases.

    Authors: We agree that additional concrete details on the evaluation protocol would improve verifiability. The manuscript reports results across multiple image classification datasets with quantitative comparisons to standard Integrated Gradients and other path-based methods, including metrics such as insertion and deletion scores. To address the concern directly, we will revise the evaluation section to include an expanded summary table with dataset sizes, specific metric values, and controls for path-dependent effects, allowing clearer assessment of the reported improvements. revision: yes

  3. Referee: [Axioms discussion] Axioms section: Completeness (sum of attributions equals f(x) − f(baseline)) must be verified numerically for the SVD path, as any deviation from a valid IG curve could make attributions incomplete or path-dependent in uncontrolled ways. The paper should include a proof sketch or empirical check that the ordering from largest to smallest singular values preserves this property.

    Authors: We thank the referee for this observation on axiomatic guarantees. For any continuous path connecting the baseline to the input, completeness follows directly from the fundamental theorem of calculus applied to the gradient integral, independent of the particular ordering of singular components. The SVD ordering shapes the trajectory but preserves path validity. In the revised manuscript, we will add a short proof sketch to the axioms discussion and include numerical verification in the experiments section confirming that attribution sums match f(x) − f(baseline) within floating-point precision. revision: yes

Circularity Check

0 steps flagged

No circularity: SIG path is independently defined and evaluated empirically

full rationale

The paper defines Spectral Integrated Gradients by constructing an integration path via SVD decomposition of the baseline-to-input difference and ordering singular components from largest to smallest. This construction is presented as a direct proposal to achieve coarse-to-fine progression, independent of any evaluation outcomes or fitted parameters. The claimed improvements in attribution quality are supported solely by empirical results across image classification datasets, not by any reduction of the method to its own inputs or self-referential definitions. No self-citations, uniqueness theorems, or ansatzes from prior author work are invoked in the provided text to justify the central path choice. The derivation chain remains self-contained against external benchmarks, with the path definition standing apart from the quantitative performance claims.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The method relies on standard linear-algebra SVD and the existing IG axiomatic framework.

pith-pipeline@v0.9.0 · 5675 in / 1063 out tokens · 39289 ms · 2026-05-20T05:27:40.519487+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 5 internal anchors

  1. [1]

    Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity checks for saliency maps.Advances in neural information processing systems31 (2018)

  2. [2]

    Julius Adebayo, Michael Muelly, Ilaria Liccardi, and Been Kim. 2020. Debugging Tests for Model Explanations.Advances in Neural Information Processing Systems 33 (2020), 700–712

  3. [3]

    Christopher J Anders, Leander Weber, David Neumann, Wojciech Samek, Klaus- Robert Müller, and Sebastian Lapuschkin. 2022. Finding and removing clever hans: Using explanation methods to debug and improve deep models.Information Fusion77 (2022), 261–295

  4. [4]

    1974.Values of non-atomic games

    Robert J Aumann and Lloyd S Shapley. 1974.Values of non-atomic games. Prince- ton University Press

  5. [5]

    Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation.PloS one10, 7 (2015), e0130140

  6. [6]

    Long Chen, Shaobo Lin, Xiankai Lu, Dongpu Cao, Hangbin Wu, Chi Guo, Chun Liu, and Fei-Yue Wang. 2021. Deep neural network based vehicle and pedestrian detection for autonomous driving: A survey.IEEE Transactions on Intelligent Transportation Systems22, 6 (2021), 3234–3246

  7. [7]

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255

  8. [8]

    Carl Eckart and Gale Young. 1936. The approximation of one matrix by another of lower rank.Psychometrika1, 3 (1936), 211–218

  9. [9]

    Thomas Fel, Rémi Cadène, Mathieu Chalvidal, Matthieu Cord, David Vigouroux, and Thomas Serre. 2021. Look at the variance! efficient black-box explanations with sobol-based sensitivity analysis.Advances in neural information processing systems34 (2021), 26005–26014

  10. [10]

    Friedman

    Eric J. Friedman. 2004. Paths and consistency in additive cost sharing.Interna- tional Journal of Game Theory32, 4 (2004), 501–518

  11. [11]

    Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A Wichmann. 2020. Shortcut learn- ing in deep neural networks.Nature Machine Intelligence2, 11 (2020), 665–673

  12. [12]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. IEEE, 770–778

  13. [13]

    Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. 2019. A benchmark for interpretability methods in deep neural networks.Advances in neural information processing systems32 (2019)

  14. [14]

    Giyoung Jeon, Haedong Jeong, and Jaesik Choi. 2022. Distilled gradient aggrega- tion: Purify features for input attribution in the deep neural network.Advances in Neural Information Processing Systems35 (2022), 26478–26491

  15. [15]

    Giyoung Jeon, Haedong Jeong, and Jaesik Choi. 2023. Beyond Single Path Inte- grated Gradients for Reliable Input Attribution via Randomized Path Sampling. InProceedings of the IEEE/CVF International Conference on Computer Vision. 2052– 2061

  16. [16]

    Aicher, Matthew R

    Anupama Jha, Joseph K. Aicher, Matthew R. Gazzara, Deependra Singh, and Yoseph Barash. 2020. Enhanced integrated gradients: improving interpretability of deep learning models using splicing codes as a case study.Genome biology21, 1 (2020), 149

  17. [17]

    Andrei Kapishnikov, Subhashini Venugopalan, Besim Avci, Ben Wedin, Michael Terry, and Tolga Bolukbasi. 2021. Guided integrated gradients: An adaptive path method for removing noise. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5050–5058

  18. [18]

    Gabriel Kasmi, Amandine Brunetto, Thomas Fel, and Jayneel Parekh. 2025. One Wave To Explain Them All: A Unifying Perspective On Feature Attribution. In Forty-second International Conference on Machine Learning

  19. [19]

    Soyeon Kim, Junho Choi, Yeji Choi, Subeen Lee, Artyom Stitsyuk, Minkyoung Park, Seongyeop Jeong, You-Hyun Baek, and Jaesik Choi. 2023. Explainable AI- based interface system for weather forecasting model. InInternational Conference on Human-Computer Interaction. Springer, 101–119

  20. [20]

    Soyeon Kim, Seongwoo Lim, Kyowoon Lee, and Jaesik Choi. 2026. Manifold- Aligned Guided Integrated Gradients for Reliable Feature Attribution. InPro- ceedings of the 43rd International Conference on Machine Learning (ICML). arXiv:2605.02167 https://arxiv.org/abs/2605.02167

  21. [21]

    Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. 2019. The (un) reliability of saliency methods.Explainable AI: Interpreting, explaining and visualizing deep learning(2019), 267–280

  22. [22]

    Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. 2019. Unmasking Clever Hans pre- dictors and assessing what machines really learn.Nature communications10, 1 (2019), 1096

  23. [23]

    Kyowoon Lee and Jaesik Choi. 2025. Local Manifold Approximation and Pro- jection for Manifold-Aware Diffusion Planning. InForty-second International Conference on Machine Learning. arXivpreprintarXiv:2506.00867

  24. [24]

    Kyowoon Lee and Jaesik Choi. 2025. State-Covering Trajectory Stitching for Diffusion Planners. InThirty-ninth Conference on Neural Information Processing Systems (NeurIPS)

  25. [25]

    Kyowoon Lee, Seongun Kim, and Jaesik Choi. 2023. Adaptive and explainable deployment of navigation skills via hierarchical deep reinforcement learning. In2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 11608–11614

  26. [26]

    Kyowoon Lee, Seongun Kim, and Jaesik Choi. 2023. Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans. In Advances in Neural Information Processing Systems (NeurIPS)

  27. [27]

    Kyowoon Lee, Yunhao Luo, Anh Tong, and Jaesik Choi. 2026. Refining Compositional Diffusion for Reliable Long-Horizon Planning.arXiv preprint arXiv:2605.03075(2026)

  28. [28]

    Yiming Lei, Zilong Li, Junping Zhang, and Hongming Shan. 2024. Denoising diffusion path: Attribution noise reduction with an auxiliary diffusion model. Advances in Neural Information Processing Systems37 (2024), 54003–54025

  29. [29]

    Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. InAdvances in Neural Information Processing Systems (NeurIPS)

  30. [30]

    2016.TorchVision: PyTorch’s Computer Vision library

    TorchVision maintainers and contributors. 2016.TorchVision: PyTorch’s Computer Vision library

  31. [31]

    Grégoire Montavon, Alexander Binder, Sebastian Lapuschkin, Wojciech Samek, and Klaus-Robert Müller. 2019. Layer-wise relevance propagation: an overview. Explainable AI: interpreting, explaining and visualizing deep learning(2019), 193– 209

  32. [32]

    Maria-Elena Nilsback and Andrew Zisserman. 2008. Automated flower classifica- tion over a large number of classes. In2008 Sixth Indian conference on computer vision, graphics & image processing. IEEE, 722–729

  33. [33]

    Deng Pan, Xin Li, and Dongxiao Zhu. 2021. Explaining deep neural network models with adversarial gradient integration. InThirtieth International Joint Conference on Artificial Intelligence (IJCAI)

  34. [34]

    Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, and CV Jawahar. 2012. Cats and dogs. In2012 IEEE conference on computer vision and pattern recognition. IEEE, 3498–3505

  35. [35]

    Vitali Petsiuk, Abir Das, and Kate Saenko. 2018. RISE: Randomized Input Sampling for Explanation of Black-box Models. InProceedings of the British Machine Vision Conference (BMVC)

  36. [36]

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, et al

  37. [37]

    InInternational Conference on Learning Representations

    SAM 2: Segment Anything in Images and Videos. InInternational Conference on Learning Representations

  38. [38]

    Why Should I Trust You?

    Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou ...

  39. [39]

    Yao Rong, Tobias Leemann, Vadim Borisov, Gjergji Kasneci, and Enkelejda Kas- neci. 2022. A consistent and efficient evaluation strategy for attribution methods. InInternational Conference on Machine Learning. PMLR, 18770–18795

  40. [40]

    Harshay Shah, Prateek Jain, and Praneeth Netrapalli. 2021. Do input gradients highlight discriminative features?Advances in Neural Information Processing Systems34 (2021), 2046–2059

  41. [41]

    Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning im- portant features through propagating activation differences. InInternational conference on machine learning. PMlR, 3145–3153

  42. [42]

    Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. In2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1312.6034

  43. [43]

    Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Net- works for Large-Scale Image Recognition. InInternational Conference on Learning Representations

  44. [44]

    SmoothGrad: removing noise by adding noise

    Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda B. Viégas, and Martin Watten- berg. 2017. SmoothGrad: removing noise by adding noise.CoRRabs/1706.03825 (2017). arXiv:1706.03825 http://arxiv.org/abs/1706.03825

  45. [45]

    Striving for Simplicity: The All Convolutional Net

    Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Ried- miller. 2015. Striving for Simplicity: The All Convolutional Net. In3rd Interna- tional Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). Spectral Integrated Gradients for Coars...

  46. [46]

    Pascal Sturmfels, Scott Lundberg, and Su-In Lee. 2020. Visualizing the impact of feature attribution baselines.Distill5, 1 (2020), e22

  47. [47]

    Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. InInternational conference on machine learning. PMLR, 3319– 3328

  48. [48]

    Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818–2826

  49. [49]

    Shawn Xu, Subhashini Venugopalan, and Mukund Sundararajan. 2020. Attribu- tion in scale and space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9680–9689

  50. [50]

    Peiyu Yang, Naveed Akhtar, Zeyi Wen, and Ajmal Mian. 2023. Local Path Integra- tion for Attribution.Proceedings of the AAAI Conference on Artificial Intelligence 37, 3 (Jun. 2023), 3173–3180. doi:10.1609/aaai.v37i3.25422

  51. [51]

    Peiyu Yang, Naveed Akhtar, Zeyi Wen, Mubarak Shah, and Ajmal Saeed Mian

  52. [52]

    InInternational Conference on Learning Representations ICLR

    Re-calibrating feature attributions for model interpretation. InInternational Conference on Learning Representations ICLR

  53. [53]

    Eslam Zaher, Maciej Trzaskowski, Quan Nguyen, and Fred Roosta. 2024. Man- ifold Integrated Gradients: Riemannian Geometry for Feature Attribution. In Proceedings of the 41st International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 235), Ruslan Salakhutdinov, Zico Kolter, Kather- ine Heller, Adrian Weller, Nuria Oliver, ...

  54. [54]

    Borui Zhang, Wenzhao Zheng, Jie Zhou, and Jiwen Lu. 2024. Path Choice Matters for Clear Attributions in Path Methods. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=gzYgsZgwXa

  55. [55]

    Jianming Zhang, Sarah Adel Bargal, Zhe Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. 2018. Top-Down Neural Attention by Excitation Backprop. International Journal of Computer Vision126, 10 (2018), 1084–1102

  56. [56]

    mean 3” is the average DiffID across the three baselines (zero, mean, blur), and “range

    Yue Zhuo and Zhiqiang Ge. 2024. IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution .IEEE Transactions on Pattern Analysis & Machine Intelligence46, 11 (2024), 7173–7190. doi:10.1109/TPAMI.2024.3388092 Appendix Contents A Proofs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 B Expe...