Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

Jaesik Choi; Kyowoon Lee; Seongwoo Lim; Soyeon Kim

arxiv: 2605.02167 · v3 · pith:HRFP2HBBnew · submitted 2026-05-04 · 💻 cs.LG · cs.AI· cs.CV

Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

Soyeon Kim , Seongwoo Lim , Kyowoon Lee , Jaesik Choi This is my paper

Pith reviewed 2026-05-20 23:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords feature attributionintegrated gradientsmanifold alignmentvariational autoencoderexplainable AIdeep neural networksreliable explanations

0 comments

The pith

Manifold-Aligned Guided Integrated Gradients improves attribution reliability by keeping integration paths close to the data manifold using a pre-trained VAE.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard Integrated Gradients and its guided variant can produce unreliable feature attributions because their straight-line or adaptive paths in input space often pass through regions far from the data manifold where gradients are noisy or meaningless. MA-GIG addresses this by performing the path construction in the latent space of a variational autoencoder, then decoding each intermediate point back to input space so that the path stays biased toward plausible data. A sympathetic reader would care because trustworthy explanations are needed to diagnose and trust deep models in high-stakes settings, and off-manifold noise undermines that trust. The method aggregates gradients only along these manifold-proximal points, yielding explanations that the authors show are more faithful than prior path-based techniques.

Core claim

MA-GIG constructs attribution paths by sampling points in the latent space of a pre-trained VAE, decoding them to input space, and applying guided updates that keep features proximal to the original input; this alignment with the learned generative manifold reduces exposure to implausible regions and produces more faithful explanations than standard IG or Guided IG.

What carries the argument

Manifold-Aligned Guided Integrated Gradients (MA-GIG), which builds the integration path in VAE latent space and decodes intermediates to enforce proximity to the data manifold.

If this is right

MA-GIG yields higher-fidelity explanations than prior path-based methods across multiple datasets and classifiers.
The method reduces off-manifold noise by restricting gradient aggregation to decoded points near the input.
Explanations become more reliable for diagnosing model behavior because paths avoid regions with meaningless gradients.
The approach extends the axiomatic benefits of Integrated Gradients while mitigating a known practical failure mode.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same latent-space alignment idea could be tested with other generative models such as diffusion models or GANs to see if the benefit is specific to VAEs.
High-stakes applications like medical imaging or autonomous driving might see more stable debugging if this manifold constraint is adopted.
If VAE training data differs substantially from the classifier training data, the benefit could shrink, suggesting a need for joint training or domain-matched VAEs.

Load-bearing premise

Decoding intermediate latent states from the pre-trained VAE produces inputs that lie sufficiently close to the true data manifold without adding significant artifacts or gradient biases.

What would settle it

A direct comparison showing that MA-GIG attributions have equal or lower fidelity than Guided IG when evaluated on a dataset where the VAE reconstruction error is large would falsify the claim that manifold alignment reliably improves explanations.

Figures

Figures reproduced from arXiv: 2605.02167 by Jaesik Choi, Kyowoon Lee, Seongwoo Lim, Soyeon Kim.

**Figure 1.** Figure 1: Overview of Manifold-Aligned Guided Integrated Gradients (MA-GIG). a) Noise-Robust Gradient Guidance: The visualization compares integration paths from the baseline to the input on the classifier’s logit surface f(x). The MA-GIG path (green solid line) traverses noise-robust regions, avoiding the high-frequency regions traversed by the linear IG path (pink dotted line). This is achieved by a gradient magni… view at source ↗

**Figure 2.** Figure 2: Qualitative comparison of attribution maps on ImageNet (InceptionV1), Oxford-IIIT Pet (ResNet18), and Oxford 102 Flower (VGG16) against baselines. Labels indicate predicted classes, and numbers in brackets denote prediction confidence. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 LPIPS GIG (input-space) MA-GIG (latent-space) Path position (0=baseline, 1=input) view at source ↗

**Figure 3.** Figure 3: LPIPS-based manifold alignment path analysis on ImageNet (ResNet18). The LPIPS distance from each intermediate sample γ(α) to the input. reconstruction MSE shows only a moderate correlation with DiffID (average Pearson r = 0.406) and Insertion AUC (r = 0.530), suggesting that domain alignment and taskrelevant latent structure matter beyond pixel-level reconstruction fidelity. Effect of Slerp. We investi… view at source ↗

**Figure 4.** Figure 4: Hyperparameter sensitivity and ablation analysis. (a) The model performance remains consistent across varying feature selection fractions. (b) We compare different pre-trained VAE backbones and the effect of Spherical Linear Interpolation (Slerp). Slerp (dashed lines) shows mixed changes relative to linear interpolation (solid lines), with no consistent DiffID gain. We therefore use linear interpolation as… view at source ↗

**Figure 5.** Figure 5: Qualitative comparison against baselines on ImageNet (InceptionV1). Left labels indicate the predicted class, and numbers in brackets denote confidence. 22 view at source ↗

**Figure 6.** Figure 6: Qualitative comparison on Oxford 102 Flower (VGG16). Labels on the left indicate predicted classes, and numbers in brackets denote prediction confidence. 23 view at source ↗

**Figure 7.** Figure 7: Qualitative comparison on Oxford-IIIT Pet (ResNet18). Labels on the left indicate predicted classes, and numbers in brackets denote prediction confidence. 24 view at source ↗

**Figure 8.** Figure 8: Qualitative comparison on ImageNet2012 (ResNet18 & VGG16). The left and right panels display results for the ResNet18 and VGG16 classifiers, respectively. For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integra… view at source ↗

**Figure 9.** Figure 9: Qualitative comparison on ImageNet2012 (InceptionV1) and Oxford-IIIT Pet (ResNet18). For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integration path, demonstrating how MA-GIG aggregates relevant attributions. … view at source ↗

**Figure 10.** Figure 10: Qualitative comparison on Oxford-IIIT Pet (VGG16 & InceptionV1). For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integration path, demonstrating how MA-GIG aggregates relevant attributions. (Conf.: Confidence) 28 view at source ↗

**Figure 11.** Figure 11: Qualitative comparison on Oxford 102 flower (ResNet18 & InceptionV1). For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integration path, demonstrating how MA-GIG aggregates relevant attributions. (Conf.: Confid… view at source ↗

**Figure 12.** Figure 12: LPIPS distance along the integration path on fine-grained datasets. We measure LPIPS-based perceptual deviation of each intermediate sample γ(α) for (a) Oxford-IIIT Pet and (b) Oxford 102 Flower using ResNet18 view at source ↗

**Figure 13.** Figure 13: Average classifier confidence along the integration path (α from 0 to 1) on three classifiers. We measure the softmax score of each intermediate sample γ(α) for (a) ResNet18, (b) VGG16, and (c) InceptionV1 on ImageNet2012 view at source ↗

read the original abstract

Feature attribution is central to diagnosing and trusting deep neural networks, and Integrated Gradients (IG) is widely used due to its axiomatic properties. However, IG can yield unreliable explanations when the integration path between a baseline and the input passes through regions with noisy gradients. While Guided Integrated Gradients reduces this sensitivity by adaptively updating low-gradient-magnitude features, input-space guidance still produces intermediate inputs that deviate from the data manifold. To address this limitation, we propose \emph{Manifold-Aligned Guided Integrated Gradients} (MA-GIG), which constructs attribution paths in the latent space of a pre-trained variational autoencoder. By decoding intermediate latent states, MA-GIG biases the path toward the learned generative manifold and reduces exposure to implausible input-space regions. Through qualitative and quantitative evaluations, we demonstrate that MA-GIG produces faithful explanations by aggregating gradients on path features proximal to the input. Consequently, our method reduces off-manifold noise and outperforms prior path-based attribution methods across multiple datasets and classifiers. Our code is available at https://github.com/leekwoon/ma-gig/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Manifold-Aligned Guided Integrated Gradients (MA-GIG), which performs the integration path of Integrated Gradients inside the latent space of a pre-trained VAE rather than in input space. Intermediate latent points are decoded to input space to produce a path that is biased toward the learned generative manifold, with the goal of reducing exposure to off-manifold regions that produce noisy gradients. The authors claim that this yields more faithful attributions than Guided IG and other path-based baselines, supported by qualitative visualizations and quantitative comparisons across multiple datasets and classifiers.

Significance. If the central mechanism holds, the work offers a practical way to improve the reliability of axiomatic attribution methods by leveraging existing generative models to constrain paths to the data manifold. The code release supports reproducibility. However, the significance is limited by the method's dependence on VAE reconstruction quality, which is not isolated in the experiments and may not generalize to domains where VAEs exhibit posterior collapse or high reconstruction error.

major comments (3)

[§3.2] §3.2 (Method): The claim that decoded latent states remain 'proximal to the input' and thereby reduce off-manifold noise rests on the untested assumption that the pre-trained VAE produces faithful reconstructions. No reconstruction error, FID scores, or manifold-distance metrics are reported to quantify how close the decoded path points actually lie to the true data manifold.
[§4.3] §4.3 (Quantitative Evaluation): The reported outperformance over Guided IG is not accompanied by an ablation that disables the VAE decoding step while retaining the guidance rule. Without this isolation, it is impossible to determine whether gains derive from manifold alignment or from other implementation details of the path construction.
[Table 2] Table 2 (or equivalent results table): Performance differences are presented without error bars, standard deviations across runs, or statistical significance tests, making it difficult to assess whether the claimed superiority is robust or could be explained by variance in the experimental setup.

minor comments (2)

[Abstract] The abstract states that MA-GIG 'aggregates gradients on path features proximal to the input,' but the precise aggregation rule (e.g., weighting or selection of points) is not formalized with an equation in §3.
[§3.1] Notation for the latent-space path (e.g., z(t) vs. x(t)) is introduced without an explicit comparison table to the standard IG formulation, which would aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we will make to improve the clarity and rigor of our empirical evaluations.

read point-by-point responses

Referee: [§3.2] §3.2 (Method): The claim that decoded latent states remain 'proximal to the input' and thereby reduce off-manifold noise rests on the untested assumption that the pre-trained VAE produces faithful reconstructions. No reconstruction error, FID scores, or manifold-distance metrics are reported to quantify how close the decoded path points actually lie to the true data manifold.

Authors: We agree that providing quantitative evidence of VAE reconstruction quality is important to support our claims. In the revised manuscript, we will report reconstruction errors (MSE) on held-out test data for each dataset and include FID scores where applicable to quantify the closeness of decoded points to the data manifold. This will help validate that the integration paths remain proximal to the input and reduce off-manifold exposure. revision: yes
Referee: [§4.3] §4.3 (Quantitative Evaluation): The reported outperformance over Guided IG is not accompanied by an ablation that disables the VAE decoding step while retaining the guidance rule. Without this isolation, it is impossible to determine whether gains derive from manifold alignment or from other implementation details of the path construction.

Authors: We acknowledge the value of such an ablation for isolating the contribution of manifold alignment. However, the guidance mechanism in MA-GIG is inherently tied to the latent space representation and the decoding step to compute gradients in input space. Disabling decoding while keeping the guidance rule would require a fundamentally different implementation that no longer aligns with the proposed method. We will add a detailed discussion in §4.3 explaining this design choice and include an alternative ablation, such as varying the VAE's reconstruction fidelity or comparing against a non-generative baseline with similar path guidance, to better isolate the effect. revision: partial
Referee: [Table 2] Table 2 (or equivalent results table): Performance differences are presented without error bars, standard deviations across runs, or statistical significance tests, making it difficult to assess whether the claimed superiority is robust or could be explained by variance in the experimental setup.

Authors: We appreciate this suggestion for improving the statistical robustness of our results. In the revised version, we will rerun the experiments with multiple random seeds (at least 5), report mean performance with standard deviations, add error bars to relevant tables and figures, and include statistical significance tests (e.g., Wilcoxon signed-rank tests) with p-values to confirm the observed improvements are significant. revision: yes

Circularity Check

0 steps flagged

No circularity: MA-GIG derivation is self-contained via external VAE and empirical validation

full rationale

The paper proposes Manifold-Aligned Guided Integrated Gradients by constructing attribution paths in the latent space of a pre-trained variational autoencoder and decoding intermediate states to bias toward the generative manifold. No derivation step reduces a claimed prediction or result to its own inputs by construction, nor does any load-bearing premise rely on self-citation chains, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation. The central claims rest on qualitative/quantitative evaluations across datasets and classifiers rather than definitional equivalence or fitted-parameter renaming. This is the normal case of an independent methodological contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the VAE latent space alignment reduces off-manifold exposure effectively.

axioms (1)

domain assumption The pre-trained variational autoencoder accurately captures the data manifold.
The method depends on the VAE providing a good representation of plausible inputs.

pith-pipeline@v0.9.0 · 5731 in / 1331 out tokens · 75319 ms · 2026-05-20T23:47:33.709769+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

constructs attribution paths in the latent space of a pre-trained variational autoencoder... By decoding intermediate latent states, MA-GIG biases the path toward the learned generative manifold
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the decoder is a smooth immersion... Im(JD(z)) = T_{D(z)} M

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
cs.CV 2026-05 unverdicted novelty 7.0

Spectral Integrated Gradients constructs SVD-based integration paths that activate singular components from largest to smallest, producing cleaner attribution maps and better quantitative scores than standard Integrat...

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Fefferman, C., Mitter, S., and Narayanan, H

doi: 10.1090/S0002-9947-1959-0110078-1. Fefferman, C., Mitter, S., and Narayanan, H. Testing the manifold hypothesis.Journal of the American Mathemat- ical Society, 29(4):983–1049,

work page doi:10.1090/s0002-9947-1959-0110078-1 1959
[2]

Beyond single path in- tegrated gradients for reliable input attribution via ran- domized path sampling

Jeon, G., Jeong, H., and Choi, J. Beyond single path in- tegrated gradients for reliable input attribution via ran- domized path sampling. InProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2052– 2061,

work page 2052
[3]

Landing with the score: Riemannian optimization through denoising, 2025

Kharitenko, A., Shen, Z., de Santi, R., He, N., and Doerfler, F. Landing with the score: Riemannian optimization through denoising.arXiv preprint arXiv:2509.23357,

work page arXiv
[4]

Kingma, D. P. and Welling, M. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

and Choi, J

Lee, K. and Choi, J. Local manifold approximation and projection for manifold-aware diffusion planning.arXiv preprint arXiv:2506.00867, 2025a. Lee, K. and Choi, J. State-covering trajectory stitching for diffusion planners. InAdvances in Neural Information Processing Systems (NeurIPS), 2025b. Lee, K., Lee, K., Lee, H., and Shin, J. A simple unified framew...

work page arXiv
[6]

Adaptive and explainable deployment of navigation skills via hierarchical deep re- inforcement learning.arXiv preprint arXiv:2305.19746, 2023a

Lee, K., Kim, S., and Choi, J. Adaptive and explainable deployment of navigation skills via hierarchical deep re- inforcement learning.arXiv preprint arXiv:2305.19746, 2023a. Lee, K., Kim, S., and Choi, J. Refining diffusion planner for reliable behavior synthesis by automatic detection of infeasible plans. InAdvances in Neural Information Processing Syst...

work page arXiv
[7]

Kandinsky: an improved text-to-image synthesis with image prior and latent diffu- sion.arXiv preprint arXiv:2310.03502,

Razzhigaev, A., Shakhmatov, A., Maltseva, A., Arkhip- kin, V ., Pavlov, I., Ryabov, I., Kuts, A., Panchenko, A., Kuznetsov, A., and Dimitrov, D. Kandinsky: an improved text-to-image synthesis with image prior and latent diffu- sion.arXiv preprint arXiv:2310.03502,

work page arXiv
[8]

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A. Not just a black box: Learning important features through propagating activation differences.arXiv preprint arXiv:1605.01713,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Simonyan, K., Vedaldi, A., and Zisserman, A. Deep in- side convolutional networks: Visualising image clas- sification models and saliency maps.arXiv preprint arXiv:1312.6034,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Zhang, B., Zheng, W., Zhou, J., and Lu, J

URL https://proceedings.mlr.press/ v235/zaher24a.html. Zhang, B., Zheng, W., Zhou, J., and Lu, J. Path choice mat- ters for clear attribution in path methods.arXiv preprint arXiv:2401.10442,

work page arXiv
[11]

IEEE Trans

ISSN 1939-3539. doi: 10.1109/TPAMI.2024. 3388092. 12 Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution Appendix Table of Contents Appendix A.Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Appen...

work page doi:10.1109/tpami.2024 1939
[12]

17 Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution Aggregated Score.The final scalar metric is obtained by integrating the DiffID curve over the perturbation range: DiffID= Z 1 0 ψ(x, δ)dδ.(31) A higher DiffID score indicates a more faithful attribution map, distinguishing salient features from irrelevant ones more effectivel...

work page 2012
[13]

Unlike the SD family, it utilizes mCLIP for embeddings and employs a MoVQGAN decoder for latent reconstruction

based on unCLIP architecture. Unlike the SD family, it utilizes mCLIP for embeddings and employs a MoVQGAN decoder for latent reconstruction. It is trained on large-scale datasets including LAION HighRes and fine-tuned on high-quality internal datasets. We use the kandinsky-community/kandinsky-2-1checkpoint. F.3. Baselines Here we describe the specific im...

work page arXiv 2016
[14]

0.1996 0.2861 0.0866 AGI (Pan et al.,

work page 1996
[15]

Method MPRT (↑) G×I (Shrikumar et al., 2016)−0.005 IG (Sundararajan et al.,

0.2532 0.32120.0680 MA-GIG 0.2670 0.3429 0.0760 Table 7.Efficient MPRT sanity check on Ima- geNet with ResNet18. Method MPRT (↑) G×I (Shrikumar et al., 2016)−0.005 IG (Sundararajan et al.,

work page arXiv 2016
[16]

0.073 IG2 (Zhuo & Ge, 2024)−0.063 AGI (Pan et al., 2021)−0.066 EIG (Jha et al.,

work page 2024
[17]

0.013 MIG (Zaher et al., 2024)−0.003 GIG (Kapishnikov et al.,

work page 2024
[18]

MA-GIG preserves the ranking observed in the main evaluation and achieves the best DiffID and Insertion AUC

0.308 MA-GIG 0.625 Large-scale ImageNet evaluation.To verify that the ImageNet results are not an artifact of the 500-image subset, we additionally evaluate 5,000 ImageNet validation images using ResNet18. MA-GIG preserves the ranking observed in the main evaluation and achieves the best DiffID and Insertion AUC. Model-randomization sanity check.We evalua...

work page 2016
[19]

0.2384 0.4378 0.1994 0.4060 0.5174 0.1114 0.2255 0.3940 0.1685 IG (Sundararajan et al.,

work page 1994
[20]

0.3634 0.5093 0.1459 0.5556 0.58890.03330.3586 0.4880 0.1294 MA-GIG (SD1) 0.4255 0.5550 0.1294 0.5745 0.6547 0.0802 0.4033 0.5147 0.1114 MA-GIG (SD1, w/ Slerp) 0.4387 0.5652 0.1264 0.5985 0.6751 0.0766 0.3937 0.5042 0.1105 MA-GIG (SD2) 0.4495 0.5697 0.1201 0.5901 0.6601 0.0700 0.4075 0.5213 0.1138 MA-GIG (SD2, w/ Slerp) 0.4474 0.5739 0.12640.6171 0.69790....

work page arXiv
[21]

0.1222 0.2338 0.1116 0.2784 0.3576 0.0791 0.2000 0.2813 0.0813 IG (Sundararajan et al.,

work page 2000
[22]

0.0193 0.1713 0.1520 0.2224 0.3304 0.1080 0.0816 0.2053 0.1238 AGI (Pan et al.,

work page 2053
[23]

0.0136 0.1569 0.1433 0.1402 0.2682 0.1280 0.0787 0.1947 0.1160 EIG (Jha et al.,

work page 1947
[24]

0.1891 0.27200.08290.2542 0.3424 0.0882 0.2551 0.3282 0.0731 MA-GIG (SD1) 0.2249 0.3240 0.0991 0.3367 0.4109 0.0742 0.3067 0.3800 0.0733 MA-GIG (SD1, w/ Slerp) 0.2009 0.3064 0.1056 0.3333 0.4131 0.0798 0.2884 0.3638 0.0753 MA-GIG (SD2) 0.2324 0.3253 0.0929 0.3429 0.4191 0.0762 0.3073 0.3756 0.0682 MA-GIG (SD2, w/ Slerp) 0.1933 0.2998 0.1064 0.3404 0.4196 ...

work page arXiv 2009
[25]

0.1358 0.2507 0.1149 0.2469 0.3222 0.0753 0.2020 0.3138 0.1118 IG2 (Zhuo & Ge,

work page 2020
[26]

0.1447 0.2507 0.1060 0.2242 0.3040 0.0798 0.2067 0.2993 0.0927 EIG (Jha et al.,

work page 2067
[27]

0.1327 0.2467 0.1140 0.2436 0.3222 0.0787 0.1942 0.3013 0.1071 MIG (Zaher et al.,

work page 1942
[28]

0.2536 0.3304 0.0769 0.2842 0.33670.05240.2771 0.3667 0.0896 MA-GIG (SD1) 0.2467 0.3333 0.0867 0.3247 0.3869 0.0622 0.3020 0.3871 0.0851 MA-GIG (SD1, w/ Slerp) 0.2542 0.3382 0.0840 0.3329 0.3933 0.0604 0.3118 0.3893 0.0776 MA-GIG (SD2) 0.2598 0.3409 0.0811 0.3284 0.3900 0.0616 0.3082 0.3838 0.0756 MA-GIG (SD2, w/ Slerp) 0.26380.34780.08400.3429 0.39820.05...

work page arXiv 2017
[29]

The visualized intermediate steps (second and third rows in each panel) provide empirical evidence for the reliability of our framework

These visualizations cover various classifiers on the ImageNet2012, Oxford-IIIT Pet, and Oxford 102 Flower datasets. The visualized intermediate steps (second and third rows in each panel) provide empirical evidence for the reliability of our framework. First, regarding gradient behavior, pixel-space guidance methods like GIG often suffer from manifold de...

work page 2019

[1] [1]

Fefferman, C., Mitter, S., and Narayanan, H

doi: 10.1090/S0002-9947-1959-0110078-1. Fefferman, C., Mitter, S., and Narayanan, H. Testing the manifold hypothesis.Journal of the American Mathemat- ical Society, 29(4):983–1049,

work page doi:10.1090/s0002-9947-1959-0110078-1 1959

[2] [2]

Beyond single path in- tegrated gradients for reliable input attribution via ran- domized path sampling

Jeon, G., Jeong, H., and Choi, J. Beyond single path in- tegrated gradients for reliable input attribution via ran- domized path sampling. InProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2052– 2061,

work page 2052

[3] [3]

Landing with the score: Riemannian optimization through denoising, 2025

Kharitenko, A., Shen, Z., de Santi, R., He, N., and Doerfler, F. Landing with the score: Riemannian optimization through denoising.arXiv preprint arXiv:2509.23357,

work page arXiv

[4] [4]

Kingma, D. P. and Welling, M. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

and Choi, J

Lee, K. and Choi, J. Local manifold approximation and projection for manifold-aware diffusion planning.arXiv preprint arXiv:2506.00867, 2025a. Lee, K. and Choi, J. State-covering trajectory stitching for diffusion planners. InAdvances in Neural Information Processing Systems (NeurIPS), 2025b. Lee, K., Lee, K., Lee, H., and Shin, J. A simple unified framew...

work page arXiv

[6] [6]

Adaptive and explainable deployment of navigation skills via hierarchical deep re- inforcement learning.arXiv preprint arXiv:2305.19746, 2023a

Lee, K., Kim, S., and Choi, J. Adaptive and explainable deployment of navigation skills via hierarchical deep re- inforcement learning.arXiv preprint arXiv:2305.19746, 2023a. Lee, K., Kim, S., and Choi, J. Refining diffusion planner for reliable behavior synthesis by automatic detection of infeasible plans. InAdvances in Neural Information Processing Syst...

work page arXiv

[7] [7]

Kandinsky: an improved text-to-image synthesis with image prior and latent diffu- sion.arXiv preprint arXiv:2310.03502,

Razzhigaev, A., Shakhmatov, A., Maltseva, A., Arkhip- kin, V ., Pavlov, I., Ryabov, I., Kuts, A., Panchenko, A., Kuznetsov, A., and Dimitrov, D. Kandinsky: an improved text-to-image synthesis with image prior and latent diffu- sion.arXiv preprint arXiv:2310.03502,

work page arXiv

[8] [8]

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A. Not just a black box: Learning important features through propagating activation differences.arXiv preprint arXiv:1605.01713,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Simonyan, K., Vedaldi, A., and Zisserman, A. Deep in- side convolutional networks: Visualising image clas- sification models and saliency maps.arXiv preprint arXiv:1312.6034,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Zhang, B., Zheng, W., Zhou, J., and Lu, J

URL https://proceedings.mlr.press/ v235/zaher24a.html. Zhang, B., Zheng, W., Zhou, J., and Lu, J. Path choice mat- ters for clear attribution in path methods.arXiv preprint arXiv:2401.10442,

work page arXiv

[11] [11]

IEEE Trans

ISSN 1939-3539. doi: 10.1109/TPAMI.2024. 3388092. 12 Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution Appendix Table of Contents Appendix A.Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Appen...

work page doi:10.1109/tpami.2024 1939

[12] [12]

17 Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution Aggregated Score.The final scalar metric is obtained by integrating the DiffID curve over the perturbation range: DiffID= Z 1 0 ψ(x, δ)dδ.(31) A higher DiffID score indicates a more faithful attribution map, distinguishing salient features from irrelevant ones more effectivel...

work page 2012

[13] [13]

Unlike the SD family, it utilizes mCLIP for embeddings and employs a MoVQGAN decoder for latent reconstruction

based on unCLIP architecture. Unlike the SD family, it utilizes mCLIP for embeddings and employs a MoVQGAN decoder for latent reconstruction. It is trained on large-scale datasets including LAION HighRes and fine-tuned on high-quality internal datasets. We use the kandinsky-community/kandinsky-2-1checkpoint. F.3. Baselines Here we describe the specific im...

work page arXiv 2016

[14] [14]

0.1996 0.2861 0.0866 AGI (Pan et al.,

work page 1996

[15] [15]

Method MPRT (↑) G×I (Shrikumar et al., 2016)−0.005 IG (Sundararajan et al.,

0.2532 0.32120.0680 MA-GIG 0.2670 0.3429 0.0760 Table 7.Efficient MPRT sanity check on Ima- geNet with ResNet18. Method MPRT (↑) G×I (Shrikumar et al., 2016)−0.005 IG (Sundararajan et al.,

work page arXiv 2016

[16] [16]

0.073 IG2 (Zhuo & Ge, 2024)−0.063 AGI (Pan et al., 2021)−0.066 EIG (Jha et al.,

work page 2024

[17] [17]

0.013 MIG (Zaher et al., 2024)−0.003 GIG (Kapishnikov et al.,

work page 2024

[18] [18]

MA-GIG preserves the ranking observed in the main evaluation and achieves the best DiffID and Insertion AUC

0.308 MA-GIG 0.625 Large-scale ImageNet evaluation.To verify that the ImageNet results are not an artifact of the 500-image subset, we additionally evaluate 5,000 ImageNet validation images using ResNet18. MA-GIG preserves the ranking observed in the main evaluation and achieves the best DiffID and Insertion AUC. Model-randomization sanity check.We evalua...

work page 2016

[19] [19]

0.2384 0.4378 0.1994 0.4060 0.5174 0.1114 0.2255 0.3940 0.1685 IG (Sundararajan et al.,

work page 1994

[20] [20]

0.3634 0.5093 0.1459 0.5556 0.58890.03330.3586 0.4880 0.1294 MA-GIG (SD1) 0.4255 0.5550 0.1294 0.5745 0.6547 0.0802 0.4033 0.5147 0.1114 MA-GIG (SD1, w/ Slerp) 0.4387 0.5652 0.1264 0.5985 0.6751 0.0766 0.3937 0.5042 0.1105 MA-GIG (SD2) 0.4495 0.5697 0.1201 0.5901 0.6601 0.0700 0.4075 0.5213 0.1138 MA-GIG (SD2, w/ Slerp) 0.4474 0.5739 0.12640.6171 0.69790....

work page arXiv

[21] [21]

0.1222 0.2338 0.1116 0.2784 0.3576 0.0791 0.2000 0.2813 0.0813 IG (Sundararajan et al.,

work page 2000

[22] [22]

0.0193 0.1713 0.1520 0.2224 0.3304 0.1080 0.0816 0.2053 0.1238 AGI (Pan et al.,

work page 2053

[23] [23]

0.0136 0.1569 0.1433 0.1402 0.2682 0.1280 0.0787 0.1947 0.1160 EIG (Jha et al.,

work page 1947

[24] [24]

0.1891 0.27200.08290.2542 0.3424 0.0882 0.2551 0.3282 0.0731 MA-GIG (SD1) 0.2249 0.3240 0.0991 0.3367 0.4109 0.0742 0.3067 0.3800 0.0733 MA-GIG (SD1, w/ Slerp) 0.2009 0.3064 0.1056 0.3333 0.4131 0.0798 0.2884 0.3638 0.0753 MA-GIG (SD2) 0.2324 0.3253 0.0929 0.3429 0.4191 0.0762 0.3073 0.3756 0.0682 MA-GIG (SD2, w/ Slerp) 0.1933 0.2998 0.1064 0.3404 0.4196 ...

work page arXiv 2009

[25] [25]

0.1358 0.2507 0.1149 0.2469 0.3222 0.0753 0.2020 0.3138 0.1118 IG2 (Zhuo & Ge,

work page 2020

[26] [26]

0.1447 0.2507 0.1060 0.2242 0.3040 0.0798 0.2067 0.2993 0.0927 EIG (Jha et al.,

work page 2067

[27] [27]

0.1327 0.2467 0.1140 0.2436 0.3222 0.0787 0.1942 0.3013 0.1071 MIG (Zaher et al.,

work page 1942

[28] [28]

0.2536 0.3304 0.0769 0.2842 0.33670.05240.2771 0.3667 0.0896 MA-GIG (SD1) 0.2467 0.3333 0.0867 0.3247 0.3869 0.0622 0.3020 0.3871 0.0851 MA-GIG (SD1, w/ Slerp) 0.2542 0.3382 0.0840 0.3329 0.3933 0.0604 0.3118 0.3893 0.0776 MA-GIG (SD2) 0.2598 0.3409 0.0811 0.3284 0.3900 0.0616 0.3082 0.3838 0.0756 MA-GIG (SD2, w/ Slerp) 0.26380.34780.08400.3429 0.39820.05...

work page arXiv 2017

[29] [29]

The visualized intermediate steps (second and third rows in each panel) provide empirical evidence for the reliability of our framework

These visualizations cover various classifiers on the ImageNet2012, Oxford-IIIT Pet, and Oxford 102 Flower datasets. The visualized intermediate steps (second and third rows in each panel) provide empirical evidence for the reliability of our framework. First, regarding gradient behavior, pixel-space guidance methods like GIG often suffer from manifold de...

work page 2019