Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

Jaesik Choi; Kyowoon Lee; Seongwoo Lim; Soyeon Kim

arxiv: 2605.02167 · v3 · pith:HRFP2HBBnew · submitted 2026-05-04 · 💻 cs.LG · cs.AI· cs.CV

Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution

Soyeon Kim , Seongwoo Lim , Kyowoon Lee , Jaesik Choi This is my paper

Pith reviewed 2026-05-08 19:04 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords gradientsattributionintegratedma-gigguidedpathreducesexplanations

0 comments

The pith

By constructing attribution paths in a variational autoencoder's latent space, MA-GIG produces more faithful feature attributions than standard path-based methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Feature attribution methods like Integrated Gradients can give unreliable results when their integration paths stray into unrealistic parts of the input space. The authors introduce Manifold-Aligned Guided Integrated Gradients to fix this by moving the path construction into the latent space of a pre-trained variational autoencoder. Intermediate points are decoded back to input space, which pulls the path toward the learned data manifold. This approach reduces noise from off-manifold regions and leads to explanations that better match how the model actually behaves near the input. A reader would care because accurate explanations help build trust in and debug complex neural network models.

Core claim

MA-GIG constructs attribution paths by interpolating in the latent space of a pre-trained variational autoencoder and decoding intermediate latent states to input space. This biases the path toward the learned generative manifold and reduces exposure to implausible regions with noisy gradients. Aggregating gradients along these manifold-aligned paths produces explanations that are more faithful to the model's predictions on features proximal to the input and outperforms prior path-based methods across datasets and classifiers.

What carries the argument

The manifold-aligned integration path, created by latent-space interpolation in a variational autoencoder followed by decoding to input space, which ensures gradient aggregations occur on plausible data points near the input.

Load-bearing premise

A pre-trained variational autoencoder must accurately capture the data manifold so that decoded latent-space paths yield gradient aggregations more faithful than those from input-space paths.

What would settle it

If quantitative faithfulness metrics on a dataset with high VAE reconstruction error show MA-GIG underperforming standard Guided Integrated Gradients, the claim that manifold alignment improves attributions would be falsified.

Figures

Figures reproduced from arXiv: 2605.02167 by Jaesik Choi, Kyowoon Lee, Seongwoo Lim, Soyeon Kim.

**Figure 1.** Figure 1: Overview of Manifold-Aligned Guided Integrated Gradients (MA-GIG). a) Noise-Robust Gradient Guidance: The visualization compares integration paths from the baseline to the input on the classifier’s logit surface f(x). The MA-GIG path (green solid line) traverses noise-robust regions, avoiding the high-frequency regions traversed by the linear IG path (pink dotted line). This is achieved by a gradient magni… view at source ↗

**Figure 2.** Figure 2: Qualitative comparison of attribution maps on ImageNet (InceptionV1), Oxford-IIIT Pet (ResNet18), and Oxford 102 Flower (VGG16) against baselines. Labels indicate predicted classes, and numbers in brackets denote prediction confidence. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 LPIPS GIG (input-space) MA-GIG (latent-space) Path position (0=baseline, 1=input) view at source ↗

**Figure 3.** Figure 3: LPIPS-based manifold alignment path analysis on ImageNet (ResNet18). The LPIPS distance from each intermediate sample γ(α) to the input. reconstruction MSE shows only a moderate correlation with DiffID (average Pearson r = 0.406) and Insertion AUC (r = 0.530), suggesting that domain alignment and taskrelevant latent structure matter beyond pixel-level reconstruction fidelity. Effect of Slerp. We investi… view at source ↗

**Figure 4.** Figure 4: Hyperparameter sensitivity and ablation analysis. (a) The model performance remains consistent across varying feature selection fractions. (b) We compare different pre-trained VAE backbones and the effect of Spherical Linear Interpolation (Slerp). Slerp (dashed lines) shows mixed changes relative to linear interpolation (solid lines), with no consistent DiffID gain. We therefore use linear interpolation as… view at source ↗

**Figure 5.** Figure 5: Qualitative comparison against baselines on ImageNet (InceptionV1). Left labels indicate the predicted class, and numbers in brackets denote confidence. 22 view at source ↗

**Figure 6.** Figure 6: Qualitative comparison on Oxford 102 Flower (VGG16). Labels on the left indicate predicted classes, and numbers in brackets denote prediction confidence. 23 view at source ↗

**Figure 7.** Figure 7: Qualitative comparison on Oxford-IIIT Pet (ResNet18). Labels on the left indicate predicted classes, and numbers in brackets denote prediction confidence. 24 view at source ↗

**Figure 8.** Figure 8: Qualitative comparison on ImageNet2012 (ResNet18 & VGG16). The left and right panels display results for the ResNet18 and VGG16 classifiers, respectively. For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integra… view at source ↗

**Figure 9.** Figure 9: Qualitative comparison on ImageNet2012 (InceptionV1) and Oxford-IIIT Pet (ResNet18). For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integration path, demonstrating how MA-GIG aggregates relevant attributions. … view at source ↗

**Figure 10.** Figure 10: Qualitative comparison on Oxford-IIIT Pet (VGG16 & InceptionV1). For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integration path, demonstrating how MA-GIG aggregates relevant attributions. (Conf.: Confidence) 28 view at source ↗

**Figure 11.** Figure 11: Qualitative comparison on Oxford 102 flower (ResNet18 & InceptionV1). For each example, the top row presents the attribution maps of IG, EIG, MIG, GIG, and MA-GIG. The second and third rows visualize the evolution of path features and their corresponding gradients, sampled at nine equally spaced intervals along the integration path, demonstrating how MA-GIG aggregates relevant attributions. (Conf.: Confid… view at source ↗

**Figure 12.** Figure 12: LPIPS distance along the integration path on fine-grained datasets. We measure LPIPS-based perceptual deviation of each intermediate sample γ(α) for (a) Oxford-IIIT Pet and (b) Oxford 102 Flower using ResNet18 view at source ↗

**Figure 13.** Figure 13: Average classifier confidence along the integration path (α from 0 to 1) on three classifiers. We measure the softmax score of each intermediate sample γ(α) for (a) ResNet18, (b) VGG16, and (c) InceptionV1 on ImageNet2012 view at source ↗

read the original abstract

Feature attribution is central to diagnosing and trusting deep neural networks, and Integrated Gradients (IG) is widely used due to its axiomatic properties. However, IG can yield unreliable explanations when the integration path between a baseline and the input passes through regions with noisy gradients. While Guided Integrated Gradients reduces this sensitivity by adaptively updating low-gradient-magnitude features, input-space guidance still produces intermediate inputs that deviate from the data manifold. To address this limitation, we propose \emph{Manifold-Aligned Guided Integrated Gradients} (MA-GIG), which constructs attribution paths in the latent space of a pre-trained variational autoencoder. By decoding intermediate latent states, MA-GIG biases the path toward the learned generative manifold and reduces exposure to implausible input-space regions. Through qualitative and quantitative evaluations, we demonstrate that MA-GIG produces faithful explanations by aggregating gradients on path features proximal to the input. Consequently, our method reduces off-manifold noise and outperforms prior path-based attribution methods across multiple datasets and classifiers. Our code is available at https://github.com/leekwoon/ma-gig/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Manifold-Aligned Guided Integrated Gradients (MA-GIG), an extension of Integrated Gradients (IG) that constructs attribution paths by interpolating in the latent space of a pre-trained variational autoencoder (VAE) and decoding the points back to input space. This is intended to bias paths toward the learned data manifold, reducing exposure to off-manifold regions with noisy gradients compared to standard input-space straight-line paths or Guided IG. The authors report qualitative visualizations and quantitative metrics showing improved faithfulness and outperformance over prior path-based attribution methods on multiple datasets and classifiers, with code released at the provided GitHub link.

Significance. If the central claim holds, the method offers a practical way to improve the reliability of axiomatic feature attributions by leveraging generative models for path construction. This addresses a recognized weakness of IG variants in high-dimensional data where straight-line paths often traverse low-density regions. The open-source code is a clear strength for reproducibility. However, the significance is tempered by the dependence on VAE quality, which is not yet shown to be robust across varying manifold approximations.

major comments (2)

[§3] §3 (Method, latent-space path construction): The assertion that decoded latent interpolations yield gradients 'proximal to the input' and reduce off-manifold noise is load-bearing for the faithfulness claim, yet no metric (e.g., reconstruction error, density estimation, or gradient-norm comparison on vs. off manifold) is reported to quantify how faithfully the VAE manifold approximates the true data distribution on the evaluation sets.
[§4] §4 (Experiments, quantitative results): The reported outperformance over baselines is presented as evidence of reduced off-manifold noise, but the evaluation lacks an ablation on VAE fidelity (different architectures, training regimes, or reconstruction PSNR/SSIM on test data). Without this, it remains possible that gains arise from the specific VAE rather than a general manifold-alignment benefit, undermining the generalizability claim.

minor comments (2)

[Abstract / §3] The phrasing 'aggregating gradients on path features proximal to the input' in the abstract and §3 could be made more precise by defining 'proximal' in terms of a distance or density threshold.
[Figures] Figure captions and axis labels in the qualitative results could explicitly state the VAE architecture and latent dimension used, to aid replication.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment point by point below, agreeing where the concerns are valid and outlining specific revisions to strengthen the presentation of our claims.

read point-by-point responses

Referee: [§3] §3 (Method, latent-space path construction): The assertion that decoded latent interpolations yield gradients 'proximal to the input' and reduce off-manifold noise is load-bearing for the faithfulness claim, yet no metric (e.g., reconstruction error, density estimation, or gradient-norm comparison on vs. off manifold) is reported to quantify how faithfully the VAE manifold approximates the true data distribution on the evaluation sets.

Authors: We agree that direct metrics quantifying VAE manifold fidelity would make the central claim more robust. In the revised manuscript we will report reconstruction errors (MSE) on the test sets for all datasets and include a brief comparison of gradient norms for decoded vs. straight-line path points. These additions will provide explicit evidence that decoded points remain closer to the data distribution. revision: yes
Referee: [§4] §4 (Experiments, quantitative results): The reported outperformance over baselines is presented as evidence of reduced off-manifold noise, but the evaluation lacks an ablation on VAE fidelity (different architectures, training regimes, or reconstruction PSNR/SSIM on test data). Without this, it remains possible that gains arise from the specific VAE rather than a general manifold-alignment benefit, undermining the generalizability claim.

Authors: The referee correctly identifies a gap in demonstrating that the benefit is due to manifold alignment in general rather than the particular VAE chosen. We will add an ablation study in the revision that varies VAE training duration and latent dimension on one dataset, reports the resulting reconstruction quality, and shows the corresponding change in MA-GIG faithfulness scores. This will directly link attribution performance to manifold approximation quality. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic construction with external VAE and empirical validation

full rationale

The paper presents MA-GIG as an algorithmic modification to Integrated Gradients that interpolates in the latent space of a separately pre-trained VAE, decodes points, and aggregates gradients along the resulting path. No equations, derivations, or self-citations are supplied that reduce the claimed reduction in off-manifold noise or the reported performance gains to a quantity defined by the method's own fitted parameters or prior outputs. The central claim rests on the external VAE manifold approximation plus qualitative/quantitative evaluations on multiple datasets, which are independent of any self-referential loop. This is a standard proposal of a new heuristic with external dependencies and benchmarks, not a closed derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that a pre-trained variational autoencoder faithfully represents the data manifold. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption A pre-trained variational autoencoder accurately models the underlying data manifold.
The method requires that decoded latent trajectories remain proximal to realistic inputs; the abstract does not discuss how the VAE is trained or validated.

pith-pipeline@v0.9.0 · 5500 in / 1340 out tokens · 87056 ms · 2026-05-08T19:04:06.763098+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
cs.CV 2026-05 unverdicted novelty 7.0

Spectral Integrated Gradients constructs SVD-based integration paths that activate singular components from largest to smallest, producing cleaner attribution maps and better quantitative scores than standard Integrat...