AGOP as Explanation: From Feature Learning to Per-Sample Attribution in Image Classifiers

· 2026 · cs.LG · arXiv 2605.12816

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

The Average Gradient Outer Product (AGOP) governs feature learning in neural networks: the Neural Feature Ansatz states that weight Gram matrices at each layer align with the corresponding AGOP matrices computed over the training distribution. We ask a complementary question: can this same quantity serve as a post-hoc attribution method for explaining individual predictions? We introduce AGOP-Weighted: a novel attribution method that multiplies the per-sample gradient by sqrt(diag(M) / max diag(M)), a training-distribution prior that suppresses gradient noise and amplifies consistently important pixels -- a combination not present in any prior attribution method. We formalise two companion variants -- AGOP-Local (per-sample gradient, equivalent to VanillaGrad) and AGOP-Global (diag(M) directly as a zero-cost saliency map) -- and implement an efficient training-time accumulation hook; AGOP-Global then requires zero inference cost (disk lookup) while AGOP-Weighted requires only a single gradient pass. We conduct the first rigorous comparison of AGOP attribution against Integrated Gradients (IG), SmoothGrad, GradCAM, and VanillaGrad across two benchmarks with pixel-level ground truth: (i) the synthetic XAI-TRIS benchmark (four classification scenarios, 8x8 images, CNN8by8) and (ii) the photorealistic CLEVR-XAI benchmark (ResNet-18 fine-tuned from ImageNet). AGOP-Weighted achieves 44% higher mIoU than IG on linear tasks; AGOP-Global achieves 7x higher mIoU than IG on multiplicative tasks (where IG falls below random) at zero inference cost. Both findings generalise to ResNet-18 on CLEVR-XAI (+18% and +37% respectively). We further show that GradCAM fails on small-resolution images due to spatial resolution collapse, and that diag(M) quality improves monotonically throughout training even after classification accuracy has plateaued.

representative citing papers

AGOP-IxG: A Gradient Covariance Filter for Local Feature Attribution on Tabular Data, with a Controlled Benchmark

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

AGOP-IxG filters per-sample gradients with a top-K truncated average gradient outer product matrix and outperforms SHAP, Integrated Gradients, InputXGradient, and LIME on Spearman correlation and noise mass across three synthetic tabular tasks while running 350-1650x faster.

citing papers explorer

Showing 1 of 1 citing paper.

AGOP-IxG: A Gradient Covariance Filter for Local Feature Attribution on Tabular Data, with a Controlled Benchmark cs.LG · 2026-05-15 · unverdicted · none · ref 6 · internal anchor
AGOP-IxG filters per-sample gradients with a top-K truncated average gradient outer product matrix and outperforms SHAP, Integrated Gradients, InputXGradient, and LIME on Spearman correlation and noise mass across three synthetic tabular tasks while running 350-1650x faster.

AGOP as Explanation: From Feature Learning to Per-Sample Attribution in Image Classifiers

fields

years

verdicts

representative citing papers

citing papers explorer