AGOP-IxG: A Gradient Covariance Filter for Local Feature Attribution on Tabular Data, with a Controlled Benchmark

Raj Kiran Gupta Katakam

arxiv: 2605.15700 · v1 · pith:FSZ4R55Rnew · submitted 2026-05-15 · 💻 cs.LG

AGOP-IxG: A Gradient Covariance Filter for Local Feature Attribution on Tabular Data, with a Controlled Benchmark

Raj Kiran Gupta Katakam This is my paper

Pith reviewed 2026-05-20 19:58 UTC · model grok-4.3

classification 💻 cs.LG

keywords feature attributiontabular datagradient outer productlocal explanationssynthetic benchmarksmodel interpretabilityAutoMLSHAP comparison

0 comments

The pith

AGOP-IxG filters per-sample gradients with a training-derived covariance matrix to improve local feature attributions on tabular classifiers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AGOP-IxG as a method that pre-multiplies each input sample's gradient by a low-rank version of the average gradient outer product matrix computed over the training set. This produces feature attributions for tabular models that are compared against SHAP, Integrated Gradients, InputXGradient, and LIME on three synthetic datasets where ground-truth attributions can be calculated exactly. AGOP-IxG records the highest Spearman rank correlation with true attributions and assigns the least mass to noise features across linear, sparse nonlinear, and interaction-based tasks, while running several hundred times faster than SHAP. On two real tabular datasets the method yields global faithfulness scores within 1.7 percent of the other techniques when measured by the ROAR protocol. The design targets local per-sample explanations rather than global feature ranking.

Core claim

Pre-multiplying the per-sample gradient vector by the top-K rank-truncated Average Gradient Outer Product matrix produces attributions whose ordering and magnitude align more closely with analytically known ground-truth feature contributions on synthetic tabular classification tasks than the attributions returned by SHAP, Integrated Gradients, InputXGradient, or LIME, while requiring orders-of-magnitude less computation.

What carries the argument

The top-K rank-truncated Average Gradient Outer Product (AGOP) matrix, which encodes the dominant directions of gradient variation across the training distribution and serves as a fixed linear filter applied to each test sample's gradient.

If this is right

On the linear synthetic task AGOP-IxG attains the highest Spearman correlation with the true linear coefficients.
Across all three synthetic tasks the method assigns lower total attribution mass to the injected noise features than any baseline.
On the interaction dataset AGOP-IxG records the best top-k precision for recovering the known interacting features.
Wall-clock time for a full test-set explanation is between 350 and 1,650 times lower than DeepExplainer SHAP under identical hardware.
Global ROAR AUC on Adult Income and Credit Card Default stays within roughly 1.7 percent relative difference of the other four methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fixed AGOP matrix could be reused across an entire model family or across successive retrainings without recomputation, further reducing cost in production pipelines.
Because the filter is derived from gradient statistics rather than from the loss surface directly, the approach may extend naturally to regression or ranking models where per-sample gradients are still available.
If the AGOP matrix is recomputed periodically on a sliding window of recent data, the method could adapt to mild distribution drift while retaining its speed advantage.
The observed clustering of global faithfulness scores suggests that local attribution quality and global feature ranking are partially orthogonal objectives that may require separate evaluation protocols.

Load-bearing premise

The average gradient outer product computed once on the training distribution remains a useful filter even for test points that lie far from the training distribution or inside highly nonlinear regions of the model.

What would settle it

Compute ground-truth attributions on a new synthetic dataset that deliberately places test samples in a region of feature space distant from the training distribution and measure whether AGOP-IxG's Spearman correlation drops below that of plain InputXGradient.

Figures

Figures reproduced from arXiv: 2605.15700 by Raj Kiran Gupta Katakam.

**Figure 1.** Figure 1: Per-sample attribution on the three synthetic datasets (rows: linear, sparse nonlinear, interaction) for three randomly selected test indices (columns: 654, 89, 773). Each panel’s matplotlib title encodes the dataset, sample index, true and predicted class, prediction outcome (✓/×), and model confidence; the 𝑥-axis annotation of each non-True panel shows per-sample Spearman 𝜌 against ground truth [PITH_F… view at source ↗

**Figure 2.** Figure 2: ROAR results on Adult and Credit datasets. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Automated machine learning pipelines increasingly produce models whose predictions must be explained to end users, auditors, and downstream decision systems. The most widely used feature attribution methods (SHAP, Integrated Gradients, LIME) are typically chosen by convention rather than measured fidelity, because rigorous evaluation is impeded by the absence of ground-truth attribution on real data. We propose AGOP-IxG, a fast per-sample attribution method for tabular classifiers that pre-multiplies the per-sample gradient by a top-$K$ rank-truncated Average Gradient Outer Product matrix, and evaluate it against four widely-used baselines on a controlled tabular benchmark designed for AutoML practitioners. In Part 1, we construct three synthetic multi-class tabular tasks (linear, sparse nonlinear, interaction-based) where ground-truth attribution per sample is analytically or numerically derivable, and compare five methods: AGOP-IxG, SHAP (DeepExplainer), Integrated Gradients, InputXGradient, and LIME. AGOP-IxG leads on Spearman rank correlation and noise feature mass on all three synthetic datasets, and on top-$k$ precision on the interaction dataset. Across all settings, AGOP-IxG is approximately $350\times$ to $1{,}650\times$ faster than SHAP. In Part 2, we evaluate global faithfulness on Adult Income and Credit Card Default using the ROAR protocol; the methods cluster within $\sim 1.7\%$ relative AUC, consistent with AGOP-IxG being optimized for per-sample local attribution rather than global feature ranking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces AGOP-IxG, a local feature attribution method for tabular classifiers that pre-multiplies each per-sample gradient by a top-K rank-truncated Average Gradient Outer Product (AGOP) matrix computed from the training distribution. It constructs three synthetic multi-class tabular tasks (linear, sparse nonlinear, interaction-based) with analytically or numerically derivable ground-truth attributions and compares AGOP-IxG against SHAP (DeepExplainer), Integrated Gradients, InputXGradient, and LIME. AGOP-IxG leads on Spearman rank correlation and noise feature mass across all three tasks and on top-k precision for the interaction task, while being 350x–1650x faster than SHAP. On real datasets (Adult Income, Credit Card Default) it shows comparable global faithfulness under the ROAR protocol.

Significance. The controlled benchmark with ground-truth attributions on synthetic tasks is a clear strength that enables rigorous, falsifiable evaluation of local attribution fidelity, addressing a persistent gap in the field. If the performance claims hold, AGOP-IxG supplies a practical, computationally lightweight alternative for per-sample explanations in tabular AutoML pipelines. The reported speed advantage over SHAP is directly relevant for deployment. The work also demonstrates awareness of the distinction between local and global faithfulness by separating the two evaluation regimes.

major comments (2)

[Part 1] Part 1, synthetic tasks: the reported gains on the sparse nonlinear and interaction datasets rest on the assumption that a single global AGOP matrix (top-K truncated) computed on the training distribution preserves or enhances the locally relevant directions of per-sample gradients at test points. No diagnostic is supplied that compares this global matrix to a locally estimated outer product (e.g., via neighborhood sampling around the test instances where headline metrics are measured). This is load-bearing because, in nonlinear regimes, sign cancellation across regions can produce a filter that attenuates rather than amplifies true local attributions.
[Method] Method description, top-K truncation: the top-K rank threshold is explicitly listed as a free parameter. The manuscript must specify the exact procedure used to choose K for each dataset and confirm that selection was performed without reference to the evaluation metrics (Spearman correlation, top-k precision, noise mass) to eliminate circularity risk in the central performance claims.

minor comments (2)

[Abstract] Abstract: the speedup range (350× to 1,650×) should be accompanied by the model architectures, batch sizes, and hardware used for the timing measurements to support reproducibility.
[Part 2] Part 2, ROAR results: the statement that methods cluster within ~1.7% relative AUC would be strengthened by reporting standard deviations across runs or statistical significance tests.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. The points raised are substantive and we address each one directly below, outlining the revisions we will make to the manuscript.

read point-by-point responses

Referee: Part 1, synthetic tasks: the reported gains on the sparse nonlinear and interaction datasets rest on the assumption that a single global AGOP matrix (top-K truncated) computed on the training distribution preserves or enhances the locally relevant directions of per-sample gradients at test points. No diagnostic is supplied that compares this global matrix to a locally estimated outer product (e.g., via neighborhood sampling around the test instances where headline metrics are measured). This is load-bearing because, in nonlinear regimes, sign cancellation across regions can produce a filter that attenuates rather than amplifies true local attributions.

Authors: We agree that an explicit diagnostic comparing the global AGOP to local estimates would strengthen the paper. In the revision we will add a new subsection that computes local AGOP matrices via neighborhood sampling (k-nearest neighbors in feature space) around each test point, compares the leading eigenvectors to those of the global matrix, and reports the resulting attribution metrics when the local filter is substituted for the global one. This will directly test whether sign cancellation attenuates performance on the nonlinear and interaction tasks. The current strong results with ground-truth labels provide supporting evidence, but we accept that the requested diagnostic is a valuable addition. revision: yes
Referee: Method description, top-K truncation: the top-K rank threshold is explicitly listed as a free parameter. The manuscript must specify the exact procedure used to choose K for each dataset and confirm that selection was performed without reference to the evaluation metrics (Spearman correlation, top-k precision, noise mass) to eliminate circularity risk in the central performance claims.

Authors: We will expand the method section to state the precise selection rule: K is the smallest integer such that the sum of the top-K eigenvalues of the training-set AGOP accounts for at least 95 % of the trace. This threshold is computed once on the training distribution before any test-set evaluation or metric computation. We confirm that K was never tuned against Spearman correlation, top-k precision, or noise-mass values. The revision will also include a sensitivity table showing metric variation for K values bracketing the chosen threshold. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or evaluation chain

full rationale

The paper defines AGOP-IxG as a fixed procedure (per-sample gradient pre-multiplied by a top-K truncated average gradient outer product computed once on the training distribution) and then measures its performance against baselines on synthetic tasks whose ground-truth attributions are constructed independently of the method. No step in the method definition or the reported metrics reduces by construction to the evaluation targets; the top-K truncation is part of the method specification rather than a post-hoc fit to the headline Spearman or precision numbers. The benchmark uses analytically or numerically derivable ground truth on linear, sparse-nonlinear, and interaction datasets, and the real-data ROAR results are reported as a secondary global-faithfulness check. No self-citation is load-bearing for the central claim, and the derivation does not rename or smuggle in prior results. The paper is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that a single global gradient covariance matrix computed once on the training set provides a stable filter for every individual sample. The top-K truncation introduces one free parameter whose value is not stated in the abstract.

free parameters (1)

top-K rank truncation threshold
The number of leading eigenvectors retained from the average gradient outer product matrix is chosen to balance fidelity and speed; its specific value is not reported in the abstract.

axioms (1)

domain assumption The gradient outer product averaged over the training distribution captures the dominant directions of feature influence for individual test samples.
Invoked when the method pre-multiplies the per-sample gradient by the truncated AGOP matrix.

pith-pipeline@v0.9.0 · 5819 in / 1363 out tokens · 30419 ms · 2026-05-20T19:58:01.280957+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AGOP-IxG ... pre-multiplies the per-sample gradient by a top-K rank-truncated Average Gradient Outer Product matrix
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

M = 1/n ∑ g_i g_i^T ... eigendecompose M = V Λ V^T

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

[1]

Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., and Kim, B. (2018). Sanity checks for saliency maps. In Advances in Neural Information Processing Systems , volume 31

work page 2018
[2]

Engstrom, L., Feldman, A., Zou, J., Madry, A., and Ilyas, A. (2023). Dsdm: Model-aware dataset selection with datamodels. In International Conference on Machine Learning

work page 2023
[3]

Gijsbers, P., Bueno, M. L. P., Coors, S., LeDell, E., Poirier, S., Thomas, J., Bischl, B., and Vanschoren, J. (2022). Amlb: an automl benchmark. arXiv preprint arXiv:2207.12560

work page arXiv 2022
[4]

Hooker, S., Erhan, D., Kindermans, P.-J., and Kim, B. (2019). A benchmark for interpretability methods in deep neural networks. In Advances in Neural Information Processing Systems , volume 32

work page 2019
[5]

Jacot, A., Gabriel, F., and Hongler, C. (2018). Neural tangent kernel: Convergence and generalization in neural networks. In Advances in Neural Information Processing Systems , volume 31

work page 2018
[6]

Katakam, R. K. G. (2026). Agop as explanation: From feature learning to per-sample attribution in image classifiers. arXiv preprint arXiv:2605.12816

work page internal anchor Pith review Pith/arXiv arXiv 2026
[7]

Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining , pages 202--207

work page 1996
[8]

Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems , volume 30

work page 2017
[9]

Radhakrishnan, A., Stefanakis, G., Belkin, M., and Uhler, C. (2022). Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features. arXiv preprint arXiv:2212.13881

work page arXiv 2022
[10]

why should i trust you?

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). "why should i trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 1135--1144

work page 2016
[11]

Samek, W., Binder, A., Montavon, G., Lapuschkin, S., and M \"u ller, K.-R. (2017). Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems , 28(11):2660--2673

work page 2017
[12]

Shapley, L. S. (1953). A value for n-person games. Contributions to the Theory of Games , 2:307--317

work page 1953
[13]

Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A. (2016). Not just a black box: Learning important features through propagating activation differences. In ICML Workshop on Human Interpretability in Machine Learning

work page 2016
[14]

Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In International Conference on Learning Representations Workshop

work page 2014
[15]

Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic attribution for deep networks. In International Conference on Machine Learning , pages 3319--3328

work page 2017
[16]

and Kim, B

Yang, M. and Kim, B. (2019). Benchmarking attribution methods with relative feature importance. In arXiv preprint arXiv:1907.09701

work page arXiv 2019
[17]

and Lien, C.-h

Yeh, I.-C. and Lien, C.-h. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications , 36(2):2473--2480

work page 2009

[1] [1]

Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., and Kim, B. (2018). Sanity checks for saliency maps. In Advances in Neural Information Processing Systems , volume 31

work page 2018

[2] [2]

Engstrom, L., Feldman, A., Zou, J., Madry, A., and Ilyas, A. (2023). Dsdm: Model-aware dataset selection with datamodels. In International Conference on Machine Learning

work page 2023

[3] [3]

Gijsbers, P., Bueno, M. L. P., Coors, S., LeDell, E., Poirier, S., Thomas, J., Bischl, B., and Vanschoren, J. (2022). Amlb: an automl benchmark. arXiv preprint arXiv:2207.12560

work page arXiv 2022

[4] [4]

Hooker, S., Erhan, D., Kindermans, P.-J., and Kim, B. (2019). A benchmark for interpretability methods in deep neural networks. In Advances in Neural Information Processing Systems , volume 32

work page 2019

[5] [5]

Jacot, A., Gabriel, F., and Hongler, C. (2018). Neural tangent kernel: Convergence and generalization in neural networks. In Advances in Neural Information Processing Systems , volume 31

work page 2018

[6] [6]

Katakam, R. K. G. (2026). Agop as explanation: From feature learning to per-sample attribution in image classifiers. arXiv preprint arXiv:2605.12816

work page internal anchor Pith review Pith/arXiv arXiv 2026

[7] [7]

Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining , pages 202--207

work page 1996

[8] [8]

Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems , volume 30

work page 2017

[9] [9]

Radhakrishnan, A., Stefanakis, G., Belkin, M., and Uhler, C. (2022). Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features. arXiv preprint arXiv:2212.13881

work page arXiv 2022

[10] [10]

why should i trust you?

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). "why should i trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 1135--1144

work page 2016

[11] [11]

Samek, W., Binder, A., Montavon, G., Lapuschkin, S., and M \"u ller, K.-R. (2017). Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems , 28(11):2660--2673

work page 2017

[12] [12]

Shapley, L. S. (1953). A value for n-person games. Contributions to the Theory of Games , 2:307--317

work page 1953

[13] [13]

Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A. (2016). Not just a black box: Learning important features through propagating activation differences. In ICML Workshop on Human Interpretability in Machine Learning

work page 2016

[14] [14]

Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In International Conference on Learning Representations Workshop

work page 2014

[15] [15]

Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic attribution for deep networks. In International Conference on Machine Learning , pages 3319--3328

work page 2017

[16] [16]

and Kim, B

Yang, M. and Kim, B. (2019). Benchmarking attribution methods with relative feature importance. In arXiv preprint arXiv:1907.09701

work page arXiv 2019

[17] [17]

and Lien, C.-h

Yeh, I.-C. and Lien, C.-h. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications , 36(2):2473--2480

work page 2009