AGOP-IxG: A Gradient Covariance Filter for Local Feature Attribution on Tabular Data, with a Controlled Benchmark
Pith reviewed 2026-05-20 19:58 UTC · model grok-4.3
The pith
AGOP-IxG filters per-sample gradients with a training-derived covariance matrix to improve local feature attributions on tabular classifiers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Pre-multiplying the per-sample gradient vector by the top-K rank-truncated Average Gradient Outer Product matrix produces attributions whose ordering and magnitude align more closely with analytically known ground-truth feature contributions on synthetic tabular classification tasks than the attributions returned by SHAP, Integrated Gradients, InputXGradient, or LIME, while requiring orders-of-magnitude less computation.
What carries the argument
The top-K rank-truncated Average Gradient Outer Product (AGOP) matrix, which encodes the dominant directions of gradient variation across the training distribution and serves as a fixed linear filter applied to each test sample's gradient.
If this is right
- On the linear synthetic task AGOP-IxG attains the highest Spearman correlation with the true linear coefficients.
- Across all three synthetic tasks the method assigns lower total attribution mass to the injected noise features than any baseline.
- On the interaction dataset AGOP-IxG records the best top-k precision for recovering the known interacting features.
- Wall-clock time for a full test-set explanation is between 350 and 1,650 times lower than DeepExplainer SHAP under identical hardware.
- Global ROAR AUC on Adult Income and Credit Card Default stays within roughly 1.7 percent relative difference of the other four methods.
Where Pith is reading between the lines
- The same fixed AGOP matrix could be reused across an entire model family or across successive retrainings without recomputation, further reducing cost in production pipelines.
- Because the filter is derived from gradient statistics rather than from the loss surface directly, the approach may extend naturally to regression or ranking models where per-sample gradients are still available.
- If the AGOP matrix is recomputed periodically on a sliding window of recent data, the method could adapt to mild distribution drift while retaining its speed advantage.
- The observed clustering of global faithfulness scores suggests that local attribution quality and global feature ranking are partially orthogonal objectives that may require separate evaluation protocols.
Load-bearing premise
The average gradient outer product computed once on the training distribution remains a useful filter even for test points that lie far from the training distribution or inside highly nonlinear regions of the model.
What would settle it
Compute ground-truth attributions on a new synthetic dataset that deliberately places test samples in a region of feature space distant from the training distribution and measure whether AGOP-IxG's Spearman correlation drops below that of plain InputXGradient.
Figures
read the original abstract
Automated machine learning pipelines increasingly produce models whose predictions must be explained to end users, auditors, and downstream decision systems. The most widely used feature attribution methods (SHAP, Integrated Gradients, LIME) are typically chosen by convention rather than measured fidelity, because rigorous evaluation is impeded by the absence of ground-truth attribution on real data. We propose AGOP-IxG, a fast per-sample attribution method for tabular classifiers that pre-multiplies the per-sample gradient by a top-$K$ rank-truncated Average Gradient Outer Product matrix, and evaluate it against four widely-used baselines on a controlled tabular benchmark designed for AutoML practitioners. In Part 1, we construct three synthetic multi-class tabular tasks (linear, sparse nonlinear, interaction-based) where ground-truth attribution per sample is analytically or numerically derivable, and compare five methods: AGOP-IxG, SHAP (DeepExplainer), Integrated Gradients, InputXGradient, and LIME. AGOP-IxG leads on Spearman rank correlation and noise feature mass on all three synthetic datasets, and on top-$k$ precision on the interaction dataset. Across all settings, AGOP-IxG is approximately $350\times$ to $1{,}650\times$ faster than SHAP. In Part 2, we evaluate global faithfulness on Adult Income and Credit Card Default using the ROAR protocol; the methods cluster within $\sim 1.7\%$ relative AUC, consistent with AGOP-IxG being optimized for per-sample local attribution rather than global feature ranking.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces AGOP-IxG, a local feature attribution method for tabular classifiers that pre-multiplies each per-sample gradient by a top-K rank-truncated Average Gradient Outer Product (AGOP) matrix computed from the training distribution. It constructs three synthetic multi-class tabular tasks (linear, sparse nonlinear, interaction-based) with analytically or numerically derivable ground-truth attributions and compares AGOP-IxG against SHAP (DeepExplainer), Integrated Gradients, InputXGradient, and LIME. AGOP-IxG leads on Spearman rank correlation and noise feature mass across all three tasks and on top-k precision for the interaction task, while being 350x–1650x faster than SHAP. On real datasets (Adult Income, Credit Card Default) it shows comparable global faithfulness under the ROAR protocol.
Significance. The controlled benchmark with ground-truth attributions on synthetic tasks is a clear strength that enables rigorous, falsifiable evaluation of local attribution fidelity, addressing a persistent gap in the field. If the performance claims hold, AGOP-IxG supplies a practical, computationally lightweight alternative for per-sample explanations in tabular AutoML pipelines. The reported speed advantage over SHAP is directly relevant for deployment. The work also demonstrates awareness of the distinction between local and global faithfulness by separating the two evaluation regimes.
major comments (2)
- [Part 1] Part 1, synthetic tasks: the reported gains on the sparse nonlinear and interaction datasets rest on the assumption that a single global AGOP matrix (top-K truncated) computed on the training distribution preserves or enhances the locally relevant directions of per-sample gradients at test points. No diagnostic is supplied that compares this global matrix to a locally estimated outer product (e.g., via neighborhood sampling around the test instances where headline metrics are measured). This is load-bearing because, in nonlinear regimes, sign cancellation across regions can produce a filter that attenuates rather than amplifies true local attributions.
- [Method] Method description, top-K truncation: the top-K rank threshold is explicitly listed as a free parameter. The manuscript must specify the exact procedure used to choose K for each dataset and confirm that selection was performed without reference to the evaluation metrics (Spearman correlation, top-k precision, noise mass) to eliminate circularity risk in the central performance claims.
minor comments (2)
- [Abstract] Abstract: the speedup range (350× to 1,650×) should be accompanied by the model architectures, batch sizes, and hardware used for the timing measurements to support reproducibility.
- [Part 2] Part 2, ROAR results: the statement that methods cluster within ~1.7% relative AUC would be strengthened by reporting standard deviations across runs or statistical significance tests.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. The points raised are substantive and we address each one directly below, outlining the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: Part 1, synthetic tasks: the reported gains on the sparse nonlinear and interaction datasets rest on the assumption that a single global AGOP matrix (top-K truncated) computed on the training distribution preserves or enhances the locally relevant directions of per-sample gradients at test points. No diagnostic is supplied that compares this global matrix to a locally estimated outer product (e.g., via neighborhood sampling around the test instances where headline metrics are measured). This is load-bearing because, in nonlinear regimes, sign cancellation across regions can produce a filter that attenuates rather than amplifies true local attributions.
Authors: We agree that an explicit diagnostic comparing the global AGOP to local estimates would strengthen the paper. In the revision we will add a new subsection that computes local AGOP matrices via neighborhood sampling (k-nearest neighbors in feature space) around each test point, compares the leading eigenvectors to those of the global matrix, and reports the resulting attribution metrics when the local filter is substituted for the global one. This will directly test whether sign cancellation attenuates performance on the nonlinear and interaction tasks. The current strong results with ground-truth labels provide supporting evidence, but we accept that the requested diagnostic is a valuable addition. revision: yes
-
Referee: Method description, top-K truncation: the top-K rank threshold is explicitly listed as a free parameter. The manuscript must specify the exact procedure used to choose K for each dataset and confirm that selection was performed without reference to the evaluation metrics (Spearman correlation, top-k precision, noise mass) to eliminate circularity risk in the central performance claims.
Authors: We will expand the method section to state the precise selection rule: K is the smallest integer such that the sum of the top-K eigenvalues of the training-set AGOP accounts for at least 95 % of the trace. This threshold is computed once on the training distribution before any test-set evaluation or metric computation. We confirm that K was never tuned against Spearman correlation, top-k precision, or noise-mass values. The revision will also include a sensitivity table showing metric variation for K values bracketing the chosen threshold. revision: yes
Circularity Check
No significant circularity in derivation or evaluation chain
full rationale
The paper defines AGOP-IxG as a fixed procedure (per-sample gradient pre-multiplied by a top-K truncated average gradient outer product computed once on the training distribution) and then measures its performance against baselines on synthetic tasks whose ground-truth attributions are constructed independently of the method. No step in the method definition or the reported metrics reduces by construction to the evaluation targets; the top-K truncation is part of the method specification rather than a post-hoc fit to the headline Spearman or precision numbers. The benchmark uses analytically or numerically derivable ground truth on linear, sparse-nonlinear, and interaction datasets, and the real-data ROAR results are reported as a secondary global-faithfulness check. No self-citation is load-bearing for the central claim, and the derivation does not rename or smuggle in prior results. The paper is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- top-K rank truncation threshold
axioms (1)
- domain assumption The gradient outer product averaged over the training distribution captures the dominant directions of feature influence for individual test samples.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
AGOP-IxG ... pre-multiplies the per-sample gradient by a top-K rank-truncated Average Gradient Outer Product matrix
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
M = 1/n ∑ g_i g_i^T ... eigendecompose M = V Λ V^T
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., and Kim, B. (2018). Sanity checks for saliency maps. In Advances in Neural Information Processing Systems , volume 31
work page 2018
-
[2]
Engstrom, L., Feldman, A., Zou, J., Madry, A., and Ilyas, A. (2023). Dsdm: Model-aware dataset selection with datamodels. In International Conference on Machine Learning
work page 2023
- [3]
-
[4]
Hooker, S., Erhan, D., Kindermans, P.-J., and Kim, B. (2019). A benchmark for interpretability methods in deep neural networks. In Advances in Neural Information Processing Systems , volume 32
work page 2019
-
[5]
Jacot, A., Gabriel, F., and Hongler, C. (2018). Neural tangent kernel: Convergence and generalization in neural networks. In Advances in Neural Information Processing Systems , volume 31
work page 2018
-
[6]
Katakam, R. K. G. (2026). Agop as explanation: From feature learning to per-sample attribution in image classifiers. arXiv preprint arXiv:2605.12816
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[7]
Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining , pages 202--207
work page 1996
-
[8]
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems , volume 30
work page 2017
- [9]
-
[10]
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). "why should i trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 1135--1144
work page 2016
-
[11]
Samek, W., Binder, A., Montavon, G., Lapuschkin, S., and M \"u ller, K.-R. (2017). Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems , 28(11):2660--2673
work page 2017
-
[12]
Shapley, L. S. (1953). A value for n-person games. Contributions to the Theory of Games , 2:307--317
work page 1953
-
[13]
Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A. (2016). Not just a black box: Learning important features through propagating activation differences. In ICML Workshop on Human Interpretability in Machine Learning
work page 2016
-
[14]
Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Deep inside convolutional networks: Visualising image classification models and saliency maps. In International Conference on Learning Representations Workshop
work page 2014
-
[15]
Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic attribution for deep networks. In International Conference on Machine Learning , pages 3319--3328
work page 2017
-
[16]
Yang, M. and Kim, B. (2019). Benchmarking attribution methods with relative feature importance. In arXiv preprint arXiv:1907.09701
-
[17]
Yeh, I.-C. and Lien, C.-h. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications , 36(2):2473--2480
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.