Fast and Accurate Explanations of Distance-Based Classifiers by Uncovering Latent Explanatory Structures
Pith reviewed 2026-05-19 00:02 UTC · model grok-4.3
The pith
Distance-based classifiers contain a hidden neural network of linear detectors and nonlinear pooling layers that lets standard XAI methods produce fast, accurate explanations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Any distance-based classifier can be exactly decomposed into linear detection units combined with nonlinear pooling layers without changing the model's output or introducing extra parameters. Once expressed in this form, Explainable AI methods such as layer-wise relevance propagation become applicable and deliver explanations that are both computationally efficient and faithful to the original decision rule.
What carries the argument
The latent neural network structure consisting of linear detection units followed by nonlinear pooling layers that exactly reproduces the distance-based decision rule.
If this is right
- Explanations for k-nearest neighbors and SVMs can be obtained in a single forward-backward pass rather than repeated model evaluations.
- The same decomposition applies to any distance-based model, extending the range of classifiers for which layer-wise relevance propagation works without modification.
- Quantitative comparisons show higher fidelity and lower runtime than perturbation-based and gradient-based baselines.
- The approach supports two concrete use cases where explanations help domain experts trust or debug predictions in scientific and industrial settings.
Where Pith is reading between the lines
- The same decomposition could let other neural-network analysis tools, such as network pruning or adversarial example generation, transfer directly to distance-based models.
- If the linear detection units correspond to human-interpretable features, the method might also improve the interpretability of the original distance metric itself.
- Hybrid models that combine learned distance functions with explicit pooling layers might inherit both the accuracy of distance-based classifiers and the explanation machinery developed for neural networks.
Load-bearing premise
The decision rule of any distance-based classifier can be exactly decomposed into linear detection units followed by nonlinear pooling without altering the model's output or requiring additional fitting parameters.
What would settle it
A distance-based classifier whose predictions cannot be reproduced exactly by any arrangement of linear detection units and nonlinear pooling layers, or whose LRP explanations on this structure fail to match the true feature contributions.
Figures
read the original abstract
Distance-based classifiers, such as k-nearest neighbors and support vector machines, continue to be a workhorse of machine learning, widely used in science and industry. In practice, to derive insights from these models, it is also important to ensure that their predictions are explainable. While the field of Explainable AI has supplied methods that are in principle applicable to any model, it has also emphasized the usefulness of latent structures (e.g. the sequence of layers in a neural network) to produce explanations. In this paper, we contribute by uncovering a hidden neural network structure in distance-based classifiers (consisting of linear detection units combined with nonlinear pooling layers) upon which Explainable AI techniques such as layer-wise relevance propagation (LRP) become applicable. Through quantitative evaluations, we demonstrate the advantage of our novel explanation approach over several baselines. We also show the overall usefulness of explaining distance-based models through two practical use cases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that distance-based classifiers such as kNN and SVM possess a latent neural-network structure consisting of linear detection units (computing distances to fixed prototypes or support vectors) followed by nonlinear pooling layers. By exposing this structure, standard XAI techniques like layer-wise relevance propagation become directly applicable, producing fast and faithful explanations that outperform several baselines in quantitative evaluations; the authors further illustrate utility via two practical use cases.
Significance. If the decomposition is exact, parameter-free, and preserves the original decision rule, the work would usefully extend LRP-style explanations to a class of models still common in scientific and industrial applications, providing an algebraic bridge between distance-based and neural-network interpretability methods.
major comments (2)
- [§3.2] §3.2 (kNN case): the decomposition into a static set of linear detection units plus a fixed nonlinear pooling layer cannot be exact for standard kNN, because neighbor selection is query-dependent; representing the rule with all training points as fixed units either changes the decision boundary or requires a query-dependent graph that standard LRP does not accommodate without additional approximation or parameters.
- [§4.1] §4.1 and Table 2: the reported accuracy and runtime advantages over baselines are presented without ablation on the choice of pooling function or verification that the LRP explanations remain faithful to the original distance-based decision rule when the decomposition is applied to non-parametric models.
minor comments (2)
- Notation for the linear detection units should be aligned more explicitly with the original classifier parameters (e.g., prototypes in kNN or support vectors in SVM) to avoid ambiguity.
- The use-case demonstrations would benefit from explicit dataset sizes, class balances, and quantitative metrics rather than qualitative description alone.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, offering clarifications and indicating planned revisions where appropriate to strengthen the presentation.
read point-by-point responses
-
Referee: [§3.2] §3.2 (kNN case): the decomposition into a static set of linear detection units plus a fixed nonlinear pooling layer cannot be exact for standard kNN, because neighbor selection is query-dependent; representing the rule with all training points as fixed units either changes the decision boundary or requires a query-dependent graph that standard LRP does not accommodate without additional approximation or parameters.
Authors: We respectfully disagree that the decomposition cannot be exact. The kNN decision rule can be precisely expressed as a fixed neural network structure: the detection layer computes distances to all training points as fixed prototypes (linear units), and the pooling layer applies a nonlinear function that determines the k nearest neighbors based on these distances and aggregates their labels via majority vote. Although which specific neighbors are selected depends on the input values, the network architecture, including all connections and the pooling function, is static and query-independent. This allows direct application of LRP by propagating relevance through the fixed layers, with appropriate handling of the nonlinear pooling (as described in our method). We will revise the manuscript in §3.2 to better explain this equivalence and clarify how the query-dependent aspect of neighbor identity does not affect the fixed nature of the model graph. revision: no
-
Referee: [§4.1] §4.1 and Table 2: the reported accuracy and runtime advantages over baselines are presented without ablation on the choice of pooling function or verification that the LRP explanations remain faithful to the original distance-based decision rule when the decomposition is applied to non-parametric models.
Authors: We agree that additional analyses would enhance the robustness of our claims. In the revised manuscript, we will include an ablation study on the choice of pooling function, comparing different nonlinear pooling variants (such as exact kNN pooling versus differentiable approximations) and their impact on explanation quality and runtime. Furthermore, we will add verification of faithfulness for non-parametric models like kNN by evaluating how well the LRP explanations align with the original model's decisions, for instance through quantitative metrics like explanation faithfulness scores or by checking consistency with perturbation-based evaluations. These additions will be incorporated into §4.1 and updated in Table 2. revision: yes
Circularity Check
No significant circularity; derivation is algebraic rewriting of classifier definitions
full rationale
The paper frames the latent structure (linear detection units + nonlinear pooling) as a direct algebraic decomposition of distance-based classifiers such as kNN and SVM that preserves the original decision rule without introducing fitted parameters or query-dependent restructuring inside the explanation step. No equations reduce a claimed prediction or LRP output to quantities defined by the explanation procedure itself. Self-citations, if present, are not load-bearing for the core rewriting claim, which rests on the models' explicit distance computations rather than prior author results. The approach therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Any distance-based classifier decision rule admits an exact rewriting as a composition of linear detection units and nonlinear pooling layers.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
neural network reformulation... preserves exactly the original decision boundary... no retraining
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Sch ¨utt, K.-R. M ¨uller, Machine learning of accurate energy- conserving molecular force fields, Science Advances 3 (5) (2017) e1603015
work page 2017
-
[2]
P. Semnani, M. Bogojeski, F. Bley, Z. Zhang, Q. Wu, T. Kneib, J. Herrmann, C. Weisser, F. Patcas, K.-R. M¨uller, A machine learning and explainable ai framework tailored for unbalanced experimental catalyst discovery, The Journal of Physical Chemistry C 128 (50) (2024) 21349–21367. 14
work page 2024
-
[3]
V . Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, G. Kasneci, Deep neural networks and tabular data: A survey, IEEE Transactions on Neural Networks and Learning Systems 35 (6) (2024) 7499–7519
work page 2024
-
[4]
C.-C. Chang, C.-J. Lin, Libsvm: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2 (3) (2011) 1–27
work page 2011
-
[5]
C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Discov. 2 (2) (1998) 121– 167
work page 1998
-
[6]
A. Binder, M. Bockmayr, M. H ¨agele, S. Wienert, D. Heim, K. Hellweg, M. Ishii, A. Stenzinger, A. Hocke, C. Denkert, K.-R. M¨uller, F. Klauschen, Morphological and molecular breast cancer profiling through explainable machine learning, Nat. Mach. Intell. 3 (4) (2021) 355–366
work page 2021
- [7]
- [8]
-
[9]
S. M. Lundberg, S. Lee, A unified approach to interpreting model predictions, in: NIPS, 2017, pp. 4765–4774
work page 2017
-
[10]
M. J. Hasan, M. Sohaib, J.-M. Kim, An explainable ai-based fault diagnosis model for bearings, Sensors 21 (12) (2021)
work page 2021
- [11]
-
[12]
A. Peresan, S. Gentili, Seismic clusters analysis in northeastern italy by the nearest-neighbor approach, Physics of the Earth and Planetary Interiors 274 (2018) 87–104
work page 2018
-
[13]
S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. M¨uller, W. Samek, On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation, PLOS ONE 10 (7) (2015) e0130140
work page 2015
-
[14]
J. R. Kau ffmann, M. Esders, L. Ru ff, G. Montavon, W. Samek, K.-R. M ¨uller, From clustering to cluster explanations via neural networks, IEEE Trans. Neural Networks Learn. Syst. 35 (2) (2024) 1926–1940
work page 2024
- [15]
-
[16]
R. Caruana, Y . Lou, J. Gehrke, P. Koch, M. Sturm, N. Elhadad, Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission, in: SIGKDD, ACM, 2015, pp. 1721–1730
work page 2015
-
[17]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
work page 2016
-
[18]
W. Brendel, M. Bethge, Approximating cnns with bag-of-local-features models works surprisingly well on imagenet, in: ICLR (Poster), OpenReview.net, 2019
work page 2019
-
[19]
M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep networks, in: ICML, V ol. 70 of Proceedings of Machine Learning Research, PMLR, 2017, pp. 3319–3328
work page 2017
-
[20]
M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: Computer Vision – ECCV 2014, Springer International Publishing, 2014, p. 818–833
work page 2014
-
[21]
L. S. Shapley, A value for n-person games, in: H. W. Kuhn, A. W. Tucker (Eds.), Contributions to the Theory of Games II, Princeton University Press, Princeton, 1953, pp. 307–317
work page 1953
- [22]
-
[23]
C. Feldmann, J. Bajorath, Calculation of exact shapley values for support vector machines with tanimoto kernel enables model interpretation, iScience 25 (9) (2022) 105023
work page 2022
-
[24]
A. Ali, T. Schnake, O. Eberle, G. Montavon, K.-R. M ¨uller, L. Wolf, XAI for transformers: Better explanations through conservative propagation, in: ICML, V ol. 162 of Proceedings of Machine Learning Research, PMLR, 2022, pp. 435–451
work page 2022
-
[25]
F. R. Jafari, G. Montavon, K.-R. M ¨uller, O. Eberle, Mambalrp: Explaining selective state space sequence models, in: NeurIPS, 2024
work page 2024
-
[26]
J. R. Kau ffmann, K.-R. M ¨uller, G. Montavon, Towards explaining anomalies: A deep Taylor decomposition of one-class models, Pattern Recognit. 101 (2020) 107198
work page 2020
-
[27]
A. Mastropietro, C. Feldmann, J. Bajorath, Calculation of exact shapley values for explaining support vector machine models using the radial basis function kernel, Scientific Reports 13 (1) (Nov. 2023)
work page 2023
-
[28]
S. M. Lundberg, G. G. Erion, H. Chen, A. J. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, S. Lee, From local explanations to global understanding with explainable AI for trees, Nat. Mach. Intell. 2 (1) (2020) 56–67
work page 2020
-
[29]
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual explanations from deep networks via gradient-based localization, Int. J. Comput. Vis. 128 (2) (2020) 336–359
work page 2020
-
[30]
R. Achtibat, M. Dreyer, I. Eisenbraun, S. Bosse, T. Wiegand, W. Samek, S. Lapuschkin, From attribution maps to human- understandable explanations through concept relevance propagation, Nat. Mac. Intell. 5 (9) (2023) 1006–1019
work page 2023
- [31]
-
[32]
K.-R. M ¨uller, S. Mika, G. R ¨atsch, K. Tsuda, B. Sch ¨olkopf, An introduction to kernel-based learning algorithms, IEEE 15 transactions on neural networks 12 (2001) 181–201
work page 2001
-
[33]
B. Sch ¨olkopf, C. J. Burges, A. J. Smola, Advances in kernel methods: support vector learning, MIT press, 1999
work page 1999
-
[34]
J. E. Moody, C. J. Darken, Fast learning in networks of locally-tuned processing units, Neural Comput. 1 (2) (1989) 281–294
work page 1989
- [35]
-
[36]
G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. M ¨uller, Explaining nonlinear classification decisions with deep Taylor decomposition, Pattern Recognit. 65 (2017) 211–222
work page 2017
- [37]
-
[38]
T. N. Nguyen, S. Nakanowatari, T. P. Nhat Tran, A. Thakur, L. Takahashi, K. Takahashi, T. Taniike, Learning catalyst design based on bias-free data set for oxidative coupling of methane, ACS Catalysis 11 (3) (2021) 1797–1809
work page 2021
-
[39]
K. Fukumizu, A. Gretton, G. R. G. Lanckriet, B. Sch ¨olkopf, B. K. Sriperumbudur, Kernel choice and classifiability for RKHS embeddings of probability distributions, in: NIPS, Curran Associates, Inc., 2009, pp. 1750–1758
work page 2009
-
[40]
A. Shrikumar, P. Greenside, A. Kundaje, Learning important features through propagating activation di fferences, in: ICML, V ol. 70 of Proceedings of Machine Learning Research, PMLR, 2017, pp. 3145–3153
work page 2017
- [41]
-
[42]
D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, K.-R. M ¨uller, How to explain individual classification decisions, The Journal of Machine Learning Research 11 (2010) 1803–1831
work page 2010
- [43]
-
[44]
N. Ben Abu, D. Harries, H. V oet, M. Y . Niv, The taste of kcl – what a difference a sugar makes, Food Chemistry 255 (2018) 165–173
work page 2018
-
[45]
D. Shen, H. Song, T. Zou, A. Raza, P. Li, K. Li, J. Xiong, Reduction of sodium chloride: a review, Journal of the Science of Food and Agriculture 102 (10) (2022) 3931–3939
work page 2022
-
[46]
M. C. Zamora, M. C. Goldner, M. V . Galmarini, Sourness–sweetness interactions in different media: white wine, ethanol and water, Journal of Sensory Studies 21 (6) (2006) 601–611
work page 2006
-
[47]
Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, V . Pande, Moleculenet: a benchmark for molecular machine learning, Chemical Science 9 (2) (2018) 513–530
work page 2018
-
[48]
M. Rupp, A. Tkatchenko, K.-R. M ¨uller, O. A. von Lilienfeld, Fast and accurate modeling of molecular atomization energies with machine learning, Phys. Rev. Lett. 108 (2012) 058301
work page 2012
-
[49]
K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. A. von Lilienfeld, K.-R. M ¨uller, A. Tkatchenko, Machine learn- ing predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space, The Journal of Physical Chemistry Letters 6 (12) (2015) 2326–2331
work page 2015
-
[50]
T. Schnake, O. Eberle, J. Lederer, S. Nakajima, K. T. Sch¨utt, K.-R. M¨uller, G. Montavon, Higher-order explanations of graph neural networks via relevant walks, IEEE Trans. Pattern Anal. Mach. Intell. 44 (11) (2022) 7581–7596. 16 Fast and Accurate Explanations of Distance-Based Classifiers by Uncovering Latent Explanatory Structures (SupplementaryNotes) ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.