pith. sign in

arxiv: 2508.03913 · v1 · submitted 2025-08-05 · 💻 cs.LG · cs.AI· stat.ML

Fast and Accurate Explanations of Distance-Based Classifiers by Uncovering Latent Explanatory Structures

Pith reviewed 2026-05-19 00:02 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML
keywords distance-based classifiersexplainable AIlayer-wise relevance propagationk-nearest neighborssupport vector machineslatent structuremodel explanation
0
0 comments X

The pith

Distance-based classifiers contain a hidden neural network of linear detectors and nonlinear pooling layers that lets standard XAI methods produce fast, accurate explanations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Distance-based models such as k-nearest neighbors and support vector machines are widely used yet remain difficult to interpret. The paper shows these models can be rewritten exactly as a sequence of linear detection units followed by nonlinear pooling layers. This latent structure makes layer-wise relevance propagation and similar techniques directly applicable. The resulting explanations are faster to compute and more accurate than those obtained from common baselines. The authors illustrate the practical value through quantitative tests and two real-world use cases in science and industry.

Core claim

Any distance-based classifier can be exactly decomposed into linear detection units combined with nonlinear pooling layers without changing the model's output or introducing extra parameters. Once expressed in this form, Explainable AI methods such as layer-wise relevance propagation become applicable and deliver explanations that are both computationally efficient and faithful to the original decision rule.

What carries the argument

The latent neural network structure consisting of linear detection units followed by nonlinear pooling layers that exactly reproduces the distance-based decision rule.

If this is right

  • Explanations for k-nearest neighbors and SVMs can be obtained in a single forward-backward pass rather than repeated model evaluations.
  • The same decomposition applies to any distance-based model, extending the range of classifiers for which layer-wise relevance propagation works without modification.
  • Quantitative comparisons show higher fidelity and lower runtime than perturbation-based and gradient-based baselines.
  • The approach supports two concrete use cases where explanations help domain experts trust or debug predictions in scientific and industrial settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decomposition could let other neural-network analysis tools, such as network pruning or adversarial example generation, transfer directly to distance-based models.
  • If the linear detection units correspond to human-interpretable features, the method might also improve the interpretability of the original distance metric itself.
  • Hybrid models that combine learned distance functions with explicit pooling layers might inherit both the accuracy of distance-based classifiers and the explanation machinery developed for neural networks.

Load-bearing premise

The decision rule of any distance-based classifier can be exactly decomposed into linear detection units followed by nonlinear pooling without altering the model's output or requiring additional fitting parameters.

What would settle it

A distance-based classifier whose predictions cannot be reproduced exactly by any arrangement of linear detection units and nonlinear pooling layers, or whose LRP explanations on this structure fail to match the true feature contributions.

Figures

Figures reproduced from arXiv: 2508.03913 by Florian Bley, Gr\'egoire Montavon, Jacob Kauffmann, Klaus-Robert M\"uller, Simon Le\'on Krug.

Figure 1
Figure 1. Figure 1: Proposed neural network reformulation of the SVM to enhance its explainability. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Ablation study in which the neuralization and propagation steps of our LRP-SVM method are introduced one after [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Scatter plots assessing the relevance of chemical properties to high wine quality as inferred by our LRP-SVM method. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: LRP explanations of KRR dipole moment predictions. The first row shows explanations using the Bag of Bonds repre [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 1
Figure 1. Figure 1: A test sample x is classified to belong to the blue class because its qth neighbor (q = 3) is closer than the qth neighbor from the red class. The two sets N and P are constructed by adding the κ = 1 closer and further neighbors to the sets, as well as the qth neighbor itself. We now simplify as follows: rmin i∈C+ q {∥x − ui∥ 2 } < rmin j∈C− q {∥x − uj∥ 2 } (27) ⇔ 0 < rmin j∈C− q {∥x − ui∥ 2 } − rmin i∈C+ … view at source ↗
Figure 2
Figure 2. Figure 2: LRP hyperparameter analysis for SVM and KNN. Left panels: schematic of relevance propagation through the network, [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗
read the original abstract

Distance-based classifiers, such as k-nearest neighbors and support vector machines, continue to be a workhorse of machine learning, widely used in science and industry. In practice, to derive insights from these models, it is also important to ensure that their predictions are explainable. While the field of Explainable AI has supplied methods that are in principle applicable to any model, it has also emphasized the usefulness of latent structures (e.g. the sequence of layers in a neural network) to produce explanations. In this paper, we contribute by uncovering a hidden neural network structure in distance-based classifiers (consisting of linear detection units combined with nonlinear pooling layers) upon which Explainable AI techniques such as layer-wise relevance propagation (LRP) become applicable. Through quantitative evaluations, we demonstrate the advantage of our novel explanation approach over several baselines. We also show the overall usefulness of explaining distance-based models through two practical use cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that distance-based classifiers such as kNN and SVM possess a latent neural-network structure consisting of linear detection units (computing distances to fixed prototypes or support vectors) followed by nonlinear pooling layers. By exposing this structure, standard XAI techniques like layer-wise relevance propagation become directly applicable, producing fast and faithful explanations that outperform several baselines in quantitative evaluations; the authors further illustrate utility via two practical use cases.

Significance. If the decomposition is exact, parameter-free, and preserves the original decision rule, the work would usefully extend LRP-style explanations to a class of models still common in scientific and industrial applications, providing an algebraic bridge between distance-based and neural-network interpretability methods.

major comments (2)
  1. [§3.2] §3.2 (kNN case): the decomposition into a static set of linear detection units plus a fixed nonlinear pooling layer cannot be exact for standard kNN, because neighbor selection is query-dependent; representing the rule with all training points as fixed units either changes the decision boundary or requires a query-dependent graph that standard LRP does not accommodate without additional approximation or parameters.
  2. [§4.1] §4.1 and Table 2: the reported accuracy and runtime advantages over baselines are presented without ablation on the choice of pooling function or verification that the LRP explanations remain faithful to the original distance-based decision rule when the decomposition is applied to non-parametric models.
minor comments (2)
  1. Notation for the linear detection units should be aligned more explicitly with the original classifier parameters (e.g., prototypes in kNN or support vectors in SVM) to avoid ambiguity.
  2. The use-case demonstrations would benefit from explicit dataset sizes, class balances, and quantitative metrics rather than qualitative description alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, offering clarifications and indicating planned revisions where appropriate to strengthen the presentation.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (kNN case): the decomposition into a static set of linear detection units plus a fixed nonlinear pooling layer cannot be exact for standard kNN, because neighbor selection is query-dependent; representing the rule with all training points as fixed units either changes the decision boundary or requires a query-dependent graph that standard LRP does not accommodate without additional approximation or parameters.

    Authors: We respectfully disagree that the decomposition cannot be exact. The kNN decision rule can be precisely expressed as a fixed neural network structure: the detection layer computes distances to all training points as fixed prototypes (linear units), and the pooling layer applies a nonlinear function that determines the k nearest neighbors based on these distances and aggregates their labels via majority vote. Although which specific neighbors are selected depends on the input values, the network architecture, including all connections and the pooling function, is static and query-independent. This allows direct application of LRP by propagating relevance through the fixed layers, with appropriate handling of the nonlinear pooling (as described in our method). We will revise the manuscript in §3.2 to better explain this equivalence and clarify how the query-dependent aspect of neighbor identity does not affect the fixed nature of the model graph. revision: no

  2. Referee: [§4.1] §4.1 and Table 2: the reported accuracy and runtime advantages over baselines are presented without ablation on the choice of pooling function or verification that the LRP explanations remain faithful to the original distance-based decision rule when the decomposition is applied to non-parametric models.

    Authors: We agree that additional analyses would enhance the robustness of our claims. In the revised manuscript, we will include an ablation study on the choice of pooling function, comparing different nonlinear pooling variants (such as exact kNN pooling versus differentiable approximations) and their impact on explanation quality and runtime. Furthermore, we will add verification of faithfulness for non-parametric models like kNN by evaluating how well the LRP explanations align with the original model's decisions, for instance through quantitative metrics like explanation faithfulness scores or by checking consistency with perturbation-based evaluations. These additions will be incorporated into §4.1 and updated in Table 2. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is algebraic rewriting of classifier definitions

full rationale

The paper frames the latent structure (linear detection units + nonlinear pooling) as a direct algebraic decomposition of distance-based classifiers such as kNN and SVM that preserves the original decision rule without introducing fitted parameters or query-dependent restructuring inside the explanation step. No equations reduce a claimed prediction or LRP output to quantities defined by the explanation procedure itself. Self-citations, if present, are not load-bearing for the core rewriting claim, which rests on the models' explicit distance computations rather than prior author results. The approach therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of an exact latent network decomposition for arbitrary distance-based classifiers. No free parameters are introduced in the abstract. The key axiom is the structural equivalence itself.

axioms (1)
  • domain assumption Any distance-based classifier decision rule admits an exact rewriting as a composition of linear detection units and nonlinear pooling layers.
    This equivalence is the load-bearing premise that allows LRP to be applied directly; it is invoked in the abstract when the authors state that the hidden structure makes XAI techniques applicable.

pith-pipeline@v0.9.0 · 5712 in / 1271 out tokens · 41116 ms · 2026-05-19T00:02:28.264022+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

  1. [1]

    Chmiela, A

    S. Chmiela, A. Tkatchenko, H. E. Sauceda, I. Poltavsky, K. T. Sch ¨utt, K.-R. M ¨uller, Machine learning of accurate energy- conserving molecular force fields, Science Advances 3 (5) (2017) e1603015

  2. [2]

    Semnani, M

    P. Semnani, M. Bogojeski, F. Bley, Z. Zhang, Q. Wu, T. Kneib, J. Herrmann, C. Weisser, F. Patcas, K.-R. M¨uller, A machine learning and explainable ai framework tailored for unbalanced experimental catalyst discovery, The Journal of Physical Chemistry C 128 (50) (2024) 21349–21367. 14

  3. [3]

    Borisov, T

    V . Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, G. Kasneci, Deep neural networks and tabular data: A survey, IEEE Transactions on Neural Networks and Learning Systems 35 (6) (2024) 7499–7519

  4. [4]

    Chang, C.-J

    C.-C. Chang, C.-J. Lin, Libsvm: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2 (3) (2011) 1–27

  5. [5]

    C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Discov. 2 (2) (1998) 121– 167

  6. [6]

    Binder, M

    A. Binder, M. Bockmayr, M. H ¨agele, S. Wienert, D. Heim, K. Hellweg, M. Ishii, A. Stenzinger, A. Hocke, C. Denkert, K.-R. M¨uller, F. Klauschen, Morphological and molecular breast cancer profiling through explainable machine learning, Nat. Mach. Intell. 3 (4) (2021) 355–366

  7. [7]

    Zednik, H

    C. Zednik, H. Boelsen, Scientific exploration and explainable artificial intelligence, Minds Mach. 32 (1) (2022) 219–239

  8. [8]

    Brusa, L

    E. Brusa, L. Cibrario, C. Delprete, L. G. Di Maggio, Explainable ai for machine fault diagnosis: Understanding features’ contribution in machine learning models for industrial condition monitoring, Applied Sciences 13 (4) (2023)

  9. [9]

    S. M. Lundberg, S. Lee, A unified approach to interpreting model predictions, in: NIPS, 2017, pp. 4765–4774

  10. [10]

    M. J. Hasan, M. Sohaib, J.-M. Kim, An explainable ai-based fault diagnosis model for bearings, Sensors 21 (12) (2021)

  11. [11]

    Khanal, S

    M. Khanal, S. R. Khadka, H. Subedi, I. P. Chaulagain, L. N. Regmi, M. Bhandari, Explaining the factors a ffecting customer satisfaction at the fintech firm f1 soft by using pca and xai, FinTech 2 (1) (2023) 70–84

  12. [12]

    Peresan, S

    A. Peresan, S. Gentili, Seismic clusters analysis in northeastern italy by the nearest-neighbor approach, Physics of the Earth and Planetary Interiors 274 (2018) 87–104

  13. [13]

    S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. M¨uller, W. Samek, On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation, PLOS ONE 10 (7) (2015) e0130140

  14. [14]

    J. R. Kau ffmann, M. Esders, L. Ru ff, G. Montavon, W. Samek, K.-R. M ¨uller, From clustering to cluster explanations via neural networks, IEEE Trans. Neural Networks Learn. Syst. 35 (2) (2024) 1926–1940

  15. [15]

    Hastie, R

    T. Hastie, R. Tibshirani, Generalized Additive Models, V ol. 43 of Monographs on Statistics and Applied Probability, Chapman and Hall, London, 1990

  16. [16]

    Caruana, Y

    R. Caruana, Y . Lou, J. Gehrke, P. Koch, M. Sturm, N. Elhadad, Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission, in: SIGKDD, ACM, 2015, pp. 1721–1730

  17. [17]

    B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

  18. [18]

    Brendel, M

    W. Brendel, M. Bethge, Approximating cnns with bag-of-local-features models works surprisingly well on imagenet, in: ICLR (Poster), OpenReview.net, 2019

  19. [19]

    Sundararajan, A

    M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep networks, in: ICML, V ol. 70 of Proceedings of Machine Learning Research, PMLR, 2017, pp. 3319–3328

  20. [20]

    M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: Computer Vision – ECCV 2014, Springer International Publishing, 2014, p. 818–833

  21. [21]

    L. S. Shapley, A value for n-person games, in: H. W. Kuhn, A. W. Tucker (Eds.), Contributions to the Theory of Games II, Princeton University Press, Princeton, 1953, pp. 307–317

  22. [22]

    Samek, G

    W. Samek, G. Montavon, S. Lapuschkin, C. J. Anders, K.-R. M ¨uller, Explaining deep neural networks and beyond: A review of methods and applications, Proceedings of the IEEE 109 (3) (2021) 247–278

  23. [23]

    Feldmann, J

    C. Feldmann, J. Bajorath, Calculation of exact shapley values for support vector machines with tanimoto kernel enables model interpretation, iScience 25 (9) (2022) 105023

  24. [24]

    A. Ali, T. Schnake, O. Eberle, G. Montavon, K.-R. M ¨uller, L. Wolf, XAI for transformers: Better explanations through conservative propagation, in: ICML, V ol. 162 of Proceedings of Machine Learning Research, PMLR, 2022, pp. 435–451

  25. [25]

    F. R. Jafari, G. Montavon, K.-R. M ¨uller, O. Eberle, Mambalrp: Explaining selective state space sequence models, in: NeurIPS, 2024

  26. [26]

    J. R. Kau ffmann, K.-R. M ¨uller, G. Montavon, Towards explaining anomalies: A deep Taylor decomposition of one-class models, Pattern Recognit. 101 (2020) 107198

  27. [27]

    Mastropietro, C

    A. Mastropietro, C. Feldmann, J. Bajorath, Calculation of exact shapley values for explaining support vector machine models using the radial basis function kernel, Scientific Reports 13 (1) (Nov. 2023)

  28. [28]

    S. M. Lundberg, G. G. Erion, H. Chen, A. J. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, S. Lee, From local explanations to global understanding with explainable AI for trees, Nat. Mach. Intell. 2 (1) (2020) 56–67

  29. [29]

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual explanations from deep networks via gradient-based localization, Int. J. Comput. Vis. 128 (2) (2020) 336–359

  30. [30]

    Achtibat, M

    R. Achtibat, M. Dreyer, I. Eisenbraun, S. Bosse, T. Wiegand, W. Samek, S. Lapuschkin, From attribution maps to human- understandable explanations through concept relevance propagation, Nat. Mac. Intell. 5 (9) (2023) 1006–1019

  31. [31]

    Cortes, V

    C. Cortes, V . Vapnik, Support-vector networks, Machine Learning 20 (3) (1995) 273–297

  32. [32]

    M ¨uller, S

    K.-R. M ¨uller, S. Mika, G. R ¨atsch, K. Tsuda, B. Sch ¨olkopf, An introduction to kernel-based learning algorithms, IEEE 15 transactions on neural networks 12 (2001) 181–201

  33. [33]

    Sch ¨olkopf, C

    B. Sch ¨olkopf, C. J. Burges, A. J. Smola, Advances in kernel methods: support vector learning, MIT press, 1999

  34. [34]

    J. E. Moody, C. J. Darken, Fast learning in networks of locally-tuned processing units, Neural Comput. 1 (2) (1989) 281–294

  35. [35]

    Rahimi, B

    A. Rahimi, B. Recht, Random features for large-scale kernel machines, in: NIPS, Curran Associates, Inc., 2007, pp. 1177– 1184

  36. [36]

    Montavon, S

    G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. M ¨uller, Explaining nonlinear classification decisions with deep Taylor decomposition, Pattern Recognit. 65 (2017) 211–222

  37. [37]

    Samek, A

    W. Samek, A. Binder, G. Montavon, S. Lapuschkin, K.-R. M¨uller, Evaluating the visualization of what a deep neural network has learned, IEEE Trans. Neural Networks Learn. Syst. 28 (11) (2017) 2660–2673

  38. [38]

    T. N. Nguyen, S. Nakanowatari, T. P. Nhat Tran, A. Thakur, L. Takahashi, K. Takahashi, T. Taniike, Learning catalyst design based on bias-free data set for oxidative coupling of methane, ACS Catalysis 11 (3) (2021) 1797–1809

  39. [39]

    Fukumizu, A

    K. Fukumizu, A. Gretton, G. R. G. Lanckriet, B. Sch ¨olkopf, B. K. Sriperumbudur, Kernel choice and classifiability for RKHS embeddings of probability distributions, in: NIPS, Curran Associates, Inc., 2009, pp. 1750–1758

  40. [40]

    Shrikumar, P

    A. Shrikumar, P. Greenside, A. Kundaje, Learning important features through propagating activation di fferences, in: ICML, V ol. 70 of Proceedings of Machine Learning Research, PMLR, 2017, pp. 3145–3153

  41. [41]

    Zurada, A

    J. Zurada, A. Malinowski, I. Cloete, Sensitivity analysis for minimization of input data dimension for feedforward neural network, in: Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS ’94, V ol. 6, 1994, pp. 447–450

  42. [42]

    Baehrens, T

    D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, K.-R. M ¨uller, How to explain individual classification decisions, The Journal of Machine Learning Research 11 (2010) 1803–1831

  43. [43]

    Cortez, A

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos, J. Reis, Modeling wine preferences by data mining from physicochemical properties, Decision Support Systems 47 (4) (2009) 547–553, smart Business Networks: Concepts and Empirical Evidence

  44. [44]

    Ben Abu, D

    N. Ben Abu, D. Harries, H. V oet, M. Y . Niv, The taste of kcl – what a difference a sugar makes, Food Chemistry 255 (2018) 165–173

  45. [45]

    D. Shen, H. Song, T. Zou, A. Raza, P. Li, K. Li, J. Xiong, Reduction of sodium chloride: a review, Journal of the Science of Food and Agriculture 102 (10) (2022) 3931–3939

  46. [46]

    M. C. Zamora, M. C. Goldner, M. V . Galmarini, Sourness–sweetness interactions in different media: white wine, ethanol and water, Journal of Sensory Studies 21 (6) (2006) 601–611

  47. [47]

    Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, V . Pande, Moleculenet: a benchmark for molecular machine learning, Chemical Science 9 (2) (2018) 513–530

  48. [48]

    M. Rupp, A. Tkatchenko, K.-R. M ¨uller, O. A. von Lilienfeld, Fast and accurate modeling of molecular atomization energies with machine learning, Phys. Rev. Lett. 108 (2012) 058301

  49. [49]

    Hansen, F

    K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. A. von Lilienfeld, K.-R. M ¨uller, A. Tkatchenko, Machine learn- ing predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space, The Journal of Physical Chemistry Letters 6 (12) (2015) 2326–2331

  50. [50]

    band of indices

    T. Schnake, O. Eberle, J. Lederer, S. Nakajima, K. T. Sch¨utt, K.-R. M¨uller, G. Montavon, Higher-order explanations of graph neural networks via relevant walks, IEEE Trans. Pattern Anal. Mach. Intell. 44 (11) (2022) 7581–7596. 16 Fast and Accurate Explanations of Distance-Based Classifiers by Uncovering Latent Explanatory Structures (SupplementaryNotes) ...