pith. sign in

arxiv: 2605.21646 · v1 · pith:ZOFGR7XInew · submitted 2026-05-20 · 💻 cs.LG

Alike Parts: A Feature-Informed Approach to Local and Global Prototype Explanations

Pith reviewed 2026-05-22 09:06 UTC · model grok-4.3

classification 💻 cs.LG
keywords prototype explanationsfeature importancelocal explanationsglobal explanationsblack-box classifierssurrogate modelsinterpretabilityfeature diversity
0
0 comments X

The pith

Integrating feature importance into prototype explanations adds local and global granularity without reducing surrogate fidelity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that folds feature importance scores into prototype explanations for black-box classifiers to supply more detail than whole-instance comparisons allow. Locally, alike parts spotlight the most relevant shared feature subsets between an instance and its nearest prototype. Globally, the prototype selection objective gains a feature-importance term that favors diversity in the attributions of the chosen prototypes. Experiments on six benchmark datasets show the changes preserve or raise the surrogate model's prediction fidelity. This matters to users who want explanations that point to concrete features driving similarity rather than relying on entire example matches.

Core claim

The authors propose alike parts as a local explanation technique that leverages feature importance to highlight key shared feature subsets between a classified instance and its nearest prototype. They also augment the global prototype selection objective with a term based on feature importance to encourage diversity in the feature attributions of the prototypes. Experiments demonstrate that this augmented selection maintains or increases the prediction fidelity of the surrogate model on six benchmark datasets, suggesting that feature diversity does not compromise model fidelity.

What carries the argument

Alike parts, a local method that uses feature importance scores to highlight the most relevant shared feature subsets between an instance and its prototype, together with an augmented global prototype selection objective that adds a feature importance term to promote diversity in attributions.

If this is right

  • Local explanations now guide attention to specific important features shared with the prototype rather than the whole instance.
  • Global prototype sets can cover a broader range of feature-based reasons for model decisions.
  • Surrogate model fidelity stays the same or rises even after the diversity-promoting change.
  • Feature diversity among prototypes does not trade off against explanation reliability.
  • Users obtain more granular, feature-level insight into why an instance matches a prototype.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same importance-driven selection logic could be added to other example-based explanation methods to increase their feature specificity.
  • Domains that need justifications at the feature level, such as medical or financial decisions, may find the local alike-parts highlighting especially practical.
  • Testing the approach on models with strong feature interactions would show whether the diversity term remains beneficial when features are not independent.

Load-bearing premise

Feature importance scores can be integrated into local subset highlighting and the global prototype selection objective without introducing post-hoc biases or requiring dataset-specific tuning that affects the reported fidelity gains.

What would settle it

Re-running the augmented prototype selection on a new benchmark dataset and finding a statistically significant drop in surrogate prediction fidelity relative to the standard selection would falsify the claim that fidelity is maintained or increased.

Figures

Figures reproduced from arXiv: 2605.21646 by Jacek Karolczak, Jerzy Stefanowski.

Figure 1
Figure 1. Figure 1: Comparison of prototypes (x-axis: prototype index) and important features (y-axis: feature index). The top row displays prototypes generated using the original raw algorithm, while the bottom row incorporates the augmented target function with feature importance (FI). The size of the inner circle represents the magnitude of the feature importance, and a gray highlight denotes features identified as importa… view at source ↗
Figure 2
Figure 2. Figure 2: Fidelity comparison of prototypes found with augmented objective across various configurations. The boxplots illustrate the distribution of fidelity across datasets. Subfigure (a) compares impact of similarity metrics for SM-A and Tree SHAP. Subfigure (b) compares the effect of ignoring the direction of feature importance (applying absolute value) for APete and Tree Interpreter. hyperparameter settings wer… view at source ↗
Figure 3
Figure 3. Figure 3: The comparison of the frequency of feature highlight￾ing between the original (raw) and Feature Importance (FI)-informed strategies (with two beta values) for Ap￾ple Quality dataset. The prototypes were found using the A-PETE algorithm and Tree SHAP, with the mean threshold as the masking strategy. the test sets [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Statistics for the number of features identified as alike parts across all datasets. The results were compiled from all algorithm and masking combinations, using the specific hyperparameters that maximized test set fidelity (as reported in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

Prototype-based explanations offer an intuitive, example-based approach to support the interpretability of machine learning black box classifiers but often lack feature-level granularity. We introduce a framework that integrates feature importance at two levels to address this gap. First, for local explanations, we propose \textit{alike parts}: a method that uses feature importance scores to highlight the most relevant, shared feature subsets between a classified instance and its nearest prototype, guiding user attention. Second, we augment the global prototype selection objective function with a feature importance term to actively promote diversity in the feature attributions of the selected prototypes. Experiments on six benchmark datasets show that this augmented selection process maintains or, in some cases, increases the prediction fidelity of the surrogate model, suggesting that feature diversity does not compromise model fidelity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a framework called 'Alike Parts' for enhancing prototype-based explanations of black-box classifiers. It integrates feature importance scores in two ways: (1) locally, by highlighting the most relevant shared feature subsets ('alike parts') between a query instance and its nearest prototype; (2) globally, by augmenting the prototype selection objective with an additional feature-importance term that encourages diversity in the attributions of the selected prototypes. Experiments across six benchmark datasets are reported to show that the augmented selection maintains or increases surrogate-model prediction fidelity relative to the unaugmented baseline, leading to the suggestion that feature diversity need not compromise fidelity.

Significance. If the empirical results are robust, the work addresses a genuine gap in prototype explanations by adding feature-level granularity to both local highlighting and global selection. The multi-dataset evaluation is a positive feature. However, the central claim that 'feature diversity does not compromise model fidelity' rests on the behavior of the augmented objective; without clear evidence that the balancing weight is held fixed or chosen independently of the reported outcome, the result remains conditional rather than general.

major comments (2)
  1. [Global prototype selection objective] Global prototype selection objective (likely §3.2 or Eq. (3)–(5)): the manuscript must explicitly state the value or selection procedure for the hyperparameter that balances the original fidelity term against the new feature-importance diversity term. If this weight is tuned independently per dataset (or via validation that favors the reported fidelity), the observed maintenance of fidelity is consistent with an artifact of per-dataset optimization rather than evidence that diversity is harmless in general. An ablation across a fixed schedule of weights on all six datasets is required to support the claim.
  2. [Experimental results] Experimental results (likely §4 and Table 2): the claim that fidelity is 'maintained or, in some cases, increased' is load-bearing for the paper’s central suggestion. The current description supplies no information on the precise fidelity metric, the baseline prototype selector, statistical tests, or variance across runs. Without these controls, it is impossible to judge whether the reported gains are reliable or whether they depend on the same weighting choice raised above.
minor comments (2)
  1. [Local explanation method] Clarify whether the same feature-importance scores are used without modification for both the local 'alike parts' highlighting and the global objective, or whether any post-processing is applied.
  2. [Related work] Add a short paragraph contrasting the approach with prior prototype methods (e.g., ProtoDash, MMD-critic) that already incorporate some form of diversity or importance weighting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below and indicate the changes we will incorporate in the revised manuscript.

read point-by-point responses
  1. Referee: [Global prototype selection objective] Global prototype selection objective (likely §3.2 or Eq. (3)–(5)): the manuscript must explicitly state the value or selection procedure for the hyperparameter that balances the original fidelity term against the new feature-importance diversity term. If this weight is tuned independently per dataset (or via validation that favors the reported fidelity), the observed maintenance of fidelity is consistent with an artifact of per-dataset optimization rather than evidence that diversity is harmless in general. An ablation across a fixed schedule of weights on all six datasets is required to support the claim.

    Authors: We agree that the balancing hyperparameter requires explicit documentation. In the submitted manuscript the weight λ was fixed at 0.5 for every dataset; this value was chosen via a small preliminary grid search on the first dataset only and then held constant for all subsequent experiments. We will add a clear statement of this procedure to Section 3.2. In addition, we will perform the requested ablation by re-running the global selection with a fixed schedule of weights (λ ∈ {0.0, 0.25, 0.5, 0.75, 1.0}) on all six benchmarks and will report the resulting fidelity curves in a new appendix table. These additions directly address the concern that the reported fidelity maintenance might be an artifact of per-dataset tuning. revision: yes

  2. Referee: [Experimental results] Experimental results (likely §4 and Table 2): the claim that fidelity is 'maintained or, in some cases, increased' is load-bearing for the paper’s central suggestion. The current description supplies no information on the precise fidelity metric, the baseline prototype selector, statistical tests, or variance across runs. Without these controls, it is impossible to judge whether the reported gains are reliable or whether they depend on the same weighting choice raised above.

    Authors: We accept that the experimental reporting is currently underspecified. The fidelity metric is the test-set accuracy of a surrogate decision-tree model in reproducing the black-box classifier’s predictions. The baseline is the original prototype-selection objective without the feature-diversity term. We will revise Section 4 and Table 2 to state these definitions explicitly, to report mean fidelity ± one standard deviation over ten independent runs (different random seeds for prototype initialization and data splits), and to include Wilcoxon signed-rank tests comparing the augmented and baseline selectors. These controls will allow readers to evaluate both reliability and dependence on the weighting choice. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on independent experiments

full rationale

The paper introduces a framework for integrating feature importance into prototype-based explanations via 'alike parts' local highlighting and an augmented global selection objective. Its central claim—that the augmented process maintains or increases surrogate fidelity across six benchmarks—is presented as an empirical outcome rather than a derivation. No equations, fitted parameters, or self-citations reduce the reported results to quantities defined by the method itself; the experiments serve as external validation against benchmark datasets. The derivation chain is therefore self-contained and does not exhibit self-definitional, fitted-input, or self-citation load-bearing circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the framework implicitly assumes reliable feature importance scores and a well-behaved surrogate fidelity metric.

pith-pipeline@v0.9.0 · 5655 in / 1134 out tokens · 56470 ms · 2026-05-22T09:06:41.074154+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    Progress in Polish Artificial Intelligence Research , volume =

    Karolczak, Jacek and Stefanowski, Jerzy , title =. Progress in Polish Artificial Intelligence Research , volume =

  2. [2]

    , title =

    Tan, Sarah and Soloviev, Matvey and Hooker, Giles and Wells, Martin T. , title =. 2020 , isbn =. doi:10.1145/3412815.3416893 , booktitle =

  3. [3]

    Gomes, Ryan and Krause, Andreas , title =. Proc. of ICML 2010 , pages =. 2010 , isbn =

  4. [4]

    and Lee, Su-In , title =

    Lundberg, Scott M. and Lee, Su-In , title =. 2017 , booktitle =

  5. [5]

    2018 , booktitle =

    Li, Oscar and Liu, Hao and Chen, Chaofan and Rudin, Cynthia , title =. 2018 , booktitle =

  6. [6]

    2022 , subtitle =

    Interpretable Machine Learning , author =. 2022 , subtitle =

  7. [7]

    Menis Mastromichalakis, Orfeas and Filandrianos, Giorgos and Liartis, Jason and Dervakos, Edmund and Stamou, Giorgos , title =. Proc. of ACM CIKM 2024 , pages =. 2024 , doi =

  8. [8]

    2019 , booktitle =

    Chen, Chaofan and Li, Oscar and Tao, Chaofan and Barnett, Alina Jade and Su, Jonathan and Rudin, Cynthia , title =. 2019 , booktitle =

  9. [9]

    Prototypical Convolutional Neural Network for a Phrase-Based Explanation of Sentiment Classification

    Pluci \' n ski, Kamil and Lango, Mateusz and Stefanowski, Jerzy. Prototypical Convolutional Neural Network for a Phrase-Based Explanation of Sentiment Classification. Proc. of ECML PKDD 2021. 2021

  10. [10]

    Data Min

    Baniecki, Hubert and Parzych, Dariusz and Biecek, Przemyslaw , title =. Data Min. Knowl. Discov. , pages =. 2023 , volume =

  11. [11]

    Benchmarking and Survey of Explanation Methods for Black Box Models , journal =

    Bodria, Francesco and Giannotti, Fosca and Guidotti, Riccardo and Naretto, Francesca and Pedreschi, Dino and Rinzivillo, Salvatore , year =. Benchmarking and Survey of Explanation Methods for Black Box Models , journal =

  12. [12]

    Prototypes as Explanation for Time Series Anomaly Detection , author=

  13. [13]

    Examples are not enough, learn to criticize! Criticism for Interpretability , volume =

    Kim, Been and Khanna, Rajiv and Koyejo, Oluwasanmi O , booktitle =. Examples are not enough, learn to criticize! Criticism for Interpretability , volume =

  14. [14]

    2022 , author =

    K-nearest neighbors rule combining prototype selection and local feature weighting for classification , journal =. 2022 , author =

  15. [15]

    INSIGHT: Efficient and Effective Instance Selection for Time-Series Classification

    Buza, Krisztian and Nanopoulos, Alexandros and Schmidt-Thieme, Lars. INSIGHT: Efficient and Effective Instance Selection for Time-Series Classification. Advances in Knowledge Discovery and Data Mining. 2011

  16. [16]

    2000 , author =

    A new edited k-nearest neighbor rule in the pattern classification problem , journal =. 2000 , author =

  17. [17]

    and Villmann, Thomas and Hammer, Barbara and Schneider, Petra , title =

    Schleif, F.-M. and Villmann, Thomas and Hammer, Barbara and Schneider, Petra , title =. International Journal of Neural Systems , volume =. 2011 , doi =

  18. [18]

    WIREs Cognitive Science , volume =

    Biehl, Michael and Hammer, Barbara and Villmann, Thomas , title =. WIREs Cognitive Science , volume =. doi:10.1002/wcs.1378 , year =

  19. [19]

    Machine Learning , pages =

    Breiman, Leo , title =. Machine Learning , pages =. 2001 , issue_date =

  20. [20]

    Ribeiro, Marco Tulio and Singh, Sameer and Guestrin, Carlos , title =. Proc. of ACM SIGKDD 2016 , pages =. 2016 , doi =

  21. [21]

    Endocrine Reviews , year=

    Sex and Gender Differences in Risk, Pathophysiology and Complications of Type 2 Diabetes Mellitus , author=. Endocrine Reviews , year=

  22. [22]

    Advances in Neural Information Processing Systems , year =

    Xiao Li and Yu Wang and Sumanta Basu and Karl Kumbier and Bin Yu , title =. Advances in Neural Information Processing Systems , year =

  23. [23]

    2025 , eprint=

    Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance , author=. 2025 , eprint=

  24. [24]

    An Interpretable Prototype Parts-based Neural Network for Medical Tabular Data

    Karolczak, Jacek and Stefanowski, Jerzy. An Interpretable Prototype Parts-based Neural Network for Medical Tabular Data. Proc. of EXPLIMED at ECML PKDD 2025. 2025

  25. [25]

    Finding a

    Satopaa, Ville and Albrecht, Jeannie and Irwin, David and Raghavan, Barath , booktitle=". Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , year=

  26. [26]

    and Durresi, Arjan , title =

    Kaur, Davinder and Uslu, Suleyman and Rittichier, Kaley J. and Durresi, Arjan , title =. ACM Comput. Surv. , volume =. 2022 , issue_date =

  27. [27]

    A Review of Trustworthy and Explainable Artificial Intelligence (XAI) , year=

    Chamola, Vinay and Hassija, Vikas and Sulthana, A Razia and Ghosh, Debshishu and Dhingra, Divyansh and Sikdar, Biplab , journal=. A Review of Trustworthy and Explainable Artificial Intelligence (XAI) , year=

  28. [28]

    Alkhatib, Amr and Boström, Henrik and Vazirgiannis, Michalis , year =

  29. [29]

    2025 , volume =

    Bach, Jakob , title =. 2025 , volume =. doi:10.1145/3725358 , journal =

  30. [30]

    Jacek Karolczak and Jerzy Stefanowski , title =. Proc. of the xAI 2025 Late-breaking Work, Demos and Doctoral Consortium at xAI , series = "

  31. [31]

    Feature Selection for Knowledge Discovery and Data Mining , publisher =

    Liu, Huan and Motoda, Hiroshi , year =. Feature Selection for Knowledge Discovery and Data Mining , publisher =

  32. [32]

    and Tang, Jiliang and Liu, Huan , title =

    Li, Jundong and Cheng, Kewei and Wang, Suhang and Morstatter, Fred and Trevino, Robert P. and Tang, Jiliang and Liu, Huan , title =. 2017 , issue_date =. doi:10.1145/3136625 , journal =

  33. [33]

    2014 , issn =

    A survey of multiple classifier systems as hybrid systems , journal =. 2014 , issn =. doi:https://doi.org/10.1016/j.inffus.2013.04.006 , author =

  34. [34]

    2025 , issn =

    TSProto: Fusing deep feature extraction with interpretable glass-box surrogate model for explainable time-series classification , journal =. 2025 , issn =. doi:https://doi.org/10.1016/j.inffus.2025.103357 , author =

  35. [35]

    2024 , volume =

    Stepka, Ignacy and Lango, Mateusz and Stefanowski, Jerzy , title =. 2024 , volume =. doi:10.61822/amcs-2024-0009 , journal =