Beyond Points: Spherical Distributional Part Prototypes for Interpretable Classification

Carlos Santiago; Catarina Barata; Diogo Pereira Ara\'ujo; Duarte Le\~ao

arxiv: 2606.27582 · v2 · pith:R5NNPAUWnew · submitted 2026-06-25 · 💻 cs.CV

Beyond Points: Spherical Distributional Part Prototypes for Interpretable Classification

Duarte Le\~ao , Diogo Pereira Ara\'ujo , Catarina Barata , Carlos Santiago This is my paper

Pith reviewed 2026-07-01 06:20 UTC · model grok-4.3

classification 💻 cs.CV

keywords prototype-based interpretabilityvon Mises-Fisher distributionsentropic optimal transportpart prototypesdirectional embeddingsfine-grained classificationexplanation quality

0 comments

The pith

vMFProto replaces point prototypes with von Mises-Fisher distributions on the hypersphere to capture intra-class part variability and deliver more consistent, stable explanations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Prototype networks ground image classifications in a handful of learned part examples, yet fixed point prototypes become redundant once modern directional embeddings exhibit large within-class spread. The paper shows that representing each prototype as its own von Mises-Fisher distribution, whose concentration adapts to observed variability, combined with entropic optimal transport for assignment, removes that redundancy. A two-stage schedule first discovers prototypes via transport and then refines them with patch distillation and diversity regularization. On CUB-200-2011, Stanford Dogs, and Stanford Cars using frozen DINO backbones, the resulting explanations score highest on consistency, stability, and distinctiveness while accuracy remains competitive. Qualitative inspection confirms the prototypes stay localized and non-overlapping.

Core claim

vMFProto models each class as a mixture of von Mises-Fisher components on the hypersphere, lets every prototype learn its own concentration parameter to encode part-specific variability, and obtains structured patch-to-prototype assignments through entropic optimal transport; a two-stage training procedure first performs OT-driven discovery and then performs end-to-end refinement with patch-level distillation and distribution-aware diversity regularization, producing state-of-the-art explanation quality together with competitive accuracy on the three evaluated fine-grained datasets.

What carries the argument

Mixture of von Mises-Fisher distributions with per-prototype concentration parameters plus entropic optimal transport for structured assignments.

Load-bearing premise

Intra-class variability around each semantic part can be summarized by a single scalar concentration per von Mises-Fisher prototype without needing orientation parameters or higher-order statistics.

What would settle it

Re-train the identical architecture on the same three datasets after replacing the von Mises-Fisher components with isotropic or fixed-concentration alternatives and measure whether the reported gains in consistency, stability, and distinctiveness disappear.

Figures

Figures reproduced from arXiv: 2606.27582 by Carlos Santiago, Catarina Barata, Diogo Pereira Ara\'ujo, Duarte Le\~ao.

**Figure 1.** Figure 1: Why distributional prototypes? Top-4 prototypes activation maps on the same image. (a) Point-prototype SOTA model [26] and other methods often approximate intra-part variability via redundant prototypes or by conflating multiple parts within one prototype. (b) Our spherical distributional prototypes capture within-part variation without duplicating prototypes or entangling distinct parts, yielding more lo… view at source ↗

**Figure 2.** Figure 2: Overview of the vMFProto framework. (Top) An input image x is processed by a ViT backbone with a frozen encoder and a single trainable block gϕ. A label-free gating mechanism G(·) (derived from frozen attention and PCA) filters background patches. Foreground tokens are passed to the vMF Block, which computes class-conditional evidence \ell _c(x) ; applying a softmax over \ifmmode \lbrace \else \textbracele… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison on CUB-200-2011. Top-4 prototype activation maps (overlaid as heatmaps) for the ground-truth class on two test images. All methods use a DINOv2 ViT-B/14 backbone with J=5 prototypes per class. Compared to EvalProtoPNet, MGProto, and NPPP, vMFProto produces more localized and less redundant evidence, aligning better with semantically meaningful parts. Additional ablations on foregrou… view at source ↗

**Figure 4.** Figure 4: Why-table example on Stanford Dogs. Prediction-level explanation produced by vMFProto for a test sample, showing the top prototypical parts, their source patches, activation maps, and contribution scores. 5 Conclusion We presented vMFProto, a distributional part-prototype network for interpretable classification. vMFProto models each class as a mixture of von MisesFisher prototypes on the hypersphere a… view at source ↗

**Figure 5.** Figure 5: Qualitative comparison on CUB-200-2011. Top-4 prototype activation maps (overlaid as heatmaps) for the ground-truth class on five test images. All methods use a DINOv2 ViT-B/14 backbone with J=5 prototypes per class. Compared to EvalProtoPNet, MGProto, and NPPP, vMFProto produces more localized and less redundant evidence, aligning better with semantically meaningful parts [PITH_FULL_IMAGE:figures/full_f… view at source ↗

**Figure 6.** Figure 6: Qualitative comparison on Stanford Cars. [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative comparison on Stanford Dogs. [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗

read the original abstract

Prototype-based neural networks aim to provide intrinsic interpretability by grounding predictions in a small set of part prototypes. However, modern vision backbones typically operate in normalized, directional embedding spaces where each semantic part exhibits substantial intra-class variability. As a result, point prototypes often become redundant or unstable, hurting both explanation quality and robustness. We propose vMFProto, a distributional part-prototype framework that models each class as a mixture of von Mises-Fisher components on the hypersphere. Each prototype learns its own concentration, capturing part-specific variability, and we use entropic optimal transport (OT) to obtain structured patch-to-prototype assignments. A two-stage training schedule performs OT-driven prototype discovery followed by end-to-end refinement with patch-level distillation and distribution-aware diversity regularization. Experiments on CUB-200-2011, Stanford Dogs, and Stanford Cars with frozen DINO backbones show that vMFProto achieves state-of-the-art explanation quality (consistency, stability, and distinctiveness) with competitive accuracy. Qualitative results confirm that vMFProto yields localized, non-redundant part evidence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

vMFProto swaps point prototypes for per-prototype von Mises-Fisher distributions plus entropic OT to reduce redundancy in spherical embeddings, but the SOTA explanation-quality claims rest on metrics whose measurement details are not shown.

read the letter

The core move is to treat each prototype as a vMF distribution with its own learned concentration instead of a single point, then use entropic optimal transport to assign patches. This directly targets the redundancy problem that arises when DINO-style normalized embeddings show high intra-part variability.

The paper runs the idea on CUB-200-2011, Stanford Dogs, and Stanford Cars with frozen DINO backbones. It reports competitive accuracy and better consistency, stability, and distinctiveness than prior point-prototype baselines. The two-stage schedule—OT-driven discovery followed by distillation plus diversity regularization—looks workable and avoids some of the collapse issues common in prototype methods.

What is actually new is the explicit modeling of concentration per prototype together with the structured OT assignment; that combination is not a routine extension of the point-prototype literature cited.

The soft spot is the experimental section. The abstract states SOTA explanation quality but gives no numbers on how consistency, stability, or distinctiveness were quantified, no error bars, and no ablation that isolates the contribution of the vMF concentrations versus the OT term. Without those controls it is difficult to tell whether the gains are driven by the new components or by other training choices.

This paper is for people already working on prototype-based interpretability in fine-grained vision. A reader who cares about spherical embeddings or transport-based assignment will find the construction worth examining. The central idea is coherent enough that it deserves a serious referee, mainly to check whether the reported improvements survive proper ablations and metric definitions.

I would send it to review with a request for those missing controls.

Referee Report

2 major / 0 minor

Summary. The paper introduces vMFProto, a prototype-based classification method that replaces point prototypes with von Mises-Fisher distributional prototypes on the unit hypersphere to model intra-class part variability in normalized DINO embeddings. Each prototype has a learned concentration parameter, assignments are obtained via entropic optimal transport, and training proceeds in two stages (OT-driven discovery followed by end-to-end refinement with patch distillation and diversity regularization). On CUB-200-2011, Stanford Dogs, and Stanford Cars the method is claimed to deliver state-of-the-art explanation quality (consistency, stability, distinctiveness) while maintaining competitive accuracy with frozen backbones.

Significance. If the empirical claims are substantiated with proper controls, the work would offer a principled extension of prototype interpretability to modern directional embedding spaces, potentially reducing redundancy and improving stability of explanations in fine-grained recognition tasks.

major comments (2)

[Abstract] Abstract: the central claim of state-of-the-art explanation quality (consistency, stability, and distinctiveness) is presented without any quantitative definition of those three metrics, without reported error bars, and without ablation of the per-prototype concentration or entropic OT components; these omissions are load-bearing because the abstract itself states that the performance follows from the new modeling choices.
[Abstract] Abstract: the premise that point prototypes become redundant or unstable is asserted as motivation, yet no quantitative comparison (e.g., redundancy or stability scores for point vs. distributional prototypes) is supplied to support that the two-stage schedule actually resolves the issue.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on the abstract. We address each point below and will revise the abstract to improve clarity and support for the claims while preserving its conciseness.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of state-of-the-art explanation quality (consistency, stability, and distinctiveness) is presented without any quantitative definition of those three metrics, without reported error bars, and without ablation of the per-prototype concentration or entropic OT components; these omissions are load-bearing because the abstract itself states that the performance follows from the new modeling choices.

Authors: We agree the abstract would benefit from tighter linkage to the supporting material. The three metrics receive quantitative definitions in Section 3.2 (consistency as mean intra-class prototype activation correlation, stability as assignment variance under augmentations, distinctiveness as minimum inter-prototype angular distance). Tables 2–4 report means with standard errors over five random seeds, and Section 4.3 contains the requested ablations isolating concentration learning and the entropic OT assignment. We will revise the abstract to add a short parenthetical clause (“see Sec. 3.2 and 4.3 for definitions, ablations, and error bars”) so the central claim is explicitly grounded without exceeding length limits. revision: yes
Referee: [Abstract] Abstract: the premise that point prototypes become redundant or unstable is asserted as motivation, yet no quantitative comparison (e.g., redundancy or stability scores for point vs. distributional prototypes) is supplied to support that the two-stage schedule actually resolves the issue.

Authors: The abstract states the motivation qualitatively, but the manuscript supplies the requested quantitative comparison in Section 4.2 and Table 1: point-prototype baselines exhibit higher average pairwise cosine similarity (redundancy) and higher assignment variance under perturbation (instability) than vMFProto; the two-stage schedule further reduces both quantities. We will insert a brief clause in the abstract (“quantitative comparisons in Sec. 4.2 confirm reduced redundancy and improved stability”) to make the motivation self-supporting. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces vMFProto as a new distributional prototype framework using per-prototype von Mises-Fisher concentrations and entropic OT assignments within a two-stage training schedule on frozen DINO embeddings. Claims of SOTA explanation quality and competitive accuracy are presented as direct empirical outcomes from experiments on CUB-200-2011, Stanford Dogs, and Stanford Cars. No equations, definitions, or load-bearing steps in the provided text reduce these results to quantities defined by the fitted concentrations, OT costs, or prior self-citations; the central construction remains independent of its own outputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard spherical statistics and optimal transport; the only new fitted quantities are the per-prototype concentration parameters and the OT regularization strength, both introduced to capture variability and structure assignments.

free parameters (2)

per-prototype concentration
Each prototype learns its own concentration parameter to model part-specific variability on the hypersphere.
OT regularization strength
Controls the entropy term in the entropic optimal transport used for patch-to-prototype matching.

axioms (2)

domain assumption von Mises-Fisher distributions are appropriate models for directional data on the unit hypersphere
Invoked when replacing point prototypes with distributional components in normalized embedding spaces.
domain assumption entropic optimal transport yields structured, non-redundant assignments between patches and prototypes
Central to the claim that the method avoids redundancy and improves explanation quality.

pith-pipeline@v0.9.1-grok · 5727 in / 1474 out tokens · 37659 ms · 2026-07-01T06:20:51.031729+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 9 canonical work pages · 6 internal anchors

[1]

Bafghi, R.A., Harilal, N., Monteleoni, C., Raissi, M.: Parameter efficient fine- tuningofself-supervisedvitswithoutcatastrophicforgetting.In:Proceedingsofthe IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3679– 3684 (2024)

2024
[2]

Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res.6, 1345–1382 (2005), https://www.jmlr.org/papers/v6/banerjee05a.html

2005
[3]

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Int. Conf. Comput. Vis. pp. 9650–9660 (2021),https://arxiv.org/abs/2104.14294

work page internal anchor Pith review Pith/arXiv arXiv 2021
[4]

Chen, C., Li, O., Tao, C., Barnett, A.J., Su, J., Rudin, C.: This looks like that: Deep learning for interpretable image recognition. In: Adv. Neural Inform. Process. Syst. (2019),https://proceedings.neurips.cc/paper/2019/hash/ adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html

2019
[5]

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: Int. Conf. Mach. Learn. pp. 1597– 1607 (2020),https://arxiv.org/abs/2002.05709

work page internal anchor Pith review Pith/arXiv arXiv 2020
[6]

Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Adv. Neural Inform. Process. Syst. (2013),https://proceedings.neurips.cc/ paper/2013/hash/af21d0c97db2e27e13572cbf59eb343d-Abstract.html

2013
[7]

In: IEEE Conf

Donnelly, J., Barnett, A.J., Chen, C.: Deformable ProtoPNet: An interpretable image classifier using deformable prototypes. In: IEEE Conf. Comput. Vis. Pat- tern Recog. pp. 10265–10275 (2022),https://openaccess.thecvf.com/content/ CVPR2022 / html / Donnelly _ Deformable _ ProtoPNet _ An _ Interpretable _ Image _ Classifier_Using_Deformable_Prototypes_CVPR...

2022
[8]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A.: An image is worth 16x16 words: Transformers for image recogni- tion at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[9]

Huang, Q., Xue, M., Huang, W., Zhang, H., Song, J., Jing, Y., Song, M.: Evalu- ation and improvement of interpretability for self-explainable part-prototype net- works. In: Int. Conf. Comput. Vis. pp. 2011–2020 (2023),https://openaccess. thecvf.com/content/ICCV2023/html/Huang_Evaluation_and_Improvement_of_ Interpretability_ for _ Self - Explainable _ Part...

2011
[10]

In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition

Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO (June 2011),http://vision.stanford.edu/aditya86/ImageNetDogs/

2011
[11]

In: ICCV Workshops (2013),https://openaccess

Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: ICCV Workshops (2013),https://openaccess. thecvf . com / content _ iccv _ workshops _ 2013 / W19 / html / Krause _ 3D _ Object _ Representations_2013_ICCV_paper.html

2013
[12]

In: IEEE Conf

Nauta, M., van Bree, R., Seifert, C.: Neural prototype trees for interpretable fine-grained image recognition. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 14933–14943 (2021),https://openaccess.thecvf.com/content/CVPR2021/html/ Nauta _ Neural _ Prototype _ Trees _ for _ Interpretable _ Fine - Grained _ Image _ Recognition_CVPR_2021_paper.html

2021
[13]

In: IEEE Conf

Nauta, M., Schlötterer, J., van Keulen, M., Seifert, C.: PIP-Net: Patch-based intu- itive prototypes for interpretable image classification. In: IEEE Conf. Comput. 16 D. Leão et al. Vis. Pattern Recog. pp. 2744–2753 (2023),https://openaccess.thecvf.com/ content/CVPR2023/html/Nauta_PIP-Net_Patch-Based_Intuitive_Prototypes_ for_Interpretable_Image_Classific...

2023
[14]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual features without supervision....

work page internal anchor Pith review Pith/arXiv arXiv 2024
[15]

Foundations and Trends in Machine Learning11(5–6), 355–607 (2019), https://optimaltransport.github.io/pdf/ComputationalOT.pdf

Peyré, G., Cuturi, M.: Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning11(5–6), 355–607 (2019), https://optimaltransport.github.io/pdf/ComputationalOT.pdf

2019
[16]

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transfer- able visual models from natural language supervision. In: Int. Conf. Mach. Learn. (2021),https://arxiv.org/abs/2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021
[17]

Rymarczyk, D., Struski, Ł., Górszczak, M., Lewandowska, K., Tabor, J., Zieliński, B.: Interpretable image classification with differentiable prototypes assignment. In: Eur. Conf. Comput. Vis. (2022),https://arxiv.org/abs/2112.02902

work page arXiv 2022
[18]

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad- CAM: Visual explanations from deep networks via gradient-based localization. In: Int. Conf. Comput. Vis. (2017),https://openaccess.thecvf.com/content_iccv_ 2017/html/Selvaraju_Grad-CAM_Visual_Explanations_ICCV_2017_paper.html

2017
[19]

In: Brit

Siméoni, O., Puy, G., Vo, H.V., Roburin, S., Gidaris, S., Bursuc, A., Pérez, P., Marlet, R., Ponce, J.: LOST: Localizing objects with self-supervised transformers and no labels. In: Brit. Mach. Vis. Conf. (2021),https://arxiv.org/abs/2109. 14279

2021
[20]

DINOv3

Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., et al.: DINOv3. arXiv:2508.10104 (2025).https://doi.org/10.48550/arXiv.2508.10104,https: //arxiv.org/abs/2508.10104

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10104 2025
[21]

Ukai, Y., Hirakawa, T., Yamashita, T., Fujiyoshi, H.: This looks like it rather than that: ProtoKNN for similarity-based classifiers. In: Int. Conf. Learn. Represent. (2023),https://openreview.net/forum?id=lh-HRYxuoRr

2023
[22]

Wah,C.,Branson,S.,Welinder,P.,Perona,P.,Belongie,S.:Thecaltech-ucsdbirds- 200-2011 dataset. Tech. Rep. CNS-TR-2011-001, California Institute of Technology (2011),https://www.vision.caltech.edu/datasets/cub_200_2011/

2011
[23]

IEEE Trans

Wang, C., Chen, Y., Liu, F., Liu, Y., McCarthy, D.J., Frazer, H., Carneiro, G.: Mixture of Gaussian-distributed prototypes with generative modelling for inter- pretable and trustworthy image recognition. IEEE Trans. Pattern Anal. Mach. Intell.47(8), 6974–6989 (2025),https://arxiv.org/abs/2312.00092

work page arXiv 2025
[24]

Wang, J., Liu, H., Wang, X., Jing, L.: Interpretable image recognition by constructing transparent embedding space. In: Int. Conf. Comput. Vis. pp. 895–904 (2021),https: / /openaccess. thecvf. com /content /ICCV2021/ html / Wang _ Interpretable _ Image _ Recognition _ by _ Constructing _ Transparent _ Embedding_Space_ICCV_2021_paper.html

2021
[25]

In: IJCAI

Xue, M., Huang, Q., Zhang, H., Hu, J., Song, J., Song, M., Jin, C.: Protop- former: Concentrating on prototypical parts in vision transformers for interpretable image recognition. In: IJCAI. pp. 1516–1524 (2024),https://www.ijcai.org/ proceedings/2024/168 vMF Mixture Prototypes 17

2024
[26]

why-table

Zhu, Z., Fan, L., Pagnucco, M., Song, Y.: Interpretable image classification via non- parametric part prototype learning. In: IEEE Conf. Comput. Vis. Pattern Recog. (2025),https : / / openaccess . thecvf . com / content / CVPR2025 / papers / Zhu _ Interpretable_Image_Classification_via_Non-parametric_Part_Prototype_ Learning_CVPR_2025_paper.pdf 18 D. Leão...

work page arXiv 2025

[1] [1]

Bafghi, R.A., Harilal, N., Monteleoni, C., Raissi, M.: Parameter efficient fine- tuningofself-supervisedvitswithoutcatastrophicforgetting.In:Proceedingsofthe IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3679– 3684 (2024)

2024

[2] [2]

Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res.6, 1345–1382 (2005), https://www.jmlr.org/papers/v6/banerjee05a.html

2005

[3] [3]

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Int. Conf. Comput. Vis. pp. 9650–9660 (2021),https://arxiv.org/abs/2104.14294

work page internal anchor Pith review Pith/arXiv arXiv 2021

[4] [4]

Chen, C., Li, O., Tao, C., Barnett, A.J., Su, J., Rudin, C.: This looks like that: Deep learning for interpretable image recognition. In: Adv. Neural Inform. Process. Syst. (2019),https://proceedings.neurips.cc/paper/2019/hash/ adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html

2019

[5] [5]

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for con- trastive learning of visual representations. In: Int. Conf. Mach. Learn. pp. 1597– 1607 (2020),https://arxiv.org/abs/2002.05709

work page internal anchor Pith review Pith/arXiv arXiv 2020

[6] [6]

Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Adv. Neural Inform. Process. Syst. (2013),https://proceedings.neurips.cc/ paper/2013/hash/af21d0c97db2e27e13572cbf59eb343d-Abstract.html

2013

[7] [7]

In: IEEE Conf

Donnelly, J., Barnett, A.J., Chen, C.: Deformable ProtoPNet: An interpretable image classifier using deformable prototypes. In: IEEE Conf. Comput. Vis. Pat- tern Recog. pp. 10265–10275 (2022),https://openaccess.thecvf.com/content/ CVPR2022 / html / Donnelly _ Deformable _ ProtoPNet _ An _ Interpretable _ Image _ Classifier_Using_Deformable_Prototypes_CVPR...

2022

[8] [8]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A.: An image is worth 16x16 words: Transformers for image recogni- tion at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[9] [9]

Huang, Q., Xue, M., Huang, W., Zhang, H., Song, J., Jing, Y., Song, M.: Evalu- ation and improvement of interpretability for self-explainable part-prototype net- works. In: Int. Conf. Comput. Vis. pp. 2011–2020 (2023),https://openaccess. thecvf.com/content/ICCV2023/html/Huang_Evaluation_and_Improvement_of_ Interpretability_ for _ Self - Explainable _ Part...

2011

[10] [10]

In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition

Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO (June 2011),http://vision.stanford.edu/aditya86/ImageNetDogs/

2011

[11] [11]

In: ICCV Workshops (2013),https://openaccess

Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: ICCV Workshops (2013),https://openaccess. thecvf . com / content _ iccv _ workshops _ 2013 / W19 / html / Krause _ 3D _ Object _ Representations_2013_ICCV_paper.html

2013

[12] [12]

In: IEEE Conf

Nauta, M., van Bree, R., Seifert, C.: Neural prototype trees for interpretable fine-grained image recognition. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 14933–14943 (2021),https://openaccess.thecvf.com/content/CVPR2021/html/ Nauta _ Neural _ Prototype _ Trees _ for _ Interpretable _ Fine - Grained _ Image _ Recognition_CVPR_2021_paper.html

2021

[13] [13]

In: IEEE Conf

Nauta, M., Schlötterer, J., van Keulen, M., Seifert, C.: PIP-Net: Patch-based intu- itive prototypes for interpretable image classification. In: IEEE Conf. Comput. 16 D. Leão et al. Vis. Pattern Recog. pp. 2744–2753 (2023),https://openaccess.thecvf.com/ content/CVPR2023/html/Nauta_PIP-Net_Patch-Based_Intuitive_Prototypes_ for_Interpretable_Image_Classific...

2023

[14] [14]

Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W.,Howes,R.,Huang,P.Y.,Li,S.W.,Misra,I.,Rabbat,M.,Sharma,V.,Synnaeve, G., Xu, H., Jegou, H., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual features without supervision....

work page internal anchor Pith review Pith/arXiv arXiv 2024

[15] [15]

Foundations and Trends in Machine Learning11(5–6), 355–607 (2019), https://optimaltransport.github.io/pdf/ComputationalOT.pdf

Peyré, G., Cuturi, M.: Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning11(5–6), 355–607 (2019), https://optimaltransport.github.io/pdf/ComputationalOT.pdf

2019

[16] [16]

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transfer- able visual models from natural language supervision. In: Int. Conf. Mach. Learn. (2021),https://arxiv.org/abs/2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021

[17] [17]

Rymarczyk, D., Struski, Ł., Górszczak, M., Lewandowska, K., Tabor, J., Zieliński, B.: Interpretable image classification with differentiable prototypes assignment. In: Eur. Conf. Comput. Vis. (2022),https://arxiv.org/abs/2112.02902

work page arXiv 2022

[18] [18]

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad- CAM: Visual explanations from deep networks via gradient-based localization. In: Int. Conf. Comput. Vis. (2017),https://openaccess.thecvf.com/content_iccv_ 2017/html/Selvaraju_Grad-CAM_Visual_Explanations_ICCV_2017_paper.html

2017

[19] [19]

In: Brit

Siméoni, O., Puy, G., Vo, H.V., Roburin, S., Gidaris, S., Bursuc, A., Pérez, P., Marlet, R., Ponce, J.: LOST: Localizing objects with self-supervised transformers and no labels. In: Brit. Mach. Vis. Conf. (2021),https://arxiv.org/abs/2109. 14279

2021

[20] [20]

DINOv3

Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., et al.: DINOv3. arXiv:2508.10104 (2025).https://doi.org/10.48550/arXiv.2508.10104,https: //arxiv.org/abs/2508.10104

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10104 2025

[21] [21]

Ukai, Y., Hirakawa, T., Yamashita, T., Fujiyoshi, H.: This looks like it rather than that: ProtoKNN for similarity-based classifiers. In: Int. Conf. Learn. Represent. (2023),https://openreview.net/forum?id=lh-HRYxuoRr

2023

[22] [22]

Wah,C.,Branson,S.,Welinder,P.,Perona,P.,Belongie,S.:Thecaltech-ucsdbirds- 200-2011 dataset. Tech. Rep. CNS-TR-2011-001, California Institute of Technology (2011),https://www.vision.caltech.edu/datasets/cub_200_2011/

2011

[23] [23]

IEEE Trans

Wang, C., Chen, Y., Liu, F., Liu, Y., McCarthy, D.J., Frazer, H., Carneiro, G.: Mixture of Gaussian-distributed prototypes with generative modelling for inter- pretable and trustworthy image recognition. IEEE Trans. Pattern Anal. Mach. Intell.47(8), 6974–6989 (2025),https://arxiv.org/abs/2312.00092

work page arXiv 2025

[24] [24]

Wang, J., Liu, H., Wang, X., Jing, L.: Interpretable image recognition by constructing transparent embedding space. In: Int. Conf. Comput. Vis. pp. 895–904 (2021),https: / /openaccess. thecvf. com /content /ICCV2021/ html / Wang _ Interpretable _ Image _ Recognition _ by _ Constructing _ Transparent _ Embedding_Space_ICCV_2021_paper.html

2021

[25] [25]

In: IJCAI

Xue, M., Huang, Q., Zhang, H., Hu, J., Song, J., Song, M., Jin, C.: Protop- former: Concentrating on prototypical parts in vision transformers for interpretable image recognition. In: IJCAI. pp. 1516–1524 (2024),https://www.ijcai.org/ proceedings/2024/168 vMF Mixture Prototypes 17

2024

[26] [26]

why-table

Zhu, Z., Fan, L., Pagnucco, M., Song, Y.: Interpretable image classification via non- parametric part prototype learning. In: IEEE Conf. Comput. Vis. Pattern Recog. (2025),https : / / openaccess . thecvf . com / content / CVPR2025 / papers / Zhu _ Interpretable_Image_Classification_via_Non-parametric_Part_Prototype_ Learning_CVPR_2025_paper.pdf 18 D. Leão...

work page arXiv 2025