T-DuMpRa: Teacher-guided Dual-path Multi-prototype Retrieval Augmented framework for fine-grained medical image classification

Shen Zhao; Zixuan Tang

arxiv: 2604.17360 · v1 · submitted 2026-04-19 · 💻 cs.AI

T-DuMpRa: Teacher-guided Dual-path Multi-prototype Retrieval Augmented framework for fine-grained medical image classification

Zixuan Tang , Shen Zhao This is my paper

Pith reviewed 2026-05-10 06:22 UTC · model grok-4.3

classification 💻 cs.AI

keywords fine-grained medical image classificationmulti-prototype retrievalteacher-guided learningconfidence-gated fusionskin lesion classificationambiguous casesEMA teachercontrastive embedding learning

0 comments

The pith

A teacher-guided dual-path framework with multi-prototype retrieval and confidence-gated fusion improves accuracy on visually ambiguous medical images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops T-DuMpRa to address fine-grained medical image classification where subtle inter-class differences create visually ambiguous cases that produce uncertain predictions. It combines a standard discriminative classifier with a parallel retrieval path that matches embeddings against a bank of prototypes derived from clustered teacher-model representations. Training uses both cross-entropy and supervised contrastive losses to produce cosine-compatible embeddings, while an EMA teacher supplies stable representations for the memory bank. At inference a conservative gate fuses the two signals only when the classifier shows uncertainty and the prototype matches strongly conflict with it, leaving high-confidence outputs unchanged. Experiments on HAM10000 and ISIC2019 report modest gains across five backbones and visualizations indicate better separation of ambiguous examples.

Core claim

The T-DuMpRa framework jointly optimizes discriminative classification and multi-prototype retrieval during training by using an EMA teacher to build a clustered memory bank in embedding space, then at inference fuses the classifier distribution with cosine similarity to the prototypes through a conservative confidence gate that activates retrieval solely when the base prediction is uncertain and the retrieval evidence is decisive and conflicting.

What carries the argument

The confidence-gated fusion mechanism that selectively combines the base classifier output with cosine similarity scores to a multi-prototype memory bank constructed from EMA teacher embeddings, activating only on uncertain and conflicting cases.

If this is right

The framework can be attached to any existing backbone by adding a compact prototype bank without retraining the original model from scratch.
Joint cross-entropy and contrastive training produces embeddings that support both classification and reliable prototype matching.
The EMA teacher supplies smoother representations that enable stable clustering into multiple prototypes per class.
The conservative gate leaves confident correct predictions untouched while targeting only the ambiguous subset.
Visualization of activation patterns confirms the method focuses retrieval on visually similar inter-class examples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The selective activation logic could be tested on other fine-grained domains such as plant species or product variants where uncertainty also signals visual overlap.
Replacing the fixed prototype bank with an online-updating version might allow the method to adapt to distribution shift without full retraining.
Varying the uncertainty and conflict thresholds per dataset could reveal whether the reported gains are conservative or near-optimal.
The dual-path training might be extended by adding a third path that learns to predict when retrieval will be helpful, turning the gate into a learned component.

Load-bearing premise

The gated fusion will activate retrieval exactly when it resolves ambiguity without introducing errors on predictions that are already correct but uncertain.

What would settle it

On the HAM10000 or ISIC2019 test sets, identify the subset of cases where the base classifier is uncertain yet correct, apply the fusion unconditionally, and check whether accuracy falls relative to the base classifier alone.

Figures

Figures reproduced from arXiv: 2604.17360 by Shen Zhao, Zixuan Tang.

**Figure 1.** Figure 1: The challenges in fine-grained medical image classification and our method’s overview. (a) shows visually ambiguous cases where different categories share similar patterns, leading to classifier uncertainty. (b) highlights intra-class diversity, demonstrating the challenge of handling different appearances within the same category. (c) illustrates the shortcomings of the single-path framework, where predi… view at source ↗

**Figure 2.** Figure 2: The proposed teacher-guided prototype retrieval framework. (a) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Results of ablation experiments for classifier confidence threshold [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative examples visualization. We visualized the results on HAM using the experimentally optimal hyperparameter setting with the ViT-B model. In this evaluation, we randomly selected four samples for analysis. fused prediction increases the BCC confidence, improving the overall accuracy. This shows that our gating mechanism effectively incorporates prototype retrieval when the classifier is uncertain… view at source ↗

read the original abstract

Fine-grained medical image classification is challenged by subtle inter-class variations and visually ambiguous cases, where confidence estimates often exhibit uncertainty rather than being overconfident. In such scenarios, purely discriminative classifiers may achieve high overall accuracy yet still fail to distinguish between highly similar categories, leading to miscalibrated predictions. We propose T-DuMpRa, a teacher-guided dual-path multi-prototype retrieval-augmented framework, where discriminative classification and multi-prototype retrieval jointly drive both training and prediction. During training, we jointly optimize cross-entropy and supervised contrastive objectives to learn a cosine-compatible embedding geometry for reliable prototype matching. We further employ an exponential moving average (EMA) teacher to obtain smoother representations and build a multi-prototype memory bank by clustering teacher embeddings in the teacher embedding space. Our framework is plug-and-play: it can be easily integrated into existing classification models by constructing a compact prototype bank, thereby improving performance on visually ambiguous cases. At inference, we combine the classifier's predicted distribution with a similarity-based distribution computed via cosine matching to prototypes, and apply a conservative confidence-gated fusion that activates retrieval only when the classifier's prediction is uncertain and the retrieval evidence is decisive and conflicting, otherwise keeping confident predictions unchanged. On HAM10000 and ISIC2019, our method yields 0.68%-0.21% and 0.44%-2.69% improvements on 5 different backbones. And visualization analysis proves our model can enhance the model's ability to handle visually ambiguous cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Modest gains on two skin lesion datasets from adding gated prototype retrieval to contrastive-trained classifiers, but the gate's specific contribution is not demonstrated.

read the letter

The main point is a plug-and-play addition to existing backbones: an EMA teacher builds a multi-prototype bank in embedding space, supervised contrastive loss shapes the features for matching, and at test time a conservative gate blends in the prototype similarity distribution only when the classifier is uncertain and the retrieval signal conflicts. On HAM10000 and ISIC2019 this yields 0.2-2.7% lifts across five backbones, with some visualizations suggesting better handling of ambiguous cases. The training setup is standard and the inference gate is a reasonable safety measure to avoid hurting confident predictions. That combination is the actual new element here—an application of retrieval augmentation tuned for medical fine-grained work rather than a new primitive. The plug-and-play claim is credible on paper since the prototype bank is compact and the fusion is post-hoc. The joint CE plus contrastive objective plus EMA is reproducible from the description. The soft spots sit in the evaluation. The abstract gives no activation statistics for the gate, no ablation that isolates the retrieval path from the contrastive training, and no per-case accuracy deltas on activated versus non-activated examples. Without those, the reported lifts could stem from the embedding changes alone. The free parameters (EMA decay, prototype count, gating thresholds) also lack sensitivity results. The assumption that the gate fires only on helpful ambiguous cases and stays silent on reliable-uncertain ones remains untested in the provided text. This paper is for groups already working on dermatology or similar medical classification who want a lightweight retrieval add-on. It is not foundational, but the framework is concrete enough that a reader could implement and test the gate themselves. I would send it to peer review after the authors add the missing ablations and activation analysis; the current version is too thin on evidence for the central mechanism to stand on its own.

Referee Report

3 major / 2 minor

Summary. The paper introduces T-DuMpRa, a teacher-guided dual-path multi-prototype retrieval-augmented framework for fine-grained medical image classification. It jointly trains a classifier with cross-entropy and supervised contrastive losses, uses an EMA teacher to build a multi-prototype memory bank from clustered embeddings, and at inference fuses the classifier's distribution with a prototype similarity distribution using a conservative confidence gate that only activates retrieval for uncertain and conflicting cases. The authors claim small but consistent improvements on HAM10000 (0.21-0.68%) and ISIC2019 (0.44-2.69%) across five backbones, with visualizations suggesting better handling of ambiguous cases.

Significance. If validated, this work could offer a lightweight, plug-and-play method to boost performance of standard backbones on medical datasets with high visual similarity between classes. The conservative gating strategy is a positive aspect to prevent degradation on easy cases. The gains are modest, so the significance would be in providing a practical tool rather than a breakthrough in accuracy.

major comments (3)

Abstract: The reported performance improvements are given as ranges without specifying per-backbone results, statistical significance, or number of runs, which is critical to evaluate if the gains are reliable and attributable to the proposed fusion mechanism rather than training variations.
Inference mechanism (as described in abstract): The confidence-gated fusion is presented qualitatively without quantitative analysis of activation frequency, false positive rate on non-ambiguous cases, or ablation removing the gate; this directly impacts whether the central claim that retrieval augmentation enhances ambiguous case handling holds.
Method description: No ablation studies are described to separate the contributions of the joint training objectives, EMA teacher, and the inference-time fusion, making it difficult to confirm that the dual-path aspect is responsible for the observed improvements on the two datasets.

minor comments (2)

Abstract: The improvement ranges are written as '0.68%-0.21%' which is non-standard ordering and unclear; it should be clarified if this is the range across backbones or something else.
The paper would benefit from including the exact values of free parameters such as EMA decay rate and number of prototypes per class in the main text or appendix for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help us improve the clarity and rigor of the manuscript. We address each major comment point by point below, indicating the revisions we will incorporate.

read point-by-point responses

Referee: Abstract: The reported performance improvements are given as ranges without specifying per-backbone results, statistical significance, or number of runs, which is critical to evaluate if the gains are reliable and attributable to the proposed fusion mechanism rather than training variations.

Authors: We agree that the abstract summary could be more precise. The ranges (0.21-0.68% on HAM10000 and 0.44-2.69% on ISIC2019) are used for brevity to convey the consistent gains across backbones. Detailed per-backbone results are already provided in Tables 1 and 2 of the main text. In the revised manuscript, we will update the abstract to explicitly note that experiments were run with fixed random seeds for reproducibility and to reference the per-backbone values and any variance reported in the tables. This will allow readers to better assess reliability without lengthening the abstract excessively. revision: yes
Referee: Inference mechanism (as described in abstract): The confidence-gated fusion is presented qualitatively without quantitative analysis of activation frequency, false positive rate on non-ambiguous cases, or ablation removing the gate; this directly impacts whether the central claim that retrieval augmentation enhances ambiguous case handling holds.

Authors: The abstract necessarily presents the gating strategy at a high level. The full manuscript includes qualitative visualizations and case studies showing improved handling of ambiguous examples. We acknowledge that quantitative support would strengthen the central claim. In the revision, we will add: (1) the percentage of test samples where the gate activates, (2) an analysis of false-positive activations (cases where the gate triggers but the classifier prediction was correct), and (3) an ablation comparing performance with the gate disabled. These additions will be placed in the experimental or analysis section. revision: yes
Referee: Method description: No ablation studies are described to separate the contributions of the joint training objectives, EMA teacher, and the inference-time fusion, making it difficult to confirm that the dual-path aspect is responsible for the observed improvements on the two datasets.

Authors: The current manuscript emphasizes the integrated framework and its overall results. We agree that component-wise ablations would help isolate contributions and confirm the value of the dual-path design. In the revised version, we will add ablation experiments that separately evaluate: (i) cross-entropy only versus joint cross-entropy + supervised contrastive loss, (ii) prototype bank construction with versus without the EMA teacher, and (iii) inference with versus without the gated fusion. These will be reported on both datasets to directly address the concern. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with independent experimental validation

full rationale

The paper describes a plug-and-play empirical architecture (joint CE + supervised contrastive training, EMA teacher for prototype bank construction, and conservative confidence-gated fusion at inference) whose performance claims are presented as measured improvements on HAM10000 and ISIC2019 across backbones, supported by visualization. No mathematical derivation chain exists that reduces a claimed prediction or result to its own inputs by construction; there are no equations shown that equate fitted parameters to outputs, no self-definitional loops, and no load-bearing self-citations or uniqueness theorems invoked to force the method. The reported gains and ambiguity-handling claims rest on external dataset evaluation rather than tautological re-expression of training objectives.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The framework introduces several empirical design choices including the dual-path structure, EMA teacher, clustering for prototypes, and the specific conservative fusion rule, none of which are theoretically derived but validated through experiments on medical datasets.

free parameters (3)

EMA decay rate
Hyperparameter for updating the teacher model with exponential moving average; value not provided in abstract
Number of prototypes per class
Determined via clustering of teacher embeddings; affects the granularity of the memory bank
Gating thresholds
Confidence and conflict thresholds for deciding when to fuse retrieval output; not specified

axioms (2)

domain assumption Joint optimization of cross-entropy and supervised contrastive losses yields cosine-compatible embeddings
Stated as the goal for reliable prototype matching
domain assumption Clustering in teacher embedding space produces useful multi-prototypes for retrieval
Core to building the memory bank

pith-pipeline@v0.9.0 · 5575 in / 1699 out tokens · 110509 ms · 2026-05-10T06:22:48.938932+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 1 internal anchor

[1]

International Journal of Intelligent Systems2025(1), 3164952 (2025)

Alam, F., Ullah, A., Shah, D., Ali, S., Tahir, M.: Artificial intelligence in melanoma detection: a review of current technologies and future directions. International Journal of Intelligent Systems2025(1), 3164952 (2025)

work page 2025
[2]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Aleem, S., Wang, F., Maniparambil, M., Arazo, E., Dietlmeier, J., Curran, K., Connor, N.E., Little, S.: Test-time adaptation with salip: A cascade of sam and clip for zero-shot medical image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5184–5193 (2024)

work page 2024
[3]

Sage Open5(4), 2158244015611451 (2015)

Bresciani,S.,Eppler,M.J.:The pitfallsofvisualrepresentations: Areviewandclas- sification of common errors made while designing and interpreting visualizations. Sage Open5(4), 2158244015611451 (2015)

work page 2015
[4]

Annals of translational medicine8(11), 713 (2020)

Cai, L., Gao, J., Zhao, D.: A review of the application of deep learning in medical image classification and segmentation. Annals of translational medicine8(11), 713 (2020)

work page 2020
[5]

IEEE Journal of Biomedical and Health Informatics (2025)

Cao, L., Li, H., Dong, Y., Liu, T., Li, J.: Few-shot class-incremental learning with dynamic prototype refinement for brain activity classification. IEEE Journal of Biomedical and Health Informatics (2025)

work page 2025
[6]

Computers in biology and medicine185, 109507 (2025)

Chen, C., Isa, N.A.M., Liu, X.: A review of convolutional neural network based methods for medical image classification. Computers in biology and medicine185, 109507 (2025)

work page 2025
[7]

In: International conference on medical image computing and computer-assisted intervention

Chen, W., Wang, P., Ren, H., Sun, L., Li, Q., Yuan, Y., Li, X.: Medical image synthesisviafine-grainedimage-textalignmentandanatomy-pathologyprompting. In: International conference on medical image computing and computer-assisted intervention. pp. 240–250. Springer (2024)

work page 2024
[8]

Advances in neural information processing systems 35, 23049–23062 (2022)

Chen, Z., Deng, Y., Wu, Y., Gu, Q., Li, Y.: Towards understanding the mixture-of- experts layer in deep learning. Advances in neural information processing systems 35, 23049–23062 (2022)

work page 2022
[9]

Medical Image Analysis76, 102313 (2022)

Cheng, J., Tian, S., Yu, L., Gao, C., Kang, X., Ma, X., Wu, W., Liu, S., Lu, H.: Resganet: Residual group attention network for medical image classification and segmentation. Medical Image Analysis76, 102313 (2022)

work page 2022
[10]

In: Proceedings of the IEEE/CVF international conference on computer vision

Cheng, P., Lin, L., Lyu, J., Huang, Y., Luo, W., Tang, X.: Prior: Prototype rep- resentation joint learning from medical images and reports. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 21361–21371 (2023)

work page 2023
[11]

The Lancet Digital Health4(5), e330–e339 (2022)

Combalia, M., Codella, N., Rotemberg, V., Carrera, C., Dusza, S., Gutman, D., Helba, B., Kittler, H., Kurtansky, N.R., Liopyris, K., et al.: Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images: the 2019 international skin imaging collaboration grand challenge. The Lancet Digital Health4(5), e330–e339 (2022)

work page 2019
[12]

In: International Conference on Machine Learning

Conti, J.R., Noiry, N., Clemencon, S., Despiegel, V., Gentric, S.: Mitigating gender bias in face recognition using the von mises-fisher mixture model. In: International Conference on Machine Learning. pp. 4344–4369. PMLR (2022)

work page 2022
[13]

Cochrane Database of Systematic Reviews (12) (2018)

Dinnes, J., Deeks, J.J., Chuchu, N., di Ruffano, L.F., Matin, R.N., Thomson, D.R., Wong, K.Y., Aldridge, R.B., Abbott, R., Fawzy, M., et al.: Dermoscopy, with and without visual inspection, for diagnosing melanoma in adults. Cochrane Database of Systematic Reviews (12) (2018)

work page 2018
[14]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) 16 Z. Tang et al

work page internal anchor Pith review Pith/arXiv arXiv 2010
[15]

Advances in Neural Information Processing Systems34, 30284–30297 (2021)

Englesson, E., Azizpour, H.: Generalized jensen-shannon divergence loss for learn- ing with noisy labels. Advances in Neural Information Processing Systems34, 30284–30297 (2021)

work page 2021
[16]

Ad- vances in neural information processing systems30(2017)

Geifman, Y., El-Yaniv, R.: Selective classification for deep neural networks. Ad- vances in neural information processing systems30(2017)

work page 2017
[17]

Advances in Neural Information Processing Systems37, 111047–111073 (2024)

Goren, S., Galil, I., El-Yaniv, R.: Hierarchical selective classification. Advances in Neural Information Processing Systems37, 111047–111073 (2024)

work page 2024
[18]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Han, Z., Yang, F., Huang, J., Zhang, C., Yao, J.: Multimodal dynamics: Dynamical fusion for trustworthy multimodal classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20707–20717 (2022)

work page 2022
[19]

PET clinics 17(1), 1 (2022)

Hasani, N., Morris, M.A., Rhamim, A., Summers, R.M., Jones, E., Siegel, E., Saboury, B.: Trustworthy artificial intelligence in medical imaging. PET clinics 17(1), 1 (2022)

work page 2022
[20]

von mises-fisher mixture model-based deep learning: Application to face verification,

Hasnat, M.A., Bohné, J., Milgram, J., Gentric, S., Chen, L.: von mises-fisher mix- ture model-based deep learning: Application to face verification. arXiv preprint arXiv:1706.04264 (2017)

work page arXiv 2017
[21]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9729–9738 (2020)

work page 2020
[22]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)

work page 2016
[23]

In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Hu, P., Qin, Y., Gou, Y., Li, Y., Yang, M., Peng, X.: Probabilistic multimodal learning with von mises-fisher distributions. In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. pp. 5390–5398 (2025)

work page 2025
[24]

Hu, X., Zeng, D., Xu, X., Shi, Y.: Semi-supervised contrastive learning for label- efficientmedicalimagesegmentation.In:Internationalconferenceonmedicalimage computing and computer-assisted intervention. pp. 481–490. Springer (2021)

work page 2021
[25]

IEEE Access (2025)

Hussain, T., Shouno, H., Hussain, A., Hussain, D., Ismail, M., Mir, T.H., Hsu, F.R., Alam, T., Akhy, S.A.: Effresnet-vit: A fusion-based convolutional and vision transformer model for explainable medical image classification. IEEE Access (2025)

work page 2025
[26]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Huy, T.D., Tran, S.K., Nguyen, P., Tran, N.H., Sam, T.B., Van Den Hengel, A., Liao, Z., Verjans, J.W., To, M.S., Phan, V.M.H.: Interactive medical image analysis with concept-based similarity reasoning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 30797–30806 (2025)

work page 2025
[27]

Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems30(2017)

work page 2017
[28]

IEEE Access (2025)

Khan, A., Rauf, Z., Khan, A.R., Rathore, S., Khan, S.H., Shah, N., Farooq, U., Asif, H., Asif, A., Zahoora, U., et al.: A recent survey of vision transformers for medical image segmentation. IEEE Access (2025)

work page 2025
[29]

Advances in neural information processing systems33, 18661–18673 (2020)

Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.: Supervised contrastive learning. Advances in neural information processing systems33, 18661–18673 (2020)

work page 2020
[30]

BMC medical imaging22(1), 69 (2022)

Kim, H.E., Cosa-Linan, A., Santhanam, N., Jannesari, M., Maros, M.E., Gans- landt, T.: Transfer learning for medical image classification: a literature review. BMC medical imaging22(1), 69 (2022)

work page 2022
[31]

The lancet oncology3(3), 159–165 (2002)

Kittler, H., Pehamberger, H., Wolff, K., Binder, M.: Diagnostic accuracy of der- moscopy. The lancet oncology3(3), 159–165 (2002)

work page 2002
[32]

Multimedia Tools and Applications83(7), 19683– 19728 (2024) T-DuMpRa 17

Kumar, R., Kumbharkar, P., Vanam, S., Sharma, S.: Medical images classification using deep learning: a survey. Multimedia Tools and Applications83(7), 19683– 19728 (2024) T-DuMpRa 17

work page 2024
[33]

In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition

Li, T., Cao, P., Yuan, Y., Fan, L., Yang, Y., Feris, R.S., Indyk, P., Katabi, D.: Targeted supervised contrastive learning for long-tailed recognition. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6918–6928 (2022)

work page 2022
[34]

IEEE Transactions on Neural Networks and Learning Systems (2025)

Li, W., Peng, Y., Zhang, M., Ding, L., Hu, H., Shen, L.: Deep model fusion: A survey. IEEE Transactions on Neural Networks and Learning Systems (2025)

work page 2025
[35]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

Li, X., Li, J., Du, Z., Zhu, L., Shen, H.T.: Unified modality separation: A vision- language framework for unsupervised domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

work page 2025
[36]

IEEE Journal of Biomedical and Health Informatics29(5), 3587–3597 (2025)

Liang, X., Li, X., Li, F., Jiang, J., Dong, Q., Wang, W., Wang, K., Dong, S., Luo, G., Li, S.: Medfilip: Medical fine-grained language-image pre-training. IEEE Journal of Biomedical and Health Informatics29(5), 3587–3597 (2025)

work page 2025
[37]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Liang,Y.,Chen,H.,Xiong,Y.,Zhou,Z.,Lyu,M.,Lin,Z.,Niu,S.,Zhao,S.,Han,J., Ding, G.: Advancing reliable test-time adaptation of vision-language models under visual variations. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 4788–4797 (2025)

work page 2025
[38]

IEEE Transactions on Medical Imaging43(2), 674–685 (2023)

Ling, Y., Wang, Y., Dai, W., Yu, J., Liang, P., Kong, D.: Mtanet: Multi-task attention network for automatic medical image segmentation and classification. IEEE Transactions on Medical Imaging43(2), 674–685 (2023)

work page 2023
[39]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Liu, F., Tian, Y., Chen, Y., Liu, Y., Belagiannis, V., Carneiro, G.: Acpl: Anti- curriculum pseudo-labelling for semi-supervised medical image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 20697–20706 (2022)

work page 2022
[40]

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer:Hierarchicalvisiontransformerusingshiftedwindows.In:Proceedings of the IEEE/CVF international conference on computer vision. pp. 10012–10022 (2021)

work page 2021
[41]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11976–11986 (2022)

work page 2022
[42]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Long, A., Yin, W., Ajanthan, T., Nguyen, V., Purkait, P., Garg, R., Blair, A., Shen, C., Van den Hengel, A.: Retrieval augmented classification for long-tail visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6959–6969 (2022)

work page 2022
[43]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Manhardt, F., Arroyo, D.M., Rupprecht, C., Busam, B., Birdal, T., Navab, N., Tombari, F.: Explaining the ambiguity of object detection and 6d pose from visual data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6841–6850 (2019)

work page 2019
[44]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Meng, M., Feng, D., Bi, L., Kim, J.: Correlation-aware coarse-to-fine mlps for de- formable medical image registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9645–9654 (2024)

work page 2024
[45]

In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference

Mildenberger, D., Hager, P., Rueckert, D., Menten, M.J.: A tale of two classes: adapting supervised contrastive learning to binary imbalanced datasets. In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference. pp. 10305– 10314 (2025)

work page 2025
[46]

Advances in neural information processing systems 34, 14200–14213 (2021)

Nagrani, A., Yang, S., Arnab, A., Jansen, A., Schmid, C., Sun, C.: Attention bot- tlenecks for multimodal fusion. Advances in neural information processing systems 34, 14200–14213 (2021)

work page 2021
[47]

Tang et al

Nguyen, T.T.D., Rezatofighi, H., Vo, B.N., Vo, B.T., Savarese, S., Reid, I.: How trustworthy are performance evaluations for basic vision tasks? IEEE Transactions on Pattern Analysis and Machine Intelligence45(7), 8538–8552 (2022) 18 Z. Tang et al

work page 2022
[48]

ACM Computing Surveys56(4), 1–41 (2023)

Patrício,C.,Neves,J.C.,Teixeira,L.F.:Explainabledeeplearningmethodsinmed- ical image classification: A survey. ACM Computing Surveys56(4), 1–41 (2023)

work page 2023
[49]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Pellicer, A.L., Mariucci, A., Angelov, P., Bukhari, M., Kerns, J.G.: Protomedx: Towards explainable multi-modal prototype learning for bone health classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 7357–7366 (2025)

work page 2025
[50]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Rao, B., Liao, H., Guan, Y., Wang, C., Wang, B., Zhang, J., Li, Z.: Amd: Adap- tive momentum and decoupled contrastive learning framework for robust long-tail trajectory prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 28849–28858 (2025)

work page 2025
[51]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Sacha, M., Rymarczyk, D., Struski, Ł., Tabor, J., Zieliński, B.: Protoseg: Inter- pretable semantic segmentation with prototypical parts. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1481– 1492 (2023)

work page 2023
[52]

IEEE Signal Processing Letters31, 1109–1113 (2024)

Shao, R., Bi, X.J., Chen, Z.: Hybrid vit-cnn network for fine-grained image classi- fication. IEEE Signal Processing Letters31, 1109–1113 (2024)

work page 2024
[53]

In: Interna- tional conference on medical image computing and computer-assisted intervention

Sharma,S.,Kumar,A.,Chandra,J.:Confidencematters:Enhancingmedicalimage classification through uncertainty-driven contrastive self-distillation. In: Interna- tional conference on medical image computing and computer-assisted intervention. pp. 133–142. Springer (2024)

work page 2024
[54]

Ad- vances in neural information processing systems30(2017)

Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Ad- vances in neural information processing systems30(2017)

work page 2017
[55]

Journal of Electronic Imaging33(3), 033013–033013 (2024)

Song, W., Chen, D.: Posture-guided part learning for fine-grained image catego- rization. Journal of Electronic Imaging33(3), 033013–033013 (2024)

work page 2024
[56]

Multimedia Tools and Applications83(9), 27305–27329 (2024)

Spolaor, N., Lee, H.D., Mendes, A.I., Nogueira, C.V., Parmezan, A.R.S., Takaki, W.S.R., Coy, C.S.R., Wu, F.C., Fonseca-Pinto, R.: Fine-tuning pre-trained neural networks for medical image classification in small clinical datasets. Multimedia Tools and Applications83(9), 27305–27329 (2024)

work page 2024
[57]

Advances in neural information processing systems33, 6100– 6110 (2020)

Sutter,T.,Daunhawer,I.,Vogt,J.:Multimodalgenerativelearningutilizingjensen- shannon-divergence. Advances in neural information processing systems33, 6100– 6110 (2020)

work page 2020
[58]

In: International conference on machine learning

Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. pp. 6105–6114. PMLR (2019)

work page 2019
[59]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Tang,Z.,Sun,B.,He,S.,Hong,Y.,Yu,D.,Liu,Z.,Li,M.,Chen,B.,Zhao,S.:Mibf- net: Multi-modal information balanced fusion network for clinical diagnosis via patient narratives and lesion image. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 366–375. Springer (2025)

work page 2025
[60]

Advances in neural information processing systems30(2017)

Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems30(2017)

work page 2017
[61]

Journal of Oral Biosciences64(3), 312–320 (2022)

Tsuneki, M.: Deep learning models in medical image analysis. Journal of Oral Biosciences64(3), 312–320 (2022)

work page 2022
[62]

Advances in Neural Information Processing Systems35, 18034–18045 (2022)

Valmadre, J.: Hierarchical classification at multiple operating points. Advances in Neural Information Processing Systems35, 18034–18045 (2022)

work page 2022
[63]

Medical image analysis79, 102470 (2022)

Van der Velden, B.H., Kuijf, H.J., Gilhuijs, K.G., Viergever, M.A.: Explainable artificial intelligence (xai) in deep learning-based medical image analysis. Medical image analysis79, 102470 (2022)

work page 2022
[64]

In: International Conference on Medical Image Computing and Computer- Assisted Intervention

Wang, K., Zhan, B., Zu, C., Wu, X., Zhou, J., Zhou, L., Wang, Y.: Tripled- uncertainty guided mean teacher model for semi-supervised medical image segmen- T-DuMpRa 19 tation. In: International Conference on Medical Image Computing and Computer- Assisted Intervention. pp. 450–460. Springer (2021)

work page 2021
[65]

The Lancet Digital Health 4(1), e64–e74 (2022)

Wen, D., Khan, S.M., Xu, A.J., Ibrahim, H., Smith, L., Caballero, J., Zepeda, L., de Blas Perez, C., Denniston, A.K., Liu, X., et al.: Characteristics of publicly avail- able skin cancer image datasets: a systematic review. The Lancet Digital Health 4(1), e64–e74 (2022)

work page 2022
[66]

In: International Conference on Machine Learning

Wen, Z., Li, Y.: Toward understanding the feature learning process of self- supervised contrastive learning. In: International Conference on Machine Learning. pp. 11112–11122. PMLR (2021)

work page 2021
[67]

Neural Networks187, 107311 (2025)

Xu, Y., Wang, D., Zhang, L., Zhang, L.: Dual selective fusion transformer network for hyperspectral image classification. Neural Networks187, 107311 (2025)

work page 2025
[68]

Pattern Recognition p

Yang, M., Zhou, Z., Gong, W.: Revisiting the representation learning in long-tailed medical image classification. Pattern Recognition p. 112683 (2025)

work page 2025
[69]

IEEE transactions on pattern analysis and machine intelligence43(9), 3126–3137 (2020)

Zadeh, S.G., Schmid, M.: Bias in cross-entropy-based training of deep survival networks. IEEE transactions on pattern analysis and machine intelligence43(9), 3126–3137 (2020)

work page 2020
[70]

IEEE Transactions on Neural Networks and Learning Systems (2025)

Zhao, L., Chen, X., Chen, E.Z., Liu, Y., Chen, T., Sun, S.: Retrieval-augmented few-shot medical image segmentation with foundation models. IEEE Transactions on Neural Networks and Learning Systems (2025)

work page 2025
[71]

Advances in Neu- ral Information Processing Systems35, 7103–7114 (2022)

Zhou, Y., Lei, T., Liu, H., Du, N., Huang, Y., Zhao, V., Dai, A.M., Le, Q.V., Laudon, J., et al.: Mixture-of-experts with expert choice routing. Advances in Neu- ral Information Processing Systems35, 7103–7114 (2022)

work page 2022
[72]

Medical Image Analysis 97, 103281 (2024)

Zhu, Y., Wang, S., Yu, H., Li, W., Tian, J.: Sfpl: Sample-specific fine-grained proto- type learning for imbalanced medical image classification. Medical Image Analysis 97, 103281 (2024)

work page 2024
[73]

cor- rectness

Zhu, Z., Yu, K., Qi, G., Cong, B., Li, Y., Li, Z., Gao, X.: Lightweight medical image segmentation network with multi-scale feature-guided fusion. Computers in Biology and Medicine182, 109204 (2024) 20 Z. Tang et al. A Effectiveness Analysis of Confidence-Gated Prototype Retrieval This appendix provides a theoretical justification for the proposed confide...

work page 2024

[1] [1]

International Journal of Intelligent Systems2025(1), 3164952 (2025)

Alam, F., Ullah, A., Shah, D., Ali, S., Tahir, M.: Artificial intelligence in melanoma detection: a review of current technologies and future directions. International Journal of Intelligent Systems2025(1), 3164952 (2025)

work page 2025

[2] [2]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Aleem, S., Wang, F., Maniparambil, M., Arazo, E., Dietlmeier, J., Curran, K., Connor, N.E., Little, S.: Test-time adaptation with salip: A cascade of sam and clip for zero-shot medical image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5184–5193 (2024)

work page 2024

[3] [3]

Sage Open5(4), 2158244015611451 (2015)

Bresciani,S.,Eppler,M.J.:The pitfallsofvisualrepresentations: Areviewandclas- sification of common errors made while designing and interpreting visualizations. Sage Open5(4), 2158244015611451 (2015)

work page 2015

[4] [4]

Annals of translational medicine8(11), 713 (2020)

Cai, L., Gao, J., Zhao, D.: A review of the application of deep learning in medical image classification and segmentation. Annals of translational medicine8(11), 713 (2020)

work page 2020

[5] [5]

IEEE Journal of Biomedical and Health Informatics (2025)

Cao, L., Li, H., Dong, Y., Liu, T., Li, J.: Few-shot class-incremental learning with dynamic prototype refinement for brain activity classification. IEEE Journal of Biomedical and Health Informatics (2025)

work page 2025

[6] [6]

Computers in biology and medicine185, 109507 (2025)

Chen, C., Isa, N.A.M., Liu, X.: A review of convolutional neural network based methods for medical image classification. Computers in biology and medicine185, 109507 (2025)

work page 2025

[7] [7]

In: International conference on medical image computing and computer-assisted intervention

Chen, W., Wang, P., Ren, H., Sun, L., Li, Q., Yuan, Y., Li, X.: Medical image synthesisviafine-grainedimage-textalignmentandanatomy-pathologyprompting. In: International conference on medical image computing and computer-assisted intervention. pp. 240–250. Springer (2024)

work page 2024

[8] [8]

Advances in neural information processing systems 35, 23049–23062 (2022)

Chen, Z., Deng, Y., Wu, Y., Gu, Q., Li, Y.: Towards understanding the mixture-of- experts layer in deep learning. Advances in neural information processing systems 35, 23049–23062 (2022)

work page 2022

[9] [9]

Medical Image Analysis76, 102313 (2022)

Cheng, J., Tian, S., Yu, L., Gao, C., Kang, X., Ma, X., Wu, W., Liu, S., Lu, H.: Resganet: Residual group attention network for medical image classification and segmentation. Medical Image Analysis76, 102313 (2022)

work page 2022

[10] [10]

In: Proceedings of the IEEE/CVF international conference on computer vision

Cheng, P., Lin, L., Lyu, J., Huang, Y., Luo, W., Tang, X.: Prior: Prototype rep- resentation joint learning from medical images and reports. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 21361–21371 (2023)

work page 2023

[11] [11]

The Lancet Digital Health4(5), e330–e339 (2022)

Combalia, M., Codella, N., Rotemberg, V., Carrera, C., Dusza, S., Gutman, D., Helba, B., Kittler, H., Kurtansky, N.R., Liopyris, K., et al.: Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images: the 2019 international skin imaging collaboration grand challenge. The Lancet Digital Health4(5), e330–e339 (2022)

work page 2019

[12] [12]

In: International Conference on Machine Learning

Conti, J.R., Noiry, N., Clemencon, S., Despiegel, V., Gentric, S.: Mitigating gender bias in face recognition using the von mises-fisher mixture model. In: International Conference on Machine Learning. pp. 4344–4369. PMLR (2022)

work page 2022

[13] [13]

Cochrane Database of Systematic Reviews (12) (2018)

Dinnes, J., Deeks, J.J., Chuchu, N., di Ruffano, L.F., Matin, R.N., Thomson, D.R., Wong, K.Y., Aldridge, R.B., Abbott, R., Fawzy, M., et al.: Dermoscopy, with and without visual inspection, for diagnosing melanoma in adults. Cochrane Database of Systematic Reviews (12) (2018)

work page 2018

[14] [14]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) 16 Z. Tang et al

work page internal anchor Pith review Pith/arXiv arXiv 2010

[15] [15]

Advances in Neural Information Processing Systems34, 30284–30297 (2021)

Englesson, E., Azizpour, H.: Generalized jensen-shannon divergence loss for learn- ing with noisy labels. Advances in Neural Information Processing Systems34, 30284–30297 (2021)

work page 2021

[16] [16]

Ad- vances in neural information processing systems30(2017)

Geifman, Y., El-Yaniv, R.: Selective classification for deep neural networks. Ad- vances in neural information processing systems30(2017)

work page 2017

[17] [17]

Advances in Neural Information Processing Systems37, 111047–111073 (2024)

Goren, S., Galil, I., El-Yaniv, R.: Hierarchical selective classification. Advances in Neural Information Processing Systems37, 111047–111073 (2024)

work page 2024

[18] [18]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Han, Z., Yang, F., Huang, J., Zhang, C., Yao, J.: Multimodal dynamics: Dynamical fusion for trustworthy multimodal classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20707–20717 (2022)

work page 2022

[19] [19]

PET clinics 17(1), 1 (2022)

Hasani, N., Morris, M.A., Rhamim, A., Summers, R.M., Jones, E., Siegel, E., Saboury, B.: Trustworthy artificial intelligence in medical imaging. PET clinics 17(1), 1 (2022)

work page 2022

[20] [20]

von mises-fisher mixture model-based deep learning: Application to face verification,

Hasnat, M.A., Bohné, J., Milgram, J., Gentric, S., Chen, L.: von mises-fisher mix- ture model-based deep learning: Application to face verification. arXiv preprint arXiv:1706.04264 (2017)

work page arXiv 2017

[21] [21]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9729–9738 (2020)

work page 2020

[22] [22]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)

work page 2016

[23] [23]

In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence

Hu, P., Qin, Y., Gou, Y., Li, Y., Yang, M., Peng, X.: Probabilistic multimodal learning with von mises-fisher distributions. In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence. pp. 5390–5398 (2025)

work page 2025

[24] [24]

Hu, X., Zeng, D., Xu, X., Shi, Y.: Semi-supervised contrastive learning for label- efficientmedicalimagesegmentation.In:Internationalconferenceonmedicalimage computing and computer-assisted intervention. pp. 481–490. Springer (2021)

work page 2021

[25] [25]

IEEE Access (2025)

Hussain, T., Shouno, H., Hussain, A., Hussain, D., Ismail, M., Mir, T.H., Hsu, F.R., Alam, T., Akhy, S.A.: Effresnet-vit: A fusion-based convolutional and vision transformer model for explainable medical image classification. IEEE Access (2025)

work page 2025

[26] [26]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Huy, T.D., Tran, S.K., Nguyen, P., Tran, N.H., Sam, T.B., Van Den Hengel, A., Liao, Z., Verjans, J.W., To, M.S., Phan, V.M.H.: Interactive medical image analysis with concept-based similarity reasoning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 30797–30806 (2025)

work page 2025

[27] [27]

Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems30(2017)

work page 2017

[28] [28]

IEEE Access (2025)

Khan, A., Rauf, Z., Khan, A.R., Rathore, S., Khan, S.H., Shah, N., Farooq, U., Asif, H., Asif, A., Zahoora, U., et al.: A recent survey of vision transformers for medical image segmentation. IEEE Access (2025)

work page 2025

[29] [29]

Advances in neural information processing systems33, 18661–18673 (2020)

Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.: Supervised contrastive learning. Advances in neural information processing systems33, 18661–18673 (2020)

work page 2020

[30] [30]

BMC medical imaging22(1), 69 (2022)

Kim, H.E., Cosa-Linan, A., Santhanam, N., Jannesari, M., Maros, M.E., Gans- landt, T.: Transfer learning for medical image classification: a literature review. BMC medical imaging22(1), 69 (2022)

work page 2022

[31] [31]

The lancet oncology3(3), 159–165 (2002)

Kittler, H., Pehamberger, H., Wolff, K., Binder, M.: Diagnostic accuracy of der- moscopy. The lancet oncology3(3), 159–165 (2002)

work page 2002

[32] [32]

Multimedia Tools and Applications83(7), 19683– 19728 (2024) T-DuMpRa 17

Kumar, R., Kumbharkar, P., Vanam, S., Sharma, S.: Medical images classification using deep learning: a survey. Multimedia Tools and Applications83(7), 19683– 19728 (2024) T-DuMpRa 17

work page 2024

[33] [33]

In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition

Li, T., Cao, P., Yuan, Y., Fan, L., Yang, Y., Feris, R.S., Indyk, P., Katabi, D.: Targeted supervised contrastive learning for long-tailed recognition. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6918–6928 (2022)

work page 2022

[34] [34]

IEEE Transactions on Neural Networks and Learning Systems (2025)

Li, W., Peng, Y., Zhang, M., Ding, L., Hu, H., Shen, L.: Deep model fusion: A survey. IEEE Transactions on Neural Networks and Learning Systems (2025)

work page 2025

[35] [35]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

Li, X., Li, J., Du, Z., Zhu, L., Shen, H.T.: Unified modality separation: A vision- language framework for unsupervised domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025)

work page 2025

[36] [36]

IEEE Journal of Biomedical and Health Informatics29(5), 3587–3597 (2025)

Liang, X., Li, X., Li, F., Jiang, J., Dong, Q., Wang, W., Wang, K., Dong, S., Luo, G., Li, S.: Medfilip: Medical fine-grained language-image pre-training. IEEE Journal of Biomedical and Health Informatics29(5), 3587–3597 (2025)

work page 2025

[37] [37]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Liang,Y.,Chen,H.,Xiong,Y.,Zhou,Z.,Lyu,M.,Lin,Z.,Niu,S.,Zhao,S.,Han,J., Ding, G.: Advancing reliable test-time adaptation of vision-language models under visual variations. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 4788–4797 (2025)

work page 2025

[38] [38]

IEEE Transactions on Medical Imaging43(2), 674–685 (2023)

Ling, Y., Wang, Y., Dai, W., Yu, J., Liang, P., Kong, D.: Mtanet: Multi-task attention network for automatic medical image segmentation and classification. IEEE Transactions on Medical Imaging43(2), 674–685 (2023)

work page 2023

[39] [39]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Liu, F., Tian, Y., Chen, Y., Liu, Y., Belagiannis, V., Carneiro, G.: Acpl: Anti- curriculum pseudo-labelling for semi-supervised medical image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 20697–20706 (2022)

work page 2022

[40] [40]

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer:Hierarchicalvisiontransformerusingshiftedwindows.In:Proceedings of the IEEE/CVF international conference on computer vision. pp. 10012–10022 (2021)

work page 2021

[41] [41]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11976–11986 (2022)

work page 2022

[42] [42]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Long, A., Yin, W., Ajanthan, T., Nguyen, V., Purkait, P., Garg, R., Blair, A., Shen, C., Van den Hengel, A.: Retrieval augmented classification for long-tail visual recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6959–6969 (2022)

work page 2022

[43] [43]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Manhardt, F., Arroyo, D.M., Rupprecht, C., Busam, B., Birdal, T., Navab, N., Tombari, F.: Explaining the ambiguity of object detection and 6d pose from visual data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6841–6850 (2019)

work page 2019

[44] [44]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Meng, M., Feng, D., Bi, L., Kim, J.: Correlation-aware coarse-to-fine mlps for de- formable medical image registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9645–9654 (2024)

work page 2024

[45] [45]

In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference

Mildenberger, D., Hager, P., Rueckert, D., Menten, M.J.: A tale of two classes: adapting supervised contrastive learning to binary imbalanced datasets. In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference. pp. 10305– 10314 (2025)

work page 2025

[46] [46]

Advances in neural information processing systems 34, 14200–14213 (2021)

Nagrani, A., Yang, S., Arnab, A., Jansen, A., Schmid, C., Sun, C.: Attention bot- tlenecks for multimodal fusion. Advances in neural information processing systems 34, 14200–14213 (2021)

work page 2021

[47] [47]

Tang et al

Nguyen, T.T.D., Rezatofighi, H., Vo, B.N., Vo, B.T., Savarese, S., Reid, I.: How trustworthy are performance evaluations for basic vision tasks? IEEE Transactions on Pattern Analysis and Machine Intelligence45(7), 8538–8552 (2022) 18 Z. Tang et al

work page 2022

[48] [48]

ACM Computing Surveys56(4), 1–41 (2023)

Patrício,C.,Neves,J.C.,Teixeira,L.F.:Explainabledeeplearningmethodsinmed- ical image classification: A survey. ACM Computing Surveys56(4), 1–41 (2023)

work page 2023

[49] [49]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Pellicer, A.L., Mariucci, A., Angelov, P., Bukhari, M., Kerns, J.G.: Protomedx: Towards explainable multi-modal prototype learning for bone health classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 7357–7366 (2025)

work page 2025

[50] [50]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Rao, B., Liao, H., Guan, Y., Wang, C., Wang, B., Zhang, J., Li, Z.: Amd: Adap- tive momentum and decoupled contrastive learning framework for robust long-tail trajectory prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 28849–28858 (2025)

work page 2025

[51] [51]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

Sacha, M., Rymarczyk, D., Struski, Ł., Tabor, J., Zieliński, B.: Protoseg: Inter- pretable semantic segmentation with prototypical parts. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1481– 1492 (2023)

work page 2023

[52] [52]

IEEE Signal Processing Letters31, 1109–1113 (2024)

Shao, R., Bi, X.J., Chen, Z.: Hybrid vit-cnn network for fine-grained image classi- fication. IEEE Signal Processing Letters31, 1109–1113 (2024)

work page 2024

[53] [53]

In: Interna- tional conference on medical image computing and computer-assisted intervention

Sharma,S.,Kumar,A.,Chandra,J.:Confidencematters:Enhancingmedicalimage classification through uncertainty-driven contrastive self-distillation. In: Interna- tional conference on medical image computing and computer-assisted intervention. pp. 133–142. Springer (2024)

work page 2024

[54] [54]

Ad- vances in neural information processing systems30(2017)

Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Ad- vances in neural information processing systems30(2017)

work page 2017

[55] [55]

Journal of Electronic Imaging33(3), 033013–033013 (2024)

Song, W., Chen, D.: Posture-guided part learning for fine-grained image catego- rization. Journal of Electronic Imaging33(3), 033013–033013 (2024)

work page 2024

[56] [56]

Multimedia Tools and Applications83(9), 27305–27329 (2024)

Spolaor, N., Lee, H.D., Mendes, A.I., Nogueira, C.V., Parmezan, A.R.S., Takaki, W.S.R., Coy, C.S.R., Wu, F.C., Fonseca-Pinto, R.: Fine-tuning pre-trained neural networks for medical image classification in small clinical datasets. Multimedia Tools and Applications83(9), 27305–27329 (2024)

work page 2024

[57] [57]

Advances in neural information processing systems33, 6100– 6110 (2020)

Sutter,T.,Daunhawer,I.,Vogt,J.:Multimodalgenerativelearningutilizingjensen- shannon-divergence. Advances in neural information processing systems33, 6100– 6110 (2020)

work page 2020

[58] [58]

In: International conference on machine learning

Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. pp. 6105–6114. PMLR (2019)

work page 2019

[59] [59]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Tang,Z.,Sun,B.,He,S.,Hong,Y.,Yu,D.,Liu,Z.,Li,M.,Chen,B.,Zhao,S.:Mibf- net: Multi-modal information balanced fusion network for clinical diagnosis via patient narratives and lesion image. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 366–375. Springer (2025)

work page 2025

[60] [60]

Advances in neural information processing systems30(2017)

Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems30(2017)

work page 2017

[61] [61]

Journal of Oral Biosciences64(3), 312–320 (2022)

Tsuneki, M.: Deep learning models in medical image analysis. Journal of Oral Biosciences64(3), 312–320 (2022)

work page 2022

[62] [62]

Advances in Neural Information Processing Systems35, 18034–18045 (2022)

Valmadre, J.: Hierarchical classification at multiple operating points. Advances in Neural Information Processing Systems35, 18034–18045 (2022)

work page 2022

[63] [63]

Medical image analysis79, 102470 (2022)

Van der Velden, B.H., Kuijf, H.J., Gilhuijs, K.G., Viergever, M.A.: Explainable artificial intelligence (xai) in deep learning-based medical image analysis. Medical image analysis79, 102470 (2022)

work page 2022

[64] [64]

In: International Conference on Medical Image Computing and Computer- Assisted Intervention

Wang, K., Zhan, B., Zu, C., Wu, X., Zhou, J., Zhou, L., Wang, Y.: Tripled- uncertainty guided mean teacher model for semi-supervised medical image segmen- T-DuMpRa 19 tation. In: International Conference on Medical Image Computing and Computer- Assisted Intervention. pp. 450–460. Springer (2021)

work page 2021

[65] [65]

The Lancet Digital Health 4(1), e64–e74 (2022)

Wen, D., Khan, S.M., Xu, A.J., Ibrahim, H., Smith, L., Caballero, J., Zepeda, L., de Blas Perez, C., Denniston, A.K., Liu, X., et al.: Characteristics of publicly avail- able skin cancer image datasets: a systematic review. The Lancet Digital Health 4(1), e64–e74 (2022)

work page 2022

[66] [66]

In: International Conference on Machine Learning

Wen, Z., Li, Y.: Toward understanding the feature learning process of self- supervised contrastive learning. In: International Conference on Machine Learning. pp. 11112–11122. PMLR (2021)

work page 2021

[67] [67]

Neural Networks187, 107311 (2025)

Xu, Y., Wang, D., Zhang, L., Zhang, L.: Dual selective fusion transformer network for hyperspectral image classification. Neural Networks187, 107311 (2025)

work page 2025

[68] [68]

Pattern Recognition p

Yang, M., Zhou, Z., Gong, W.: Revisiting the representation learning in long-tailed medical image classification. Pattern Recognition p. 112683 (2025)

work page 2025

[69] [69]

IEEE transactions on pattern analysis and machine intelligence43(9), 3126–3137 (2020)

Zadeh, S.G., Schmid, M.: Bias in cross-entropy-based training of deep survival networks. IEEE transactions on pattern analysis and machine intelligence43(9), 3126–3137 (2020)

work page 2020

[70] [70]

IEEE Transactions on Neural Networks and Learning Systems (2025)

Zhao, L., Chen, X., Chen, E.Z., Liu, Y., Chen, T., Sun, S.: Retrieval-augmented few-shot medical image segmentation with foundation models. IEEE Transactions on Neural Networks and Learning Systems (2025)

work page 2025

[71] [71]

Advances in Neu- ral Information Processing Systems35, 7103–7114 (2022)

Zhou, Y., Lei, T., Liu, H., Du, N., Huang, Y., Zhao, V., Dai, A.M., Le, Q.V., Laudon, J., et al.: Mixture-of-experts with expert choice routing. Advances in Neu- ral Information Processing Systems35, 7103–7114 (2022)

work page 2022

[72] [72]

Medical Image Analysis 97, 103281 (2024)

Zhu, Y., Wang, S., Yu, H., Li, W., Tian, J.: Sfpl: Sample-specific fine-grained proto- type learning for imbalanced medical image classification. Medical Image Analysis 97, 103281 (2024)

work page 2024

[73] [73]

cor- rectness

Zhu, Z., Yu, K., Qi, G., Cong, B., Li, Y., Li, Z., Gao, X.: Lightweight medical image segmentation network with multi-scale feature-guided fusion. Computers in Biology and Medicine182, 109204 (2024) 20 Z. Tang et al. A Effectiveness Analysis of Confidence-Gated Prototype Retrieval This appendix provides a theoretical justification for the proposed confide...

work page 2024