IViT: A Novel Interpretable Visual Transformer for Skin Disease Detection

Di Lin; Haibiao Li; WeiWei Wu; Xue Jiang; Yanxi Li; Yugang Chi

arxiv: 2606.22892 · v1 · pith:XNUWIVSVnew · submitted 2026-06-22 · 📡 eess.IV · cs.CV

IViT: A Novel Interpretable Visual Transformer for Skin Disease Detection

Haibiao Li , Di Lin , Xue Jiang , Weiwei Wu , Yanxi Li , Yugang Chi This is my paper

Pith reviewed 2026-06-26 06:40 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords skin disease detectioninterpretable vision transformerquadratic programmingfeature selectionfew-shot medical imagingactivation map alignmentmulti-objective loss

0 comments

The pith

A quadratic programming constraint on vision transformers selects skin-disease features aligned with clinical logic while keeping accuracy within 0.21 percent of the baseline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents IViT as a vision transformer that incorporates quadratic programming to address black-box opacity and few-shot limitations in skin disease detection. It adapts pre-trained models for limited medical data and applies a discrete QP feature selection step plus a multi-objective loss to cut redundancy and align activations with lesion areas. Results across six datasets report 93.80 percent accuracy, 29.5 percent lower feature redundancy, and core activation regions that match areas clinicians examine. The approach aims to deliver both competitive performance and explanations that track diagnostic reasoning.

Core claim

IViT builds a discrete QP feature selection framework that screens generic and discriminative features consistent with clinical diagnostic logic. A multi-objective loss then reduces feature redundancy and optimizes activation distribution without degrading classification performance, yielding 93.80 percent accuracy on six standard datasets with core activations matching clinically relevant lesion regions.

What carries the argument

The discrete quadratic programming feature selection framework that screens features for consistency with clinical logic while preserving classification performance.

If this is right

The model supplies explanations that directly reference lesion locations used in clinical practice.
Reduced feature redundancy lowers storage and compute demands during deployment on limited hardware.
Transfer learning plus the QP step enables competitive results when only small numbers of labeled medical images are available.
The same constraint pattern could be applied to other transformer architectures that require both accuracy and built-in interpretability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the QP selection proves stable across imaging modalities, it could replace post-hoc explanation tools in other diagnostic pipelines.
The alignment between activations and clinical regions offers a direct test for whether learned features capture medically meaningful patterns rather than dataset artifacts.
Extending the framework to multi-label or longitudinal skin data would test whether the clinical-logic constraint generalizes beyond single-image classification.

Load-bearing premise

The quadratic programming step is assumed to identify features that remain consistent with clinical diagnostic logic without causing a meaningful drop in classification accuracy.

What would settle it

Running the model on the same six datasets and finding that core activation regions no longer overlap with clinically identified lesion areas, or that accuracy falls more than 1 percent below the baseline, would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.22892 by Di Lin, Haibiao Li, WeiWei Wu, Xue Jiang, Yanxi Li, Yugang Chi.

**Figure 1.** Figure 1: AI Inspection Model Patient Database AI-Assisted Medical Diagnosis System Offline Medical Consultation Medical Training Remote Medical Consultation Medical Image Analysis Drug Discovery [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 3.** Figure 3: Transfer Learning ViT Training on the Few [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 2.** Figure 2: Flowchart of the IViT Classification Algorithm Framework [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 5.** Figure 5: Training accuracy comparison: Pre-trained weights vs. Random initialization [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of ViT Model Performance which is determined by the appearance, color and morphology of the skin lesions. Acne is dominated by follicular papules, pustules and cysts, with red or dark red in color. The lesions are scattered, isolated and tend to occur around the follicular ostia. The core features focus on papules, pustules, oily skin and cysts. Psoriasis manifests as well-defined red plaques co… view at source ↗

read the original abstract

The clinical diagnosis of skin diseases is susceptible to interference from inter-class similarity of skin lesions, and over-reliance on clinicians'experience easily leads to subjective bias. Although existing deep learning aided diagnosis methods achieve competitive accuracy, they suffer from the black-box opacity of Vision Transformer (ViT) and poor adaptability to medical few-shot scenarios. Moreover, mainstream explainable algorithms generally face the bottleneck of significant accuracy degradation when improving interpretability. This paper proposes an interpretable ViT (IViT) constrained by Quadratic Programming (QP). The introduced pre-trained transfer learning adapts to few-shot feature extraction. A discrete QP feature selection framework is constructed to screen generic and discriminative features consistent with clinical diagnostic logic. A multi-objective loss function is designed to reduce feature redundancy and optimize activation distribution while preserving classification performance. Experimental results on six standard skin disease datasets show that IViT achieves an accuracy of 93.80%, only 0.21% lower than the baseline, with feature redundancy reduced by 29.5%. Its core activation regions are consistent with clinically concerned lesion areas. The proposed model balances accuracy and interpretability, providing a reliable solution for the clinical deployment of few-shot intelligent skin disease diagnosis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

IViT adds QP-constrained feature selection to a transferred ViT for skin lesions, keeping accuracy close to baseline while cutting redundancy, but the clinical alignment claim rests on qualitative maps without quantitative checks.

read the letter

The main things to know about this paper are that it combines quadratic programming feature selection with a vision transformer using transfer learning for skin disease detection, and it reports accuracy within 0.21 percent of the baseline along with 29.5 percent less feature redundancy on six datasets. The activation regions are described as consistent with clinical lesion areas.

What is actually new is the discrete QP framework applied in this medical context to screen features that are both generic and discriminative. The multi-objective loss is designed to cut redundancy and improve activation distribution while keeping classification performance. The transfer learning part helps with few-shot scenarios, which is relevant for medical data.

The paper does well in providing a complete pipeline from pre-training to the constrained model and testing it across multiple standard datasets. That gives a sense of how the method behaves in practice.

The soft spots center on the interpretability side. The claim that the QP selections are consistent with clinical diagnostic logic lacks direct quantitative support such as overlap metrics with expert annotations. The abstract relies on the accuracy preservation and qualitative maps, but without those checks it is not clear if the features truly align with how clinicians reason or if they are just optimized for the loss. The circularity burden noted in the review is real here because the framework is internal.

This paper is for people working on interpretable deep learning for dermatology or similar medical imaging tasks. A reader looking for extensions of ViTs in few-shot medical settings might get ideas from the QP approach and loss design.

It deserves a serious referee because the performance claims are specific and the architecture is described in enough detail to evaluate.

I would recommend sending it for peer review so the methods can be checked for the missing validation steps on clinical consistency.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes an interpretable Vision Transformer (IViT) for skin disease detection using Quadratic Programming (QP) constraints. It incorporates pre-trained transfer learning for few-shot adaptation, a discrete QP feature selection framework to identify generic and discriminative features aligned with clinical logic, and a multi-objective loss to minimize feature redundancy while maintaining classification performance. On six standard datasets, IViT achieves 93.80% accuracy (0.21% below baseline) with 29.5% redundancy reduction, and core activation regions consistent with lesion areas.

Significance. If the QP-selected features can be shown to align with clinical reasoning without performance loss, the work would meaningfully advance interpretable models for few-shot medical imaging by addressing the accuracy-interpretability trade-off in ViTs. The reported metrics indicate only marginal accuracy degradation alongside substantial redundancy reduction, which is a positive indicator for practical deployment if the clinical consistency claim holds under quantitative scrutiny.

major comments (3)

[Abstract] Abstract: The central claim that the discrete QP feature selection framework screens features 'consistent with clinical diagnostic logic' is supported only by qualitative activation-map consistency and the 0.21% accuracy gap; no quantitative overlap metrics (Dice, IoU) or correlation with dermatologist-annotated lesion attributes are reported on the six datasets, which is load-bearing for the interpretability guarantee in few-shot medical deployment.
[Experimental results] Experimental results: The reported accuracy of 93.80% and 29.5% redundancy reduction supply no derivation details, error bars, dataset splits, cross-validation procedure, or statistical significance tests, limiting evaluation of whether the QP selections are robust or merely optimized to the internal multi-objective loss.
[Method] Method (QP framework): The multi-objective loss and discrete QP feature selection are defined internally to optimize the reported metrics; without the full equations it remains unclear whether the claimed reductions are independent of the fitting choices or circular by construction.

minor comments (2)

[Abstract] Abstract: Typo in 'clinicians'experience' (missing space after apostrophe).
Notation: The term 'QP' is introduced without an initial expansion or reference to the quadratic programming formulation used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be incorporated.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the discrete QP feature selection framework screens features 'consistent with clinical diagnostic logic' is supported only by qualitative activation-map consistency and the 0.21% accuracy gap; no quantitative overlap metrics (Dice, IoU) or correlation with dermatologist-annotated lesion attributes are reported on the six datasets, which is load-bearing for the interpretability guarantee in few-shot medical deployment.

Authors: The manuscript supports the interpretability claim through qualitative activation-map consistency with lesion areas, as explicitly stated in the abstract and experimental results. No quantitative metrics such as Dice or IoU were computed because the standard datasets lack the required pixel-level dermatologist annotations. This qualitative demonstration aligns with common practices in medical imaging interpretability studies. We stand by the presented evidence and will not add unsubstantiated quantitative claims. revision: no
Referee: [Experimental results] Experimental results: The reported accuracy of 93.80% and 29.5% redundancy reduction supply no derivation details, error bars, dataset splits, cross-validation procedure, or statistical significance tests, limiting evaluation of whether the QP selections are robust or merely optimized to the internal multi-objective loss.

Authors: We agree that additional experimental details are needed for full evaluation. The revised manuscript will include dataset splits, cross-validation procedure, error bars from repeated runs, and statistical significance tests to substantiate the reported accuracy and redundancy reduction. revision: yes
Referee: [Method] Method (QP framework): The multi-objective loss and discrete QP feature selection are defined internally to optimize the reported metrics; without the full equations it remains unclear whether the claimed reductions are independent of the fitting choices or circular by construction.

Authors: The Method section provides the full equations for the discrete QP feature selection and multi-objective loss. The QP step is formulated as an independent optimization for feature discriminativeness and genericity prior to loss-based training, avoiding circularity. We will add explicit clarification and restate the key equations in the revision to address this concern. revision: partial

Circularity Check

2 steps flagged

QP framework and multi-objective loss make clinical consistency and redundancy reduction claims tautological by design

specific steps

self definitional [Abstract]
"A discrete QP feature selection framework is constructed to screen generic and discriminative features consistent with clinical diagnostic logic."

The framework is built with the explicit goal of producing features consistent with clinical logic; the later assertion that the selected features exhibit this consistency therefore restates the design objective rather than deriving it from data or external criteria.
fitted input called prediction [Abstract]
"A multi-objective loss function is designed to reduce feature redundancy and optimize activation distribution while preserving classification performance. Experimental results ... with feature redundancy reduced by 29.5%."

The loss is constructed to minimize redundancy; the reported 29.5% reduction is the direct numerical outcome of optimizing that loss on the datasets, rendering the reduction a fitted result rather than an a-priori prediction.

full rationale

The paper defines the discrete QP feature selection and multi-objective loss explicitly to enforce the properties later reported as results (consistency with clinical logic, 29.5% redundancy reduction). These outcomes therefore follow from the construction and fitting choices rather than constituting independent derivations or predictions. Accuracy preservation is shown empirically and is non-circular, but the interpretability guarantees rest on the internal definitions without external quantitative validation (e.g., Dice overlap with annotations). No load-bearing self-citations appear. This yields moderate circularity confined to the interpretability claims.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; full text would be required to enumerate them.

pith-pipeline@v0.9.1-grok · 5755 in / 1059 out tokens · 24665 ms · 2026-06-26T06:40:11.920914+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 38 canonical work pages · 7 internal anchors

[1]

Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)

N. Codella et al., “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC),” Mar. 29, 2019, arXiv: arXiv:1902.03368. doi: 10.48550/arXiv.1902.03368

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1902.03368 2018
[2]

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of com- mon pigmented skin lesions.Scientific Data

P. Tschandl, C. Rosendahl, and H. Kittler, “The HAM10000 dataset, a large collection of multi -source dermatoscopic images of common pigmented skin lesions,” Sci. Data, vol. 5, no. 1, p. 180161, Aug. 2018, doi: 10.1038/sdata.2018.161

work page doi:10.1038/sdata.2018.161 2018
[3]

Machine learning -based prediction models for atopic dermatitis diagnosis and evaluation,

S. Wu et al. , “Machine learning -based prediction models for atopic dermatitis diagnosis and evaluation,” Fundam. Res., vol. 5, no. 3, pp. 1313–1322, May 2025, doi: 10.1016/j.fmre.2023.02.021

work page doi:10.1016/j.fmre.2023.02.021 2025
[4]

Advances in the study and application of digital technology in the clinical practice of atopic dermatitis,

Y . Chen et al. , “Advances in the study and application of digital technology in the clinical practice of atopic dermatitis,” Digit. Health, vol. 11, p. 20552076251377957, May 2025, doi: 10.1177/20552076251377957

work page doi:10.1177/20552076251377957 2025
[5]

Deep Ensemble Learning for Multiclass Skin Lesion Classification,

T.-M. Chiu, I. -C. Chi, Y .-C. Li, and M. -H. Tseng, “Deep Ensemble Learning for Multiclass Skin Lesion Classification,” Bioengineering, vol. 12, no. 9, p. 934, Aug. 2025, doi: 10.3390/bioengineering12090934

work page doi:10.3390/bioengineering12090934 2025
[6]

Federated deep reinforcement learning based trajectory design for UAV -assisted networks with mobile ground devices,

Z. Jiang et al., “Accurate diagnosis of atopic dermatitis by combining transcriptome and microbiota data with supervised machine learning,” Sci. Rep., vol. 12, no. 1, p. 290, Jan. 2022, doi: 10.1038/s41598 -021- 04373-7

work page doi:10.1038/s41598 2022
[7]

Advancements in artificial intelligence for atopic dermatitis: diagnosis, treatment, and patient management,

F. Cao, Y . Yang, C. Guo, H. Zhang, Q. Yu, and J. Guo, “Advancements in artificial intelligence for atopic dermatitis: diagnosis, treatment, and patient management,” Ann. Med., vol. 57, no. 1, p. 2484665, Dec. 2025, doi: 10.1080/07853890.2025.2484665

work page doi:10.1080/07853890.2025.2484665 2025
[8]

Classification of skin diseases with deep learning based approaches,

M. O. Sarı and K. Keser, “Classification of skin diseases with deep learning based approaches,” Sci. Rep., vol. 15, no. 1, p. 27506, Jul. 2025, doi: 10.1038/s41598-025-13275-x

work page doi:10.1038/s41598-025-13275-x 2025
[9]

Evaluation of atopic dermatitis severity using artificial intelligence,

A. Maulana et al. , “Evaluation of atopic dermatitis severity using artificial intelligence,” Narra J, vol. 3, no. 3, p. e511, Dec. 2023, doi: 10.52225/narra.v3i3.511

work page doi:10.52225/narra.v3i3.511 2023
[10]

Comparative performance of deep learning models and non-dermatologists in diagnosing psoriasis, dermatophytosis, and eczema,

N. Yodrabum et al., “Comparative performance of deep learning models and non-dermatologists in diagnosing psoriasis, dermatophytosis, and eczema,” Sci. Rep. , vol. 16, no. 1, p. 245, Dec. 2025, doi: 10.1038/s41598-025-29562-6

work page doi:10.1038/s41598-025-29562-6 2025
[11]

Deep Residual Learning for Image Recognition

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Dec. 10, 2015, arXiv: arXiv:1512.03385. doi: 10.48550/arXiv.1512.03385

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1512.03385 2015
[12]

Enhancing Interpretability in Medical Image Classification by Integrating Formal Concept Analysis with Convolutional Neural Networks,

M. Khatri, Y . Yin, and J. Deogun, “Enhancing Interpretability in Medical Image Classification by Integrating Formal Concept Analysis with Convolutional Neural Networks,” Biomimetics, vol. 9, no. 7, p. 421, Jul. 2024, doi: 10.3390/biomimetics9070421

work page doi:10.3390/biomimetics9070421 2024
[13]

Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI),

A. Adadi and M. Berrada, “Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI),” IEEE Access , vol. 6, pp. 52138–52160, 2018, doi: 10.1109/ACCESS.2018.2870052

work page doi:10.1109/access.2018.2870052 2018
[14]

Explainable artificial intelligence in skin cancer recognition: A systematic review,

K. Hauser et al. , “Explainable artificial intelligence in skin cancer recognition: A systematic review,” Eur. J. Cancer, vol. 167, pp. 54–69, May 2022, doi: 10.1016/j.ejca.2022.02.025

work page doi:10.1016/j.ejca.2022.02.025 2022
[15]

Visualizing and Understanding Convolutional Networks

M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” Nov. 28, 2013, arXiv: arXiv:1311.2901. doi: 10.48550/arXiv.1311.2901

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1311.2901 2013
[16]

An explainable hybrid deep learning framework for precise skin lesion segmentation and multi -class classification,

M. Fiaz et al. , “An explainable hybrid deep learning framework for precise skin lesion segmentation and multi -class classification,” Front. Med., vol. 12, p. 1681542, Oct. 2025, doi: 10.3389/fmed.2025.1681542

work page doi:10.3389/fmed.2025.1681542 2025
[17]

A Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification,

N. Nigar, M. Umar, M. K. Shahzad, S. Islam, and D. Abalo, “A Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification,” IEEE Access, vol. 10, pp. 113715–113725, 2022, doi: 10.1109/ACCESS.2022.3217217

work page doi:10.1109/access.2022.3217217 2022
[18]

Attention Is All You Need

A. Vaswani et al., “Attention Is All You Need,” Aug. 02, 2023, arXiv: arXiv:1706.03762. doi: 10.48550/arXiv.1706.03762

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762 2023
[20]

Visual Bias and Interpretability in Deep Learning for Dermatological Image Analysis,

E. A. Taufik, A. Khondoker, A. F. Parsa, and S. A. M. Mostafa, “Visual Bias and Interpretability in Deep Learning for Dermatological Image Analysis,” Aug. 06, 2025, arXiv: arXiv:2508.04573. doi: 10.48550/arXiv.2508.04573

work page doi:10.48550/arxiv.2508.04573 2025
[21]

MT-TransUNet: Mediating Multi -Task Tokens in Transformers for Skin Lesion Segmentation and Classification,

J. Chen, J. Chen, Z. Zhou, B. Li, A. Y uille, and Y . Lu, “MT-TransUNet: Mediating Multi -Task Tokens in Transformers for Skin Lesion Segmentation and Classification,” Dec. 03, 2021, arXiv: arXiv:2112.01767. doi: 10.48550/arXiv.2112.01767

work page doi:10.48550/arxiv.2112.01767 2021
[22]

DermViT: Diagnosis-Guided Vision Transformer for Robust and Efficient Skin Lesion Classification.Bioengineering

X. Zhang et al., “DermViT: Diagnosis-Guided Vision Transformer for Robust and Efficient Skin Lesion Classification,” Bioengineering, vol. 12, no. 4, p. 421, Apr. 2025, doi: 10.3390/bioengineering12040421

work page doi:10.3390/bioengineering12040421 2025
[23]

B. Li, H. Chen, and H. Duan, “Artificial intelligence-driven prognostic system for conception prediction and management in intrauterine adhesions following hysteroscopic adhesiolysis: a diagnostic study using hysteroscopic images,” Front. Bioeng. Biotechnol. , vol. 12, p. 1327207, Apr. 2024, doi: 10.3389/fbioe.2024.1327207

work page doi:10.3389/fbioe.2024.1327207 2024
[24]

A Deep CNN Transformer Hybrid Model for Skin Lesion Classification of Dermoscopic Images Using Focal Loss,

Y . Nie, P. Sommella, M. Carratù, M. O’Nils, and J. Lundgren, “A Deep CNN Transformer Hybrid Model for Skin Lesion Classification of Dermoscopic Images Using Focal Loss,” Diagnostics, vol. 13, no. 1, p. 72, Dec. 2022, doi: 10.3390/diagnostics13010072

work page doi:10.3390/diagnostics13010072 2022
[25]

FA T -Net: Feature adaptive transformers for automated skin lesion segmentation,

H. Wu, S. Chen, G. Chen, W. Wang, B. Lei, and Z. Wen, “FA T -Net: Feature adaptive transformers for automated skin lesion segmentation,” Med. Image Anal. , vol. 76, p. 102327, 2022, doi: https://doi.org/10.1016/j.media.2021.102327

work page doi:10.1016/j.media.2021.102327 2022
[26]

CFI-Net: A Choquet Fuzzy Integral Based Ensemble Network With PSO -Optimized Fuzzy Measures for Diagnosing Multiple Skin Diseases Including Mpox,

S. Asif, M. Zhao, Y . Li, F. Tang, and Y . Zhu, “CFI-Net: A Choquet Fuzzy Integral Based Ensemble Network With PSO -Optimized Fuzzy Measures for Diagnosing Multiple Skin Diseases Including Mpox,” IEEE J. Biomed. Health Inform. , vol. 28, no. 9, pp. 5573 –5586, Sep. 2024, doi: 10.1109/JBHI.2024.3411658

work page doi:10.1109/jbhi.2024.3411658 2024
[27]

HiTrace: Hierarchical Class Tracing Approach for Open-Set Recognition on Skin Lesions,

B. W.-Y . Hsu and V . S. Tseng, “HiTrace: Hierarchical Class Tracing Approach for Open-Set Recognition on Skin Lesions,” IEEE J. Biomed. Health Inform. , vol. 29, no. 8, pp. 5700 –5711, Aug. 2025, doi: 10.1109/JBHI.2025.3560555

work page doi:10.1109/jbhi.2025.3560555 2025
[28]

Federated Machine Learning for Detection of Skin Diseases and Enhancement of Internet of Medical Things (IoMT) Security,

Md. N. Hossen, V . Panneerselvam, D. Koundal, K. Ahmed, F. M. Bui, and S. M. Ibrahim, “Federated Machine Learning for Detection of Skin Diseases and Enhancement of Internet of Medical Things (IoMT) Security,” IEEE J. Biomed. Health Inform., vol. 27, no. 2, pp. 835–841, Feb. 2023, doi: 10.1109/JBHI.2022.3149288

work page doi:10.1109/jbhi.2022.3149288 2023
[29]

Deep Neural Forest for Out -of- Distribution Detection of Skin Lesion Images,

X. Li, C. Desrosiers, and X. Liu, “Deep Neural Forest for Out -of- Distribution Detection of Skin Lesion Images,” IEEE J. Biomed. Health Inform., vol. 27, no. 1, pp. 157 –165, Jan. 2023, doi: 10.1109/JBHI.2022.3171582

work page doi:10.1109/jbhi.2022.3171582 2023
[30]

Novoa, Justin Ko, Susan M

A. Esteva et al., “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, pp. 115 –118, Feb. 2017, doi: 10.1038/nature21056

work page doi:10.1038/nature21056 2017
[31]

Transformer Interpretability Beyond Attention Visualization,

H. Chefer, S. Gur, and L. Wolf, “Transformer Interpretability Beyond Attention Visualization,” Apr. 05, 2021, arXiv: arXiv:2012.09838. doi: 10.48550/arXiv.2012.09838

work page doi:10.48550/arxiv.2012.09838 2021
[32]

ADIC: An Adaptive Disentangled CNN Classifier for Interpretable Image Recognition,

Zhao Xiaoyang, “ADIC: An Adaptive Disentangled CNN Classifier for Interpretable Image Recognition,” J. Comput. Res. Dev., vol. 60, no. 8, p. 1754, 2023, doi: 10.7544/issn1000-1239.202330231

work page doi:10.7544/issn1000-1239.202330231 2023
[33]

A novel framework of multiclass skin lesion recognition from dermoscopic images using deep learning and explainable AI,

N. Ahmad et al. , “A novel framework of multiclass skin lesion recognition from dermoscopic images using deep learning and explainable AI,” Front. Oncol., vol. 13, p. 1151257, Jun. 2023, doi: 10.3389/fonc.2023.1151257

work page doi:10.3389/fonc.2023.1151257 2023
[34]

Grad -CAM: Visual Explanations From Deep Networks via Gradient-Based Localization

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad -CAM: Visual Explanations From Deep Networks via Gradient-Based Localization”
[35]

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier,” Aug. 09, 2016, arXiv: arXiv:1602.04938. doi: 10.48550/arXiv.1602.04938

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1602.04938 2016
[36]

A Comprehensive Taxonomy for Explainable Artificial Intelligence: A Systematic Survey of Surveys on Methods and Concepts,

G. Schwalbe and B. Finzel, “A Comprehensive Taxonomy for Explainable Artificial Intelligence: A Systematic Survey of Surveys on Methods and Concepts,” Data Min. Knowl. Discov., vol. 38, no. 5, pp. 3043–3101, Sep. 2024, doi: 10.1007/s10618-022-00867-8

work page doi:10.1007/s10618-022-00867-8 2024
[37]

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Z. Liu et al., “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” Aug. 17, 2021, arXiv: arXiv:2103.14030. doi: 10.48550/arXiv.2103.14030

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2103.14030 2021
[38]

Rethinking Semantic Segmentation from a Sequence- to-Sequence Perspective with Transformers,

S. Zheng et al., “Rethinking Semantic Segmentation from a Sequence- to-Sequence Perspective with Transformers,” Jul. 25, 2021, arXiv: arXiv:2012.15840. doi: 10.48550/arXiv.2012.15840

work page doi:10.48550/arxiv.2012.15840 2021
[39]

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classiﬁcation

Z. Shao et al. , “TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classiﬁcation”
[40]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 03, 2021, arXiv: arXiv:2010.11929. doi: 10.48550/arXiv.2010.11929

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.11929 2021
[41]

Convnext v2: Co-designing and scaling convnets with masked autoen- coders,

P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, and A. Ranjan, “MobileOne: An Improved One millisecond Mobile Backbone,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Vancouver, BC, Canada: IEEE, Jun. 2023, pp. 7907 –7917. doi: 10.1109/CVPR52729.2023.00764

work page doi:10.1109/cvpr52729.2023.00764 2023

[1] [1]

Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)

N. Codella et al., “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC),” Mar. 29, 2019, arXiv: arXiv:1902.03368. doi: 10.48550/arXiv.1902.03368

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1902.03368 2018

[2] [2]

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of com- mon pigmented skin lesions.Scientific Data

P. Tschandl, C. Rosendahl, and H. Kittler, “The HAM10000 dataset, a large collection of multi -source dermatoscopic images of common pigmented skin lesions,” Sci. Data, vol. 5, no. 1, p. 180161, Aug. 2018, doi: 10.1038/sdata.2018.161

work page doi:10.1038/sdata.2018.161 2018

[3] [3]

Machine learning -based prediction models for atopic dermatitis diagnosis and evaluation,

S. Wu et al. , “Machine learning -based prediction models for atopic dermatitis diagnosis and evaluation,” Fundam. Res., vol. 5, no. 3, pp. 1313–1322, May 2025, doi: 10.1016/j.fmre.2023.02.021

work page doi:10.1016/j.fmre.2023.02.021 2025

[4] [4]

Advances in the study and application of digital technology in the clinical practice of atopic dermatitis,

Y . Chen et al. , “Advances in the study and application of digital technology in the clinical practice of atopic dermatitis,” Digit. Health, vol. 11, p. 20552076251377957, May 2025, doi: 10.1177/20552076251377957

work page doi:10.1177/20552076251377957 2025

[5] [5]

Deep Ensemble Learning for Multiclass Skin Lesion Classification,

T.-M. Chiu, I. -C. Chi, Y .-C. Li, and M. -H. Tseng, “Deep Ensemble Learning for Multiclass Skin Lesion Classification,” Bioengineering, vol. 12, no. 9, p. 934, Aug. 2025, doi: 10.3390/bioengineering12090934

work page doi:10.3390/bioengineering12090934 2025

[6] [6]

Federated deep reinforcement learning based trajectory design for UAV -assisted networks with mobile ground devices,

Z. Jiang et al., “Accurate diagnosis of atopic dermatitis by combining transcriptome and microbiota data with supervised machine learning,” Sci. Rep., vol. 12, no. 1, p. 290, Jan. 2022, doi: 10.1038/s41598 -021- 04373-7

work page doi:10.1038/s41598 2022

[7] [7]

Advancements in artificial intelligence for atopic dermatitis: diagnosis, treatment, and patient management,

F. Cao, Y . Yang, C. Guo, H. Zhang, Q. Yu, and J. Guo, “Advancements in artificial intelligence for atopic dermatitis: diagnosis, treatment, and patient management,” Ann. Med., vol. 57, no. 1, p. 2484665, Dec. 2025, doi: 10.1080/07853890.2025.2484665

work page doi:10.1080/07853890.2025.2484665 2025

[8] [8]

Classification of skin diseases with deep learning based approaches,

M. O. Sarı and K. Keser, “Classification of skin diseases with deep learning based approaches,” Sci. Rep., vol. 15, no. 1, p. 27506, Jul. 2025, doi: 10.1038/s41598-025-13275-x

work page doi:10.1038/s41598-025-13275-x 2025

[9] [9]

Evaluation of atopic dermatitis severity using artificial intelligence,

A. Maulana et al. , “Evaluation of atopic dermatitis severity using artificial intelligence,” Narra J, vol. 3, no. 3, p. e511, Dec. 2023, doi: 10.52225/narra.v3i3.511

work page doi:10.52225/narra.v3i3.511 2023

[10] [10]

Comparative performance of deep learning models and non-dermatologists in diagnosing psoriasis, dermatophytosis, and eczema,

N. Yodrabum et al., “Comparative performance of deep learning models and non-dermatologists in diagnosing psoriasis, dermatophytosis, and eczema,” Sci. Rep. , vol. 16, no. 1, p. 245, Dec. 2025, doi: 10.1038/s41598-025-29562-6

work page doi:10.1038/s41598-025-29562-6 2025

[11] [11]

Deep Residual Learning for Image Recognition

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Dec. 10, 2015, arXiv: arXiv:1512.03385. doi: 10.48550/arXiv.1512.03385

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1512.03385 2015

[12] [12]

Enhancing Interpretability in Medical Image Classification by Integrating Formal Concept Analysis with Convolutional Neural Networks,

M. Khatri, Y . Yin, and J. Deogun, “Enhancing Interpretability in Medical Image Classification by Integrating Formal Concept Analysis with Convolutional Neural Networks,” Biomimetics, vol. 9, no. 7, p. 421, Jul. 2024, doi: 10.3390/biomimetics9070421

work page doi:10.3390/biomimetics9070421 2024

[13] [13]

Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI),

A. Adadi and M. Berrada, “Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI),” IEEE Access , vol. 6, pp. 52138–52160, 2018, doi: 10.1109/ACCESS.2018.2870052

work page doi:10.1109/access.2018.2870052 2018

[14] [14]

Explainable artificial intelligence in skin cancer recognition: A systematic review,

K. Hauser et al. , “Explainable artificial intelligence in skin cancer recognition: A systematic review,” Eur. J. Cancer, vol. 167, pp. 54–69, May 2022, doi: 10.1016/j.ejca.2022.02.025

work page doi:10.1016/j.ejca.2022.02.025 2022

[15] [15]

Visualizing and Understanding Convolutional Networks

M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” Nov. 28, 2013, arXiv: arXiv:1311.2901. doi: 10.48550/arXiv.1311.2901

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1311.2901 2013

[16] [16]

An explainable hybrid deep learning framework for precise skin lesion segmentation and multi -class classification,

M. Fiaz et al. , “An explainable hybrid deep learning framework for precise skin lesion segmentation and multi -class classification,” Front. Med., vol. 12, p. 1681542, Oct. 2025, doi: 10.3389/fmed.2025.1681542

work page doi:10.3389/fmed.2025.1681542 2025

[17] [17]

A Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification,

N. Nigar, M. Umar, M. K. Shahzad, S. Islam, and D. Abalo, “A Deep Learning Approach Based on Explainable Artificial Intelligence for Skin Lesion Classification,” IEEE Access, vol. 10, pp. 113715–113725, 2022, doi: 10.1109/ACCESS.2022.3217217

work page doi:10.1109/access.2022.3217217 2022

[18] [18]

Attention Is All You Need

A. Vaswani et al., “Attention Is All You Need,” Aug. 02, 2023, arXiv: arXiv:1706.03762. doi: 10.48550/arXiv.1706.03762

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762 2023

[19] [20]

Visual Bias and Interpretability in Deep Learning for Dermatological Image Analysis,

E. A. Taufik, A. Khondoker, A. F. Parsa, and S. A. M. Mostafa, “Visual Bias and Interpretability in Deep Learning for Dermatological Image Analysis,” Aug. 06, 2025, arXiv: arXiv:2508.04573. doi: 10.48550/arXiv.2508.04573

work page doi:10.48550/arxiv.2508.04573 2025

[20] [21]

MT-TransUNet: Mediating Multi -Task Tokens in Transformers for Skin Lesion Segmentation and Classification,

J. Chen, J. Chen, Z. Zhou, B. Li, A. Y uille, and Y . Lu, “MT-TransUNet: Mediating Multi -Task Tokens in Transformers for Skin Lesion Segmentation and Classification,” Dec. 03, 2021, arXiv: arXiv:2112.01767. doi: 10.48550/arXiv.2112.01767

work page doi:10.48550/arxiv.2112.01767 2021

[21] [22]

DermViT: Diagnosis-Guided Vision Transformer for Robust and Efficient Skin Lesion Classification.Bioengineering

X. Zhang et al., “DermViT: Diagnosis-Guided Vision Transformer for Robust and Efficient Skin Lesion Classification,” Bioengineering, vol. 12, no. 4, p. 421, Apr. 2025, doi: 10.3390/bioengineering12040421

work page doi:10.3390/bioengineering12040421 2025

[22] [23]

B. Li, H. Chen, and H. Duan, “Artificial intelligence-driven prognostic system for conception prediction and management in intrauterine adhesions following hysteroscopic adhesiolysis: a diagnostic study using hysteroscopic images,” Front. Bioeng. Biotechnol. , vol. 12, p. 1327207, Apr. 2024, doi: 10.3389/fbioe.2024.1327207

work page doi:10.3389/fbioe.2024.1327207 2024

[23] [24]

A Deep CNN Transformer Hybrid Model for Skin Lesion Classification of Dermoscopic Images Using Focal Loss,

Y . Nie, P. Sommella, M. Carratù, M. O’Nils, and J. Lundgren, “A Deep CNN Transformer Hybrid Model for Skin Lesion Classification of Dermoscopic Images Using Focal Loss,” Diagnostics, vol. 13, no. 1, p. 72, Dec. 2022, doi: 10.3390/diagnostics13010072

work page doi:10.3390/diagnostics13010072 2022

[24] [25]

FA T -Net: Feature adaptive transformers for automated skin lesion segmentation,

H. Wu, S. Chen, G. Chen, W. Wang, B. Lei, and Z. Wen, “FA T -Net: Feature adaptive transformers for automated skin lesion segmentation,” Med. Image Anal. , vol. 76, p. 102327, 2022, doi: https://doi.org/10.1016/j.media.2021.102327

work page doi:10.1016/j.media.2021.102327 2022

[25] [26]

CFI-Net: A Choquet Fuzzy Integral Based Ensemble Network With PSO -Optimized Fuzzy Measures for Diagnosing Multiple Skin Diseases Including Mpox,

S. Asif, M. Zhao, Y . Li, F. Tang, and Y . Zhu, “CFI-Net: A Choquet Fuzzy Integral Based Ensemble Network With PSO -Optimized Fuzzy Measures for Diagnosing Multiple Skin Diseases Including Mpox,” IEEE J. Biomed. Health Inform. , vol. 28, no. 9, pp. 5573 –5586, Sep. 2024, doi: 10.1109/JBHI.2024.3411658

work page doi:10.1109/jbhi.2024.3411658 2024

[26] [27]

HiTrace: Hierarchical Class Tracing Approach for Open-Set Recognition on Skin Lesions,

B. W.-Y . Hsu and V . S. Tseng, “HiTrace: Hierarchical Class Tracing Approach for Open-Set Recognition on Skin Lesions,” IEEE J. Biomed. Health Inform. , vol. 29, no. 8, pp. 5700 –5711, Aug. 2025, doi: 10.1109/JBHI.2025.3560555

work page doi:10.1109/jbhi.2025.3560555 2025

[27] [28]

Federated Machine Learning for Detection of Skin Diseases and Enhancement of Internet of Medical Things (IoMT) Security,

Md. N. Hossen, V . Panneerselvam, D. Koundal, K. Ahmed, F. M. Bui, and S. M. Ibrahim, “Federated Machine Learning for Detection of Skin Diseases and Enhancement of Internet of Medical Things (IoMT) Security,” IEEE J. Biomed. Health Inform., vol. 27, no. 2, pp. 835–841, Feb. 2023, doi: 10.1109/JBHI.2022.3149288

work page doi:10.1109/jbhi.2022.3149288 2023

[28] [29]

Deep Neural Forest for Out -of- Distribution Detection of Skin Lesion Images,

X. Li, C. Desrosiers, and X. Liu, “Deep Neural Forest for Out -of- Distribution Detection of Skin Lesion Images,” IEEE J. Biomed. Health Inform., vol. 27, no. 1, pp. 157 –165, Jan. 2023, doi: 10.1109/JBHI.2022.3171582

work page doi:10.1109/jbhi.2022.3171582 2023

[29] [30]

Novoa, Justin Ko, Susan M

A. Esteva et al., “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, pp. 115 –118, Feb. 2017, doi: 10.1038/nature21056

work page doi:10.1038/nature21056 2017

[30] [31]

Transformer Interpretability Beyond Attention Visualization,

H. Chefer, S. Gur, and L. Wolf, “Transformer Interpretability Beyond Attention Visualization,” Apr. 05, 2021, arXiv: arXiv:2012.09838. doi: 10.48550/arXiv.2012.09838

work page doi:10.48550/arxiv.2012.09838 2021

[31] [32]

ADIC: An Adaptive Disentangled CNN Classifier for Interpretable Image Recognition,

Zhao Xiaoyang, “ADIC: An Adaptive Disentangled CNN Classifier for Interpretable Image Recognition,” J. Comput. Res. Dev., vol. 60, no. 8, p. 1754, 2023, doi: 10.7544/issn1000-1239.202330231

work page doi:10.7544/issn1000-1239.202330231 2023

[32] [33]

A novel framework of multiclass skin lesion recognition from dermoscopic images using deep learning and explainable AI,

N. Ahmad et al. , “A novel framework of multiclass skin lesion recognition from dermoscopic images using deep learning and explainable AI,” Front. Oncol., vol. 13, p. 1151257, Jun. 2023, doi: 10.3389/fonc.2023.1151257

work page doi:10.3389/fonc.2023.1151257 2023

[33] [34]

Grad -CAM: Visual Explanations From Deep Networks via Gradient-Based Localization

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad -CAM: Visual Explanations From Deep Networks via Gradient-Based Localization”

[34] [35]

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier,” Aug. 09, 2016, arXiv: arXiv:1602.04938. doi: 10.48550/arXiv.1602.04938

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1602.04938 2016

[35] [36]

A Comprehensive Taxonomy for Explainable Artificial Intelligence: A Systematic Survey of Surveys on Methods and Concepts,

G. Schwalbe and B. Finzel, “A Comprehensive Taxonomy for Explainable Artificial Intelligence: A Systematic Survey of Surveys on Methods and Concepts,” Data Min. Knowl. Discov., vol. 38, no. 5, pp. 3043–3101, Sep. 2024, doi: 10.1007/s10618-022-00867-8

work page doi:10.1007/s10618-022-00867-8 2024

[36] [37]

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Z. Liu et al., “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” Aug. 17, 2021, arXiv: arXiv:2103.14030. doi: 10.48550/arXiv.2103.14030

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2103.14030 2021

[37] [38]

Rethinking Semantic Segmentation from a Sequence- to-Sequence Perspective with Transformers,

S. Zheng et al., “Rethinking Semantic Segmentation from a Sequence- to-Sequence Perspective with Transformers,” Jul. 25, 2021, arXiv: arXiv:2012.15840. doi: 10.48550/arXiv.2012.15840

work page doi:10.48550/arxiv.2012.15840 2021

[38] [39]

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classiﬁcation

Z. Shao et al. , “TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classiﬁcation”

[39] [40]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 03, 2021, arXiv: arXiv:2010.11929. doi: 10.48550/arXiv.2010.11929

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.11929 2021

[40] [41]

Convnext v2: Co-designing and scaling convnets with masked autoen- coders,

P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, and A. Ranjan, “MobileOne: An Improved One millisecond Mobile Backbone,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Vancouver, BC, Canada: IEEE, Jun. 2023, pp. 7907 –7917. doi: 10.1109/CVPR52729.2023.00764

work page doi:10.1109/cvpr52729.2023.00764 2023