Cascade Classification of Dermoscopic Images of Skin Neoplasms with Controllable Sensitivity and External Clinical Validation

Aleksandr V. Kozachok; Elena S. Kozachok; Ilya P. Latyshev; Oleg I. Samovarov; Sergey S. Seregin

arxiv: 2606.13135 · v1 · pith:V7DMLLPUnew · submitted 2026-06-11 · 💻 cs.CV · cs.AI

Cascade Classification of Dermoscopic Images of Skin Neoplasms with Controllable Sensitivity and External Clinical Validation

Elena S. Kozachok , Sergey S. Seregin , Aleksandr V. Kozachok , Ilya P. Latyshev , Oleg I. Samovarov This is my paper

Pith reviewed 2026-06-27 07:20 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords dermoscopic imagesskin neoplasmscascade classificationdeep learninggeneralization gapsensitivity controlexternal validationbinary triage

0 comments

The pith

A cascade of binary triage followed by three-class differentiation allows tunable sensitivity for skin neoplasm images that single-stage models cannot achieve.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares four deep learning architectures on dermoscopic images of skin neoplasms using binary, single-stage four-class, and two-stage cascade schemes. It demonstrates that the cascade recovers malignant lesions often misassigned to the dominant benign class in single-stage setups and supplies an adjustable threshold for controlling sensitivity. Models trained on aggregated open ISIC data perform well internally but exhibit clear drops in ranking, sensitivity, and calibration on two independent Russian clinical datasets. The work concludes that the cascade better matches clinical differential-diagnosis logic yet requires external validation and recalibration prior to deployment.

Core claim

By evaluating ViT-B/16, Swin-S, ConvNeXt-S, and EfficientNetV2-S across binary, single-stage four-class, and cascade schemes on aggregated ISIC data, the paper shows that the cascade raises macro F1 over single-stage four-class classification for most architectures and significantly for ViT-B/16. The binary triage stage attains ROC-AUC 0.952-0.966 internally but drops to 0.797-0.893 on Sechenov University data, with sensitivity falling to 0.53-0.67 and ECE rising from 0.02 to 0.27-0.39. No architecture proves superior at the differentiation stage on clinical data, and direct 11-class classification on ISIC MILK10k yields mean-class sensitivity of 0.525.

What carries the argument

Two-stage cascade: binary malignant/benign triage with adjustable threshold, followed by three-class differentiation among malignant types (MEL, SCC, BCC).

If this is right

Cascade raises macro F1 over single-stage four-class classification for most architectures by recovering malignant lesions assigned to the benign class.
Tunable triage threshold supplies sensitivity control unattainable with standard single-stage argmax classification.
Binary stage ROC-AUC falls from 0.952-0.966 internally to 0.797-0.893 on external clinical data, with sensitivity declining to 0.53-0.67.
Calibration error rises sharply on external data, with malignancy underestimation quantified by ECE increasing to 0.27-0.39.
No architecture shows a proven advantage at the malignant differentiation stage on clinical data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The persistent gap between internal and external performance implies that domain adaptation or target-population data collection may be required before reliable clinical use.
The cascade structure could be tested on other imbalanced medical imaging tasks where rare positive cases must be separated from a large negative background.
Incorporating additional patient metadata or multi-modal inputs might narrow the observed generalization gap between open international and local clinical datasets.
Regulatory pathways for similar diagnostic tools would likely need to require independent external validation on representative populations.

Load-bearing premise

Aggregated open ISIC Archive data with ImageNet-pretrained weights provides a sufficient basis for models that transfer meaningfully to independent Russian clinical datasets without domain adaptation.

What would settle it

Showing that adjusting the triage threshold on the Sechenov University or Melanoscope AI datasets produces no improvement in macro F1 or sensitivity control compared with single-stage argmax classification would falsify the claimed advantage of the cascade.

Figures

Figures reproduced from arXiv: 2606.13135 by Aleksandr V. Kozachok, Elena S. Kozachok, Ilya P. Latyshev, Oleg I. Samovarov, Sergey S. Seregin.

**Figure 2.** Figure 2: Learning-rate schedule: linear warm-up followed by cosine annealing (AdamW, [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Learning curves (stage 2, MEL / SCC / BCC): (a) training (solid) and [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Binary-stage ROC curves (malignant / benign) for four architectures on three [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Binary-stage reliability diagrams: observed malignant fraction vs predicted [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Confusion matrix of ViT-B/16 in three-class differentiation [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of macro F1 of single-stage four-class and cascade [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: End-to-end cascade confusion matrix, Sechenov University dataset ( [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

read the original abstract

Purpose. To compare deep learning architectures and classification schemes for dermoscopic images of skin neoplasms and assess their generalization on transfer from open international datasets to independent clinical datasets of Russian practice. Methods. Four architectures (ViT-B/16, Swin-S, ConvNeXt-S, EfficientNetV2-S) were compared in three schemes: binary (malignant/benign), single-stage four-class (benign, MEL, SCC, BCC), and a two-stage cascade (binary triage, then three-class differentiation MEL/SCC/BCC). All models used ImageNet-pretrained weights and a single augmentation protocol on aggregated open ISIC Archive data, and were evaluated on an internal held-out sample and two clinical datasets (Melanoscope AI mobile system; Sechenov University). Results. Internally the binary stage attains ROC-AUC 0.952-0.966; on Sechenov University it drops to 0.797-0.893, sensitivity to 0.53-0.67, and ECE rises from 0.02 to 0.27-0.39 with underestimation of malignancy, quantifying a generalization gap in ranking and calibration. Paired tests confirm one inter-architecture result on clinical data: the deficit of ViT-B/16 at the binary stage (p<0.05); at the differentiation stage no architecture has a proven advantage. The cascade raises macro F1 over single-stage four-class classification for most architectures, but significantly only for ViT-B/16, by recovering malignant lesions assigned to the dominant benign class. On ISIC MILK10k, direct 11-class classification yields mean-class sensitivity 0.525. Conclusion. A tunable triage threshold gives sensitivity control not attainable in standard single-stage (argmax) classification and better reproduces clinical differential-diagnosis logic. The persistent generalization gap mandates external clinical validation and recalibration before deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper quantifies a clear ISIC-to-Russian generalization gap and shows modest macro-F1 lift from cascade over argmax single-stage for ViT, but the sensitivity-control claim rests on an incomplete baseline.

read the letter

The main things to know are that the models drop sharply on the two external Russian clinical sets (AUC 0.95 internal to 0.80-0.89, sensitivity 0.53-0.67, ECE up to 0.39) and that the cascade recovers some malignant cases the single-stage four-class argmax misses, giving a statistically significant macro-F1 gain only for ViT-B/16.

The work does a solid job running the same four architectures through binary, single-stage, and cascade schemes on the same data splits, reporting concrete AUC/sensitivity/F1/ECE numbers, and applying paired tests on the clinical data. External validation on two independent sets is the clearest strength.

The soft spot is the central claim that a tunable triage threshold supplies sensitivity control unattainable in standard single-stage classification. The paper only pits the cascade against argmax single-stage; it does not test whether a single-stage model with a tuned malignancy-probability threshold can match the same sensitivity range. That missing comparison leaves the practical advantage of the cascade less settled than the abstract suggests.

The generalization gap itself is convincingly shown and already implies the need for recalibration. This paper is for groups working on dermoscopy deployment who want numbers on real-world transfer rather than new theory. It is coherent enough and the external results are useful enough that a serious editor should send it to referees.

Referee Report

1 major / 2 minor

Summary. The manuscript compares four deep learning architectures (ViT-B/16, Swin-S, ConvNeXt-S, EfficientNetV2-S) for dermoscopic skin neoplasm classification under three schemes: binary (malignant/benign), single-stage four-class (benign/MEL/SCC/BCC), and a two-stage cascade (binary triage then three-class differentiation). All models use ImageNet-pretrained weights and are trained on aggregated ISIC Archive data; evaluation occurs on an internal held-out sample plus two external Russian clinical datasets. Reported results include internal binary AUC of 0.952-0.966 dropping to 0.797-0.893 externally with sensitivity 0.53-0.67 and rising ECE, macro F1 gains for cascade over single-stage (significant only for ViT-B/16), and statistical tests confirming limited inter-architecture differences on clinical data. The conclusion states that a tunable triage threshold enables sensitivity control unattainable with standard single-stage argmax classification and better matches clinical logic, while the generalization gap requires external validation and recalibration.

Significance. If the central claims hold, the work supplies concrete empirical support for cascade schemes in medical image triage by quantifying sensitivity control and domain-shift effects via external validation on independent clinical data. Credit is due for reporting specific AUC/sensitivity/F1/ECE values, paired statistical tests, and the explicit quantification of the generalization gap (AUC drop and ECE rise from 0.02 to 0.27-0.39). These elements provide a falsifiable basis for the triage-threshold advantage and the call for recalibration.

major comments (1)

[Results] Results: the central claim that 'a tunable triage threshold gives sensitivity control not attainable in standard single-stage (argmax) classification' rests solely on comparison to argmax single-stage four-class models. No results are shown for single-stage four-class models whose output probabilities are thresholded (e.g., malignancy probability or per-class operating points) to achieve the same external sensitivity range (0.53-0.67); this comparison is required to substantiate that the reported control is unavailable in any single-stage formulation.

minor comments (2)

[Abstract] Abstract: the specific data subset (internal vs. external) on which the macro F1 improvement reaches statistical significance for ViT-B/16 is not stated.
[Methods] The manuscript does not detail whether the single-stage models were also evaluated under any form of probability thresholding, leaving the scope of the 'standard single-stage' baseline ambiguous.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on our results section. We address the point below and agree that additional comparisons will strengthen the manuscript.

read point-by-point responses

Referee: [Results] Results: the central claim that 'a tunable triage threshold gives sensitivity control not attainable in standard single-stage (argmax) classification' rests solely on comparison to argmax single-stage four-class models. No results are shown for single-stage four-class models whose output probabilities are thresholded (e.g., malignancy probability or per-class operating points) to achieve the same external sensitivity range (0.53-0.67); this comparison is required to substantiate that the reported control is unavailable in any single-stage formulation.

Authors: We agree that the referee's point is valid for fully substantiating the advantage of the cascade. While the manuscript explicitly frames its claim against the standard argmax single-stage four-class output (as stated in the conclusion), a comparison to single-stage models operated with probability thresholding is a natural extension. In the revised manuscript we will add results for single-stage four-class models where decision thresholds are adjusted on the output probabilities (both on the aggregated malignant probability and per-class operating points) to target the same external sensitivity range of 0.53-0.67. We will report the resulting macro F1, specificity, and calibration metrics alongside the cascade results. This will clarify whether the cascade provides sensitivity control that cannot be replicated by post-hoc thresholding in a single-stage formulation. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparisons on held-out and external datasets

full rationale

The paper reports training and evaluation of four architectures under three classification schemes (binary, single-stage four-class, cascade) using ImageNet-pretrained weights on aggregated ISIC data, with metrics on internal held-out and two external clinical datasets. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear; the central claim about tunable triage thresholds is an empirical observation from direct comparisons to argmax baselines, not a reduction to inputs by construction. The generalization gap is quantified via explicit AUC/ECE drops rather than assumed away.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard supervised learning assumptions and the representativeness of the datasets used.

free parameters (1)

triage threshold
Tunable parameter for sensitivity control in cascade scheme

axioms (2)

domain assumption ImageNet-pretrained weights and single augmentation protocol are sufficient for fair comparison across architectures
Used for all models in the study
domain assumption The clinical datasets from Melanoscope and Sechenov University are independent and representative of Russian practice
Used for external validation

pith-pipeline@v0.9.1-grok · 5917 in / 1364 out tokens · 35341 ms · 2026-06-27T07:20:19.932112+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 17 canonical work pages · 4 internal anchors

[1]

Nature542(7639), 115–118 (2017).https: //doi.org/10.1038/nature21056

Esteva A., Kuprel B., Novoa R.A., Ko J., Swetter S.M., Blau H.M., Thrun S. Dermatologist-level classification of skin can- cer with deep neural networks.Nature. 2017;542(7639):115–118. https://doi.org/10.1038/nature21056

work page doi:10.1038/nature21056 2017
[2]

Deep learn- ing outperformed 136 of 157 dermatologists in a head-to-head dermo- scopic melanoma image classification task.European Journal of Cancer

Brinker T.J., Hekler A., Enk A.H., Berking C., Haferkamp S., Hauschild A., Weichenthal M., Klode J., Schadendorf D., Holland- Letz T., von Kalle C., Fröhling S., Schilling B., Utikal J.S. Deep learn- ing outperformed 136 of 157 dermatologists in a head-to-head dermo- scopic melanoma image classification task.European Journal of Cancer. 2019;113:47–54. htt...

work page doi:10.1016/j.ejca.2019.04.001 2019
[3]

Systematic outperformance of 112 derma- tologists in multiclass skin cancer image classification by convolu- tional neural networks.European Journal of Cancer

Maron R.C., Weichenthal M., Utikal J.S., Hekler A., Berk- ing C., Hauschild A., Enk A.H., Haferkamp S., Klode J., Schaden- dorf D., Jansen P., Holland-Letz T., Schilling B., von Kalle C., 25 Fröhling S., Gaiser M.R., Hartmann D., Gesierich A., Käm- merer U., Brinker T.J. Systematic outperformance of 112 derma- tologists in multiclass skin cancer image cla...

work page doi:10.1016/j.ejca.2019.06.013 2019
[4]

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of com- mon pigmented skin lesions.Scientific Data

Tschandl P., Rosendahl C., Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of com- mon pigmented skin lesions.Scientific Data. 2018;5:180161. https://doi.org/10.1038/sdata.2018.161

work page doi:10.1038/sdata.2018.161 2018
[5]

Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)

Codella N., Rotemberg V., Tschandl P., Celebi M.E., Dusza S., Gut- man D., Helba B., Kalloo A., Liopyris K., Marchetti M., Kittler H., HalpernA.Skinlesionanalysistowardmelanomadetection2018:Achal- lenge hosted by the International Skin Imaging Collaboration (ISIC). arXiv:1902.03368. 2019. https://doi.org/10.48550/arXiv.1902.03368

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1902.03368 1902
[6]

An introduction to domain adaptation and trans- fer learning

Kouw W.M., Loog M. An introduction to domain adaptation and trans- fer learning. arXiv:1812.11806. 2018

Pith/arXiv arXiv 2018
[7]

Dis- parities in dermatology AI performance on a diverse, cu- rated clinical image set.Science Advances

Daneshjou R., Vodrahalli K., Novoa R.A., Jenkins M., Liang W., Rotemberg V., Ko J., Swetter S.M., Bailey E.E., Gevaert O., Mukherjee P., Phung M., Yekrang K., Fong B., Sahasrabudhe R., Allerup J.A.C., Okata-Karigane U., Zou J., Chiou A.S. Dis- parities in dermatology AI performance on a diverse, cu- rated clinical image set.Science Advances. 2022;8(31):ea...

work page doi:10.1126/sciadv.abq6147 2022
[8]

Validation of AI prediction mod- els for skin cancer diagnosis using dermoscopy images: the 2019 ISIC grand challenge.The Lancet Digital Health

Combalia M., Codella N., Rotemberg V., Carrera C., Dusza S., Gutman D., Helba B., Kittler H., Kurtansky N.R., Liopyris K., Marchetti M.A., Podlipnik S., Puig S., Rinner C., Tschandl P., We- ber J., Halpern A., Malvehy J. Validation of AI prediction mod- els for skin cancer diagnosis using dermoscopy images: the 2019 ISIC grand challenge.The Lancet Digital...

work page doi:10.1016/s2589-7500(22)00021-8 2019
[9]

Rotemberg V., Kurtansky N., Betz-Stablein B. et al. A patient-centric dataset of images and metadata for identifying melanomas using clinical context.Scientific Data. 2021;8(1):34. https://doi.org/10.1038/s41597- 021-00815-z

work page doi:10.1038/s41597- 2021
[10]

Methodology for Creating a Clinically Verified Dermoscopic Image Dataset

Kozachok E.S. Methodology for Creating a Clinically Verified Der- moscopic Image Dataset. Preprint. 2026. arXiv:2605.25168 [cs.CV]. https://doi.org/10.48550/arXiv.2605.25168. 26

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.25168 2026
[11]

Kozachok E.S. [A dermoscopic image dataset with high-quality an- notation of clinically significant features for diagnosis of melanocytic skin lesions].Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta. 2025;15(3):93–111.(InRuss.)https://doi.org/10.21869/2223-1536-2025- 15-3-93-111

work page doi:10.21869/2223-1536-2025- 2025
[12]

Screening methodology for early differen- tial diagnosis of skin lesions using mobile dermoscopy.Vrach i informatsionnye tekhnologii

Kozachok E.S., Seregin S.S., Kozachok A.V., Eletskiy K.V., Samovarov O.I. [Screening methodology for early differ- ential diagnosis of skin lesions using mobile dermoscopy]. Vrach i informatsionnye tekhnologii. 2025;(3):50–64. (In Russ.) https://doi.org/10.25881/18110193_2025_3_50

work page doi:10.25881/18110193_2025_3_50 2025
[13]

Clinical Validation of the Melanoscope AI Mobile Dermoscopy Clinical Decision Support System

Kozachok E.S., Seregin S.S. [Clinical Validation of the Melanoscope AI Mobile Dermoscopy Clinical Decision Support System]. Preprint. 2026. arXiv:2605.27561 [cs.CV]. (In Russ.) https://doi.org/10.48550/arXiv.2605.27561

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.27561 2026
[14]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J., Houlsby N. An image is worth 16×16 words: Trans- formers for image recognition at scale.Proceedings of ICLR. 2021. https://doi.org/10.48550/arXiv.2010.11929

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.11929 2021
[15]

Zhang, Y

Liu Z., Lin Y., Cao Y., Hu H., Wei Y., Zhang Z., Lin S., Guo B. Swin Transformer: hierarchical vision transformer using shifted windows.Proceedings of IEEE/CVF ICCV. 2021:10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986

work page doi:10.1109/iccv48922.2021.00986 2021
[16]

In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu Z., Mao H., Wu C.-Y., Feichtenhofer C., Darrell T., Xie S. A Con- vNet for the 2020s.Proceedings of IEEE/CVF CVPR. 2022:11976– 11986. https://doi.org/10.1109/CVPR52688.2022.01167

work page doi:10.1109/cvpr52688.2022.01167 2022
[17]

EfficientNetV2: smaller models and faster training.Pro- ceedings of ICML

Tan M., Le Q. EfficientNetV2: smaller models and faster training.Pro- ceedings of ICML. 2021;139:10096–10106

2021
[18]

DermViT: Diagnosis-Guided Vision Transformer for Robust and Efficient Skin Lesion Classification.Bioengineering

Zhang X., Liu Y., Ouyang G., Chen W., Xu A., Hara T., Zhou X., Wu D. DermViT: Diagnosis-Guided Vision Transformer for Robust and Efficient Skin Lesion Classification.Bioengineering. 2025;12(4):421. https://doi.org/10.3390/bioengineering12040421

work page doi:10.3390/bioengineering12040421 2025
[19]

Hierarchical skin lesion image classification with prototypical decision tree.npj Digital Medicine

Yu Z., et al. Hierarchical skin lesion image classification with prototypical decision tree.npj Digital Medicine. 2025;8:26. https://doi.org/10.1038/s41746-024-01395-z. 27

work page doi:10.1038/s41746-024-01395-z 2025
[20]

International Skin Imaging Col- laboration

ISIC MILK10k Challenge. International Skin Imaging Col- laboration. 2024. Available from: https://challenge.isic- archive.com/leaderboards/milk10k/. 28

2024

[1] [1]

Nature542(7639), 115–118 (2017).https: //doi.org/10.1038/nature21056

Esteva A., Kuprel B., Novoa R.A., Ko J., Swetter S.M., Blau H.M., Thrun S. Dermatologist-level classification of skin can- cer with deep neural networks.Nature. 2017;542(7639):115–118. https://doi.org/10.1038/nature21056

work page doi:10.1038/nature21056 2017

[2] [2]

Deep learn- ing outperformed 136 of 157 dermatologists in a head-to-head dermo- scopic melanoma image classification task.European Journal of Cancer

Brinker T.J., Hekler A., Enk A.H., Berking C., Haferkamp S., Hauschild A., Weichenthal M., Klode J., Schadendorf D., Holland- Letz T., von Kalle C., Fröhling S., Schilling B., Utikal J.S. Deep learn- ing outperformed 136 of 157 dermatologists in a head-to-head dermo- scopic melanoma image classification task.European Journal of Cancer. 2019;113:47–54. htt...

work page doi:10.1016/j.ejca.2019.04.001 2019

[3] [3]

Systematic outperformance of 112 derma- tologists in multiclass skin cancer image classification by convolu- tional neural networks.European Journal of Cancer

Maron R.C., Weichenthal M., Utikal J.S., Hekler A., Berk- ing C., Hauschild A., Enk A.H., Haferkamp S., Klode J., Schaden- dorf D., Jansen P., Holland-Letz T., Schilling B., von Kalle C., 25 Fröhling S., Gaiser M.R., Hartmann D., Gesierich A., Käm- merer U., Brinker T.J. Systematic outperformance of 112 derma- tologists in multiclass skin cancer image cla...

work page doi:10.1016/j.ejca.2019.06.013 2019

[4] [4]

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of com- mon pigmented skin lesions.Scientific Data

Tschandl P., Rosendahl C., Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of com- mon pigmented skin lesions.Scientific Data. 2018;5:180161. https://doi.org/10.1038/sdata.2018.161

work page doi:10.1038/sdata.2018.161 2018

[5] [5]

Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)

Codella N., Rotemberg V., Tschandl P., Celebi M.E., Dusza S., Gut- man D., Helba B., Kalloo A., Liopyris K., Marchetti M., Kittler H., HalpernA.Skinlesionanalysistowardmelanomadetection2018:Achal- lenge hosted by the International Skin Imaging Collaboration (ISIC). arXiv:1902.03368. 2019. https://doi.org/10.48550/arXiv.1902.03368

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1902.03368 1902

[6] [6]

An introduction to domain adaptation and trans- fer learning

Kouw W.M., Loog M. An introduction to domain adaptation and trans- fer learning. arXiv:1812.11806. 2018

Pith/arXiv arXiv 2018

[7] [7]

Dis- parities in dermatology AI performance on a diverse, cu- rated clinical image set.Science Advances

Daneshjou R., Vodrahalli K., Novoa R.A., Jenkins M., Liang W., Rotemberg V., Ko J., Swetter S.M., Bailey E.E., Gevaert O., Mukherjee P., Phung M., Yekrang K., Fong B., Sahasrabudhe R., Allerup J.A.C., Okata-Karigane U., Zou J., Chiou A.S. Dis- parities in dermatology AI performance on a diverse, cu- rated clinical image set.Science Advances. 2022;8(31):ea...

work page doi:10.1126/sciadv.abq6147 2022

[8] [8]

Validation of AI prediction mod- els for skin cancer diagnosis using dermoscopy images: the 2019 ISIC grand challenge.The Lancet Digital Health

Combalia M., Codella N., Rotemberg V., Carrera C., Dusza S., Gutman D., Helba B., Kittler H., Kurtansky N.R., Liopyris K., Marchetti M.A., Podlipnik S., Puig S., Rinner C., Tschandl P., We- ber J., Halpern A., Malvehy J. Validation of AI prediction mod- els for skin cancer diagnosis using dermoscopy images: the 2019 ISIC grand challenge.The Lancet Digital...

work page doi:10.1016/s2589-7500(22)00021-8 2019

[9] [9]

Rotemberg V., Kurtansky N., Betz-Stablein B. et al. A patient-centric dataset of images and metadata for identifying melanomas using clinical context.Scientific Data. 2021;8(1):34. https://doi.org/10.1038/s41597- 021-00815-z

work page doi:10.1038/s41597- 2021

[10] [10]

Methodology for Creating a Clinically Verified Dermoscopic Image Dataset

Kozachok E.S. Methodology for Creating a Clinically Verified Der- moscopic Image Dataset. Preprint. 2026. arXiv:2605.25168 [cs.CV]. https://doi.org/10.48550/arXiv.2605.25168. 26

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.25168 2026

[11] [11]

Kozachok E.S. [A dermoscopic image dataset with high-quality an- notation of clinically significant features for diagnosis of melanocytic skin lesions].Izvestiya Yugo-Zapadnogo gosudarstvennogo universiteta. 2025;15(3):93–111.(InRuss.)https://doi.org/10.21869/2223-1536-2025- 15-3-93-111

work page doi:10.21869/2223-1536-2025- 2025

[12] [12]

Screening methodology for early differen- tial diagnosis of skin lesions using mobile dermoscopy.Vrach i informatsionnye tekhnologii

Kozachok E.S., Seregin S.S., Kozachok A.V., Eletskiy K.V., Samovarov O.I. [Screening methodology for early differ- ential diagnosis of skin lesions using mobile dermoscopy]. Vrach i informatsionnye tekhnologii. 2025;(3):50–64. (In Russ.) https://doi.org/10.25881/18110193_2025_3_50

work page doi:10.25881/18110193_2025_3_50 2025

[13] [13]

Clinical Validation of the Melanoscope AI Mobile Dermoscopy Clinical Decision Support System

Kozachok E.S., Seregin S.S. [Clinical Validation of the Melanoscope AI Mobile Dermoscopy Clinical Decision Support System]. Preprint. 2026. arXiv:2605.27561 [cs.CV]. (In Russ.) https://doi.org/10.48550/arXiv.2605.27561

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.27561 2026

[14] [14]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J., Houlsby N. An image is worth 16×16 words: Trans- formers for image recognition at scale.Proceedings of ICLR. 2021. https://doi.org/10.48550/arXiv.2010.11929

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.11929 2021

[15] [15]

Zhang, Y

Liu Z., Lin Y., Cao Y., Hu H., Wei Y., Zhang Z., Lin S., Guo B. Swin Transformer: hierarchical vision transformer using shifted windows.Proceedings of IEEE/CVF ICCV. 2021:10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986

work page doi:10.1109/iccv48922.2021.00986 2021

[16] [16]

In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Liu Z., Mao H., Wu C.-Y., Feichtenhofer C., Darrell T., Xie S. A Con- vNet for the 2020s.Proceedings of IEEE/CVF CVPR. 2022:11976– 11986. https://doi.org/10.1109/CVPR52688.2022.01167

work page doi:10.1109/cvpr52688.2022.01167 2022

[17] [17]

EfficientNetV2: smaller models and faster training.Pro- ceedings of ICML

Tan M., Le Q. EfficientNetV2: smaller models and faster training.Pro- ceedings of ICML. 2021;139:10096–10106

2021

[18] [18]

DermViT: Diagnosis-Guided Vision Transformer for Robust and Efficient Skin Lesion Classification.Bioengineering

Zhang X., Liu Y., Ouyang G., Chen W., Xu A., Hara T., Zhou X., Wu D. DermViT: Diagnosis-Guided Vision Transformer for Robust and Efficient Skin Lesion Classification.Bioengineering. 2025;12(4):421. https://doi.org/10.3390/bioengineering12040421

work page doi:10.3390/bioengineering12040421 2025

[19] [19]

Hierarchical skin lesion image classification with prototypical decision tree.npj Digital Medicine

Yu Z., et al. Hierarchical skin lesion image classification with prototypical decision tree.npj Digital Medicine. 2025;8:26. https://doi.org/10.1038/s41746-024-01395-z. 27

work page doi:10.1038/s41746-024-01395-z 2025

[20] [20]

International Skin Imaging Col- laboration

ISIC MILK10k Challenge. International Skin Imaging Col- laboration. 2024. Available from: https://challenge.isic- archive.com/leaderboards/milk10k/. 28

2024