Empirical benchmark finds attention-based models (SwinTiny, CoAtNet0, MaxViTTiny) achieve highest AUC above 84% on RFMiD binary screening and best F1 scores on multi-label task, with VLMs competitive but not superior and external Messidor-2 AUC 66.8-84.7%.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Benchmark of twelve models finds hybrid CNN-transformer architectures and a SigLIP vision-language model deliver the strongest overall performance on skin cancer detection using the PAD-UFES-20 dataset.
citing papers explorer
-
Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening
Empirical benchmark finds attention-based models (SwinTiny, CoAtNet0, MaxViTTiny) achieve highest AUC above 84% on RFMiD binary screening and best F1 scores on multi-label task, with VLMs competitive but not superior and external Messidor-2 AUC 66.8-84.7%.
-
CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection
Benchmark of twelve models finds hybrid CNN-transformer architectures and a SigLIP vision-language model deliver the strongest overall performance on skin cancer detection using the PAD-UFES-20 dataset.