Skin-R1: Clinical Knowledge-Guided Dermatological Diagnosis Using Vision-Language Models

Jingxi Zhu; Jipeng Zhang; Tianxiang Zhao; Vasant G Honavar; Weijieying Ren; Xiaoting Li; Zehao Liu

arxiv: 2511.14900 · v2 · pith:A2GAQFDTnew · submitted 2025-11-18 · 💻 cs.CV · cs.AI· cs.CL

Skin-R1: Clinical Knowledge-Guided Dermatological Diagnosis Using Vision-Language Models

Zehao Liu , Weijieying Ren , Jipeng Zhang , Tianxiang Zhao , Jingxi Zhu , Xiaoting Li , Vasant G Honavar This is my paper

classification 💻 cs.CV cs.AIcs.CL

keywords reasoningdiagnosticclinicalgroundeddatasetsdermatologicalskin-r1supervision

0 comments

read the original abstract

Vision--language models (VLMs) have recently shown promise for assisting clinical reasoning in dermatological diagnosis. However, their trustworthiness and clinical utility remain limited by three key challenges: heterogeneous datasets with inconsistent diagnostic labels and concept annotations, the lack of grounded diagnostic rationales for reliable reasoning supervision, and limited scalability when transferring knowledge from small, densely annotated datasets to large collections with sparse labels. To address these challenges, we propose Skin-R1, a dermatology-oriented VLM that integrates textbook-grounded clinical reasoning supervision with reinforcement learning (RL) to improve the accuracy and robustness of diagnostic prediction. First, we construct a textbook-based reasoning generator that synthesizes hierarchy-aware and differential-diagnosis (DDx) diagnostic trajectories derived from authoritative dermatology knowledge. Second, these trajectories are used for supervised fine-tuning (SFT), establishing a clinically grounded reasoning foundation for the model. Finally, we introduce an RL training framework that incorporates the hierarchical structure of dermatological diseases into the reward design, enabling the model to generalize grounded diagnostic reasoning to large-scale datasets with sparse annotations. Extensive experiments across multiple dermatology benchmarks demonstrate that Skin-R1 consistently improves diagnostic accuracy and robustness compared to state-of-the-art Med-VLM baselines. Ablation studies further highlight the critical role of grounded reasoning supervision introduced during the SFT stage.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis
cs.CV 2026-05 unverdicted novelty 5.0

TIF-GRPO uses integral feedback on pseudo-temporal trajectories to regulate anatomy-aware rewards in RL for clinical faithfulness in volumetric CT analysis.