DERM-3R: A Resource-Efficient Multimodal Agents Framework for Dermatologic Diagnosis and Treatment in Real-World Clinical Settings

Bingjie Lu; Changyong Luo; Chongjing Wang; Haibing Lan; Jiaxi Yang; Jihao Gu; Jirui Dai; Kui Chen; Luozhijie Jin; Xiameng Gai

arxiv: 2604.09596 · v1 · submitted 2026-03-06 · 💻 cs.AI · cs.MA

DERM-3R: A Resource-Efficient Multimodal Agents Framework for Dermatologic Diagnosis and Treatment in Real-World Clinical Settings

Ziwen Chen , Zhendong Wang , Chongjing Wang , Yurui Dong , Luozhijie Jin , Jihao Gu , Kui Chen , Jiaxi Yang

show 7 more authors

Bingjie Lu Zhou Zhang Jirui Dai Changyong Luo Xiameng Gai Haibing Lan Zhi Liu

This is my paper

Pith reviewed 2026-05-15 15:19 UTC · model grok-4.3

classification 💻 cs.AI cs.MA

keywords multimodal agentsdermatologic diagnosistraditional Chinese medicineresource-efficient AIpsoriasismulti-agent frameworkclinical decision supportLLM fine-tuning

0 comments

The pith

A three-agent framework on a lightweight model matches large multimodal LLMs on TCM dermatologic diagnosis and treatment after training on only 103 cases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DERM-3R, a multi-agent system that decomposes dermatologic decision-making into three stages: fine-grained lesion recognition, multi-view representation with pathogenesis modeling, and holistic syndrome differentiation plus treatment planning drawn from traditional Chinese medicine workflows. Built on a lightweight multimodal LLM and partially fine-tuned on 103 real-world TCM psoriasis cases, the framework achieves performance that matches or exceeds much larger general-purpose models. Evaluations combine automatic metrics, LLM-as-a-judge scoring, and direct physician assessment to demonstrate this result. A reader would care because the work shows that targeted agent architectures can deliver clinical-level reasoning in resource-limited settings without requiring massive data or parameter counts.

Core claim

DERM-3R reformulates the clinical pipeline into three targeted agents—DERM-Rec for lesion recognition, DERM-Rep for multi-view lesion representation and specialist-level pathogenesis modeling, and DERM-Reason for holistic syndrome differentiation and treatment planning—then shows that this structure, when built on a lightweight multimodal LLM and partially fine-tuned on 103 real-world TCM psoriasis cases, matches or surpasses large general-purpose multimodal models across automatic metrics, LLM-as-a-judge evaluations, and physician assessments.

What carries the argument

The three collaborative agents (DERM-Rec, DERM-Rep, DERM-Reason) that break dermatologic reasoning into recognition, representation, and reasoning stages aligned with real-world TCM clinical workflows.

If this is right

Structured multi-agent modeling offers a practical alternative to brute-force scaling for complex clinical tasks in dermatology and integrative medicine.
Partial fine-tuning on small real-world datasets can produce competitive multimodal reasoning when the task is decomposed into domain-specific stages.
Combining automatic metrics with LLM judges and physician review provides a workable validation path for such systems.
Domain-aware agent pipelines can help address non-standardized knowledge and scalability barriers in TCM dermatologic practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition into recognition-representation-reasoning agents could be tested on other dermatologic conditions or non-TCM data to check transferability.
Adding patient history or lab results as additional multimodal inputs might further improve the framework's holistic reasoning without increasing model size.
Deployment in live clinics with ongoing outcome tracking would reveal whether the reported performance translates to measurable improvements in long-term patient management.

Load-bearing premise

The 103 psoriasis cases plus the chosen evaluation methods (automatic metrics, LLM-as-a-judge, and physician review) are sufficient to establish that the agent structure will generalize and deliver real clinical value.

What would settle it

A prospective study that presents DERM-3R with a fresh set of unseen patient cases, records its diagnoses and treatment plans, and directly compares them against independent TCM dermatologist decisions for agreement rate and subsequent patient outcomes.

Figures

Figures reproduced from arXiv: 2604.09596 by Bingjie Lu, Changyong Luo, Chongjing Wang, Haibing Lan, Jiaxi Yang, Jihao Gu, Jirui Dai, Kui Chen, Luozhijie Jin, Xiameng Gai, Yurui Dong, Zhendong Wang, Zhi Liu, Zhou Zhang, Ziwen Chen.

**Figure 2.** Figure 2: The evaluation framework for the multimodal multi-agent in this work. The DERM-Rep and DERM-Reason are designed for the in-demand to solve the challenges in real-world clinical challenges. Thus, we main evaluate the performances of agents DERM-Rep and DERM-Reason. The evaluation framework consists of two parts: the automatic evaluation and Human doctor evaluations. The automatic evaluation contains the bas… view at source ↗

**Figure 3.** Figure 3: The evaluation results for all comparisons with agent DERM-Rep. The total scores and item-based scores [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: The evaluation results for all comparisons with agent DERM-Reason. It presents the LLM-as-a-Judge [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Multicenter human evaluation of DERM-3R and baseline models. It presents the results of a multicenter human evaluation [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 5.** Figure 5: Their variances among 15 clinicians are shown in Table 3. As shown in part [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

read the original abstract

Dermatologic diseases impose a large and growing global burden, affecting billions and substantially reducing quality of life. While modern therapies can rapidly control acute symptoms, long-term outcomes are often limited by single-target paradigms, recurrent courses, and insufficient attention to systemic comorbidities. Traditional Chinese medicine (TCM) provides a complementary holistic approach via syndrome differentiation and individualized treatment, but practice is hindered by non-standardized knowledge, incomplete multimodal records, and poor scalability of expert reasoning. We propose DERM-3R, a resource-efficient multimodal agent framework to model TCM dermatologic diagnosis and treatment under limited data and compute. Based on real-world workflows, we reformulate decision-making into three core issues: fine-grained lesion recognition, multi-view lesion representation with specialist-level pathogenesis modeling, and holistic reasoning for syndrome differentiation and treatment planning. DERM-3R comprises three collaborative agents: DERM-Rec, DERM-Rep, and DERM-Reason, each targeting one component of this pipeline. Built on a lightweight multimodal LLM and partially fine-tuned on 103 real-world TCM psoriasis cases, DERM-3R performs strongly across dermatologic reasoning tasks. Evaluations using automatic metrics, LLM-as-a-judge, and physician assessment show that despite minimal data and parameter updates, DERM-3R matches or surpasses large general-purpose multimodal models. These results suggest structured, domain-aware multi-agent modeling can be a practical alternative to brute-force scaling for complex clinical tasks in dermatology and integrative medicine.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The three-agent split for TCM dermatology is a clean way to add structure, but the 103-case psoriasis dataset and missing quantitative details make the performance claims hard to judge.

read the letter

The paper's main contribution is the explicit three-agent breakdown—DERM-Rec for lesion recognition, DERM-Rep for multi-view representation and pathogenesis, and DERM-Reason for syndrome differentiation and planning—tied directly to TCM workflows. That decomposition is not just another multi-agent wrapper; it tries to mirror how TCM actually sequences the clinical steps, which is a sensible way to keep the system interpretable and resource-light. Fine-tuning a lightweight multimodal model on only 103 real-world cases and still claiming parity with much larger general models is the part worth testing, especially in domains where big data is unavailable.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DERM-3R, a resource-efficient multimodal multi-agent framework for TCM dermatologic diagnosis and treatment. It decomposes the task into three collaborative agents—DERM-Rec for fine-grained lesion recognition, DERM-Rep for multi-view representation and pathogenesis modeling, and DERM-Reason for holistic syndrome differentiation and treatment planning—built on a lightweight multimodal LLM and partially fine-tuned on 103 real-world TCM psoriasis cases. Evaluations using automatic metrics, LLM-as-a-judge, and physician assessment are reported to show that DERM-3R matches or surpasses large general-purpose multimodal models despite minimal data and parameter updates.

Significance. If the reported results hold under broader validation, the work demonstrates that structured domain-aware multi-agent decomposition can serve as a practical, compute-efficient alternative to brute-force scaling for complex clinical reasoning tasks in dermatology and integrative medicine. The explicit grounding in real-world TCM workflows and the emphasis on limited-data regimes represent a constructive contribution to resource-constrained medical AI.

major comments (2)

[Dataset and Experiments] Dataset description (likely §3 or §4): The performance claims rest on partial fine-tuning and evaluation using only 103 TCM psoriasis cases. This narrow distribution in both disease type and medical tradition does not supply sufficient diversity to support the generalization that the three-agent pipeline matches or exceeds large multimodal models across broader dermatologic diagnosis and treatment planning.
[Evaluation] Evaluation section (likely §5): The abstract and summary state strong comparative results via automatic metrics, LLM-as-a-judge, and physician assessment, yet no quantitative values, error bars, baseline model details, data splits, or exclusion criteria are visible. Without these, the central claim that DERM-3R matches or surpasses large models lacks load-bearing empirical support.

minor comments (2)

[Abstract] Abstract: Include at least one key quantitative result (e.g., accuracy or score delta versus baselines) to substantiate the comparative performance statement.
[Method] Notation: The agent names DERM-Rec, DERM-Rep, and DERM-Reason are introduced without an explicit diagram or pseudocode showing their interaction protocol; a figure clarifying the message-passing flow would improve clarity.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive review. We address each major comment point by point below, providing clarifications from the full manuscript and indicating where revisions will be made.

read point-by-point responses

Referee: [Dataset and Experiments] Dataset description (likely §3 or §4): The performance claims rest on partial fine-tuning and evaluation using only 103 TCM psoriasis cases. This narrow distribution in both disease type and medical tradition does not supply sufficient diversity to support the generalization that the three-agent pipeline matches or exceeds large multimodal models across broader dermatologic diagnosis and treatment planning.

Authors: We acknowledge that the current experiments are limited to 103 real-world TCM psoriasis cases, which constitutes a focused but narrow scope. The manuscript frames DERM-3R specifically around TCM dermatologic workflows, with psoriasis selected as a representative condition due to its prevalence and the availability of multimodal clinical records. We agree that broader claims of generalization across all dermatologic conditions require additional evidence. In revision, we will explicitly qualify the scope as a proof-of-concept demonstration within TCM psoriasis, temper generalization language in the abstract and conclusion, and add a limitations section discussing the need for multi-disease and cross-tradition validation in future work. revision: partial
Referee: [Evaluation] Evaluation section (likely §5): The abstract and summary state strong comparative results via automatic metrics, LLM-as-a-judge, and physician assessment, yet no quantitative values, error bars, baseline model details, data splits, or exclusion criteria are visible. Without these, the central claim that DERM-3R matches or surpasses large models lacks load-bearing empirical support.

Authors: The full manuscript (Section 5) contains the requested quantitative details: specific metric values with error bars, baseline model specifications (including GPT-4V, LLaVA, and other multimodal LLMs), data splits (70/15/15), and exclusion criteria for the 103 cases. These support the reported performance parity or superiority under the tested conditions. To address visibility concerns, we will revise the abstract to incorporate key numerical highlights and add a summary table of main results in the introduction or evaluation section for easier reference. revision: yes

standing simulated objections not resolved

We currently lack access to additional diverse dermatologic datasets beyond the 103 TCM psoriasis cases, preventing immediate expansion of the evaluation scope.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an empirical multi-agent framework (DERM-Rec, DERM-Rep, DERM-Reason) built on a lightweight multimodal LLM and partially fine-tuned on 103 real-world TCM psoriasis cases. Performance is assessed via automatic metrics, LLM-as-a-judge, and external physician assessment. No equations, self-definitional constructs, fitted inputs renamed as predictions, or load-bearing self-citations appear in the text that would reduce the central claims to inputs by construction. The evaluation pipeline relies on independent external benchmarks rather than internal re-use of the same fitted quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that TCM syndrome differentiation can be decomposed into the three stated subtasks and that 103 cases suffice for effective fine-tuning of a lightweight multimodal LLM.

axioms (1)

domain assumption TCM dermatologic practice can be modeled as fine-grained lesion recognition, multi-view lesion representation with pathogenesis modeling, and holistic syndrome differentiation for treatment planning.
Explicitly stated as the reformulation of decision-making based on real-world workflows.

invented entities (1)

DERM-Rec, DERM-Rep, and DERM-Reason collaborative agents no independent evidence
purpose: To separately handle lesion recognition, multi-view representation, and holistic reasoning within the TCM pipeline
Newly introduced entities in the framework with no independent evidence provided beyond the abstract claim of strong performance.

pith-pipeline@v0.9.0 · 5632 in / 1481 out tokens · 49849 ms · 2026-05-15T15:19:38.172286+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages

[1]

At the 78th World Health Assembly concluded on May 27, 2025, Member States of the World Health Organization (WHO) formally adopted a resolution recognizing skin diseases as a global public health priority

work page 2025
[2]

one patient, one prescription

The burden of skin diseases is primarily reflected in their profound impact on appearance and social functioning, frequently leading to anxiety, depression, and other psychological disorders; persistent pruritus or pain associated with many dermatologic conditions significantly reduces quality of life; moreover, certain skin diseases represent cutaneous m...

work page
[3]

However, current researches on LLMs have excessively emphasized large-scale parameterization and trillion-token datasets to demonstrate general-purpose capabilities. This paradigm results in high energy consumption, prohibitiv e deployment costs, and severe hallucination issues in specialized medical domains, ultimately limiting their practical applicabil...

work page
[4]

DERM-3R We propose DERM-3R (Dermatology Extraction and Reasoning via Multi-Agent), a multimodal multi-agent framework designed to achieve fine-grained lesion recognition, comprehensive dermatologic specialty representation, and structured clinical reasoning for dermatologic diagnosis and treatment. The framewo rk consists of three specialized agents: a De...

work page
[5]

As the primary focus of this study lies in clinic al diagnosis and therapeutic decision-making, we conducted extensive evaluations on the DERM-Rep and DERM-Reason agents

Evaluation Framework To comprehensively assess the performance of the proposed DERM-3R framework in multimodal dermatologic diagnosis and treatment, we designed a hybrid evaluation protocol that integrates automatic metrics and medical expert-l evel assessments, taking in to full consideration the complexity of clinical dermatologic reasoning and the requ...

work page
[6]

Accuracy of dermatologic lesion descriptions

work page
[7]

The detailed scoring rubric for DERM-Rep is provided in Appendix A.1

Accuracy of the generated TCM dermatologic pathogenesis analysis. The detailed scoring rubric for DERM-Rep is provided in Appendix A.1. Evaluation of DERM-Reason. Based upon the judge models used for DERM-Rep, we further incorporated DeepSeek-V3.2 as an additional judge model to diversify the evaluation ensemble and reduce potential model-specific biases....

work page
[8]

Overall pathogenesis analysis : assessing whether the model effectively integrates visual features (e.g., erythema, pigmentation) with other clinical information to produce a coherent and clinically meaningful pathogenesis reasoning process

work page
[9]

Syndrome differentiation: evaluating the correctness of the generated syndrome diagnosis

work page
[10]

Treatment principle selection : assessing whether the select ed treatment principles are appropriate, syndrome-consistent, and sufficiently cover the inferred pathogenesis

work page
[11]

Formula selection and prescription generation : evaluating the accuracy of formula matching and the correctness of herbal composition, computed according to the proposed quantitative formulation-matching criteria

work page
[12]

The detailed evaluation protocol and scorin g criteria for DERM-Reason are presented in Appendix A.2

Output completeness : assessing whether all required diagnostic and therapeutic components are present. The detailed evaluation protocol and scorin g criteria for DERM-Reason are presented in Appendix A.2. Human Evaluation Due to automated evaluations are inherently limited by the professional knowledge encoded within the underlying models or the fixed NL...

work page
[13]

Evaluations for integrated dermatologic lesion description and TCM pathogenesis analysis

Experiments Automatic Evaluation Using BLEU-4 and ROUGE-L Table 1. Evaluations for integrated dermatologic lesion description and TCM pathogenesis analysis. The highest score of each item is in bold. Evaluations for Description and Pathogenesis of Dermatologic Lesions Item GPT5 .1-instant gemini-3-Flash Qwen2.5-VL-7B qwen3-VL-8B DERM-Rep BLEU-4 Descriptio...

work page 2044
[14]

than GPT-5.1-Instant (≈ 7), suggesting improved robustness and reduced sensitivity to case-level variability. We next evaluate agent DERM-Reason, which performs end-to-e nd multimodal clinical reasoning, including overall pathogenesis analysis , syndrome differentiation, treatment principle selection, and prescription recommendation. Table 3 summarizes th...

work page
[15]

After 14 TCM dermatology specialists completed the evaluation, we conducted semi-structured interviews and summarized the feedback

Discussion As described in the Human Evaluation section, all models were anonymized and labeled as A–E during the assessment process, where A: Qwen3-VL-8B-Instruct, B: Qwen2.5-VL-7B-Instruct, C: DERM-3R, D: GPT-5.1-Instant, and E: Gemini-3-Flash. After 14 TCM dermatology specialists completed the evaluation, we conducted semi-structured interviews and sum...

work page
[16]

pathogenesis → principle → formula → herbs

Human evaluation criteria. Most clinicians reported that, during scoring, they prioritized the internal logical coherence of the generated response over its literal similarity to the golden label. When the model produced a continuous and internally consistent “pathogenesis → principle → formula → herbs” reasoning chain, higher scores were assigned. Conver...

work page
[17]

pathogenesis analyses were more structured and logically complete

Overall comparative judgment of models. Several clinicians considered models C, D, and E to perform the best, noting that their “pathogenesis analyses were more structured and logically complete.” In contrast, models A and B “occasiona lly generated irrelevant content or produced clearly incorrect lesion descriptions.”

work page
[18]

rarely produced unreasonable treatments

Strengths and wea knesses of DERM-3R. Multiple experts highlighted that DERM-3R demonstrated a more coherent reasoning chain —particularly in pathogenesis interpretation, therapeutic principles, and formula selection —and “rarely produced unreasonable treatments.” Its prescriptions were regarded as reliable, contributing to its higher scores. However, cli...

work page
[19]

pustules

Perspectives on lesion image recognition. Clinicians unanimously agreed that dermatologic image interpretation is greatest advantage of AI model in dermatology and should remain a key modeling focus. However, some models exhibited visual hallucinations (e.g., fabricating “pustules”), particularly model B. Descriptions of remission-stage lesions were inacc...

work page
[20]

fits well with the logic of TCM pattern differentiation and treatment,

Integration of image and symptom information. Clinicians emphasized that TCM syndrome differentiation relies not only on lesion morphology but also heavily on systemic symptoms (e.g., cold –heat, deficiency –excess, Zang –Fu organ patterns). Image-only models cannot capture all diagnostically relevant information. Therefore, the multimodal agents strategy...

work page
[21]

future trend

General attitudes toward TCM-AI integration. Clinicians widely viewed the integration of AI and TCM as a “future trend” and “a direction worth pursuing.” However, several experts emphasized that current models —including leading hundred-billion-parameter general-purpose models—are still far from real-world clinical deployment. Improvements in safety, cons...

work page
[22]

r emove the false and preserve the true,

The role of AI in TCM clinical practice. Clinicians emphasized th at the advancement of TCM fundamentally depends on human-driven truth refinement. Current AI systems cannot yet replace human diagnostic reasoning, and their logical chains in pattern differentiation remain inferior to those of experienced clinicians. Thus, TCM –AI integration should be des...

work page
[23]

The limitations of chemically based therapies in modern dermatolo gy are becoming increasingly appa rent

Conclusion Skin diseases, as a class of persistent and globally prevalent disorders, continue to pose significant challenges for both diagnosis and treatment. The limitations of chemically based therapies in modern dermatolo gy are becoming increasingly appa rent. Developing therapeutic strategies that integrate natural pl ant- and animal-derived medicine...

work page
[24]

Global burden of skin and subcutaneous diseases: an update from the Global Burden of Disease Study 2021

Huai P , Xing P , Yang Y , Kong Y , Zhang F. Global burden of skin and subcutaneous diseases: an update from the Global Burden of Disease Study 2021. British Journal of Dermatology 2025; 192(6): 1136-8

work page 2021
[25]

Skin Health at Risk? Examining the Implications of a United States Exit from the World Health Organization

Freeman E, Anwar S, Ahmad N, Fuller LC. Skin Health at Risk? Examining the Implications of a United States Exit from the World Health Organization. Journal of Investigative Dermatology 2025

work page 2025
[26]

Paradoxical eruptions to targeted therapies in dermatology: a systematic review and analysis

Murphy MJ, Cohen JM, Vesely MD, Damsky W. Paradoxical eruptions to targeted therapies in dermatology: a systematic review and analysis. Journal of the American Academy of Dermatology 2022; 86(5): 1080-91

work page 2022
[27]

Beyond the blockade: unmet needs in systemic targeted atopic dermatitis therapy

Zhang L, Peng G, Wang M, Niyonsaba F, Gao X. Beyond the blockade: unmet needs in systemic targeted atopic dermatitis therapy. Frontiers in Immunology 2025; 16: 1712757

work page 2025
[28]

Closing the gap between possibilities and reality in psoriasis management

Gyulai R. Closing the gap between possibilities and reality in psoriasis management. Journal of the European Academy of Dermatology and Venereology 2025; 39(3): 449

work page 2025
[29]

Beyond the dichotomy: understanding the overlap between atopic dermatitis and psoriasis

Li M, Wang J, Liu Q, et al. Beyond the dichotomy: understanding the overlap between atopic dermatitis and psoriasis. Frontiers in Immunology 2025; 16: 1541776

work page 2025
[30]

Extracellu lar matrix in skin diseases: The road to new therapies

Malta M, Cerqueira MT, Marques A. Extracellu lar matrix in skin diseases: The road to new therapies. Journal of Advanced Research 2023; 51: 149-60

work page 2023
[31]

Immunomodulatory plant natural products as therapeutics against inflammatory skin diseases

Sampath Kumar N, Reddy N, Kumar H, Vemireddy S. Immunomodulatory plant natural products as therapeutics against inflammatory skin diseases. Current Topics in Medicinal Chemistry 2024; 24(12): 1013-34

work page 2024
[32]

Adding Chinese herbal medicine bath therapy to conventional therapies for psoriasis vulgaris: a systematic review with meta-analysis of randomised controlled trials

Wang J, Zhang CS, Zhang AL, Chen H, Xue CC, Lu C. Adding Chinese herbal medicine bath therapy to conventional therapies for psoriasis vulgaris: a systematic review with meta-analysis of randomised controlled trials. Phytomedicine 2024; 128: 155381

work page 2024
[33]

Patients-oriented treatments for chronic inflammatory skin diseases

Mastorino L, Ribero S, Burlando M, Mendes-Bastos P . Patients-oriented treatments for chronic inflammatory skin diseases. Frontiers Media SA; 2024. p. 1473753

work page 2024
[34]

Exploring the me chanism of Notopterygii rhizoma et radix in the treatment of psoriasis using a network Pharmacology approach and experimental validation

Liu J, Tan C, Shi J, et al. Exploring the me chanism of Notopterygii rhizoma et radix in the treatment of psoriasis using a network Pharmacology approach and experimental validation. Scientific Reports 2025; 15(1): 40422

work page 2025
[35]

Chinese herbal medicine for atopic dermatitis: a systematic review

Tan HY , Zhang AL, Chen D, Xue CC, Lenon GB. Chinese herbal medicine for atopic dermatitis: a systematic review. Journal of the American Academy of Dermatology 2013; 69(2): 295-304

work page 2013
[36]

Jo H-G, Kim H, Baek E, Seo J, Lee D. E fficacy and safety of orally administered east asian herbal medicine combined with narrowband ultraviolet b against psoriasis: a bayesian network meta-analysis and network analysis. Nutrients 2024; 16(16): 2690

work page 2024
[37]

Can GPTs accelerate the development of intelligent diagnosis and treatment in traditional Chinese Medici ne? A survey and empirical analysis

Guo Y , Wang H, Ren X, et al. Can GPTs accelerate the development of intelligent diagnosis and treatment in traditional Chinese Medici ne? A survey and empirical analysis. Journal of Evidence ‐ Based Medicine 2025; 18(1): e70004

work page 2025
[38]

Traditional Chinese medicine for further categorization of atopic dermatitis subtypes

Chau CA, Lio P . Traditional Chinese medicine for further categorization of atopic dermatitis subtypes. Journal of Integrative Dermatology 2024

work page 2024
[39]

Traditional Ch inese medicine in dermatology

Ho J, Ong PH. Traditional Ch inese medicine in dermatology. Pediatric Skin of Color 2015: 427-37. 1 7 . R e n Y, L u o X , Wa n g Y, e t a l . L a r g e l a n g u a g e models in traditional chinese medicine: A scoping review. Journal of Evidence‐Based Medicine 2025; 18(1): e12658

work page 2015
[40]

Tianyi: A traditional Chinese medicine all-rounder language model and its real-world clinical practice

Liu Z, Yang T , Wang J, et al. Tianyi: A traditional Chinese medicine all-rounder language model and its real-world clinical practice. Information Fusion 2025: 103663

work page 2025
[41]

Yang A, Yang B, Zhang B, et al. Qwen2. 5 technical report. arXiv preprint arXiv:241215115 2024

work page 2024
[42]

Lora: Low-rank adaptation of large language models

Hu EJ, Shen Y , Wallis P , et al. Lora: Low-rank adaptation of large language models. ICLR 2022; 1(2): 3

work page 2022
[43]

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Jiang L, Chai Y , Li M, et al. Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond). arXiv preprint arXiv:251022954 2025

work page 2025
[44]

Statistically Significant Results On Biases and Errors of LLMs Do Not Guarantee Generalizable Results

Liu J, Qiu H, Lasko J, Karakos D, Yarmohammadi M, Dredze M. Statistically Significant Results On Biases and Errors of LLMs Do Not Guarantee Generalizable Results. arXiv preprint arXiv:251102246 2025. 2 3 . S h a n g Z . U s e o f D e l p h i i n h e a l t h sciences research: a narrative review. Medicine 2023; 102(7): e32829. Acknowledgements The authors ...

work page 2025
[45]

Scoring:  Answers both items: 5 points  Answers only one item: 2.5 points  Unanswered or invalid answer: 0 points

Response Completeness (Max 5 points) Standard: Whether the model answers both items: [Specialty Condition] and [Specialty Pathomechanism]. Scoring:  Answers both items: 5 points  Answers only one item: 2.5 points  Unanswered or invalid answer: 0 points

work page
[46]

red rash

Specialty Condition Description (Max 10 points) 1.1 Core Feature Matching (6 points) First, carefully observe the actual skin lesion presentation in the imag es. Then evaluate the accuracy of the model’s description:  6 (Excellent): Accurately identifies all core features o Location (e.g., bilateral lower limbs, face, dorsum of hands, etc.) o Color (e.g....

work page
[47]

intense toxic heat

Specialty-Indicative Pathomechanism (Max 10 points) 2.1 Reasoning Accuracy (6 points) Judge based on the image findings and RAG knowledge:  6: Pathomechanism fully matches visual features and follows RAG knowledge o e.g., brown hyperpigmentation → infer blood deficiency with dryness (xue xu xue zao) o e.g., bright-red papules → infer blood heat with wind...

work page
[48]

Response Completeness (Basic bonus item, Max 5 points) Scoring Standard: Evaluate whether the model fully answers the following 5 items:

work page
[49]

Patient Pathomechanism

work page
[50]

Applicable Treatment Method

work page
[51]

Prescription Scoring Method: Score = (Number of items actually answered / 5) × 5

work page
[52]

Evidence chain is complete

Patient Pathomechanism Analysis (Max 10 points) 1.1 Evidence Extraction Accuracy (4 points)  4: Accurately identifies image-based visual features (e.g., brown hyperpigmented patches, erythematous papules) and case-based signs (e .g., pale-red tongue, deep pulse). Evidence chain is complete.  2: Extracts only textual information and igno res image featur...

work page
[53]

blood dryness syndrome

Syndrome Differentiation (Max 10 points) 2.1 Syndrome Accuracy (6 points)  6: Syndrome name (e.g., “blood dryness syndrome”) is fully consistent with the Label or RAG, or is a standardized equivalent term in TCM.  3: Only half correct in disease location or nature (e.g., diagnoses “blood deficiency” but misses “blood dryness”).  0: Completely incorrect...

work page
[54]

nourish blood and moisten skin; clear toxin and stop itching

Applicable Treatment Method (Max 10 points) 3.1 Therapeutic Principle Targeting (6 points)  6: Treatment method (e.g., “nourish blood and moisten skin; clear toxin and stop itching ” ) perfectly matches the pathomechanism and syndrome, reflecting RAG-recommended strategies.  3: Directionally correct but not precise enough (e.g., only “stop itching” whil...

work page
[55]

Yangxue Jiedu Decoction

Formula Selection & Prescription (Max 10 points) 4.1 Formula Name Match (2 points)  2: Formula name (e.g., “Yangxue Jiedu Decoction”) is exactly correct.  1: Formula name differs, but the therapeutic intent/core efficacy is similar. 4.2 Herb/Medicinal Matching (7 points) Formula: Score = (Number of identical medicinals / Total number of medicinals in th...

work page
[56]

Original Patient Case (Text): {text_case}

work page
[57]

RAG Retrieved Knowledge: {rag_content}

work page
[58]

Standard Answer (Label): {label_content}

work page
[59]

Model Output (Model Response): {model_output} B. The Showcase of DERM-3R on the Real-World Clinical Cases We shown some TCM psoriasis cas es that are collected from the clinic in XXX hospital, whose results are generated by DERM-3R. Case 1 Input Information Diffuse erythema and desquamation over the entire body, accompanied by a small number of pustules. ...

work page

[1] [1]

At the 78th World Health Assembly concluded on May 27, 2025, Member States of the World Health Organization (WHO) formally adopted a resolution recognizing skin diseases as a global public health priority

work page 2025

[2] [2]

one patient, one prescription

The burden of skin diseases is primarily reflected in their profound impact on appearance and social functioning, frequently leading to anxiety, depression, and other psychological disorders; persistent pruritus or pain associated with many dermatologic conditions significantly reduces quality of life; moreover, certain skin diseases represent cutaneous m...

work page

[3] [3]

However, current researches on LLMs have excessively emphasized large-scale parameterization and trillion-token datasets to demonstrate general-purpose capabilities. This paradigm results in high energy consumption, prohibitiv e deployment costs, and severe hallucination issues in specialized medical domains, ultimately limiting their practical applicabil...

work page

[4] [4]

DERM-3R We propose DERM-3R (Dermatology Extraction and Reasoning via Multi-Agent), a multimodal multi-agent framework designed to achieve fine-grained lesion recognition, comprehensive dermatologic specialty representation, and structured clinical reasoning for dermatologic diagnosis and treatment. The framewo rk consists of three specialized agents: a De...

work page

[5] [5]

As the primary focus of this study lies in clinic al diagnosis and therapeutic decision-making, we conducted extensive evaluations on the DERM-Rep and DERM-Reason agents

Evaluation Framework To comprehensively assess the performance of the proposed DERM-3R framework in multimodal dermatologic diagnosis and treatment, we designed a hybrid evaluation protocol that integrates automatic metrics and medical expert-l evel assessments, taking in to full consideration the complexity of clinical dermatologic reasoning and the requ...

work page

[6] [6]

Accuracy of dermatologic lesion descriptions

work page

[7] [7]

The detailed scoring rubric for DERM-Rep is provided in Appendix A.1

Accuracy of the generated TCM dermatologic pathogenesis analysis. The detailed scoring rubric for DERM-Rep is provided in Appendix A.1. Evaluation of DERM-Reason. Based upon the judge models used for DERM-Rep, we further incorporated DeepSeek-V3.2 as an additional judge model to diversify the evaluation ensemble and reduce potential model-specific biases....

work page

[8] [8]

Overall pathogenesis analysis : assessing whether the model effectively integrates visual features (e.g., erythema, pigmentation) with other clinical information to produce a coherent and clinically meaningful pathogenesis reasoning process

work page

[9] [9]

Syndrome differentiation: evaluating the correctness of the generated syndrome diagnosis

work page

[10] [10]

Treatment principle selection : assessing whether the select ed treatment principles are appropriate, syndrome-consistent, and sufficiently cover the inferred pathogenesis

work page

[11] [11]

Formula selection and prescription generation : evaluating the accuracy of formula matching and the correctness of herbal composition, computed according to the proposed quantitative formulation-matching criteria

work page

[12] [12]

The detailed evaluation protocol and scorin g criteria for DERM-Reason are presented in Appendix A.2

Output completeness : assessing whether all required diagnostic and therapeutic components are present. The detailed evaluation protocol and scorin g criteria for DERM-Reason are presented in Appendix A.2. Human Evaluation Due to automated evaluations are inherently limited by the professional knowledge encoded within the underlying models or the fixed NL...

work page

[13] [13]

Evaluations for integrated dermatologic lesion description and TCM pathogenesis analysis

Experiments Automatic Evaluation Using BLEU-4 and ROUGE-L Table 1. Evaluations for integrated dermatologic lesion description and TCM pathogenesis analysis. The highest score of each item is in bold. Evaluations for Description and Pathogenesis of Dermatologic Lesions Item GPT5 .1-instant gemini-3-Flash Qwen2.5-VL-7B qwen3-VL-8B DERM-Rep BLEU-4 Descriptio...

work page 2044

[14] [14]

than GPT-5.1-Instant (≈ 7), suggesting improved robustness and reduced sensitivity to case-level variability. We next evaluate agent DERM-Reason, which performs end-to-e nd multimodal clinical reasoning, including overall pathogenesis analysis , syndrome differentiation, treatment principle selection, and prescription recommendation. Table 3 summarizes th...

work page

[15] [15]

After 14 TCM dermatology specialists completed the evaluation, we conducted semi-structured interviews and summarized the feedback

Discussion As described in the Human Evaluation section, all models were anonymized and labeled as A–E during the assessment process, where A: Qwen3-VL-8B-Instruct, B: Qwen2.5-VL-7B-Instruct, C: DERM-3R, D: GPT-5.1-Instant, and E: Gemini-3-Flash. After 14 TCM dermatology specialists completed the evaluation, we conducted semi-structured interviews and sum...

work page

[16] [16]

pathogenesis → principle → formula → herbs

Human evaluation criteria. Most clinicians reported that, during scoring, they prioritized the internal logical coherence of the generated response over its literal similarity to the golden label. When the model produced a continuous and internally consistent “pathogenesis → principle → formula → herbs” reasoning chain, higher scores were assigned. Conver...

work page

[17] [17]

pathogenesis analyses were more structured and logically complete

Overall comparative judgment of models. Several clinicians considered models C, D, and E to perform the best, noting that their “pathogenesis analyses were more structured and logically complete.” In contrast, models A and B “occasiona lly generated irrelevant content or produced clearly incorrect lesion descriptions.”

work page

[18] [18]

rarely produced unreasonable treatments

Strengths and wea knesses of DERM-3R. Multiple experts highlighted that DERM-3R demonstrated a more coherent reasoning chain —particularly in pathogenesis interpretation, therapeutic principles, and formula selection —and “rarely produced unreasonable treatments.” Its prescriptions were regarded as reliable, contributing to its higher scores. However, cli...

work page

[19] [19]

pustules

Perspectives on lesion image recognition. Clinicians unanimously agreed that dermatologic image interpretation is greatest advantage of AI model in dermatology and should remain a key modeling focus. However, some models exhibited visual hallucinations (e.g., fabricating “pustules”), particularly model B. Descriptions of remission-stage lesions were inacc...

work page

[20] [20]

fits well with the logic of TCM pattern differentiation and treatment,

Integration of image and symptom information. Clinicians emphasized that TCM syndrome differentiation relies not only on lesion morphology but also heavily on systemic symptoms (e.g., cold –heat, deficiency –excess, Zang –Fu organ patterns). Image-only models cannot capture all diagnostically relevant information. Therefore, the multimodal agents strategy...

work page

[21] [21]

future trend

General attitudes toward TCM-AI integration. Clinicians widely viewed the integration of AI and TCM as a “future trend” and “a direction worth pursuing.” However, several experts emphasized that current models —including leading hundred-billion-parameter general-purpose models—are still far from real-world clinical deployment. Improvements in safety, cons...

work page

[22] [22]

r emove the false and preserve the true,

The role of AI in TCM clinical practice. Clinicians emphasized th at the advancement of TCM fundamentally depends on human-driven truth refinement. Current AI systems cannot yet replace human diagnostic reasoning, and their logical chains in pattern differentiation remain inferior to those of experienced clinicians. Thus, TCM –AI integration should be des...

work page

[23] [23]

The limitations of chemically based therapies in modern dermatolo gy are becoming increasingly appa rent

Conclusion Skin diseases, as a class of persistent and globally prevalent disorders, continue to pose significant challenges for both diagnosis and treatment. The limitations of chemically based therapies in modern dermatolo gy are becoming increasingly appa rent. Developing therapeutic strategies that integrate natural pl ant- and animal-derived medicine...

work page

[24] [24]

Global burden of skin and subcutaneous diseases: an update from the Global Burden of Disease Study 2021

Huai P , Xing P , Yang Y , Kong Y , Zhang F. Global burden of skin and subcutaneous diseases: an update from the Global Burden of Disease Study 2021. British Journal of Dermatology 2025; 192(6): 1136-8

work page 2021

[25] [25]

Skin Health at Risk? Examining the Implications of a United States Exit from the World Health Organization

Freeman E, Anwar S, Ahmad N, Fuller LC. Skin Health at Risk? Examining the Implications of a United States Exit from the World Health Organization. Journal of Investigative Dermatology 2025

work page 2025

[26] [26]

Paradoxical eruptions to targeted therapies in dermatology: a systematic review and analysis

Murphy MJ, Cohen JM, Vesely MD, Damsky W. Paradoxical eruptions to targeted therapies in dermatology: a systematic review and analysis. Journal of the American Academy of Dermatology 2022; 86(5): 1080-91

work page 2022

[27] [27]

Beyond the blockade: unmet needs in systemic targeted atopic dermatitis therapy

Zhang L, Peng G, Wang M, Niyonsaba F, Gao X. Beyond the blockade: unmet needs in systemic targeted atopic dermatitis therapy. Frontiers in Immunology 2025; 16: 1712757

work page 2025

[28] [28]

Closing the gap between possibilities and reality in psoriasis management

Gyulai R. Closing the gap between possibilities and reality in psoriasis management. Journal of the European Academy of Dermatology and Venereology 2025; 39(3): 449

work page 2025

[29] [29]

Beyond the dichotomy: understanding the overlap between atopic dermatitis and psoriasis

Li M, Wang J, Liu Q, et al. Beyond the dichotomy: understanding the overlap between atopic dermatitis and psoriasis. Frontiers in Immunology 2025; 16: 1541776

work page 2025

[30] [30]

Extracellu lar matrix in skin diseases: The road to new therapies

Malta M, Cerqueira MT, Marques A. Extracellu lar matrix in skin diseases: The road to new therapies. Journal of Advanced Research 2023; 51: 149-60

work page 2023

[31] [31]

Immunomodulatory plant natural products as therapeutics against inflammatory skin diseases

Sampath Kumar N, Reddy N, Kumar H, Vemireddy S. Immunomodulatory plant natural products as therapeutics against inflammatory skin diseases. Current Topics in Medicinal Chemistry 2024; 24(12): 1013-34

work page 2024

[32] [32]

Adding Chinese herbal medicine bath therapy to conventional therapies for psoriasis vulgaris: a systematic review with meta-analysis of randomised controlled trials

Wang J, Zhang CS, Zhang AL, Chen H, Xue CC, Lu C. Adding Chinese herbal medicine bath therapy to conventional therapies for psoriasis vulgaris: a systematic review with meta-analysis of randomised controlled trials. Phytomedicine 2024; 128: 155381

work page 2024

[33] [33]

Patients-oriented treatments for chronic inflammatory skin diseases

Mastorino L, Ribero S, Burlando M, Mendes-Bastos P . Patients-oriented treatments for chronic inflammatory skin diseases. Frontiers Media SA; 2024. p. 1473753

work page 2024

[34] [34]

Exploring the me chanism of Notopterygii rhizoma et radix in the treatment of psoriasis using a network Pharmacology approach and experimental validation

Liu J, Tan C, Shi J, et al. Exploring the me chanism of Notopterygii rhizoma et radix in the treatment of psoriasis using a network Pharmacology approach and experimental validation. Scientific Reports 2025; 15(1): 40422

work page 2025

[35] [35]

Chinese herbal medicine for atopic dermatitis: a systematic review

Tan HY , Zhang AL, Chen D, Xue CC, Lenon GB. Chinese herbal medicine for atopic dermatitis: a systematic review. Journal of the American Academy of Dermatology 2013; 69(2): 295-304

work page 2013

[36] [36]

Jo H-G, Kim H, Baek E, Seo J, Lee D. E fficacy and safety of orally administered east asian herbal medicine combined with narrowband ultraviolet b against psoriasis: a bayesian network meta-analysis and network analysis. Nutrients 2024; 16(16): 2690

work page 2024

[37] [37]

Can GPTs accelerate the development of intelligent diagnosis and treatment in traditional Chinese Medici ne? A survey and empirical analysis

Guo Y , Wang H, Ren X, et al. Can GPTs accelerate the development of intelligent diagnosis and treatment in traditional Chinese Medici ne? A survey and empirical analysis. Journal of Evidence ‐ Based Medicine 2025; 18(1): e70004

work page 2025

[38] [38]

Traditional Chinese medicine for further categorization of atopic dermatitis subtypes

Chau CA, Lio P . Traditional Chinese medicine for further categorization of atopic dermatitis subtypes. Journal of Integrative Dermatology 2024

work page 2024

[39] [39]

Traditional Ch inese medicine in dermatology

Ho J, Ong PH. Traditional Ch inese medicine in dermatology. Pediatric Skin of Color 2015: 427-37. 1 7 . R e n Y, L u o X , Wa n g Y, e t a l . L a r g e l a n g u a g e models in traditional chinese medicine: A scoping review. Journal of Evidence‐Based Medicine 2025; 18(1): e12658

work page 2015

[40] [40]

Tianyi: A traditional Chinese medicine all-rounder language model and its real-world clinical practice

Liu Z, Yang T , Wang J, et al. Tianyi: A traditional Chinese medicine all-rounder language model and its real-world clinical practice. Information Fusion 2025: 103663

work page 2025

[41] [41]

Yang A, Yang B, Zhang B, et al. Qwen2. 5 technical report. arXiv preprint arXiv:241215115 2024

work page 2024

[42] [42]

Lora: Low-rank adaptation of large language models

Hu EJ, Shen Y , Wallis P , et al. Lora: Low-rank adaptation of large language models. ICLR 2022; 1(2): 3

work page 2022

[43] [43]

Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)

Jiang L, Chai Y , Li M, et al. Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond). arXiv preprint arXiv:251022954 2025

work page 2025

[44] [44]

Statistically Significant Results On Biases and Errors of LLMs Do Not Guarantee Generalizable Results

Liu J, Qiu H, Lasko J, Karakos D, Yarmohammadi M, Dredze M. Statistically Significant Results On Biases and Errors of LLMs Do Not Guarantee Generalizable Results. arXiv preprint arXiv:251102246 2025. 2 3 . S h a n g Z . U s e o f D e l p h i i n h e a l t h sciences research: a narrative review. Medicine 2023; 102(7): e32829. Acknowledgements The authors ...

work page 2025

[45] [45]

Scoring:  Answers both items: 5 points  Answers only one item: 2.5 points  Unanswered or invalid answer: 0 points

Response Completeness (Max 5 points) Standard: Whether the model answers both items: [Specialty Condition] and [Specialty Pathomechanism]. Scoring:  Answers both items: 5 points  Answers only one item: 2.5 points  Unanswered or invalid answer: 0 points

work page

[46] [46]

red rash

Specialty Condition Description (Max 10 points) 1.1 Core Feature Matching (6 points) First, carefully observe the actual skin lesion presentation in the imag es. Then evaluate the accuracy of the model’s description:  6 (Excellent): Accurately identifies all core features o Location (e.g., bilateral lower limbs, face, dorsum of hands, etc.) o Color (e.g....

work page

[47] [47]

intense toxic heat

Specialty-Indicative Pathomechanism (Max 10 points) 2.1 Reasoning Accuracy (6 points) Judge based on the image findings and RAG knowledge:  6: Pathomechanism fully matches visual features and follows RAG knowledge o e.g., brown hyperpigmentation → infer blood deficiency with dryness (xue xu xue zao) o e.g., bright-red papules → infer blood heat with wind...

work page

[48] [48]

Response Completeness (Basic bonus item, Max 5 points) Scoring Standard: Evaluate whether the model fully answers the following 5 items:

work page

[49] [49]

Patient Pathomechanism

work page

[50] [50]

Applicable Treatment Method

work page

[51] [51]

Prescription Scoring Method: Score = (Number of items actually answered / 5) × 5

work page

[52] [52]

Evidence chain is complete

Patient Pathomechanism Analysis (Max 10 points) 1.1 Evidence Extraction Accuracy (4 points)  4: Accurately identifies image-based visual features (e.g., brown hyperpigmented patches, erythematous papules) and case-based signs (e .g., pale-red tongue, deep pulse). Evidence chain is complete.  2: Extracts only textual information and igno res image featur...

work page

[53] [53]

blood dryness syndrome

Syndrome Differentiation (Max 10 points) 2.1 Syndrome Accuracy (6 points)  6: Syndrome name (e.g., “blood dryness syndrome”) is fully consistent with the Label or RAG, or is a standardized equivalent term in TCM.  3: Only half correct in disease location or nature (e.g., diagnoses “blood deficiency” but misses “blood dryness”).  0: Completely incorrect...

work page

[54] [54]

nourish blood and moisten skin; clear toxin and stop itching

Applicable Treatment Method (Max 10 points) 3.1 Therapeutic Principle Targeting (6 points)  6: Treatment method (e.g., “nourish blood and moisten skin; clear toxin and stop itching ” ) perfectly matches the pathomechanism and syndrome, reflecting RAG-recommended strategies.  3: Directionally correct but not precise enough (e.g., only “stop itching” whil...

work page

[55] [55]

Yangxue Jiedu Decoction

Formula Selection & Prescription (Max 10 points) 4.1 Formula Name Match (2 points)  2: Formula name (e.g., “Yangxue Jiedu Decoction”) is exactly correct.  1: Formula name differs, but the therapeutic intent/core efficacy is similar. 4.2 Herb/Medicinal Matching (7 points) Formula: Score = (Number of identical medicinals / Total number of medicinals in th...

work page

[56] [56]

Original Patient Case (Text): {text_case}

work page

[57] [57]

RAG Retrieved Knowledge: {rag_content}

work page

[58] [58]

Standard Answer (Label): {label_content}

work page

[59] [59]

Model Output (Model Response): {model_output} B. The Showcase of DERM-3R on the Real-World Clinical Cases We shown some TCM psoriasis cas es that are collected from the clinic in XXX hospital, whose results are generated by DERM-3R. Case 1 Input Information Diffuse erythema and desquamation over the entire body, accompanied by a small number of pustules. ...

work page