Recognition: unknown
Infection-Reasoner: A Compact Vision-Language Model for Wound Infection Classification with Evidence-Grounded Clinical Reasoning
Pith reviewed 2026-05-10 03:11 UTC · model grok-4.3
The pith
A 4B-parameter vision-language model classifies chronic wound infections from photos at 86.8% accuracy and generates rationales that experts rate correct or partially correct 94% of the time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Infection-Reasoner achieves 86.8% accuracy, 86.4% sensitivity, and 87.1% specificity on a held-out heterogeneous wound dataset, outperforming GPT-5.1 and other baselines, while producing rationales that receive visual-support agreement scores of 0.722-0.903 from MLLM judges and are rated Correct by experts in 61.8% of cases and Partially Correct in 32.4% of cases.
What carries the argument
The two-stage training pipeline that first performs reasoning distillation of GPT-5.1 chain-of-thought rationales onto the Qwen3-VL-4B-Thinking student model and then applies Group Relative Policy Optimization reinforcement learning on a small labeled infection dataset.
If this is right
- The model supplies both a classification decision and an explicit visual reasoning trace suitable for point-of-care review.
- Performance exceeds that of the larger GPT-5.1 model despite using far fewer parameters.
- Rationale quality remains high across heterogeneous wound etiologies, locations, and imaging conditions.
- The pipeline reduces reliance on large volumes of expert-annotated reasoning data.
Where Pith is reading between the lines
- The same distillation-plus-RL recipe could be tested on other medical image classification tasks that currently lack reasoning annotations.
- Deployment on mobile devices becomes feasible because the final model is only 4B parameters.
- If rationale quality holds in prospective clinical use, the outputs could serve as training material for human clinicians.
Load-bearing premise
GPT-5.1-generated rationales on unlabeled wound images supply accurate and unbiased supervision that the small labeled dataset can then refine without inheriting teacher errors or causing overfitting.
What would settle it
Retraining the same 4B base model on the identical small labeled set but without the GPT-5.1 distillation stage and measuring whether accuracy falls below 86.8% or expert-rated rationale correctness drops below 60%.
Figures
read the original abstract
Assessing chronic wound infection from photographs is challenging because visual appearance varies across wound etiologies, anatomical locations, and imaging conditions. Prior image-based deep learning methods have mainly focused on classification with limited interpretability, despite the need for evidence-grounded explanations to support point-of-care decision making. We present Infection-Reasoner, a compact 4B-parameter reasoning vision-language model for chronic wound infection classification and rationale generation. To address the scarcity of expert-labeled wound images with reasoning annotations, Infection-Reasoner is trained using a two-stage pipeline: (1) reasoning distillation, in which GPT-5.1 generates chain-of-thought rationales for unlabeled wound images to initialize wound-specific reasoning in a smaller student model (Qwen3-VL-4B-Thinking), and (2) reinforcement learning post-training with Group Relative Policy Optimization on a small labeled infection dataset to refine classification reasoning. On a held-out heterogeneous wound dataset, Infection-Reasoner achieved 86.8\% accuracy, 86.4\% sensitivity, and 87.1\% specificity, outperforming several strong baselines, including GPT-5.1. Rationale quality was further evaluated using both multimodal large language model (MLLM) judges and wound expert review. Across four MLLM judges, visual-support agreement scores ranged from 0.722 to 0.903, while expert review rated 61.8\% of rationales as Correct and 32.4\% as Partially Correct.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Infection-Reasoner, a 4B-parameter vision-language model for chronic wound infection classification from photographs together with evidence-grounded rationale generation. Training proceeds in two stages: (1) reasoning distillation in which GPT-5.1 produces chain-of-thought rationales on unlabeled wound images to initialize a Qwen3-VL-4B-Thinking student, and (2) Group Relative Policy Optimization (GRPO) reinforcement learning on a small labeled infection dataset. On a held-out heterogeneous wound dataset the model reports 86.8% accuracy, 86.4% sensitivity and 87.1% specificity, outperforming several baselines including GPT-5.1. Rationale quality is assessed by four MLLM judges (visual-support agreement 0.722–0.903) and by wound-expert review (61.8% rated Correct, 32.4% Partially Correct).
Significance. If the empirical claims hold after the requested clarifications, the work supplies a compact, interpretable model that directly addresses the interpretability gap in prior image-only deep-learning wound classifiers. The distillation-plus-RL recipe for injecting clinical reasoning under limited labeled data is technically interesting and the reported outperformance of GPT-5.1 is a notable result. The combination of MLLM and expert rationale evaluation is a positive step toward grounded assessment.
major comments (2)
- [Abstract] Abstract: the reported performance figures (86.8% accuracy, 86.4% sensitivity, 87.1% specificity) and the claim of outperforming GPT-5.1 are presented without dataset size, diversity statistics, statistical significance tests, exact baseline implementations, or ablation results. These omissions are load-bearing for any claim that the two-stage pipeline yields reliable clinical reasoning.
- [Abstract] Abstract (two-stage pipeline description): no quantitative validation (expert agreement, error rate, or inter-rater reliability) is supplied for the GPT-5.1-generated chain-of-thought rationales on the unlabeled distillation images. Because the final expert review (61.8% Correct) occurs only after GRPO and does not isolate the distillation contribution, it is impossible to determine whether the reported rationale quality reflects genuine clinical reasoning or inherited teacher artifacts.
Simulated Author's Rebuttal
We thank the referee for the thorough and constructive review. We address each major comment below and have revised the manuscript to improve clarity and transparency regarding our experimental details and limitations.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported performance figures (86.8% accuracy, 86.4% sensitivity, 87.1% specificity) and the claim of outperforming GPT-5.1 are presented without dataset size, diversity statistics, statistical significance tests, exact baseline implementations, or ablation results. These omissions are load-bearing for any claim that the two-stage pipeline yields reliable clinical reasoning.
Authors: We agree that the abstract, as a concise summary, omits supporting details that are important for evaluating the claims. The full manuscript provides the held-out dataset description (size, diversity across etiologies, anatomical locations, and imaging conditions) in the experimental setup, reports statistical significance tests for outperformance over baselines including GPT-5.1 in the results, details exact baseline implementations and hyperparameters in the methods, and presents ablation studies on the two-stage pipeline. To address the concern, we have revised the abstract to include a brief reference to dataset scale and statistical significance while maintaining length constraints. revision: partial
-
Referee: [Abstract] Abstract (two-stage pipeline description): no quantitative validation (expert agreement, error rate, or inter-rater reliability) is supplied for the GPT-5.1-generated chain-of-thought rationales on the unlabeled distillation images. Because the final expert review (61.8% Correct) occurs only after GRPO and does not isolate the distillation contribution, it is impossible to determine whether the reported rationale quality reflects genuine clinical reasoning or inherited teacher artifacts.
Authors: We acknowledge the value of isolating the distillation stage's contribution through separate quantitative validation of the GPT-5.1 rationales. This was not performed due to the substantial expert annotation costs involved. The expert review was conducted on the final post-GRPO outputs to evaluate the end-to-end system, while MLLM judge scores offer additional supporting evidence for rationale quality. In the revised manuscript, we have added a dedicated limitations paragraph in the discussion that explicitly notes this gap, explains how GRPO refines initial reasoning, and outlines plans for future work on granular distillation ablations. revision: yes
Circularity Check
No circularity: results from empirical training on external data with held-out evaluation
full rationale
The paper presents a standard two-stage ML pipeline (reasoning distillation from GPT-5.1 on unlabeled wound images, followed by GRPO RL refinement on a small labeled set) and reports accuracy/sensitivity/specificity on a held-out heterogeneous dataset plus independent rationale quality checks by MLLM judges and wound experts. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the derivation chain. All performance numbers derive from external data splits and external judges rather than reducing to the training inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- GRPO hyperparameters
axioms (1)
- domain assumption GPT-5.1 chain-of-thought rationales on unlabeled wound images constitute high-quality supervision for clinical reasoning
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Nora Al-Garaawi, Raja Ebsim, Abbas FH Alharan, and Moi Hoon Yap. 2022. Diabetic foot ulcer classification using mapped binary patterns and convolutional neural networks.Computers in biology and medicine140 (2022), 105055
2022
-
[3]
2024.The Claude 3 Model Family: Opus, Sonnet, Haiku
Anthropic. 2024.The Claude 3 Model Family: Opus, Sonnet, Haiku. Technical Report. Anthropic. https://www-cdn.anthropic.com/ de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf Model card
2024
-
[4]
Anthropic. 2026. Claude Sonnet 4.6 System Card. https://www-cdn.anthropic.com/bbd8ef16d70b7a1665f14f306ee88b53f686aa75.pdf. Technical report
2026
-
[5]
Shuai Bai, Yulong Cai, Rui Chen, et al. 2025. Qwen3-VL technical report.arXiv preprint arXiv:2511.21631(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[6]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model.Journal of machine learning research3, Feb (2003), 1137–1155
2003
- [7]
-
[8]
Steven Bowers and Eginia Franco. 2020. Chronic wounds: evaluation and management.American family physician101, 3 (2020), 159–166
2020
-
[9]
Palawat Busaranuvong, Emmanuel Agu, Deepak Kumar, Shefalika Gautam, Reza Saadati Fard, Bengisu Tulu, and Diane Strong. 2025. Guided Conditional Diffusion Classifier (ConDiff) for Enhanced Prediction of Infection in Diabetic Foot Ulcers.IEEE Open Journal of Engineering in Medicine and Biology6 (2025), 20–27. doi:10.1109/OJEMB.2024.3453060
-
[10]
Palawat Busaranuvong, Emmanuel Agu, Reza Saadati Fard, Deepak Kumar, Shefalika Gautam, Bengisu Tulu, Diane Strong, and Lorraine Loretz. 2025. Explainable, multi-modal wound infection classification from images augmented with generated captions.ACM Transactions on Computing for Healthcare(2025)
2025
-
[11]
Caroline Chanussot-Deprez and José Contreras-Ruiz. 2013. Telemedicine in wound care: a review.Advances in skin & wound care26, 2 (2013), 78–82
2013
-
[12]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al . 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
Margaret A Fonder, Gerald S Lazarus, David A Cowan, Barbara Aronson-Cook, Angela R Kohli, and Adam J Mamelak. 2008. Treating the chronic wound: A practical approach to the care of nonhealing wounds and wound care dressings.Journal of the American Academy of Dermatology58, 2 (2008), 185–206. Infection-Reasoner: A Compact Vision-Language Model for Wound Inf...
2008
-
[14]
Peter J Franks, Judith Barker, Mark Collier, Georgina Gethin, Emily Haesler, Arkadiusz Jawien, Severin Laeuchli, Giovanni Mosti, Sebastian Probst, and Carolina Weller. 2016. Management of patients with venous leg ulcers: challenges and current best practice. Journal of wound care25, Sup6 (2016), S1–S67
2016
-
[15]
Adrian Galdran, Gustavo Carneiro, and Miguel A González Ballester. 2021. Convolutional nets versus vision transformers for diabetic foot ulcer classification. InDiabetic Foot Ulcers Grand Challenge. Springer, 21–29
2021
-
[16]
Google. 2025. MedGemma: A Gemma 3 Variant Optimized for Medical Text and Image Comprehension. https://deepmind.google/ models/gemma/medgemma/
2025
-
[17]
2025.Gemini 3 Pro Model Card
Google DeepMind. 2025.Gemini 3 Pro Model Card. Technical Report. Google DeepMind. https://storage.googleapis.com/deepmind- media/Model-Cards/Gemini-3-Pro-Model-Card.pdf Accessed: 2026-03-11
2025
-
[18]
Lisa Gould, Peter Abadir, Harold Brem, Marissa Carter, Teresa Conner-Kerr, Jeff Davidson, Luisa DiPietro, Vincent Falanga, Caroline Fife, Sue Gardner, et al. 2015. Chronic wound repair and healing in older adults: current status and future research.Wound Repair and Regeneration23, 1 (2015), 1–13
2015
-
[19]
Manu Goyal, Neil D Reeves, Satyan Rajbhandari, Naseer Ahmad, Chuan Wang, and Moi Hoon Yap. 2020. Recognition of ischaemia and infection in diabetic foot ulcers: Dataset and techniques.Computers in biology and medicine117 (2020), 103616
2020
-
[20]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al
-
[22]
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[23]
Xuehai et al. He. 2020. PathVQA: 30000+ Questions for Medical Visual Question Answering. InAAAI
2020
-
[24]
K Järbrink, G Ni, H Sönnergren, A Schmidtchen, C Pang, R Bajpai, and J Car. [n. d.]. The humanistic and economic burden of chronic wounds: a protocol for a systematic review. Syst Rev. 2017; 6 (1): 15
2017
-
[25]
Qiao Jin, Fangyuan Chen, Yiliang Zhou, Ziyang Xu, Justin M Cheung, Robert Chen, Ronald M Summers, Justin F Rousseau, Peiyun Ni, Marc J Landsman, et al. 2024. Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.NPJ Digital Medicine 7, 1 (2024), 190
2024
-
[26]
Robert Kaczmarczyk, Theresa Isabelle Wilhelm, Ron Martin, and Jonas Roos. 2024. Evaluating multimodal AI in medical diagnostics. NPJ Digital Medicine7, 1 (2024), 205
2024
-
[27]
Yuxiang Lai, Jike Zhong, Ming Li, Shitian Zhao, Yuheng Li, Konstantinos Psounis, and Xiaofeng Yang. 2026. Med-r1: Reinforcement learning for generalizable medical reasoning in vision-language models.IEEE Transactions on Medical Imaging(2026)
2026
-
[28]
Stephan J Landis. 2008. Chronic wound infection and antimicrobial use.Advances in skin & wound care21, 11 (2008), 531–540
2008
-
[29]
Jason J Lau, Soumya Gayen, Asma Ben Abacha, and Dina Demner-Fushman. 2018. A dataset of clinically generated visual questions and answers about radiology images.Scientific data5, 1 (2018), 1–10
2018
-
[30]
Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, and Jianfeng Gao
- [31]
-
[32]
Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, and Yejin Choi. 2023. Symbolic chain-of-thought distillation: Small models can also “think” step-by-step. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2665–2679
2023
-
[33]
Benjamin A Lipsky, Anthony R Berendt, Paul B Cornia, John C Pile, Edgar J G Peters, David G Armstrong, H Gunner Deery, John M Embil, Warren S Joseph, Adolf W Karchmer, et al. 2012. 2012 infectious diseases society of america clinical practice guideline for the diagnosis and treatment of diabetic foot infections.Clinical Infectious Diseases54, 12 (2012), e132–e173
2012
-
[34]
Benjamin A Lipsky, Matthew Dryden, Finn Gottrup, Dilip Nathwani, Ronald Andrew Seaton, and Jan Stryja. 2016. Antimicrobial stewardship in wound care: a position paper from the British Society for Antimicrobial Chemotherapy and European Wound Mgmt Assoc.J. Antimicrobial Chemotherapy71, 11 (2016), 3026–3035
2016
-
[35]
Bo Liu, Li-Ming Zhan, Li Xu, Lin Ma, Yan Yang, and Xiao-Ming Wu. 2021. Slake: A semantically-labeled knowledge-enhanced dataset for medical visual question answering. In2021 IEEE 18th international symposium on biomedical imaging (ISBI). IEEE, 1650–1654
2021
-
[36]
Ziyang Liu, Emmanuel Agu, Peder Pedersen, Clifford Lindsay, Bengisu Tulu, and Diane Strong. 2021. Comprehensive assessment of fine-grained wound images using a patch-based CNN with context-preserving attention.IEEE open journal of engineering in medicine and biology2 (2021), 224–234
2021
-
[37]
Ziyu Liu, Zeyi Sun, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Haodong Duan, Dahua Lin, and Jiaqi Wang. 2025. Visual-rft: Visual reinforcement fine-tuning. InProceedings of the IEEE/CVF International Conference on Computer Vision. 2034–2044
2025
-
[38]
Lorna MacLellan, G Gardner, and Anne Gardner. 2002. Designing the future in wound care: the role of the nurse practitioner.Primary Intention: The Australian Journal of Wound Management10, 3 (2002)
2002
-
[39]
Jay R McDonald, Stephen Y Liang, Ping Li, Salwa Maalouf, Clinton K Murray, Amy C Weintrob, Elizabeth R Schnaubelt, Janis Kuhn, Anuradha Ganesan, William Bradley, et al. 2018. Infectious complications after deployment trauma: following wounded US military personnel into veterans affairs care.Clinical infectious diseases67, 8 (2018), 1205–1212
2018
-
[40]
Medetec. [n. d.].Surgical Dressings and Wound Management Resource Centre - Home page. https://www.medetec.co.uk/ 26•Busaranuvong et al
-
[41]
Joseph L Mills Sr, Michael S Conte, David G Armstrong, Frank B Pomposelli, Andres Schanzer, Anton N Sidawy, George Andros, Society for Vascular Surgery Lower Extremity Guidelines Committee, et al. 2014. The society for vascular surgery lower extremity threatened limb classification system: risk stratification based on wound, ischemia, and foot infection (...
2014
-
[42]
Holly Nguyen, Emmanuel Agu, Bengisu Tulu, Diane Strong, Haadi Mombini, Peder Pedersen, Clifford Lindsay, Raymond Dunn, and Lorraine Loretz. 2020. Machine learning models for synthesizing actionable care decisions on lower extremity wounds.Smart Health18 (2020), 100139
2020
-
[43]
Samuel R Nussbaum, Marissa J Carter, Caroline E Fife, Joan DaVanzo, Randall Haught, Marcia Nusgart, and Donna Cartwright. 2018. An economic evaluation of the impact, cost, and medicare policy implications of chronic nonhealing wounds.Value in Health21, 1 (2018), 27–32
2018
-
[44]
Maja Olsson, Krister Järbrink, Ushashree Divakar, Ram Bajpai, Zee Upton, Artur Schmidtchen, and Josip Car. 2019. The humanistic and economic burden of chronic wounds: A systematic review.Wound repair and regeneration27, 1 (2019), 114–125
2019
-
[45]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744
2022
-
[46]
Jiazhen Pan, Che Liu, Junde Wu, Fenglin Liu, Jiayuan Zhu, Hongwei Bran Li, Chen Chen, Cheng Ouyang, and Daniel Rueckert. 2025. Medvlm-r1: Incentivizing medical reasoning capability of vision-language models (vlms) via reinforcement learning. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 337–347
2025
-
[47]
Abdul Qayyum, Abdesslam Benzinou, Moona Mazher, and Fabrice Meriaudeau. 2021. Efficient multi-model vision transformer based on feature fusion for classification of dfuc2021 challenge. InDiabetic foot ulcers grand challenge. Springer, 62–75
2021
-
[48]
Jean-Louis Richard, Albert Sotto, and Jean-Philippe Lavigne. 2011. New insights in diabetic foot infection.World journal of diabetes2, 2 (2011), 24–32
2011
-
[49]
Armand ALM Rondas, Jos MGA Schols, Ellen E Stobberingh, and Ruud JG Halfens. 2015. Prevalence of chronic wounds and structural quality indicators of chronic wound care in Dutch nursing homes.International wound journal12, 6 (2015), 630–635
2015
- [50]
-
[51]
Soroush Safavi, Keshav Prasad, Lewis Forster, Eran Klang, Bharath Dasari, and Uri Ladabaum. 2025. Benchmarking proprietary and open-source language and vision-language models for gastroenterology clinical reasoning.npj Digital Medicine8, 1 (2025), 259. doi:10.1038/s41746-025-02174-0
- [52]
-
[53]
Éric Senneville, Zaina Albalawi, Suzanne A Van Asten, Zulfiqarali G Abbas, Geneve Allison, Javier Aragón-Sánchez, John M Embil, Lawrence A Lavery, Majdi Alhasan, Orhan Oz, et al. 2024. IWGDF/IDSA guidelines on the diagnosis and treatment of diabetes-related foot infections (IWGDF/IDSA 2023).Diabetes/metabolism research and reviews40, 3 (2024), e3687
2024
-
[54]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [55]
-
[56]
Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. 2025. Openai gpt-5 system card.arXiv preprint arXiv:2601.03267(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[57]
Yvonne Stallard. 2018. When and how to perform cultures on chronic wounds?Journal of Wound Ostomy & Continence Nursing45, 2 (2018), 179–186
2018
-
[58]
Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. InProc. ICML. PMLR, 6105–6114
2019
-
[59]
Qwen Team. 2026. Qwen3.5: Accelerating Productivity with Native Multimodal Agents. https://qwen.ai/blog?id=qwen3.5
2026
-
[60]
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transformers & distillation through attention. InProc. ICML. PMLR, 10347–10357
2021
-
[61]
Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Charles Lau, Ryutaro Tanno, Ira Ktena, et al. 2024. Towards generalist biomedical AI.Nejm Ai1, 3 (2024), AIoa2300138
2024
-
[62]
Changhan Wang, Xinchen Yan, Max Smith, Kanika Kochhar, Marcie Rubin, Stephen M Warren, James Wrobel, and Honglak Lee. 2015. A unified framework for automatic wound segmentation and analysis with deep convolutional neural networks. InProc Int’l Conf. Engr. Medicine and Biology (EMBC). IEEE, 2415–2418
2015
-
[63]
Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Liyuan Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, et al
-
[64]
Reinforcement learning for reasoning in large language models with one training example, 2025
Reinforcement learning for reasoning in large language models with one training example.arXiv preprint arXiv:2504.20571(2025). Infection-Reasoner: A Compact Vision-Language Model for Wound Infection Classification with Evidence-Grounded Clinical Reasoning• 27
-
[65]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837
2022
-
[66]
Wayne A Wilbright, James A Birke, Charles A Patout, Myra Varnado, and Ron Horswell. 2004. The use of telemedicine in the management of diabetes-related foot ulceration: a pilot study.Advances in skin & wound care17, 5 (2004), 232–238
2004
-
[67]
Jason T Wiseman, Sara Fernandes-Taylor, Rebecca Gunter, Maggie L Barnes, Richard Scott Saunders, Paul J Rathouz, Dai Yamanouchi, and K Craig Kent. 2016. Inter-rater agreement and checklist validation for postoperative wound assessment using smartphone images in vascular surgery.Journal of Vascular Surgery: Venous and Lymphatic Disorders4, 3 (2016), 320–328
2016
-
[68]
World Union of Wound Healing Societies (WUWHS). 2016. Advances in Wound Care: The Triangle of Wound Assess- ment. Wounds International, Florence Congress Position Document. https://wounds-me.com/wp-content/uploads/2023/02/ 963bb683c483c1fd1e33c7af6901657a.pdf
2016
-
[69]
Weiwen Xu, Hou Pong Chan, Long Li, Mahani Aljunied, Ruifeng Yuan, Jianyu Wang, Chenghao Xiao, Guizhen Chen, Chaoqun Liu, Zhaodonghui Li, et al. 2025. Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning. arXiv preprint arXiv:2506.07044(2025)
work page internal anchor Pith review arXiv 2025
-
[70]
Moi Hoon Yap, Bill Cassidy, Joseph M Pappachan, Claire O’Shea, David Gillespie, and Neil D Reeves. 2021. Analysis towards classification of infection and ischaemia of diabetic foot ulcers. In2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 1–4
2021
-
[71]
Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Shiji Song, and Gao Huang. 2025. Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?arXiv preprint arXiv:2504.13837(2025)
work page internal anchor Pith review arXiv 2025
-
[72]
Hongbo Zhang, Junying Chen, Feng Jiang, Fei Yu, Zhihong Chen, Guiming Chen, Jianquan Li, Xiangbo Wu, Zhang Zhiyi, Qingying Xiao, et al. 2023. Huatuogpt, towards taming language model to be a doctor. InFindings of the association for computational linguistics: EMNLP 2023. 10859–10885
2023
-
[73]
Sheng Zhang, Qianchu Liu, Guanghui Qin, Tristan Naumann, and Hoifung Poon. 2025. Med-rlvr: Emerging medical reasoning from a 3b base model via reinforcement learning.arXiv preprint arXiv:2502.19655(2025). 8 Appendix 8.1 MLLM-as-a-Judge Prompt Templates System Prompt Template You are a meticulous wound-rationale evaluator. You will be given: (1) a wound im...
-
[74]
cellulitis_spreading_redness
-
[75]
necrotic_tissue_eschar
-
[76]
- Never use the text to invent visual evidence
maceration Important rules: - Never use the image to change TEXT_CLAIM. - Never use the text to invent visual evidence. - Be conservative: prefer UNC over guessing. - Return STRICT JSON only. - No markdown, no prose, no code fences. User Prompt Template Inputs:
-
[77]
Wound image: (attached)
-
[78]
parsed_think
Model generation text: Infection-Reasoner: A Compact Vision-Language Model for Wound Infection Classification with Evidence-Grounded Clinical Reasoning• 29 {GENERATION_TEXT} Return STRICT JSON with this exact schema: { "parsed_think": "string", "rubric": { "purulence_pus": { "TEXT_CLAIM":"POS|NEG|UNC|NOT_MENTIONED", "text_span":"string", "IMAGE_EVIDENCE":...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.