pith. sign in

arxiv: 2511.17652 · v2 · submitted 2025-11-20 · 🧬 q-bio.QM · cs.CV

TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots

Pith reviewed 2026-05-17 20:26 UTC · model grok-4.3

classification 🧬 q-bio.QM cs.CV
keywords computational pathologymultimodal AIreinforcement learningreasoning pathsAI copilotshistopathologycross-modality generation
0
0 comments X

The pith

TeamPath uses reinforcement learning and routing to build pathology AI copilots that generate rigorous reasoning paths for diagnosis and cross-modal tasks while correcting expert errors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TeamPath as an AI system built on reinforcement learning and router-enhanced components trained over large histopathology multimodal datasets. It positions the system as a virtual assistant that performs expert-level disease diagnosis, patch-level summarization, and cross-modality generation that incorporates transcriptomic data. Collaboration with pathologists shows the system can identify and correct mistakes in expert conclusions and reasoning paths, backed by human evaluation of reasoning quality. Current pathology visual language models lack such rigorous reasoning and task flexibility, so a working system would enable more reliable AI support in real clinical workflows. If the approach holds, it points toward AI that integrates with human experts rather than replacing them.

Core claim

TeamPath is an AI system powered by reinforcement learning and router-enhanced solutions based on large-scale histopathology multimodal datasets that serves as a virtual assistant for expert-level disease diagnosis, patch-level information summarization, and cross-modality generation to integrate transcriptomic information for clinical usage. It demonstrates assistance to pathologists by identifying and correcting their conclusions and reasoning paths, with human evaluation supporting the reasoning quality.

What carries the argument

The TeamPath system, which applies reinforcement learning and router-enhanced components to select and generate reasoning paths across multimodal pathology tasks.

If this is right

  • Pathologists can work more efficiently when the AI flags and corrects errors in their conclusions and reasoning.
  • The system flexibly switches between diagnosis, patch summarization, and transcriptomic integration depending on the clinical need.
  • Cross-modality outputs become available for direct clinical use without separate manual integration steps.
  • Human evaluations indicate that the generated reasoning paths meet quality standards for expert review.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread adoption could reduce variability in pathology diagnoses by supplying consistent second-check reasoning.
  • The same reinforcement-plus-router pattern might transfer to other multimodal medical domains such as radiology or oncology.
  • Detailed quantitative benchmarks and ablation studies would be required to confirm advantages over prior models.

Load-bearing premise

The reinforcement learning and router components produce reasoning paths that are genuinely rigorous and generalizable enough to outperform prior pathology models on divergent tasks.

What would settle it

A side-by-side comparison on a diverse held-out set of pathology cases showing no gain in diagnostic accuracy or reasoning correctness versus existing visual language models would disprove the central claim.

Figures

Figures reproduced from arXiv: 2511.17652 by Alfonso Garcia Tan, Chen Liu, Fang Wu, Hao Wu, Heli Qi, Hongyu Zhao, Hua Xu, Irene Li, Marcello DiStasio, Mohamed Kahila, Nan Liu, Peter Humphrey, Qingyu Chen, Rui Yang, Simeng Han, Tianyu Liu, Tinglin Huang, Weihao Xuan.

Figure 1
Figure 1. Figure 1: Landscape of TeamPath (a) Steps of dataset curation. We extract image-text pairs from a [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Benchmarking results with PathMMU for the pathology VQA task. We note that since we [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Case study (topic: synaptophysin, which is a precursor cell that develops into an adipocyte [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Results of using TeamPath as the answer corrector/reason corrector. TeamPath can work [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Benchmarking results of the caption summary task. (a) Performances of different meth [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Evaluation of model performances for transcriptomic profile generation. (a) SPCC (higher [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Advances in AI have introduced several strong models in computational pathology to usher it into the era of multi-modal diagnosis, analysis, and interpretation. However, the current pathology-specific visual language models still lack capacities in making the diagnosis with rigorous reasoning paths as well as handling divergent tasks, and thus, challenges of building AI Copilots for real scenarios still exist. Here we introduce TeamPath, an AI system powered by reinforcement learning and router-enhanced solutions based on large-scale histopathology multimodal datasets, to work as a virtual assistant for expert-level disease diagnosis, patch-level information summarization, and cross-modality generation to integrate transcriptomic information for clinical usage. We also collaborate with pathologists from Yale School of Medicine to demonstrate that TeamPath can assist them in working more efficiently by identifying and correcting expert conclusions and reasoning paths. We also discuss the human evaluation results to support the reasoning quality from TeamPath. Overall, TeamPath can flexibly choose the best settings according to the needs, and serve as an innovative and reliable system for information communication across different modalities and experts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces TeamPath, a multimodal AI system for pathology powered by reinforcement learning and router-enhanced components trained on large-scale histopathology datasets. It supports expert-level disease diagnosis, patch-level information summarization, and cross-modality generation to integrate transcriptomic data. The central claim is that collaboration with Yale School of Medicine pathologists shows TeamPath assists experts by identifying and correcting their conclusions and reasoning paths, with human evaluation results discussed to support reasoning quality. The system is presented as flexible for real-world clinical scenarios.

Significance. If the human evaluation were strengthened with quantitative metrics, controls, and ablations demonstrating that the RL and router components causally improve diagnostic accuracy or efficiency over baselines, this could advance development of reasoning-capable AI copilots in computational pathology. The multimodal integration and focus on divergent tasks address documented limitations in existing pathology VLMs. Currently, the absence of performance numbers or rigorous study design limits the assessed impact to a preliminary system description.

major comments (2)
  1. [Human evaluation results] Human evaluation with Yale pathologists: the claim that TeamPath identifies and corrects expert conclusions to improve efficiency rests on this evaluation, yet the manuscript provides no case count, blinding protocol, quantitative accuracy/efficiency metrics, inter-rater reliability, statistical tests, or comparison to non-TeamPath baselines. This leaves open whether reported benefits derive from the RL/router reasoning or from confirmation bias and non-specific effects.
  2. [Abstract and system architecture] System description (abstract and methods): the abstract states that reinforcement learning and router-enhanced solutions produce rigorous reasoning paths outperforming prior pathology VLMs, but supplies no reward function details, routing thresholds, dataset scale, ablation studies, or quantitative benchmarks on divergent tasks. Without these, the contribution of the proposed components to generalizable reasoning cannot be isolated.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by including at least one concrete quantitative result or efficiency metric from the human evaluation to ground the qualitative claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. The comments identify important gaps in the presentation of our human evaluation and technical specifications that we will address to strengthen the paper. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: Human evaluation with Yale pathologists: the claim that TeamPath identifies and corrects expert conclusions to improve efficiency rests on this evaluation, yet the manuscript provides no case count, blinding protocol, quantitative accuracy/efficiency metrics, inter-rater reliability, statistical tests, or comparison to non-TeamPath baselines. This leaves open whether reported benefits derive from the RL/router reasoning or from confirmation bias and non-specific effects.

    Authors: We agree that the current description of the human evaluation is insufficiently detailed to support the claims robustly. In the revised manuscript we will report the exact number of cases evaluated, the blinding protocol employed, quantitative metrics for accuracy and efficiency (including time-to-diagnosis and error-correction rates), inter-rater reliability coefficients, appropriate statistical tests, and direct comparisons against non-TeamPath baselines. These additions will allow readers to better evaluate whether the observed benefits are attributable to the RL and router components. revision: yes

  2. Referee: System description (abstract and methods): the abstract states that reinforcement learning and router-enhanced solutions produce rigorous reasoning paths outperforming prior pathology VLMs, but supplies no reward function details, routing thresholds, dataset scale, ablation studies, or quantitative benchmarks on divergent tasks. Without these, the contribution of the proposed components to generalizable reasoning cannot be isolated.

    Authors: We acknowledge that additional technical details are required to isolate the contributions of the proposed components. In the revised manuscript we will expand the methods section to describe the reward function used for reinforcement learning, the routing thresholds and logic, the scale of the training datasets, ablation studies that quantify the impact of the RL and router modules, and quantitative benchmarks on the divergent tasks of diagnosis, patch-level summarization, and cross-modality transcriptomic generation, with explicit comparisons to prior pathology VLMs. revision: yes

Circularity Check

0 steps flagged

No circularity in system description or evaluation claims

full rationale

The paper introduces TeamPath as a multimodal pathology AI system built on reinforcement learning and router components trained on large histopathology datasets, with supporting human evaluation from Yale pathologists. No equations, fitted parameters, predictions, or first-principles derivations are present that could reduce outputs to inputs by construction. Claims rest on external collaboration and evaluation results rather than self-referential definitions, self-citations as load-bearing premises, or renamed empirical patterns. The derivation chain is therefore self-contained with independent content from the described datasets and human assessments.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of reinforcement learning and router mechanisms for producing rigorous reasoning, plus the representativeness of the large-scale histopathology multimodal datasets used for training. These are domain assumptions rather than derived results.

free parameters (2)
  • RL reward function and hyperparameters
    Reinforcement learning training requires reward shaping and many tunable parameters whose specific values are not stated.
  • Router architecture and routing thresholds
    The router-enhanced solution implies learned or hand-chosen routing logic that affects task selection.
axioms (1)
  • domain assumption Large-scale histopathology multimodal datasets contain sufficient signal for rigorous reasoning across divergent tasks.
    The system is explicitly built on these datasets; their quality and coverage are taken as given.

pith-pipeline@v0.9.0 · 5538 in / 1260 out tokens · 29318 ms · 2026-05-17T20:26:27.323625+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Versatile AI Agent for Rare Disease Diagnosis and Risk Gene Prioritization

    cs.AI 2026-05 unverdicted novelty 6.0

    Hygieia is a router-based multi-modal AI system that outperforms physicians in rare disease diagnosis benchmarks and assists with real-world medical records.

  2. A Versatile AI Agent for Rare Disease Diagnosis and Risk Gene Prioritization

    cs.AI 2026-05 unverdicted novelty 6.0

    Hygieia is a new AI agent system that integrates phenotypes, genetics, and records to achieve superior rare disease diagnosis and gene prioritization with confidence scores.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · cited by 1 Pith paper · 12 internal anchors

  1. [1]

    Artificial intelligence for digital and computational pathology

    Andrew H Song, Guillaume Jaume, Drew FK Williamson, Ming Y Lu, Anurag Vaidya, Tiffany R Miller, and Faisal Mahmood. Artificial intelligence for digital and computational pathology. Nature Reviews Bioengineering, 1(12):930–949, 2023

  2. [2]

    Ar- tificialintelligenceindigitalpathology—newtoolsfordiagnosisandprecisiononcology.Nature reviews Clinical oncology, 16(11):703–715, 2019

    Kaustav Bera, Kurt A Schalper, David L Rimm, Vamsidhar Velcheti, and Anant Madabhushi. Ar- tificialintelligenceindigitalpathology—newtoolsfordiagnosisandprecisiononcology.Nature reviews Clinical oncology, 16(11):703–715, 2019

  3. [3]

    Digital pathology and artificial intelligence.The lancet oncology, 20(5):e253–e261, 2019

    Muhammad Khalid Khan Niazi, Anil V Parwani, and Metin N Gurcan. Digital pathology and artificial intelligence.The lancet oncology, 20(5):e253–e261, 2019

  4. [4]

    Digital pathology: current status and future perspectives.Histopathology, 61(1):1–9, 2012

    Shaimaa Al-Janabi, André Huisman, and Paul J Van Diest. Digital pathology: current status and future perspectives.Histopathology, 61(1):1–9, 2012

  5. [5]

    Onthechallengesandperspectivesoffoundationmodels for medical image analysis.Medical image analysis, 91:102996, 2024

    ShaotingZhangandDimitrisMetaxas. Onthechallengesandperspectivesoffoundationmodels for medical image analysis.Medical image analysis, 91:102996, 2024

  6. [6]

    Towardsageneral- purpose foundation model for computational pathology.Nature Medicine, 30(3):850–862, 2024

    Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song,BowenChen,AndrewZhang,DanielShao,MuhammadShaban,etal. Towardsageneral- purpose foundation model for computational pathology.Nature Medicine, 30(3):850–862, 2024. 18 TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots

  7. [7]

    Towardsavisual-languagefoundation model for computational pathology.arXiv preprint arXiv:2307.12914, 2023

    MingYLu,BowenChen,DrewFKWilliamson,RichardJChen,IvyLiang,TongDing,Guillaume Jaume, IgorOdintsov, AndrewZhang, LongPhiLe, etal. Towardsavisual-languagefoundation model for computational pathology.arXiv preprint arXiv:2307.12914, 2023

  8. [8]

    A whole-slide foundation model for digital pathology from real-world data.Nature, pages 1–8, 2024

    Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, pages 1–8, 2024

  9. [9]

    Towards a generalizable pathology foundation model via unified knowledge distillation, 2024

    Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin, Xinrui Jiang, Anjia Han, Li Liang, Ronald Cheong Kin Chan, Jiguang Wang, Kwang- Ting Cheng, and Hao Chen. Towards a generalizable pathology foundation model via unified knowledge distillation, 2024

  10. [10]

    Artificialintelligenceforprostatecancerdiagnostics.NatureCancer,Septem- ber 2025

    AnttiS.Rannikko. Artificialintelligenceforprostatecancerdiagnostics.NatureCancer,Septem- ber 2025

  11. [11]

    Gen- erating dermatopathology reports from gigapixel whole slide images with histogpt.Nature Communications, 16(1):1–17, 2025

    Manuel Tran, Paul Schmidle, Ruifeng Ray Guo, Sophia J Wagner, Valentin Koch, Valerio Lup- perger, Brenna Novotny, Dennis H Murphree, Heather D Hardway, Marina D’Amato, et al. Gen- erating dermatopathology reports from gigapixel whole slide images with histogpt.Nature Communications, 16(1):1–17, 2025

  12. [12]

    spemo: Exploring the capacity of foundation models for analyzing spatial multi-omic data.Nature Biomedical Engineering, pages 2025–01, 2025

    Tianyu Liu, Tinglin Huang, Tong Ding, Hao Wu, Peter Humphrey, Sudhir Perincheri, Kurt Schalper, Rex Ying, Hua Xu, et al. spemo: Exploring the capacity of foundation models for analyzing spatial multi-omic data.Nature Biomedical Engineering, pages 2025–01, 2025

  13. [13]

    Prism2: Unlocking multi-modal general pathology ai with clinical dialogue.arXiv preprint arXiv:2506.13063, 2025

    George Shaikovski, Eugene Vorontsov, Adam Casson, Julian Viret, Eric Zimmermann, Neil Tenenholtz, Yi Kan Wang, Jan H Bernhard, Ran A Godrich, Juan A Retamero, et al. Prism2: Unlocking multi-modal general pathology ai with clinical dialogue.arXiv preprint arXiv:2506.13063, 2025

  14. [14]

    Vision-language models for vision tasks: A survey.IEEE transactions on pattern analysis and machine intelligence, 46(8):5625– 5644, 2024

    Jingyi Zhang, Jiaxing Huang, Sheng Jin, and Shijian Lu. Vision-language models for vision tasks: A survey.IEEE transactions on pattern analysis and machine intelligence, 46(8):5625– 5644, 2024

  15. [15]

    Slidechat: A large vision-language assistant for whole-slide pathology image understanding

    Ying Chen, Guoan Wang, Yuanfeng Ji, Yanjun Li, Jin Ye, Tianbin Li, Ming Hu, Rongshan Yu, Yu Qiao, and Junjun He. Slidechat: A large vision-language assistant for whole-slide pathology image understanding. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 5134–5143, June 2025

  16. [16]

    A multimodal generative ai copilot for human pathology.Nature, 634(8033):466–473, 2024

    Ming Y Lu, Bowen Chen, Drew FK Williamson, Richard J Chen, Melissa Zhao, Aaron K Chow, Kenji Ikemura, Ahrong Kim, Dimitra Pouli, Ankush Patel, et al. A multimodal generative ai copilot for human pathology.Nature, 634(8033):466–473, 2024

  17. [17]

    Avision–languagefoundationmodel for precision oncology.Nature, pages 1–10, 2025

    Jinxi Xiang, Xiyue Wang, Xiaoming Zhang, Yinghua Xi, Feyisope Eweje, Yijiang Chen, Yuchen Li, ColinBergstrom, MatthewGopaulchan, TedKim, etal. Avision–languagefoundationmodel for precision oncology.Nature, pages 1–10, 2025

  18. [18]

    A visual–language foundation model for pathology image analysis using medical twitter.Nature medicine, 29(9):2307–2316, 2023

    Zhi Huang, Federico Bianchi, Mert Yuksekgonul, Thomas J Montine, and James Zou. A visual–language foundation model for pathology image analysis using medical twitter.Nature medicine, 29(9):2307–2316, 2023

  19. [19]

    SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

    Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V Le, Sergey Levine, and Yi Ma. Sft memorizes, rl generalizes: A comparative study of foundation model post-training.arXiv preprint arXiv:2501.17161, 2025. 19 TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots

  20. [20]

    arXiv preprint arXiv:2505.11404 , year=

    Wenchuan Zhang, Penghao Zhang, Jingru Guo, Tao Cheng, Jie Chen, Shuwan Zhang, Zhang Zhang, Yuhao Yi, and Hong Bu. Patho-r1: A multimodal reinforcement learning-based pathol- ogy expert reasoner.arXiv preprint arXiv:2505.11404, 2025

  21. [21]

    Discovering pathology rationale and tokenallocationforefficientmultimodalpathologyreasoning.arXivpreprintarXiv:2505.15687, 2025

    Zhe Xu, Cheng Jin, Yihui Wang, Ziyi Liu, and Hao Chen. Discovering pathology rationale and tokenallocationforefficientmultimodalpathologyreasoning.arXivpreprintarXiv:2505.15687, 2025

  22. [22]

    arXiv preprint arXiv:2504.09258 , year=

    Jianyu Wu, Hao Yang, Xinhua Zeng, Guibing He, Zhiyu Chen, Zihui Li, Xiaochuan Zhang, YangyangMa, RunFang, andYangLiu. Pathvlm-r1: Areinforcementlearning-drivenreasoning model for pathology visual-language tasks.arXiv preprint arXiv:2504.09258, 2025

  23. [23]

    Pathgen-1.6 m: 1.6 million pathology image-text pairs generation through multi-agent collaboration

    Yuxuan Sun, Yunlong Zhang, Yixuan Si, Chenglu Zhu, Kai Zhang, Zhongyi Shui, Jingxiong Li, Xuan Gong, XINHENG LYU, Tao Lin, et al. Pathgen-1.6 m: 1.6 million pathology image-text pairs generation through multi-agent collaboration. InThe Thirteenth International Conference on Learning Representations

  24. [24]

    The cancer genome atlas pan- cancer analysis project.Nature genetics, 45(10):1113–1120, 2013

    John N Weinstein, Eric A Collisson, Gordon B Mills, Kenna R Shaw, Brad A Ozenberger, Kyle Ellrott, Ilya Shmulevich, Chris Sander, and Joshua M Stuart. The cancer genome atlas pan- cancer analysis project.Nature genetics, 45(10):1113–1120, 2013

  25. [25]

    Openai o3 and o4-mini system card, 2025

    OpenAI. Openai o3 and o4-mini system card, 2025

  26. [26]

    PathVQA: 30000+ Questions for Medical Visual Question Answering

    Xuehai He, Yichen Zhang, Luntian Mou, Eric Xing, and Pengtao Xie. Pathvqa: 30000+ ques- tions for medical visual question answering.arXiv preprint arXiv:2003.10286, 2020

  27. [27]

    Pathmmu: A massive multimodal expert-level benchmarkforunderstandingandreasoninginpathology

    Yuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Dan Wan, Xiaoxiao Lan, Mengyue Zheng, et al. Pathmmu: A massive multimodal expert-level benchmarkforunderstandingandreasoninginpathology. InEuropeanConferenceonComputer Vision, pages 56–73. Springer, 2024

  28. [28]

    Song, Ming Y

    Guillaume Jaume, Paul Doucet, Andrew H. Song, Ming Y. Lu, Cristina Almagro Pérez, Sophia J Wagner, Anurag Jayant Vaidya, Richard J. Chen, Drew FK Williamson, Ahrong Kim, and Faisal Mahmood. HEST-1k: A dataset for spatial transcriptomics and histology image analysis. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Bench...

  29. [29]

    Stimage- 1k4m: A histopathology image-gene expression dataset for spatial transcriptomics.Advances in Neural Information Processing Systems, 37:35796–35823, 2024

    Jiawen Chen, Muqing Zhou, Wenrong Wu, Jinwei Zhang, Yun Li, and Didong Li. Stimage- 1k4m: A histopathology image-gene expression dataset for spatial transcriptomics.Advances in Neural Information Processing Systems, 37:35796–35823, 2024

  30. [30]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  31. [31]

    Pathological visual question answering.arXiv preprint arXiv:2010.12435, 2020

    Xuehai He, Zhuo Cai, Wenlan Wei, Yichen Zhang, Luntian Mou, Eric Xing, and Pengtao Xie. Pathological visual question answering.arXiv preprint arXiv:2010.12435, 2020

  32. [32]

    Qwen2 Technical Report

    Qwen Team. Qwen2 technical report.arXiv preprint arXiv:2407.10671, 2024

  33. [33]

    InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

    JinguoZhu,WeiyunWang,ZheChen,ZhaoyangLiu,ShenglongYe,LixinGu,HaoTian,Yuchen Duan, Weijie Su, Jie Shao, et al. Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models.arXiv preprint arXiv:2504.10479, 2025. 20 TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots

  34. [34]

    MedGemma Technical Report

    AndrewSellergren,SaharKazemzadeh,TiamJaroensri,AtillaKiraly,MadeleineTraverse,Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, et al. Medgemma technical report.arXiv preprint arXiv:2507.05201, 2025

  35. [35]

    arXiv preprint arXiv:2508.02669 , year=

    Xiaoke Huang, Juncheng Wu, Hui Liu, Xianfeng Tang, and Yuyin Zhou. Medvlthinker: Simple baselines for multimodal medical reasoning.arXiv preprint arXiv:2508.02669, 2025

  36. [36]

    Lipoblast: morphologic features and diagnostic value.Journal of UOEH, 36(2):115–121, 2014

    Masanori Hisaoka. Lipoblast: morphologic features and diagnostic value.Journal of UOEH, 36(2):115–121, 2014

  37. [37]

    Gaia: a benchmark for general ai assistants

    Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. Gaia: a benchmark for general ai assistants. InThe Twelfth International Conference on Learning Representations, 2023

  38. [38]

    Advancing AI Research Assistants with Expert-Involved Learning

    Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, et al. Towards artificial intelligence research assistant for expert- involved learning.arXiv preprint arXiv:2505.04638, 2025

  39. [39]

    Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

    Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute opti- mally can be more effective than scaling model parameters.arXiv preprint arXiv:2408.03314, 2024

  40. [40]

    Gpt-4 technical report, 2023

    OpenAI. Gpt-4 technical report, 2023

  41. [41]

    Bleu: a method for automatic evaluation of machine translation

    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002

  42. [42]

    Rouge: A package for automatic evaluation of summaries, 2004

    Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries, 2004

  43. [43]

    Bertscore: Eval- uating text generation with bert

    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. Bertscore: Eval- uating text generation with bert

  44. [44]

    Quickumls: a fast, unsupervised approach for medical concept extraction, 2016

    Luca Soldaini and Nazli Goharian. Quickumls: a fast, unsupervised approach for medical concept extraction, 2016

  45. [45]

    A visual–omics foundation model to bridge histopathology with spatial transcriptomics.Nature Methods, pages 1–15, 2025

    Weiqing Chen, Pengzhi Zhang, Tu N Tran, Yiwei Xiao, Shengyu Li, Vrutant V Shah, Hao Cheng, Kristopher W Brannan, Keith Youker, Li Lai, et al. A visual–omics foundation model to bridge histopathology with spatial transcriptomics.Nature Methods, pages 1–15, 2025

  46. [46]

    Analysis of 3d pathology samples using weakly supervised ai.Cell, 187(10):2502–2520, 2024

    Andrew H Song, Mane Williams, Drew FK Williamson, Sarah SL Chow, Guillaume Jaume, Gan Gao, Andrew Zhang, Bowen Chen, Alexander S Baras, Robert Serafin, et al. Analysis of 3d pathology samples using weakly supervised ai.Cell, 187(10):2502–2520, 2024

  47. [47]

    Visium technology

    10X Genomics. Visium technology

  48. [48]

    Cell2sentence: Teaching largelanguagemodelsthelanguageofbiology

    Daniel Levine, Syed A Rizvi, Sacha Lévy, Nazreen Pallikkavaliyaveetil, David Zhang, Xingyu Chen, Sina Ghadermarzi, Ruiming Wu, Zihe Zheng, Ivan Vrkic, et al. Cell2sentence: Teaching largelanguagemodelsthelanguageofbiology. InInternationalConferenceonMachineLearning, pages 27299–27325. PMLR, 2024

  49. [49]

    HybridFlow: A Flexible and Efficient RLHF Framework

    Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework.arXiv preprint arXiv: 2409.19256, 2024. 21 TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots

  50. [50]

    LlamaFactory: Unified efficient fine-tuning of 100+ language models

    Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, and Zheyan Luo. LlamaFactory: Unified efficient fine-tuning of 100+ language models. In Yixin Cao, Yang Feng, and Deyi Xiong, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 400–410, Bangkok, Thailand, August

  51. [51]

    Association for Computational Linguistics

  52. [52]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

  53. [53]

    DAPO: An Open-Source LLM Reinforcement Learning System at Scale

    QiyingYu,ZhengZhang,RuofeiZhu,YufengYuan,XiaochenZuo,YuYue,WeinanDai,Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476, 2025

  54. [54]

    Scikit- learn: Machine learning in python.the Journal of machine Learning research, 12:2825–2830, 2011

    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit- learn: Machine learning in python.the Journal of machine Learning research, 12:2825–2830, 2011

  55. [55]

    Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

    Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Courna- peau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. Scipy 1.0: fundamental algorithms for scientific computing in python.Nature methods, 17(3):261–272, 2020

  56. [56]

    Radgraph: Extracting clinical entities and relations from radiology reports

    Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven Truong, Tan Bui, Pierre Chambon, Yuhao Zhang, Matthew P Lungren, Andrew Y Ng, Curtis Langlotz, et al. Radgraph: Extracting clinical entities and relations from radiology reports

  57. [57]

    Aci-bench: a novel ambient clinical intelligence dataset for benchmarking automatic visit note generation.Scientific Data, 10(1):586, 2023

    Wen-wai Yim, Yujuan Fu, Asma Ben Abacha, Neal Snider, Thomas Lin, and Meliha Yetisgen. Aci-bench: a novel ambient clinical intelligence dataset for benchmarking automatic visit note generation.Scientific Data, 10(1):586, 2023

  58. [58]

    Adapted large language models can outperform medical experts in clinical text summarization

    DaveVanVeen, CaraVanUden, LouisBlankemeier, Jean-BenoitDelbrouck, AsadAali, Christian Bluethgen, Anuj Pareek, Malgorzata Polacin, Eduardo Pontes Reis, Anna Seehofnerová, et al. Adapted large language models can outperform medical experts in clinical text summarization. Nature medicine, 30(4):1134–1142, 2024

  59. [59]

    Visualinstructiontuning.Advances in neural information processing systems, 36:34892–34916, 2023

    HaotianLiu, ChunyuanLi, QingyangWu, andYongJaeLee. Visualinstructiontuning.Advances in neural information processing systems, 36:34892–34916, 2023

  60. [60]

    Reinforcement learning optimization for large-scale learning: An efficient and user-friendly scaling library.arXiv preprint arXiv:2506.06122, 2025a

    Weixun Wang, Shaopan Xiong, Gengru Chen, Wei Gao, Sheng Guo, Yancheng He, Ju Huang, Jiaheng Liu, Zhendong Li, Xiaoyang Li, et al. Reinforcement learning optimization for large- scale learning: An efficient and user-friendly scaling library.arXiv preprint arXiv:2506.06122, 2025

  61. [61]

    Part i: Tricks or traps? a deep dive into rl for llm reasoning

    Zihe Liu, Jiashun Liu, Yancheng He, Weixun Wang, Jiaheng Liu, Ling Pan, Xinyu Hu, Shaopan Xiong, Ju Huang, Jian Hu, et al. Part i: Tricks or traps? a deep dive into rl for llm reasoning. arXiv preprint arXiv:2508.08221, 2025

  62. [62]

    SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

    Hardy Chen, Haoqin Tu, Fali Wang, Hui Liu, Xianfeng Tang, Xinya Du, Yuyin Zhou, and Ci- hang Xie. Sft or rl? an early investigation into training r1-like reasoning large vision-language models.arXiv preprint arXiv:2504.11468, 2025. 22 TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots

  63. [63]

    arXiv preprint arXiv:2508.11408 , year=

    Wenhao Zhang, Yuexiang Xie, Yuchang Sun, Yanxi Chen, Guoyin Wang, Yaliang Li, Bolin Ding, and Jingren Zhou. On-policy rl meets off-policy experts: Harmonizing supervised fine-tuning and reinforcement learning via dynamic weighting.arXiv preprint arXiv:2508.11408, 2025. 23 TeamPath: Building MultiModal Pathology Experts with Reasoning AI Copilots A. Prompt...