pith. machine review for the scientific record. sign in

arxiv: 2603.16876 · v2 · submitted 2026-02-17 · 💻 cs.CV · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:48 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords radiology report generationmulti-agent reinforcement learningmulti-modal learningchest X-rayclinical efficacyMIMIC-CXRlaterality consistency
0
0 comments X

The pith

Multi-agent reinforcement learning optimizes region-specific and global agents to generate radiology reports with better clinical accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MARL-Rad, a framework that applies multi-modal multi-agent reinforcement learning directly to radiology report generation. It decomposes chest X-ray interpretation into region-specific agents plus one global integrating agent, then jointly optimizes the full system on policy using clinically verifiable rewards. This replaces post-hoc assembly of fixed language models with role-specific training inside the deployed workflow. Experiments on MIMIC-CXR and IU X-ray show gains on RadGraph, CheXbert, and GREEN scores plus improved laterality and detail, with blinded clinicians rating the outputs comparable to ground-truth reports.

Core claim

MARL-Rad trains the entire agentic system on-policy within the radiology workflow. Chest X-ray interpretation is decomposed into region-specific agents and a global integrating agent whose outputs are jointly optimized by reinforcement learning driven by clinically verifiable reward signals. On the MIMIC-CXR and IU X-ray datasets the method reaches state-of-the-art clinical efficacy on RadGraph, CheXbert, and GREEN metrics, raises laterality consistency, produces more accurate and detailed reports, and yields outputs that a blinded clinician evaluation finds clinically comparable to ground-truth reports.

What carries the argument

Decomposition into region-specific multi-modal agents coordinated by a global integrating agent, jointly optimized on-policy via reinforcement learning with clinically verifiable rewards.

If this is right

  • Achieves state-of-the-art clinical efficacy scores on RadGraph, CheXbert, and GREEN for MIMIC-CXR and IU X-ray.
  • Improves laterality consistency in generated reports.
  • Produces more accurate and detailed radiology reports.
  • Yields outputs judged clinically comparable to ground-truth reports in blinded clinician evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same region-plus-global decomposition may improve agentic systems for other medical imaging modalities where local detail and global coherence must be balanced.
  • Joint policy optimization could reduce the inconsistencies often seen when fixed language models are assembled into medical report pipelines after training.
  • Scaling the number or granularity of region agents offers a testable route to finer report quality on complex or multi-finding cases.

Load-bearing premise

Clinically verifiable rewards can be defined to accurately guide joint optimization of the multi-agent system without introducing biases or failing to capture key aspects of report quality.

What would settle it

A large-scale blinded study in which expert radiologists rate MARL-Rad reports no better than non-optimized agent baselines on diagnostic utility and error rate would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2603.16876 by Kaito Baba, Risa Kishikawa, Satoshi Kodera.

Figure 1
Figure 1. Figure 1: Comparison with previous state-of-the-art (SOTA) meth [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed multi-agent RL framework. Region-specific agents and global integrating agent collaboratively [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example output from MARL-Rad. Region-specific agents consistently focus on their assigned regions and generate regional [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

We propose MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation that trains the entire agentic system on policy within its deployed radiology workflow. MARL-Rad addresses the limitation of post-hoc agentization, where fixed LLMs are organized into hand-designed agentic workflows without being optimized for their assigned roles. Our framework decomposes chest X-ray interpretation into region-specific agents and a global integrating agent, and jointly optimizes them using clinically verifiable rewards. Experiments on the MIMIC-CXR and IU X-ray datasets show that MARL-Rad consistently improves clinical efficacy metrics such as RadGraph, CheXbert, and GREEN scores, achieving state-of-the-art clinical efficacy performance. Further analyses show that MARL-Rad improves laterality consistency and produces more accurate and detailed reports. A blinded clinician evaluation further suggests that MARL-Rad produces reports clinically comparable to ground-truth reports.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation from chest X-rays. It decomposes interpretation into region-specific agents plus a global integrating agent and jointly optimizes the system end-to-end using clinically verifiable rewards. Experiments on MIMIC-CXR and IU X-ray report state-of-the-art results on RadGraph, CheXbert, and GREEN scores, plus gains in laterality consistency, report detail, and blinded clinician equivalence to ground-truth reports.

Significance. If the experimental claims are substantiated, the work would be significant as one of the first demonstrations of end-to-end multi-agent RL optimization for medical report generation, moving beyond post-hoc LLM agent workflows. The use of region-specific agents, clinically grounded rewards, and clinician evaluation are positive elements that could influence future agentic systems in radiology.

major comments (3)
  1. [Abstract / Methods] Abstract and Methods: The central claim that clinically verifiable rewards enable joint optimization of region-specific and global agents rests on unspecified reward definitions, weighting, shaping, and handling of sparse signals. Without these details it is impossible to assess whether reported gains on RadGraph/CheXbert/GREEN reflect genuine clinical improvement or metric-specific optimization.
  2. [Experiments] Experiments: The manuscript does not report statistical significance tests, confidence intervals, or ablation studies isolating the contribution of the multi-agent RL component versus single-agent or supervised baselines, undermining the SOTA and laterality-consistency claims.
  3. [Experiments] Experiments: Potential circularity between reward signals and evaluation metrics (both drawn from entity-extraction and label-accuracy tools) is not addressed; explicit discussion or an independent held-out clinical metric is required to rule out reward hacking.
minor comments (2)
  1. Clarify the precise multi-modal fusion mechanism between image features and text tokens inside each agent.
  2. Add dataset split statistics, preprocessing steps, and hyper-parameter tables to support reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Revisions will be incorporated into the next version of the manuscript to improve clarity, statistical rigor, and discussion of potential limitations.

read point-by-point responses
  1. Referee: [Abstract / Methods] Abstract and Methods: The central claim that clinically verifiable rewards enable joint optimization of region-specific and global agents rests on unspecified reward definitions, weighting, shaping, and handling of sparse signals. Without these details it is impossible to assess whether reported gains on RadGraph/CheXbert/GREEN reflect genuine clinical improvement or metric-specific optimization.

    Authors: We agree that additional detail on the reward formulation is necessary. In the revised Methods section we will explicitly define each component of the clinically verifiable rewards (entity-level matching from RadGraph, label accuracy from CheXbert, and GREEN score contributions), specify the weighting coefficients used to combine them, describe the reward-shaping functions applied to address sparsity, and explain how the composite reward is back-propagated through the multi-agent policy gradient updates. These additions will allow readers to evaluate whether the reported gains reflect genuine clinical improvement. revision: yes

  2. Referee: [Experiments] Experiments: The manuscript does not report statistical significance tests, confidence intervals, or ablation studies isolating the contribution of the multi-agent RL component versus single-agent or supervised baselines, undermining the SOTA and laterality-consistency claims.

    Authors: We accept this criticism. The revised Experiments section will include paired statistical significance tests (with p-values), 95% confidence intervals for all metrics on both MIMIC-CXR and IU X-ray, and a set of ablation studies that isolate the multi-agent RL component against single-agent RL and supervised-learning baselines. These results will be presented in new tables and will directly support the SOTA and laterality-consistency claims. revision: yes

  3. Referee: [Experiments] Experiments: Potential circularity between reward signals and evaluation metrics (both drawn from entity-extraction and label-accuracy tools) is not addressed; explicit discussion or an independent held-out clinical metric is required to rule out reward hacking.

    Authors: We acknowledge the need for explicit discussion of this issue. The revised manuscript will add a dedicated paragraph in the Experiments section that analyzes the overlap between reward signals and evaluation metrics and explains why the clinical grounding of the rewards reduces the risk of pure metric hacking. We will also expand the existing blinded clinician evaluation (already performed on a held-out set) to serve as an independent validation metric not used during reward computation. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on external empirical validation

full rationale

The paper's derivation chain consists of a multi-agent RL framework whose policy is optimized via clinically verifiable rewards and then evaluated on standard public datasets (MIMIC-CXR, IU X-ray) using independent automated metrics (RadGraph, CheXbert, GREEN) plus blinded clinician review. No equation or step reduces a claimed prediction to a fitted input by construction, nor does any load-bearing premise collapse to a self-citation whose content is itself unverified. The reward design is presented as an external modeling choice rather than a tautological re-expression of the evaluation scores.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that clinically meaningful rewards can be constructed to optimize report generation. No free parameters or invented entities are explicitly described in the abstract.

axioms (1)
  • domain assumption Clinically verifiable rewards can be defined to measure and optimize report quality for the multi-agent system.
    This underpins the joint training of agents described in the abstract.

pith-pipeline@v0.9.0 · 5450 in / 1258 out tokens · 29644 ms · 2026-05-15T21:48:48.315451+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

149 extracted references · 149 canonical work pages · 14 internal anchors

  1. [1]

    ARDGen: Augmentation regularization for domain- generalized medical report generation

    Syed Bilal Ahsan, Muhammad Ikhalas, Muhammad Muza- mil Khan, Sana Ullah, and Muhammad Zaigham Za- heer. ARDGen: Augmentation regularization for domain- generalized medical report generation. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition Work- shops, pages 6526–6535, 2025. 1, 3, 4

  2. [2]

    A review on detection of pneumonia in chest X- ray images using neural networks.Journal of Biomedical Physics and Engineering, 12(6):551–558, 2022

    Daniel Joseph Alapat, Malavika Venu Menon, and Sharmila Ashok. A review on detection of pneumonia in chest X- ray images using neural networks.Journal of Biomedical Physics and Engineering, 12(6):551–558, 2022. 1

  3. [3]

    Multi-resolution pathology-language pre-training model with text-guided visual representation

    Shahad Albastaki, Anabia Sohail, Iyyakutti Iyappan Gana- pathi, Basit Alawode, Asim Khan, Sajid Javed, Naoufel Werghi, Mohammed Bennamoun, and Arif Mahmood. Multi-resolution pathology-language pre-training model with text-guided visual representation. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 25907–25919, 2025. 3

  4. [4]

    JRadiEvo: A japanese radiology report generation model enhanced by evolu- tionary optimization of model merging.arXiv preprint arXiv:2411.09933, 2024

    Kaito Baba, Ryota Yagi, Junichiro Takahashi, Risa Kishikawa, and Satoshi Kodera. JRadiEvo: A japanese radiology report generation model enhanced by evolu- tionary optimization of model merging.arXiv preprint arXiv:2411.09933, 2024. 1, 4

  5. [5]

    Prover Agent: An agent-based framework for formal mathematical proofs.arXiv preprint arXiv:2506.19923,

    Kaito Baba, Chaoran Liu, Shuhei Kurita, and Akiyoshi San- nai. Prover Agent: An agent-based framework for formal mathematical proofs.arXiv preprint arXiv:2506.19923,

  6. [6]

    METEOR: An auto- matic metric for MT evaluation with improved correlation with human judgments

    Satanjeev Banerjee and Alon Lavie. METEOR: An auto- matic metric for MT evaluation with improved correlation with human judgments. InProceedings of the ACL Work- shop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72. Association for Computational Linguistics, 2005. 4

  7. [7]

    Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, An- ton Schwaighofer, Anja Thieme, Sam Bond-Taylor, Max- imilian Ilse, Fernando P ´erez-Garc´ıa, Valentina Salvatelli, Harshita Sharma, Felix Meissen, Mercy Ranjit, Shaury Sri- vastav, Julia Gong, Noel C. F. Codella, Fabian Falck, Ozan Oktay, Matthew P. Lungren, Maria Teodora Wetscherek, Javier Alvarez-Valle...

  8. [8]

    Cross-counter-repeat attention for enhanced understanding of visual semantics in radiology report generation

    Xiaolei Bo, Feiyang Yang, Feilong Xu, and Xiaoli Zhang. Cross-counter-repeat attention for enhanced understanding of visual semantics in radiology report generation. InPro- ceedings of the 33rd ACM International Conference on Multimedia, pages 4242–4250. Association for Computing Machinery, 2025. 3

  9. [9]

    Baselines for chest X-ray report generation

    William Boag, Tzu-Ming Harry Hsu, Matthew Mcdermott, Gabriela Berner, Emily Alesentzer, and Peter Szolovits. Baselines for chest X-ray report generation. InProceed- ings of the Machine Learning for Health NeurIPS Work- shop, pages 126–140. PMLR, 2020. 5, 6

  10. [10]

    G. W. L. Boland, A. S. Guimaraes, and P. R. Mueller. Ra- diology report turnaround: expectations and solutions.Eu- ropean Radiology, 18(7):1326–1328, 2008. 1

  11. [11]

    Imaging the chest: The chest radiograph

    Joshua Broder. Imaging the chest: The chest radiograph. InDiagnostic Imaging for the Emergency Physician, pages 185–296. Elsevier, 2011. 1

  12. [12]

    A review on lung boundary detection in chest X-rays.International Journal of Computer Assisted Radiology and Surgery, 14(4):563– 576, 2019

    Sema Candemir and Sameer Antani. A review on lung boundary detection in chest X-rays.International Journal of Computer Assisted Radiology and Surgery, 14(4):563– 576, 2019. 1

  13. [13]

    Spatialvlm: Endow- ing vision-language models with spatial reasoning capabil- ities

    Boyuan Chen, Zhuo Xu, Sean Kirmani, Brain Ichter, Dorsa Sadigh, Leonidas Guibas, and Fei Xia. Spatialvlm: Endow- ing vision-language models with spatial reasoning capabil- ities. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14455–14465, 2024. 7

  14. [14]

    Pan, Wen Zhang, Huajun Chen, Fan Yang, Zenan Zhou, and Weipeng Chen

    Mingyang Chen, Linzhuang Sun, Tianpeng Li, sunhaoze, ZhouYijie, Chenzheng Zhu, Haofen Wang, Jeff Z. Pan, Wen Zhang, Huajun Chen, Fan Yang, Zenan Zhou, and Weipeng Chen. ReSearch: Learning to reason with search for LLMs via reinforcement learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys- tems, 2025. 2

  15. [15]

    Generating radiology reports via memory- driven transformer

    Zhihong Chen, Yan Song, Tsung-Hui Chang, and Xi- ang Wan. Generating radiology reports via memory- driven transformer. InProceedings of the 2020 Confer- ence on Empirical Methods in Natural Language Process- ing (EMNLP), pages 1439–1449. Association for Compu- tational Linguistics, 2020. 2, 4, 5, 6

  16. [16]

    Cross-modal memory networks for radiology report gener- ation

    Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan. Cross-modal memory networks for radiology report gener- ation. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th Inter- national Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5904–5914. Association for Computational L...

  17. [17]

    CheXa- gent: Towards a foundation model for chest X-ray interpre- tation

    Zhihong Chen, Maya Varma, Jean-Benoit Delbrouck, Mag- dalini Paschali, Louis Blankemeier, Dave Van Veen, Jeya Maria Jose Valanarasu, Alaa Youssef, Joseph Paul Co- hen, Eduardo Pontes Reis, Emily Tsai, Andrew Johnston, Cameron Olsen, Tanishq Mathew Abraham, Sergios Ga- tidis, Akshay S Chaudhari, and Curtis Langlotz. CheXa- gent: Towards a foundation model ...

  18. [18]

    OraPO: Oracle-educated rein- forcement learning for data-efficient and factual radiology report generation.arXiv preprint arXiv:2509.18600, 2025

    Zhuoxiao Chen, Hongyang Yu, Ying Xu, Yadan Luo, Long Duong, and Yuan-Fang Li. OraPO: Oracle-educated rein- forcement learning for data-efficient and factual radiology report generation.arXiv preprint arXiv:2509.18600, 2025. 1, 3

  19. [19]

    SpatialRGPT: Grounded spatial reasoning in vision- language models

    An-Chieh Cheng, Hongxu Yin, Yang Fu, Qiushan Guo, Ruihan Yang, Jan Kautz, Xiaolong Wang, and Sifei Liu. SpatialRGPT: Grounded spatial reasoning in vision- language models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 7

  20. [20]

    Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaria-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Hols...

  21. [21]

    Cowan, Sharyn L

    Ian A. Cowan, Sharyn L. S. MacDonald, and Richard A. Floyd. Measuring and managing radiologist workload: measuring radiologist reporting times using data from a ra- diology information system.Journal of Medical Imaging and Radiation Oncology, 57(5):558–566, 2013. 1

  22. [22]

    Daniel Coelho de Castro, Aurelia Bustos, Shruthi Ban- nur, Stephanie L. Hyland, Kenza Bouzid, Maria Teodora Wetscherek, Maria Dolores S ´anchez-Valverde, Lara Jaques-P´erez, Lourdes P ´erez-Rodr´ıguez, Kenji Takeda, Jos´e Mar´ıa Salinas-Serrano, Javier Alvarez-Valle, Joaqu´ın Galant-Herrero, and Antonio Pertusa. PadChest-GR: A bilingual chest X-ray datase...

  23. [23]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    DeepSeek-AI. DeepSeek-R1: Incentivizing reasoning ca- pability in LLMs via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 1

  24. [24]

    Automated structured radiology report generation

    Jean-Benoit Delbrouck, Justin Xu, Johannes Moll, Alois Thomas, Zhihong Chen, Sophie Ostmeier, Asfandyar Azhar, Kelvin Zhenghao Li, Andrew Johnston, Christian Bluethgen, Eduardo Pontes Reis, Mohamed S Muneer, Maya Varma, and Curtis Langlotz. Automated structured radiology report generation. InProceedings of the 63rd An- nual Meeting of the Association for ...

  25. [25]

    Kohli, Marc B

    Dina Demner-Fushman, Marc D. Kohli, Marc B. Rosen- man, Sonya E. Shooshan, Laritza Rodriguez, Sameer An- tani, George R. Thoma, and Clement J. McDonald. Prepar- ing a collection of radiology examinations for distribution and retrieval.Journal of the American Medical Informatics Association, 23(2):304–310, 2015. 1, 3, 4, 6, 7

  26. [26]

    Keyword-based ai assistance in the generation of radiology reports: A pilot study.npj Digital Medicine, 8 (1):490, 2025

    Fei Dong, Shouping Nie, Manling Chen, Fangfang Xu, and Qian Li. Keyword-based ai assistance in the generation of radiology reports: A pilot study.npj Digital Medicine, 8 (1):490, 2025. 1, 3

  27. [27]

    Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du, Huiyang Wang, Fuzheng Zhang, et al

    Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, and Ji-Rong Wen. Tool-Star: Empowering LLM- brained multi-tool reasoner via reinforcement learning. arXiv preprint arXiv:2505.16410, 2025. 2

  28. [28]

    Elboardy, Ghada Khoriba, and Essam A

    Ahmed T. Elboardy, Ghada Khoriba, and Essam A. Rashed. Medical AI consensus: A multi-agent framework for ra- diology report generation and evaluation.arXiv preprint arXiv:2509.17353, 2025. 1, 3

  29. [29]

    ReTool: Reinforcement learning for strate- gic tool use in LLMs, 2025

    Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Yu- jia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, and Wanjun Zhong. ReTool: Reinforcement learning for strate- gic tool use in LLMs, 2025. 2

  30. [30]

    Anna Fink, Alexander Rau, Marco Reisert, Fabian Bam- berg, and Maximilian F. Russe. Retrieval-augmented gen- eration with large language models in radiology: From the- ory to practice.Radiology: Artificial Intelligence, 7(4): e240790, 2025. 3

  31. [31]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Google. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next genera- tion agentic capabilities.arXiv preprint arXiv:2507.06261,

  32. [32]

    FactCheXcker: Mitigating measurement hallucinations in chest X-ray report genera- tion models

    Alice Heiman, Xiaoman Zhang, Emma Chen, Sung Eun Kim, and Pranav Rajpurkar. FactCheXcker: Mitigating measurement hallucinations in chest X-ray report genera- tion models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 30787–30796, 2025. 3

  33. [33]

    MetaGPT: Meta programming for a multi- agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and J ¨urgen Schmidhuber. MetaGPT: Meta programming for a multi- agent collaborative framework. InThe Twelfth Interna- tional Conference on Learning Representations...

  34. [34]

    RADAR: Enhancing radiology report generation with supplementary knowledge injection

    Wenjun Hou, Yi Cheng, Kaishuai Xu, Heng Li, Yan Hu, Wenjie Li, and Jiang Liu. RADAR: Enhancing radiology report generation with supplementary knowledge injection. InProceedings of the 63rd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Pa- pers), pages 26366–26381. Association for Computational Linguistics, 2025. 3

  35. [35]

    RRG-Mamba: Efficient radiology report gener- ation with state space model

    Xiaodi Hou, Xiaobo Li, Mingyu Lu, Simiao Wang, and Yi- jia Zhang. RRG-Mamba: Efficient radiology report gener- ation with state space model. InProceedings of the Thirty- Fourth International Joint Conference on Artificial Intel- ligence, IJCAI-25, pages 7410–7418. International Joint Conferences on Artificial Intelligence Organization, 2025

  36. [36]

    Knowledge- driven query network with adaptive cross-view attention for structured radiology report generation

    Xuege Hou, Yali Li, and Shengjin Wang. Knowledge- driven query network with adaptive cross-view attention for structured radiology report generation. InIEEE/CVF Inter- national Conference on Computer Vision Workshops, pages 1234–1243, 2025. 3, 4, 5, 6

  37. [37]

    OWL: Optimized workforce learning for general multi-agent assistance in real-world task automation

    Mengkang Hu, Yuhang Zhou, Wendong Fan, Yuzhou Nie, Ziyu Ye, Bowei Xia, Tao Sun, Zhaoxuan Jin, Yingru Li, Zeyu Zhang, Yifeng Wang, Qianshuo Ye, Bernard Ghanem, Ping Luo, and Guohao Li. OWL: Optimized workforce learning for general multi-agent assistance in real-world task automation. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys...

  38. [38]

    Lungren, and Serena Yeung

    Shih-Cheng Huang, Liyue Shen, Matthew P. Lungren, and Serena Yeung. GLoRIA: A multimodal global-local rep- resentation learning framework for label-efficient medical image recognition. In2021 IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 3922–3931,

  39. [39]

    DAMPER: A dual-stage medical report generation framework with coarse-grained mesh alignment and fine-grained hypergraph matching

    Xiaofei Huang, Wenting Chen, Jie Liu, Qisheng Lu, Xi- aoling Luo, and Linlin Shen. DAMPER: A dual-stage medical report generation framework with coarse-grained mesh alignment and fine-grained hypergraph matching. AAAI Conference on Artificial Intelligence, 39(4):3769– 3778, 2025. 4, 5, 6

  40. [40]

    CmEAA: Cross-modal enhancement and alignment adapter for radiology report generation

    Xiyang Huang, Yingjie Han, Yx L, Runzhi Li, Pengcheng Wu, and Kunli Zhang. CmEAA: Cross-modal enhancement and alignment adapter for radiology report generation. In 10 Proceedings of the 31st International Conference on Com- putational Linguistics, pages 8546–8556. Association for Computational Linguistics, 2025. 3

  41. [41]

    Kiut: Knowledge-injected u-transformer for radiology re- port generation

    Zhongzhen Huang, Xiaofan Zhang, and Shaoting Zhang. Kiut: Knowledge-injected u-transformer for radiology re- port generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19809–19818, 2023. 4, 5, 6

  42. [42]

    Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C

    Stephanie L. Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Mercy Ranjit, Anton Schwaighofer, Fernando P ´erez-Garc´ıa, Valentina Salvatelli, Shaury Sri- vastav, Anja Thieme, Noel Codella, Matthew P. Lun- gren, Maria Teodora Wetscherek, Ozan Oktay, and Javier Alvarez-Valle. MAIRA-1: A specialised large multimodal model for radiology report genera...

  43. [43]

    RadGraph: Extracting clinical entities and relations from radiology reports

    Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven Truong, Du Nguyen Duong Nguyen Duong, Tan Bui, Pierre Chambon, Yuhao Zhang, Matthew Lungren, Andrew Ng, Curtis Langlotz, Pranav Rajpurkar, and Pranav Rajpurkar. RadGraph: Extracting clinical entities and relations from radiology reports. InProceedings of the Neural Information Processing Systems Track on...

  44. [44]

    Dobre, and Merouane Debbah

    Feibo Jiang, Cunhua Pan, Li Dong, Kezhi Wang, Octavia A. Dobre, and Merouane Debbah. From large AI models to agentic AI: A tutorial on future intelligent communications. arXiv preprint arXiv:2505.22311, 2025. 1, 2

  45. [45]

    Advanc- ing medical radiograph representation learning: A hybrid pre-training paradigm with multilevel semantic granularity

    Hanqi Jiang, Xixuan Hao, Yuzhou Huang, Chong Ma, Jiaxun Zhang, Yi Pan, and Ruimao Zhang. Advanc- ing medical radiograph representation learning: A hybrid pre-training paradigm with multilevel semantic granularity. InEuropean Conference on Computer Vision Workshops, pages 16–33, 2025. 3

  46. [46]

    CoMT: Chain-of-medical-thought reduces hallucination in medical report generation

    Yue Jiang, Jiawei Chen, Dingkang Yang, Mingcheng Li, Shunli Wang, Tong Wu, Ke Li, and Lihua Zhang. CoMT: Chain-of-medical-thought reduces hallucination in medical report generation. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2025. 3

  47. [47]

    Reason like a radiol- ogist: Chain-of-thought and reinforcement learning for ver- ifiable report generation.arXiv preprint arXiv:2504.18453,

    Peiyuan Jing, Kinhei Lee, Zhenxuan Zhang, Huichi Zhou, Zhengqing Yuan, Zhifan Gao, Lei Zhu, Giorgos Papanasta- siou, Yingying Fang, and Guang Yang. Reason like a radiol- ogist: Chain-of-thought and reinforcement learning for ver- ifiable report generation.arXiv preprint arXiv:2504.18453,

  48. [48]

    Alistair E. W. Johnson, Tom J. Pollard, Seth J. Berkowitz, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih-ying Deng, Roger G. Mark, and Steven Horng. MIMIC-CXR, a de-identified publicly available database of chest radio- graphs with free-text reports.Scientific Data, 6(1):317,

  49. [49]

    CT-GRAPH: Hierarchical graph attention network for anatomy-guided CT report gen- eration.arXiv preprint arXiv:2508.05375, 2025

    Hamza Kalisch, Fabian H ¨orst, Jens Kleesiek, Ken Her- rmann, and Constantin Seibold. CT-GRAPH: Hierarchical graph attention network for anatomy-guided CT report gen- eration.arXiv preprint arXiv:2508.05375, 2025. 3

  50. [50]

    MDA- gents: An adaptive collaboration of LLMs for medical decision-making

    Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik Siu Chan, Xuhai Xu, Daniel McDuff, Hyeonhoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, and Hae Won Park. MDA- gents: An adaptive collaboration of LLMs for medical decision-making. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 1

  51. [51]

    Look & mark: Leveraging radiologist eye fixations and bounding boxes in multimodal large language models for chest X-ray report generation

    Yunsoo Kim, Jinge Wu, Su Hwan Kim, Pardeep Vasudev, Jiashu Shen, and Honghan Wu. Look & mark: Leveraging radiologist eye fixations and bounding boxes in multimodal large language models for chest X-ray report generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 17680–17694. Association for Computa- tional Linguistics, 2025. 3

  52. [52]

    From pre-trained language models to agen- tic AI: Evolution and architectures for autonomous intelli- gence.Preprints, 2025

    Anis Koubaa. From pre-trained language models to agen- tic AI: Evolution and architectures for autonomous intelli- gence.Preprints, 2025. 1, 2

  53. [53]

    Efficient memory management for large language model serving with pagedattention

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626. Association for Computing Ma- chinery, 2023. 5

  54. [54]

    Med- R1: Reinforcement learning for generalizable medical reasoning in vision-language models.arXiv preprint arXiv:2503.13939, 2025

    Yuxiang Lai, Jike Zhong, Ming Li, Shitian Zhao, Yuheng Li, Konstantinos Psounis, and Xiaofeng Yang. Med- R1: Reinforcement learning for generalizable medical reasoning in vision-language models.arXiv preprint arXiv:2503.13939, 2025. 3

  55. [55]

    CLARIFID: Improving radiology report generation by re- inforcing clinically accurate impressions and enforcing de- tailed findings.arXiv preprint arXiv:2507.17234, 2025

    Kyeongkyu Lee, Seonghwan Yoon, and Hongki Lim. CLARIFID: Improving radiology report generation by re- inforcing clinically accurate impressions and enforcing de- tailed findings.arXiv preprint arXiv:2507.17234, 2025. 1, 3, 4, 5

  56. [56]

    CXR-LLaV A: a multimodal large language model for interpreting chest X-ray images.European Radi- ology, 35(7):4374–4386, 2025

    Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, and Soon Ho Yoon. CXR-LLaV A: a multimodal large language model for interpreting chest X-ray images.European Radi- ology, 35(7):4374–4386, 2025. 3

  57. [57]

    CAMEL: Com- municative agents for ”mind” exploration of large language model society

    Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Com- municative agents for ”mind” exploration of large language model society. InThirty-seventh Conference on Neural In- formation Processing Systems, 2023. 1, 2

  58. [58]

    Con- trastive learning with counterfactual explanations for radi- ology report generation

    Mingjie Li, Haokun Lin, Liang Qiu, Xiaodan Liang, Ling Chen, Abdulmotaleb Elsaddik, and Xiaojun Chang. Con- trastive learning with counterfactual explanations for radi- ology report generation. InEuropean Conference on Com- puter Vision, pages 162–180, 2024. 4, 5, 6

  59. [59]

    A survey on llm-based multi-agent systems: workflow, infras- tructure, and challenges.Vicinagearth, 1(1):9, 2024

    Xinyi Li, Sai Wang, Siqi Zeng, Yu Wu, and Yi Yang. A survey on llm-based multi-agent systems: workflow, infras- tructure, and challenges.Vicinagearth, 1(1):9, 2024. 1, 2

  60. [60]

    ToRL: Scal- ing tool-integrated RL.arXiv preprint arXiv:2503.23383,

    Xuefeng Li, Haoyang Zou, and Pengfei Liu. ToRL: Scal- ing tool-integrated RL.arXiv preprint arXiv:2503.23383,

  61. [61]

    Au- tomatic radiology report generation with deep learning: a comprehensive review of methods and advances.Artificial Intelligence Review, 58(11):344, 2025

    Yilin Li, Chao Kong, Guosheng Zhao, and Zijian Zhao. Au- tomatic radiology report generation with deep learning: a comprehensive review of methods and advances.Artificial Intelligence Review, 58(11):344, 2025. 2

  62. [62]

    S-RRG-Bench: 11 Structured radiology report generation with fine-grained evaluation framework.Meta-Radiology, page 100171,

    Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, and Luping Zhou. S-RRG-Bench: 11 Structured radiology report generation with fine-grained evaluation framework.Meta-Radiology, page 100171,

  63. [63]

    In-the-flow agentic system optimization for effective planning and tool use.arXiv preprint arXiv:2510.05592,

    Zhuofeng Li, Haoxiang Zhang, Seungju Han, Sheng Liu, Jianwen Xie, Yu Zhang, Yejin Choi, James Zou, and Pan Lu. In-the-flow agentic system optimization for effective planning and tool use.arXiv preprint arXiv:2510.05592,

  64. [64]

    Encouraging divergent thinking in large language mod- els through multi-agent debate

    Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. Encouraging divergent thinking in large language mod- els through multi-agent debate. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17889–17904. Association for Compu- tational Linguistics, 2024. 2

  65. [65]

    ROUGE: A package for automatic evalua- tion of summaries

    Chin-Yew Lin. ROUGE: A package for automatic evalua- tion of summaries. InText Summarization Branches Out, pages 74–81. Association for Computational Linguistics,

  66. [66]

    A foundation model for chest X-ray inter- pretation with grounded reasoning via online reinforcement learning.arXiv preprint arXiv:2509.03906, 2025

    Qika Lin, Yifan Zhu, Bin Pu, Ling Huang, Haoran Luo, Jingying Ma, Zhen Peng, Tianzhe Zhao, Fangzhi Xu, Jian Zhang, Kai He, Zhonghong Ou, Swapnil Mishra, and Mengling Feng. A foundation model for chest X-ray inter- pretation with grounded reasoning via online reinforcement learning.arXiv preprint arXiv:2509.03906, 2025. 1, 3, 4, 5

  67. [67]

    Bootstrapping large language models for radiology report generation.AAAI Conference on Artificial Intelligence, 38(17):18635–18643, 2024

    Chang Liu, Yuanhe Tian, Weidong Chen, Yan Song, and Yongdong Zhang. Bootstrapping large language models for radiology report generation.AAAI Conference on Artificial Intelligence, 38(17):18635–18643, 2024. 4, 5, 6

  68. [68]

    Competence-based multimodal curriculum learning for medical report genera- tion

    Fenglin Liu, Shen Ge, and Xian Wu. Competence-based multimodal curriculum learning for medical report genera- tion. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th Inter- national Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3001–3012. Association for Computational ...

  69. [69]

    Exploring and distilling posterior and prior knowledge for radiology report generation

    Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, and Yuexian Zou. Exploring and distilling posterior and prior knowledge for radiology report generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13748– 13757, 2021. 5, 6

  70. [70]

    Clinically accurate chest X-ray report generation

    Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDer- mott, Willie Boag, Wei-Hung Weng, Peter Szolovits, and Marzyeh Ghassemi. Clinically accurate chest X-ray report generation. InProceedings of the 4th Machine Learning for Healthcare Conference, pages 249–269. PMLR, 2019. 5, 6

  71. [71]

    Structural entities extraction and patient indications incorporation for chest X-ray report generation

    Kang Liu, Zhuoqi Ma, Xiaolu Kang, Zhusi Zhong, Zhicheng Jiao, Grayson Baird, Harrison Bai, and Qiguang Miao. Structural entities extraction and patient indications incorporation for chest X-ray report generation. Inproceed- ings of Medical Image Computing and Computer Assisted Intervention. Springer Nature Switzerland, 2024. 4, 5

  72. [72]

    Enhanced contrastive learning with multi-view longitudinal data for chest X-ray report generation

    Kang Liu, Zhuoqi Ma, Xiaolu Kang, Yunan Li, Kun Xie, Zhicheng Jiao, and Qiguang Miao. Enhanced contrastive learning with multi-view longitudinal data for chest X-ray report generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10348–10359, 2025. 1, 3, 5

  73. [73]

    In-context learning for zero- shot medical report generation

    Rui Liu, Mingjie Li, Shen Zhao, Ling Chen, Xiaojun Chang, and Lina Yao. In-context learning for zero- shot medical report generation. InProceedings of the 32nd ACM International Conference on Multimedia, pages 8721–8730. Association for Computing Machinery, 2024. 4, 5, 6

  74. [74]

    HC-LLM: Historical-constrained large language models for radiology report generation

    Tengfei Liu, Jiapu Wang, Yongli Hu, Mingjie Li, Junfei Yi, Xiaojun Chang, Junbin Gao, and Baocai Yin. HC-LLM: Historical-constrained large language models for radiology report generation. InAAAI Conference on Artificial Intelli- gence, pages 5595–5603, 2025. 3

  75. [75]

    A generalist medi- cal language model for disease diagnosis assistance.Nature Medicine, 31(3):932–942, 2025

    Xiaohong Liu, Hao Liu, Guoxing Yang, Zeyu Jiang, Shuguang Cui, Zhaoze Zhang, Huan Wang, Liyuan Tao, Yongchang Sun, Zhu Song, Tianpei Hong, Jin Yang, Tian- run Gao, Jiangjiang Zhang, Xiaohu Li, Jing Zhang, Ye Sang, Zhao Yang, Kanmin Xue, Song Wu, Ping Zhang, Jian Yang, Chunli Song, and Guangyu Wang. A generalist medi- cal language model for disease diagnos...

  76. [76]

    From observation to concept: A flexible multi-view paradigm for medical report generation

    Zhizhe Liu, Zhenfeng Zhu, Shuai Zheng, Yawei Zhao, Kun- lun He, and Yao Zhao. From observation to concept: A flexible multi-view paradigm for medical report generation. IEEE Transactions on Multimedia, 26:5987–5995, 2024. 4, 5, 6

  77. [77]

    Understanding R1-Zero-like training: A critical perspective

    Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. Understanding R1-Zero-like training: A critical perspective. InSecond Conference on Language Modeling, 2025. 2

  78. [78]

    Part I: Tricks or traps? a deep dive into RL for LLM reasoning.arXiv preprint arXiv:2508.08221, 2025

    Zihe Liu, Jiashun Liu, Yancheng He, Weixun Wang, Jia- heng Liu, Ling Pan, Xinyu Hu, Shaopan Xiong, Ju Huang, Jian Hu, Shengyi Huang, Johan Obando-Ceron, Siran Yang, Jiamang Wang, Wenbo Su, and Bo Zheng. Part I: Tricks or traps? a deep dive into RL for LLM reasoning.arXiv preprint arXiv:2508.08221, 2025. 2

  79. [79]

    CXRAgent: Director- orchestrated multi-stage reasoning for chest X-ray interpre- tation.arXiv preprint arXiv:2510.21324, 2025

    Jinhui Lou, Yan Yang, Zhou Yu, Zhenqi Fu, Weidong Han, Qingming Huang, and Jun Yu. CXRAgent: Director- orchestrated multi-stage reasoning for chest X-ray interpre- tation.arXiv preprint arXiv:2510.21324, 2025. 1, 3

  80. [80]

    Eye-gaze guided multi-modal alignment for medical repre- sentation learning

    Chong Ma, Hanqi Jiang, Wenting Chen, Yiwei Li, Zihao Wu, Xiaowei Yu, Zhengliang Liu, Lei Guo, Dajiang Zhu, Tuo Zhang, Dinggang Shen, Tianming Liu, and Xiang Li. Eye-gaze guided multi-modal alignment for medical repre- sentation learning. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 3

Showing first 80 references.