arxiv: 2603.16876 · v2 · submitted 2026-02-17 · 💻 cs.CV · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation

Kaito Baba , Risa Kishikawa , Satoshi Kodera

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:48 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords radiology report generationmulti-agent reinforcement learningmulti-modal learningchest X-rayclinical efficacyMIMIC-CXRlaterality consistency

0 comments

The pith

Multi-agent reinforcement learning optimizes region-specific and global agents to generate radiology reports with better clinical accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MARL-Rad, a framework that applies multi-modal multi-agent reinforcement learning directly to radiology report generation. It decomposes chest X-ray interpretation into region-specific agents plus one global integrating agent, then jointly optimizes the full system on policy using clinically verifiable rewards. This replaces post-hoc assembly of fixed language models with role-specific training inside the deployed workflow. Experiments on MIMIC-CXR and IU X-ray show gains on RadGraph, CheXbert, and GREEN scores plus improved laterality and detail, with blinded clinicians rating the outputs comparable to ground-truth reports.

Core claim

MARL-Rad trains the entire agentic system on-policy within the radiology workflow. Chest X-ray interpretation is decomposed into region-specific agents and a global integrating agent whose outputs are jointly optimized by reinforcement learning driven by clinically verifiable reward signals. On the MIMIC-CXR and IU X-ray datasets the method reaches state-of-the-art clinical efficacy on RadGraph, CheXbert, and GREEN metrics, raises laterality consistency, produces more accurate and detailed reports, and yields outputs that a blinded clinician evaluation finds clinically comparable to ground-truth reports.

What carries the argument

Decomposition into region-specific multi-modal agents coordinated by a global integrating agent, jointly optimized on-policy via reinforcement learning with clinically verifiable rewards.

If this is right

Achieves state-of-the-art clinical efficacy scores on RadGraph, CheXbert, and GREEN for MIMIC-CXR and IU X-ray.
Improves laterality consistency in generated reports.
Produces more accurate and detailed radiology reports.
Yields outputs judged clinically comparable to ground-truth reports in blinded clinician evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same region-plus-global decomposition may improve agentic systems for other medical imaging modalities where local detail and global coherence must be balanced.
Joint policy optimization could reduce the inconsistencies often seen when fixed language models are assembled into medical report pipelines after training.
Scaling the number or granularity of region agents offers a testable route to finer report quality on complex or multi-finding cases.

Load-bearing premise

Clinically verifiable rewards can be defined to accurately guide joint optimization of the multi-agent system without introducing biases or failing to capture key aspects of report quality.

What would settle it

A large-scale blinded study in which expert radiologists rate MARL-Rad reports no better than non-optimized agent baselines on diagnostic utility and error rate would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2603.16876 by Kaito Baba, Risa Kishikawa, Satoshi Kodera.

**Figure 2.** Figure 2: Overview of the proposed multi-agent RL framework. Region-specific agents and global integrating agent collaboratively [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Example output from MARL-Rad. Region-specific agents consistently focus on their assigned regions and generate regional [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

We propose MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation that trains the entire agentic system on policy within its deployed radiology workflow. MARL-Rad addresses the limitation of post-hoc agentization, where fixed LLMs are organized into hand-designed agentic workflows without being optimized for their assigned roles. Our framework decomposes chest X-ray interpretation into region-specific agents and a global integrating agent, and jointly optimizes them using clinically verifiable rewards. Experiments on the MIMIC-CXR and IU X-ray datasets show that MARL-Rad consistently improves clinical efficacy metrics such as RadGraph, CheXbert, and GREEN scores, achieving state-of-the-art clinical efficacy performance. Further analyses show that MARL-Rad improves laterality consistency and produces more accurate and detailed reports. A blinded clinician evaluation further suggests that MARL-Rad produces reports clinically comparable to ground-truth reports.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MARL-Rad introduces joint multi-agent RL training for radiology reports but the reward setup lacks enough detail to back the SOTA claims.

read the letter

The paper's main move is to replace post-hoc agent workflows with a system that decomposes chest X-ray reading into region-specific agents plus one global integrator, then trains the whole thing end-to-end with reinforcement learning on clinical rewards. That joint optimization is the actual novelty here, and it is a clean way to address the limitation they name in the abstract. On MIMIC-CXR and IU X-ray they report gains in RadGraph, CheXbert, and GREEN scores, plus better laterality consistency and more detailed reports, with a blinded clinician check suggesting parity with ground truth. Those are the concrete results worth noting. The soft spot is the reward design. The abstract calls the rewards clinically verifiable but does not spell out their exact form, weighting, or how they handle coordination across agents. If those rewards are built from the same automated labelers used in evaluation, the loop risks rewarding metric alignment rather than real clinical improvement. No training curves, ablation on reward components, or statistical tests appear in the summary, so it is difficult to judge whether the reported gains are robust. This is the kind of work that matters to groups already building report-generation models or experimenting with multi-agent RL in medical imaging. A reader who wants a new training paradigm to try could extract useful pieces even if the current numbers need more support. I would send it to peer review so referees can examine the reward definitions and experimental controls directly.

Referee Report

3 major / 2 minor

Summary. The paper proposes MARL-Rad, a multi-modal multi-agent reinforcement learning framework for radiology report generation from chest X-rays. It decomposes interpretation into region-specific agents plus a global integrating agent and jointly optimizes the system end-to-end using clinically verifiable rewards. Experiments on MIMIC-CXR and IU X-ray report state-of-the-art results on RadGraph, CheXbert, and GREEN scores, plus gains in laterality consistency, report detail, and blinded clinician equivalence to ground-truth reports.

Significance. If the experimental claims are substantiated, the work would be significant as one of the first demonstrations of end-to-end multi-agent RL optimization for medical report generation, moving beyond post-hoc LLM agent workflows. The use of region-specific agents, clinically grounded rewards, and clinician evaluation are positive elements that could influence future agentic systems in radiology.

major comments (3)

[Abstract / Methods] Abstract and Methods: The central claim that clinically verifiable rewards enable joint optimization of region-specific and global agents rests on unspecified reward definitions, weighting, shaping, and handling of sparse signals. Without these details it is impossible to assess whether reported gains on RadGraph/CheXbert/GREEN reflect genuine clinical improvement or metric-specific optimization.
[Experiments] Experiments: The manuscript does not report statistical significance tests, confidence intervals, or ablation studies isolating the contribution of the multi-agent RL component versus single-agent or supervised baselines, undermining the SOTA and laterality-consistency claims.
[Experiments] Experiments: Potential circularity between reward signals and evaluation metrics (both drawn from entity-extraction and label-accuracy tools) is not addressed; explicit discussion or an independent held-out clinical metric is required to rule out reward hacking.

minor comments (2)

Clarify the precise multi-modal fusion mechanism between image features and text tokens inside each agent.
Add dataset split statistics, preprocessing steps, and hyper-parameter tables to support reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Revisions will be incorporated into the next version of the manuscript to improve clarity, statistical rigor, and discussion of potential limitations.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and Methods: The central claim that clinically verifiable rewards enable joint optimization of region-specific and global agents rests on unspecified reward definitions, weighting, shaping, and handling of sparse signals. Without these details it is impossible to assess whether reported gains on RadGraph/CheXbert/GREEN reflect genuine clinical improvement or metric-specific optimization.

Authors: We agree that additional detail on the reward formulation is necessary. In the revised Methods section we will explicitly define each component of the clinically verifiable rewards (entity-level matching from RadGraph, label accuracy from CheXbert, and GREEN score contributions), specify the weighting coefficients used to combine them, describe the reward-shaping functions applied to address sparsity, and explain how the composite reward is back-propagated through the multi-agent policy gradient updates. These additions will allow readers to evaluate whether the reported gains reflect genuine clinical improvement. revision: yes
Referee: [Experiments] Experiments: The manuscript does not report statistical significance tests, confidence intervals, or ablation studies isolating the contribution of the multi-agent RL component versus single-agent or supervised baselines, undermining the SOTA and laterality-consistency claims.

Authors: We accept this criticism. The revised Experiments section will include paired statistical significance tests (with p-values), 95% confidence intervals for all metrics on both MIMIC-CXR and IU X-ray, and a set of ablation studies that isolate the multi-agent RL component against single-agent RL and supervised-learning baselines. These results will be presented in new tables and will directly support the SOTA and laterality-consistency claims. revision: yes
Referee: [Experiments] Experiments: Potential circularity between reward signals and evaluation metrics (both drawn from entity-extraction and label-accuracy tools) is not addressed; explicit discussion or an independent held-out clinical metric is required to rule out reward hacking.

Authors: We acknowledge the need for explicit discussion of this issue. The revised manuscript will add a dedicated paragraph in the Experiments section that analyzes the overlap between reward signals and evaluation metrics and explains why the clinical grounding of the rewards reduces the risk of pure metric hacking. We will also expand the existing blinded clinician evaluation (already performed on a held-out set) to serve as an independent validation metric not used during reward computation. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on external empirical validation

full rationale

The paper's derivation chain consists of a multi-agent RL framework whose policy is optimized via clinically verifiable rewards and then evaluated on standard public datasets (MIMIC-CXR, IU X-ray) using independent automated metrics (RadGraph, CheXbert, GREEN) plus blinded clinician review. No equation or step reduces a claimed prediction to a fitted input by construction, nor does any load-bearing premise collapse to a self-citation whose content is itself unverified. The reward design is presented as an external modeling choice rather than a tautological re-expression of the evaluation scores.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that clinically meaningful rewards can be constructed to optimize report generation. No free parameters or invented entities are explicitly described in the abstract.

axioms (1)

domain assumption Clinically verifiable rewards can be defined to measure and optimize report quality for the multi-agent system.
This underpins the joint training of agents described in the abstract.

pith-pipeline@v0.9.0 · 5450 in / 1258 out tokens · 29644 ms · 2026-05-15T21:48:48.315451+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The final reward is defined as the unweighted sum of these three components [CheXbert, RadGraph F1, ROUGE-L]
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MARL-Rad consists of region-specific agents ... and a global integrating agent ... jointly optimized through reinforcement learning

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

149 extracted references · 149 canonical work pages · 14 internal anchors

[1]

ARDGen: Augmentation regularization for domain- generalized medical report generation

Syed Bilal Ahsan, Muhammad Ikhalas, Muhammad Muza- mil Khan, Sana Ullah, and Muhammad Zaigham Za- heer. ARDGen: Augmentation regularization for domain- generalized medical report generation. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition Work- shops, pages 6526–6535, 2025. 1, 3, 4

work page 2025
[2]

A review on detection of pneumonia in chest X- ray images using neural networks.Journal of Biomedical Physics and Engineering, 12(6):551–558, 2022

Daniel Joseph Alapat, Malavika Venu Menon, and Sharmila Ashok. A review on detection of pneumonia in chest X- ray images using neural networks.Journal of Biomedical Physics and Engineering, 12(6):551–558, 2022. 1

work page 2022
[3]

Multi-resolution pathology-language pre-training model with text-guided visual representation

Shahad Albastaki, Anabia Sohail, Iyyakutti Iyappan Gana- pathi, Basit Alawode, Asim Khan, Sajid Javed, Naoufel Werghi, Mohammed Bennamoun, and Arif Mahmood. Multi-resolution pathology-language pre-training model with text-guided visual representation. InIEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 25907–25919, 2025. 3

work page 2025
[4]

JRadiEvo: A japanese radiology report generation model enhanced by evolu- tionary optimization of model merging.arXiv preprint arXiv:2411.09933, 2024

Kaito Baba, Ryota Yagi, Junichiro Takahashi, Risa Kishikawa, and Satoshi Kodera. JRadiEvo: A japanese radiology report generation model enhanced by evolu- tionary optimization of model merging.arXiv preprint arXiv:2411.09933, 2024. 1, 4

work page arXiv 2024
[5]

Prover Agent: An agent-based framework for formal mathematical proofs.arXiv preprint arXiv:2506.19923,

Kaito Baba, Chaoran Liu, Shuhei Kurita, and Akiyoshi San- nai. Prover Agent: An agent-based framework for formal mathematical proofs.arXiv preprint arXiv:2506.19923,

work page arXiv
[6]

METEOR: An auto- matic metric for MT evaluation with improved correlation with human judgments

Satanjeev Banerjee and Alon Lavie. METEOR: An auto- matic metric for MT evaluation with improved correlation with human judgments. InProceedings of the ACL Work- shop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72. Association for Computational Linguistics, 2005. 4

work page 2005
[7]

Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, An- ton Schwaighofer, Anja Thieme, Sam Bond-Taylor, Max- imilian Ilse, Fernando P ´erez-Garc´ıa, Valentina Salvatelli, Harshita Sharma, Felix Meissen, Mercy Ranjit, Shaury Sri- vastav, Julia Gong, Noel C. F. Codella, Fabian Falck, Ozan Oktay, Matthew P. Lungren, Maria Teodora Wetscherek, Javier Alvarez-Valle...

work page arXiv 2024
[8]

Cross-counter-repeat attention for enhanced understanding of visual semantics in radiology report generation

Xiaolei Bo, Feiyang Yang, Feilong Xu, and Xiaoli Zhang. Cross-counter-repeat attention for enhanced understanding of visual semantics in radiology report generation. InPro- ceedings of the 33rd ACM International Conference on Multimedia, pages 4242–4250. Association for Computing Machinery, 2025. 3

work page 2025
[9]

Baselines for chest X-ray report generation

William Boag, Tzu-Ming Harry Hsu, Matthew Mcdermott, Gabriela Berner, Emily Alesentzer, and Peter Szolovits. Baselines for chest X-ray report generation. InProceed- ings of the Machine Learning for Health NeurIPS Work- shop, pages 126–140. PMLR, 2020. 5, 6

work page 2020
[10]

G. W. L. Boland, A. S. Guimaraes, and P. R. Mueller. Ra- diology report turnaround: expectations and solutions.Eu- ropean Radiology, 18(7):1326–1328, 2008. 1

work page 2008
[11]

Imaging the chest: The chest radiograph

Joshua Broder. Imaging the chest: The chest radiograph. InDiagnostic Imaging for the Emergency Physician, pages 185–296. Elsevier, 2011. 1

work page 2011
[12]

A review on lung boundary detection in chest X-rays.International Journal of Computer Assisted Radiology and Surgery, 14(4):563– 576, 2019

Sema Candemir and Sameer Antani. A review on lung boundary detection in chest X-rays.International Journal of Computer Assisted Radiology and Surgery, 14(4):563– 576, 2019. 1

work page 2019
[13]

Spatialvlm: Endow- ing vision-language models with spatial reasoning capabil- ities

Boyuan Chen, Zhuo Xu, Sean Kirmani, Brain Ichter, Dorsa Sadigh, Leonidas Guibas, and Fei Xia. Spatialvlm: Endow- ing vision-language models with spatial reasoning capabil- ities. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14455–14465, 2024. 7

work page 2024
[14]

Pan, Wen Zhang, Huajun Chen, Fan Yang, Zenan Zhou, and Weipeng Chen

Mingyang Chen, Linzhuang Sun, Tianpeng Li, sunhaoze, ZhouYijie, Chenzheng Zhu, Haofen Wang, Jeff Z. Pan, Wen Zhang, Huajun Chen, Fan Yang, Zenan Zhou, and Weipeng Chen. ReSearch: Learning to reason with search for LLMs via reinforcement learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys- tems, 2025. 2

work page 2025
[15]

Generating radiology reports via memory- driven transformer

Zhihong Chen, Yan Song, Tsung-Hui Chang, and Xi- ang Wan. Generating radiology reports via memory- driven transformer. InProceedings of the 2020 Confer- ence on Empirical Methods in Natural Language Process- ing (EMNLP), pages 1439–1449. Association for Compu- tational Linguistics, 2020. 2, 4, 5, 6

work page 2020
[16]

Cross-modal memory networks for radiology report gener- ation

Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan. Cross-modal memory networks for radiology report gener- ation. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th Inter- national Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5904–5914. Association for Computational L...

work page 2021
[17]

CheXa- gent: Towards a foundation model for chest X-ray interpre- tation

Zhihong Chen, Maya Varma, Jean-Benoit Delbrouck, Mag- dalini Paschali, Louis Blankemeier, Dave Van Veen, Jeya Maria Jose Valanarasu, Alaa Youssef, Joseph Paul Co- hen, Eduardo Pontes Reis, Emily Tsai, Andrew Johnston, Cameron Olsen, Tanishq Mathew Abraham, Sergios Ga- tidis, Akshay S Chaudhari, and Curtis Langlotz. CheXa- gent: Towards a foundation model ...

work page 2024
[18]

OraPO: Oracle-educated rein- forcement learning for data-efficient and factual radiology report generation.arXiv preprint arXiv:2509.18600, 2025

Zhuoxiao Chen, Hongyang Yu, Ying Xu, Yadan Luo, Long Duong, and Yuan-Fang Li. OraPO: Oracle-educated rein- forcement learning for data-efficient and factual radiology report generation.arXiv preprint arXiv:2509.18600, 2025. 1, 3

work page arXiv 2025
[19]

SpatialRGPT: Grounded spatial reasoning in vision- language models

An-Chieh Cheng, Hongxu Yin, Yang Fu, Qiushan Guo, Ruihan Yang, Jan Kautz, Xiaolong Wang, and Sifei Liu. SpatialRGPT: Grounded spatial reasoning in vision- language models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 7

work page 2024
[20]

Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaria-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Hols...

work page arXiv 2024
[21]

Cowan, Sharyn L

Ian A. Cowan, Sharyn L. S. MacDonald, and Richard A. Floyd. Measuring and managing radiologist workload: measuring radiologist reporting times using data from a ra- diology information system.Journal of Medical Imaging and Radiation Oncology, 57(5):558–566, 2013. 1

work page 2013
[22]

Daniel Coelho de Castro, Aurelia Bustos, Shruthi Ban- nur, Stephanie L. Hyland, Kenza Bouzid, Maria Teodora Wetscherek, Maria Dolores S ´anchez-Valverde, Lara Jaques-P´erez, Lourdes P ´erez-Rodr´ıguez, Kenji Takeda, Jos´e Mar´ıa Salinas-Serrano, Javier Alvarez-Valle, Joaqu´ın Galant-Herrero, and Antonio Pertusa. PadChest-GR: A bilingual chest X-ray datase...

work page 2025
[23]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI. DeepSeek-R1: Incentivizing reasoning ca- pability in LLMs via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 1

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Automated structured radiology report generation

Jean-Benoit Delbrouck, Justin Xu, Johannes Moll, Alois Thomas, Zhihong Chen, Sophie Ostmeier, Asfandyar Azhar, Kelvin Zhenghao Li, Andrew Johnston, Christian Bluethgen, Eduardo Pontes Reis, Mohamed S Muneer, Maya Varma, and Curtis Langlotz. Automated structured radiology report generation. InProceedings of the 63rd An- nual Meeting of the Association for ...

work page 2025
[25]

Kohli, Marc B

Dina Demner-Fushman, Marc D. Kohli, Marc B. Rosen- man, Sonya E. Shooshan, Laritza Rodriguez, Sameer An- tani, George R. Thoma, and Clement J. McDonald. Prepar- ing a collection of radiology examinations for distribution and retrieval.Journal of the American Medical Informatics Association, 23(2):304–310, 2015. 1, 3, 4, 6, 7

work page 2015
[26]

Keyword-based ai assistance in the generation of radiology reports: A pilot study.npj Digital Medicine, 8 (1):490, 2025

Fei Dong, Shouping Nie, Manling Chen, Fangfang Xu, and Qian Li. Keyword-based ai assistance in the generation of radiology reports: A pilot study.npj Digital Medicine, 8 (1):490, 2025. 1, 3

work page 2025
[27]

Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du, Huiyang Wang, Fuzheng Zhang, et al

Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, and Ji-Rong Wen. Tool-Star: Empowering LLM- brained multi-tool reasoner via reinforcement learning. arXiv preprint arXiv:2505.16410, 2025. 2

work page arXiv 2025
[28]

Elboardy, Ghada Khoriba, and Essam A

Ahmed T. Elboardy, Ghada Khoriba, and Essam A. Rashed. Medical AI consensus: A multi-agent framework for ra- diology report generation and evaluation.arXiv preprint arXiv:2509.17353, 2025. 1, 3

work page arXiv 2025
[29]

ReTool: Reinforcement learning for strate- gic tool use in LLMs, 2025

Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Yu- jia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, and Wanjun Zhong. ReTool: Reinforcement learning for strate- gic tool use in LLMs, 2025. 2

work page 2025
[30]

Anna Fink, Alexander Rau, Marco Reisert, Fabian Bam- berg, and Maximilian F. Russe. Retrieval-augmented gen- eration with large language models in radiology: From the- ory to practice.Radiology: Artificial Intelligence, 7(4): e240790, 2025. 3

work page 2025
[31]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Google. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next genera- tion agentic capabilities.arXiv preprint arXiv:2507.06261,

work page internal anchor Pith review Pith/arXiv arXiv
[32]

FactCheXcker: Mitigating measurement hallucinations in chest X-ray report genera- tion models

Alice Heiman, Xiaoman Zhang, Emma Chen, Sung Eun Kim, and Pranav Rajpurkar. FactCheXcker: Mitigating measurement hallucinations in chest X-ray report genera- tion models. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 30787–30796, 2025. 3

work page 2025
[33]

MetaGPT: Meta programming for a multi- agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and J ¨urgen Schmidhuber. MetaGPT: Meta programming for a multi- agent collaborative framework. InThe Twelfth Interna- tional Conference on Learning Representations...

work page 2024
[34]

RADAR: Enhancing radiology report generation with supplementary knowledge injection

Wenjun Hou, Yi Cheng, Kaishuai Xu, Heng Li, Yan Hu, Wenjie Li, and Jiang Liu. RADAR: Enhancing radiology report generation with supplementary knowledge injection. InProceedings of the 63rd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Pa- pers), pages 26366–26381. Association for Computational Linguistics, 2025. 3

work page 2025
[35]

RRG-Mamba: Efficient radiology report gener- ation with state space model

Xiaodi Hou, Xiaobo Li, Mingyu Lu, Simiao Wang, and Yi- jia Zhang. RRG-Mamba: Efficient radiology report gener- ation with state space model. InProceedings of the Thirty- Fourth International Joint Conference on Artificial Intel- ligence, IJCAI-25, pages 7410–7418. International Joint Conferences on Artificial Intelligence Organization, 2025

work page 2025
[36]

Knowledge- driven query network with adaptive cross-view attention for structured radiology report generation

Xuege Hou, Yali Li, and Shengjin Wang. Knowledge- driven query network with adaptive cross-view attention for structured radiology report generation. InIEEE/CVF Inter- national Conference on Computer Vision Workshops, pages 1234–1243, 2025. 3, 4, 5, 6

work page 2025
[37]

OWL: Optimized workforce learning for general multi-agent assistance in real-world task automation

Mengkang Hu, Yuhang Zhou, Wendong Fan, Yuzhou Nie, Ziyu Ye, Bowei Xia, Tao Sun, Zhaoxuan Jin, Yingru Li, Zeyu Zhang, Yifeng Wang, Qianshuo Ye, Bernard Ghanem, Ping Luo, and Guohao Li. OWL: Optimized workforce learning for general multi-agent assistance in real-world task automation. InThe Thirty-ninth Annual Conference on Neural Information Processing Sys...

work page 2025
[38]

Lungren, and Serena Yeung

Shih-Cheng Huang, Liyue Shen, Matthew P. Lungren, and Serena Yeung. GLoRIA: A multimodal global-local rep- resentation learning framework for label-efficient medical image recognition. In2021 IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 3922–3931,

work page
[39]

DAMPER: A dual-stage medical report generation framework with coarse-grained mesh alignment and fine-grained hypergraph matching

Xiaofei Huang, Wenting Chen, Jie Liu, Qisheng Lu, Xi- aoling Luo, and Linlin Shen. DAMPER: A dual-stage medical report generation framework with coarse-grained mesh alignment and fine-grained hypergraph matching. AAAI Conference on Artificial Intelligence, 39(4):3769– 3778, 2025. 4, 5, 6

work page 2025
[40]

CmEAA: Cross-modal enhancement and alignment adapter for radiology report generation

Xiyang Huang, Yingjie Han, Yx L, Runzhi Li, Pengcheng Wu, and Kunli Zhang. CmEAA: Cross-modal enhancement and alignment adapter for radiology report generation. In 10 Proceedings of the 31st International Conference on Com- putational Linguistics, pages 8546–8556. Association for Computational Linguistics, 2025. 3

work page 2025
[41]

Kiut: Knowledge-injected u-transformer for radiology re- port generation

Zhongzhen Huang, Xiaofan Zhang, and Shaoting Zhang. Kiut: Knowledge-injected u-transformer for radiology re- port generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19809–19818, 2023. 4, 5, 6

work page 2023
[42]

Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C

Stephanie L. Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Mercy Ranjit, Anton Schwaighofer, Fernando P ´erez-Garc´ıa, Valentina Salvatelli, Shaury Sri- vastav, Anja Thieme, Noel Codella, Matthew P. Lun- gren, Maria Teodora Wetscherek, Ozan Oktay, and Javier Alvarez-Valle. MAIRA-1: A specialised large multimodal model for radiology report genera...

work page arXiv 2024
[43]

RadGraph: Extracting clinical entities and relations from radiology reports

Saahil Jain, Ashwin Agrawal, Adriel Saporta, Steven Truong, Du Nguyen Duong Nguyen Duong, Tan Bui, Pierre Chambon, Yuhao Zhang, Matthew Lungren, Andrew Ng, Curtis Langlotz, Pranav Rajpurkar, and Pranav Rajpurkar. RadGraph: Extracting clinical entities and relations from radiology reports. InProceedings of the Neural Information Processing Systems Track on...

work page
[44]

Dobre, and Merouane Debbah

Feibo Jiang, Cunhua Pan, Li Dong, Kezhi Wang, Octavia A. Dobre, and Merouane Debbah. From large AI models to agentic AI: A tutorial on future intelligent communications. arXiv preprint arXiv:2505.22311, 2025. 1, 2

work page arXiv 2025
[45]

Advanc- ing medical radiograph representation learning: A hybrid pre-training paradigm with multilevel semantic granularity

Hanqi Jiang, Xixuan Hao, Yuzhou Huang, Chong Ma, Jiaxun Zhang, Yi Pan, and Ruimao Zhang. Advanc- ing medical radiograph representation learning: A hybrid pre-training paradigm with multilevel semantic granularity. InEuropean Conference on Computer Vision Workshops, pages 16–33, 2025. 3

work page 2025
[46]

CoMT: Chain-of-medical-thought reduces hallucination in medical report generation

Yue Jiang, Jiawei Chen, Dingkang Yang, Mingcheng Li, Shunli Wang, Tong Wu, Ke Li, and Lihua Zhang. CoMT: Chain-of-medical-thought reduces hallucination in medical report generation. InIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2025. 3

work page 2025
[47]

Reason like a radiol- ogist: Chain-of-thought and reinforcement learning for ver- ifiable report generation.arXiv preprint arXiv:2504.18453,

Peiyuan Jing, Kinhei Lee, Zhenxuan Zhang, Huichi Zhou, Zhengqing Yuan, Zhifan Gao, Lei Zhu, Giorgos Papanasta- siou, Yingying Fang, and Guang Yang. Reason like a radiol- ogist: Chain-of-thought and reinforcement learning for ver- ifiable report generation.arXiv preprint arXiv:2504.18453,

work page arXiv
[48]

Alistair E. W. Johnson, Tom J. Pollard, Seth J. Berkowitz, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih-ying Deng, Roger G. Mark, and Steven Horng. MIMIC-CXR, a de-identified publicly available database of chest radio- graphs with free-text reports.Scientific Data, 6(1):317,

work page
[49]

CT-GRAPH: Hierarchical graph attention network for anatomy-guided CT report gen- eration.arXiv preprint arXiv:2508.05375, 2025

Hamza Kalisch, Fabian H ¨orst, Jens Kleesiek, Ken Her- rmann, and Constantin Seibold. CT-GRAPH: Hierarchical graph attention network for anatomy-guided CT report gen- eration.arXiv preprint arXiv:2508.05375, 2025. 3

work page arXiv 2025
[50]

MDA- gents: An adaptive collaboration of LLMs for medical decision-making

Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik Siu Chan, Xuhai Xu, Daniel McDuff, Hyeonhoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, and Hae Won Park. MDA- gents: An adaptive collaboration of LLMs for medical decision-making. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 1

work page 2024
[51]

Look & mark: Leveraging radiologist eye fixations and bounding boxes in multimodal large language models for chest X-ray report generation

Yunsoo Kim, Jinge Wu, Su Hwan Kim, Pardeep Vasudev, Jiashu Shen, and Honghan Wu. Look & mark: Leveraging radiologist eye fixations and bounding boxes in multimodal large language models for chest X-ray report generation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 17680–17694. Association for Computa- tional Linguistics, 2025. 3

work page 2025
[52]

From pre-trained language models to agen- tic AI: Evolution and architectures for autonomous intelli- gence.Preprints, 2025

Anis Koubaa. From pre-trained language models to agen- tic AI: Evolution and architectures for autonomous intelli- gence.Preprints, 2025. 1, 2

work page 2025
[53]

Efficient memory management for large language model serving with pagedattention

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626. Association for Computing Ma- chinery, 2023. 5

work page 2023
[54]

Med- R1: Reinforcement learning for generalizable medical reasoning in vision-language models.arXiv preprint arXiv:2503.13939, 2025

Yuxiang Lai, Jike Zhong, Ming Li, Shitian Zhao, Yuheng Li, Konstantinos Psounis, and Xiaofeng Yang. Med- R1: Reinforcement learning for generalizable medical reasoning in vision-language models.arXiv preprint arXiv:2503.13939, 2025. 3

work page arXiv 2025
[55]

CLARIFID: Improving radiology report generation by re- inforcing clinically accurate impressions and enforcing de- tailed findings.arXiv preprint arXiv:2507.17234, 2025

Kyeongkyu Lee, Seonghwan Yoon, and Hongki Lim. CLARIFID: Improving radiology report generation by re- inforcing clinically accurate impressions and enforcing de- tailed findings.arXiv preprint arXiv:2507.17234, 2025. 1, 3, 4, 5

work page arXiv 2025
[56]

CXR-LLaV A: a multimodal large language model for interpreting chest X-ray images.European Radi- ology, 35(7):4374–4386, 2025

Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, and Soon Ho Yoon. CXR-LLaV A: a multimodal large language model for interpreting chest X-ray images.European Radi- ology, 35(7):4374–4386, 2025. 3

work page 2025
[57]

CAMEL: Com- municative agents for ”mind” exploration of large language model society

Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Com- municative agents for ”mind” exploration of large language model society. InThirty-seventh Conference on Neural In- formation Processing Systems, 2023. 1, 2

work page 2023
[58]

Con- trastive learning with counterfactual explanations for radi- ology report generation

Mingjie Li, Haokun Lin, Liang Qiu, Xiaodan Liang, Ling Chen, Abdulmotaleb Elsaddik, and Xiaojun Chang. Con- trastive learning with counterfactual explanations for radi- ology report generation. InEuropean Conference on Com- puter Vision, pages 162–180, 2024. 4, 5, 6

work page 2024
[59]

A survey on llm-based multi-agent systems: workflow, infras- tructure, and challenges.Vicinagearth, 1(1):9, 2024

Xinyi Li, Sai Wang, Siqi Zeng, Yu Wu, and Yi Yang. A survey on llm-based multi-agent systems: workflow, infras- tructure, and challenges.Vicinagearth, 1(1):9, 2024. 1, 2

work page 2024
[60]

ToRL: Scal- ing tool-integrated RL.arXiv preprint arXiv:2503.23383,

Xuefeng Li, Haoyang Zou, and Pengfei Liu. ToRL: Scal- ing tool-integrated RL.arXiv preprint arXiv:2503.23383,

work page arXiv
[61]

Au- tomatic radiology report generation with deep learning: a comprehensive review of methods and advances.Artificial Intelligence Review, 58(11):344, 2025

Yilin Li, Chao Kong, Guosheng Zhao, and Zijian Zhao. Au- tomatic radiology report generation with deep learning: a comprehensive review of methods and advances.Artificial Intelligence Review, 58(11):344, 2025. 2

work page 2025
[62]

S-RRG-Bench: 11 Structured radiology report generation with fine-grained evaluation framework.Meta-Radiology, page 100171,

Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, and Luping Zhou. S-RRG-Bench: 11 Structured radiology report generation with fine-grained evaluation framework.Meta-Radiology, page 100171,

work page
[63]

In-the-flow agentic system optimization for effective planning and tool use.arXiv preprint arXiv:2510.05592,

Zhuofeng Li, Haoxiang Zhang, Seungju Han, Sheng Liu, Jianwen Xie, Yu Zhang, Yejin Choi, James Zou, and Pan Lu. In-the-flow agentic system optimization for effective planning and tool use.arXiv preprint arXiv:2510.05592,

work page arXiv
[64]

Encouraging divergent thinking in large language mod- els through multi-agent debate

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. Encouraging divergent thinking in large language mod- els through multi-agent debate. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17889–17904. Association for Compu- tational Linguistics, 2024. 2

work page 2024
[65]

ROUGE: A package for automatic evalua- tion of summaries

Chin-Yew Lin. ROUGE: A package for automatic evalua- tion of summaries. InText Summarization Branches Out, pages 74–81. Association for Computational Linguistics,

work page
[66]

A foundation model for chest X-ray inter- pretation with grounded reasoning via online reinforcement learning.arXiv preprint arXiv:2509.03906, 2025

Qika Lin, Yifan Zhu, Bin Pu, Ling Huang, Haoran Luo, Jingying Ma, Zhen Peng, Tianzhe Zhao, Fangzhi Xu, Jian Zhang, Kai He, Zhonghong Ou, Swapnil Mishra, and Mengling Feng. A foundation model for chest X-ray inter- pretation with grounded reasoning via online reinforcement learning.arXiv preprint arXiv:2509.03906, 2025. 1, 3, 4, 5

work page arXiv 2025
[67]

Bootstrapping large language models for radiology report generation.AAAI Conference on Artificial Intelligence, 38(17):18635–18643, 2024

Chang Liu, Yuanhe Tian, Weidong Chen, Yan Song, and Yongdong Zhang. Bootstrapping large language models for radiology report generation.AAAI Conference on Artificial Intelligence, 38(17):18635–18643, 2024. 4, 5, 6

work page 2024
[68]

Competence-based multimodal curriculum learning for medical report genera- tion

Fenglin Liu, Shen Ge, and Xian Wu. Competence-based multimodal curriculum learning for medical report genera- tion. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th Inter- national Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3001–3012. Association for Computational ...

work page 2021
[69]

Exploring and distilling posterior and prior knowledge for radiology report generation

Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, and Yuexian Zou. Exploring and distilling posterior and prior knowledge for radiology report generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13748– 13757, 2021. 5, 6

work page 2021
[70]

Clinically accurate chest X-ray report generation

Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDer- mott, Willie Boag, Wei-Hung Weng, Peter Szolovits, and Marzyeh Ghassemi. Clinically accurate chest X-ray report generation. InProceedings of the 4th Machine Learning for Healthcare Conference, pages 249–269. PMLR, 2019. 5, 6

work page 2019
[71]

Structural entities extraction and patient indications incorporation for chest X-ray report generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang, Zhusi Zhong, Zhicheng Jiao, Grayson Baird, Harrison Bai, and Qiguang Miao. Structural entities extraction and patient indications incorporation for chest X-ray report generation. Inproceed- ings of Medical Image Computing and Computer Assisted Intervention. Springer Nature Switzerland, 2024. 4, 5

work page 2024
[72]

Enhanced contrastive learning with multi-view longitudinal data for chest X-ray report generation

Kang Liu, Zhuoqi Ma, Xiaolu Kang, Yunan Li, Kun Xie, Zhicheng Jiao, and Qiguang Miao. Enhanced contrastive learning with multi-view longitudinal data for chest X-ray report generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10348–10359, 2025. 1, 3, 5

work page 2025
[73]

In-context learning for zero- shot medical report generation

Rui Liu, Mingjie Li, Shen Zhao, Ling Chen, Xiaojun Chang, and Lina Yao. In-context learning for zero- shot medical report generation. InProceedings of the 32nd ACM International Conference on Multimedia, pages 8721–8730. Association for Computing Machinery, 2024. 4, 5, 6

work page 2024
[74]

HC-LLM: Historical-constrained large language models for radiology report generation

Tengfei Liu, Jiapu Wang, Yongli Hu, Mingjie Li, Junfei Yi, Xiaojun Chang, Junbin Gao, and Baocai Yin. HC-LLM: Historical-constrained large language models for radiology report generation. InAAAI Conference on Artificial Intelli- gence, pages 5595–5603, 2025. 3

work page 2025
[75]

A generalist medi- cal language model for disease diagnosis assistance.Nature Medicine, 31(3):932–942, 2025

Xiaohong Liu, Hao Liu, Guoxing Yang, Zeyu Jiang, Shuguang Cui, Zhaoze Zhang, Huan Wang, Liyuan Tao, Yongchang Sun, Zhu Song, Tianpei Hong, Jin Yang, Tian- run Gao, Jiangjiang Zhang, Xiaohu Li, Jing Zhang, Ye Sang, Zhao Yang, Kanmin Xue, Song Wu, Ping Zhang, Jian Yang, Chunli Song, and Guangyu Wang. A generalist medi- cal language model for disease diagnos...

work page 2025
[76]

From observation to concept: A flexible multi-view paradigm for medical report generation

Zhizhe Liu, Zhenfeng Zhu, Shuai Zheng, Yawei Zhao, Kun- lun He, and Yao Zhao. From observation to concept: A flexible multi-view paradigm for medical report generation. IEEE Transactions on Multimedia, 26:5987–5995, 2024. 4, 5, 6

work page 2024
[77]

Understanding R1-Zero-like training: A critical perspective

Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. Understanding R1-Zero-like training: A critical perspective. InSecond Conference on Language Modeling, 2025. 2

work page 2025
[78]

Part I: Tricks or traps? a deep dive into RL for LLM reasoning.arXiv preprint arXiv:2508.08221, 2025

Zihe Liu, Jiashun Liu, Yancheng He, Weixun Wang, Jia- heng Liu, Ling Pan, Xinyu Hu, Shaopan Xiong, Ju Huang, Jian Hu, Shengyi Huang, Johan Obando-Ceron, Siran Yang, Jiamang Wang, Wenbo Su, and Bo Zheng. Part I: Tricks or traps? a deep dive into RL for LLM reasoning.arXiv preprint arXiv:2508.08221, 2025. 2

work page arXiv 2025
[79]

CXRAgent: Director- orchestrated multi-stage reasoning for chest X-ray interpre- tation.arXiv preprint arXiv:2510.21324, 2025

Jinhui Lou, Yan Yang, Zhou Yu, Zhenqi Fu, Weidong Han, Qingming Huang, and Jun Yu. CXRAgent: Director- orchestrated multi-stage reasoning for chest X-ray interpre- tation.arXiv preprint arXiv:2510.21324, 2025. 1, 3

work page arXiv 2025
[80]

Eye-gaze guided multi-modal alignment for medical repre- sentation learning

Chong Ma, Hanqi Jiang, Wenting Chen, Yiwei Li, Zihao Wu, Xiaowei Yu, Zhengliang Liu, Lei Guo, Dajiang Zhu, Tuo Zhang, Dinggang Shen, Tianming Liu, and Xiang Li. Eye-gaze guided multi-modal alignment for medical repre- sentation learning. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 3

work page 2024

Showing first 80 references.