ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs

Baixiang Huang; Hongxin Ding; Jinyang Zhang; Junfeng Zhao; Liantao Ma; Weibin Liao; Xinke Jiang; Yasha Wang; Yinghao Zhu; Yue Fang

arxiv: 2508.13514 · v2 · pith:RCL4MOEHnew · submitted 2025-08-19 · 💻 cs.CL · cs.AI

ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs

Hongxin Ding , Baixiang Huang , Yue Fang , Weibin Liao , Xinke Jiang , Jinyang Zhang , Yinghao Zhu , Zheng Li

show 3 more authors

Liantao Ma Junfeng Zhao Yasha Wang

This is my paper

classification 💻 cs.CL cs.AI

keywords promedinformationmedicalgainllmsparadigmshapleybefore

0 comments

read the original abstract

Interactive medical questioning is essential in clinical consultations, where physicians must actively gather necessary patient information. Yet existing medical Large Language Models (LLMs) predominantly follow a reactive paradigm, risking diagnostic errors by answering before seeking sufficient details. To bridge this gap, we propose ProMed, a reinforcement learning framework that transitions LLMs toward a proactive paradigm, enabling them to ask clinically valuable questions before decision-making. Central to ProMed is the Shapley Information Gain (SIG) reward, which quantifies a question's clinical utility as the amount of newly acquired information, while considering its contextual importance via Shapley values. We integrate SIG into a two-stage training pipeline: (1) SIG-Guided Model Initialization uses Monte Carlo Tree Search to construct high-reward interaction trajectories for supervision, and (2) SIG-Augmented Policy Optimization, with a novel SIG-guided Reward Distribution Mechanism that prioritizes informative questions for fine-grained optimization. Experiments on partial-information medical benchmarks show that ProMed significantly outperforms state-of-the-art methods by 6.29% on average and delivers a 54.45% gain over the reactive paradigm, and generalizes robustly to out-of-domain cases. Our codes are available at https://github.com/hxxding/ProMed.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Survey of Reinforcement Learning for Large Reasoning Models
cs.CL 2025-09 accept novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.