DUCX: Decomposing Unfairness in Tool-Using Chest X-ray Agents

Ruinan Jin; Xiaoxiao Li; Zikang Xu

read the original abstract

Fairness in medical agents is becoming critical as tool-using clinical AI systems orchestrate specialized vision and language modules for tasks such as chest X-ray question answering. While these medical AI agents can improve flexibility, their added pipeline complexity also creates new pathways for demographic bias beyond standalone models. We present DUCK, Decomposing Unfairness in Chest X-ray agents, a systematic audit of fairness in tool-using chest X-ray agents instantiated with MedRAX. To localize where disparities arise, we introduce a stage-wise fairness decomposition that separates end-to-end bias from three agent-specific sources: tool exposure bias, or utility gaps conditioned on tool presence; tool transition bias, or subgroup differences in tool-routing patterns; and model reasoning bias, or subgroup differences in synthesis behaviors. Extensive experiments on tool-using agentic frameworks across five driver backbones reveal that demographic gaps persist in end-to-end performance, with equalized odds up to 20.79% and the lowest fairness-utility tradeoff down to 28.65%. Intermediate behaviors, including tool usage, transition patterns, and reasoning traces, exhibit distinct subgroup disparities that are not predictable from end-to-end evaluation alone. For example, conditioned on segmentation-tool availability, the subgroup utility gap reaches as high as 50%. Our findings underscore the need for process-level fairness auditing and debiasing to ensure the equitable deployment of clinical agentic systems. Code: https://github.com/Nanboy-Ronan/DUCK.

DUCX: Decomposing Unfairness in Tool-Using Chest X-ray Agents

discussion (0)