Graph-Augmented Topological Internalization with Dual-Stream Classifiers for Medical Report Generation

Chupei Tang; Di Wang; Junxiao Kong; Moyu Tang; Tianchi Lu

arxiv: 2605.02376 · v1 · submitted 2026-05-04 · 💻 cs.CV

Graph-Augmented Topological Internalization with Dual-Stream Classifiers for Medical Report Generation

Moyu Tang , Chupei Tang , Junxiao Kong , Di Wang , Tianchi Lu This is my paper

Pith reviewed 2026-05-08 19:23 UTC · model grok-4.3

classification 💻 cs.CV

keywords medical report generationgraph convolutional networkdual-stream classifiertopological internalizationchest X-rayzero-shot generalizationclinical efficacydisease co-occurrence

0 comments

The pith

A graph convolutional network turns disease co-occurrence patterns into explicit weights that guide more accurate medical report generation from chest images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard approaches to automated medical report generation treat each chest abnormality as an isolated target, which overlooks how diseases naturally co-occur and limits reasoning on subtle or combined lesions. The paper proposes embedding these co-occurrence patterns directly into the model by using a graph convolutional network to create a learnable weight matrix from global priors, then feeding that structure into a dual-stream classifier. One stream produces diagnostic prompts under the topological constraints while the second adjusts decision boundaries for rare cases, and a diagnosis-guided attention layer uses the resulting clinical semantics to focus the visual features. If the approach works, reports would become both clinically more reliable and linguistically fluent without depending on external data retrieval, and the same structure would transfer to new datasets.

Core claim

The Topological Knowledge Internalization module uses a Graph Convolutional Network to convert global disease co-occurrence priors into an explicit parameterized weight matrix that injects topological structure into the classification process. This matrix constrains a main diagnostic branch to generate discrete prompts while an auxiliary branch applies asymmetric optimization to handle class imbalance; a Diagnosis-Guided Spatial Attention mechanism then closes the loop by using those diagnostics to recalibrate the visual encoder and reduce feature hallucinations. Experiments show the resulting GDMRG model reaches competitive clinical efficacy scores on the MIMIC-CXR dataset while preserving

What carries the argument

The Topological Knowledge Internalization module, which employs a Graph Convolutional Network to generate an explicit parameterized weight matrix from disease co-occurrence priors and injects it as topological constraints into the dual-stream classifier.

If this is right

The main branch produces discrete diagnostic prompts that respect the learned topological constraints from disease co-occurrences.
Asymmetric optimization in the auxiliary branch dynamically adjusts decision boundaries for highly imbalanced abnormality classes.
Diagnosis-Guided Spatial Attention uses high-dimensional clinical semantics to recalibrate visual features and reduce hallucinations.
The integrated system maintains natural language fluency while achieving competitive clinical efficacy on MIMIC-CXR.
The same internalized structure supports robust zero-shot generalization to the IU X-Ray dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Because the model avoids external retrieval steps, it could run with lower latency and stronger privacy guarantees in hospital workflows.
The same co-occurrence internalization technique could be tested on other imaging modalities such as CT or MRI where relational disease patterns matter.
Periodic retraining of the GCN priors on updated hospital data might be needed to keep the topology current as disease patterns shift.

Load-bearing premise

Global disease co-occurrence priors can be turned into an explicit parameterized weight matrix via GCN that accurately captures topological structures and improves reasoning on complex lesions without introducing bias or requiring external retrieval.

What would settle it

A controlled test set of complex or rare lesion combinations whose co-occurrence statistics deviate from the training priors, on which the model shows no gain or a drop in clinical metrics such as CheXbert F1 compared with non-graph baselines.

Figures

Figures reproduced from arXiv: 2605.02376 by Chupei Tang, Di Wang, Junxiao Kong, Moyu Tang, Tianchi Lu.

**Figure 1.** Figure 1: The overall architecture of the proposed GDMRG framework. The system consists of five cohesive modules: view at source ↗

**Figure 2.** Figure 2: Detailed architecture of the proposed Topologi view at source ↗

**Figure 4.** Figure 4: Comparison of absolute F1 scores on the ex view at source ↗

**Figure 3.** Figure 3: (a) The 18-dimensional prior co-occurrence view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of visual grounding and text generation. While the baseline model (w/o TKI) exhibits diffuse attention and hallucinates pulmonary edema, the complete GDMRG leverages topological priors to focus its attention on the basal and retrocardiac regions. This spatial alignment helps capture concurrent morbidities (e.g., pleural effusion and atelectasis) and suppresses feature hallucinations… view at source ↗

read the original abstract

Automated medical report generation, MRG, holds substantial value for alleviating radiologist workload and enhancing diagnostic efficiency. However, mainstream approaches typically treat diverse chest abnormalities as isolated classification targets. This paradigm often overlooks inherent disease co-occurrences and struggles to translate medical topological structures into explicit data correlations, constraining the model's reasoning capacity on complex or subtle lesions. To address this, we propose a Graph-Augmented Dual-Stream Medical Report Generation with Topological Internalization, GDMRG. Our framework introduces a Topological Knowledge Internalization module, TKI, which leverages a Graph Convolutional Network, GCN, to generate an explicit parameterized weight matrix based on global disease co-occurrence priors. This facilitates efficient topological knowledge injection without relying on external retrieval mechanisms. Building upon this, we construct a dual-stream classification system: the main branch generates discrete diagnostic prompts under topological constraints, while the auxiliary branch employs an asymmetric optimization strategy to dynamically calibrate decision boundaries for highly imbalanced samples. Concurrently, to establish a logical closed loop between diagnosis and visual grounding, we design a diagnostic-driven Diagnosis-Guided Spatial Attention, DGSA, that utilizes high-dimensional clinical semantics to recalibrate the visual encoder, mitigating feature hallucinations. Comprehensive experiments on the MIMIC-CXR dataset demonstrate that GDMRG achieves competitive clinical efficacy, CE, while maintaining natural language fluency. Furthermore, our model exhibits robust zero-shot generalization on the IU X-Ray dataset. In summary, this work presents an integrated and interpretable paradigm for medical report generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GDMRG adds a GCN-based topological module and dual-stream setup to medical report generation but leaves its performance claims untested in the abstract.

read the letter

The paper's main move is to build a Topological Knowledge Internalization module that turns global disease co-occurrence statistics into a GCN-derived weight matrix, then feeds that into a dual-stream classifier (one branch for prompts under topological constraints, the other for imbalance calibration) plus a Diagnosis-Guided Spatial Attention that uses diagnostic semantics to adjust the visual encoder. This is a direct response to the common problem of treating chest findings as independent labels, and the closed diagnosis-to-visual loop is a clean way to try to reduce hallucinations. The zero-shot transfer claim to IU X-Ray is also a reasonable target for this kind of prior-injection work. Those pieces are new enough as a combination that the subfield might want to see the details. The soft spot is exactly what the stress-test flags: the abstract asserts competitive clinical efficacy and robust generalization on MIMIC-CXR without any numbers, ablations, or checks on whether the GCN actually learns structure beyond the raw co-occurrence matrix. No error analysis or comparison against a non-GCN version of the same priors appears either. If the full paper supplies those controls and shows the learned adjacency adds measurable value, the contribution becomes clearer; right now the central mechanism stays unverified. This is for groups already working on graph priors or imbalance handling in medical vision-language models. It deserves a serious referee to check the experiments rather than a desk reject, because the framing is coherent and the problem it targets is real.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes GDMRG for automated medical report generation. It introduces a Topological Knowledge Internalization (TKI) module that uses a Graph Convolutional Network (GCN) to convert global disease co-occurrence priors into an explicit parameterized weight matrix for topological knowledge injection without external retrieval. This supports a dual-stream classifier (main branch for topology-constrained diagnostic prompts; auxiliary branch with asymmetric optimization for imbalanced samples) and a Diagnosis-Guided Spatial Attention (DGSA) mechanism to link clinical semantics with visual features. The paper claims competitive clinical efficacy (CE) on MIMIC-CXR while preserving language fluency, plus robust zero-shot generalization on IU X-Ray.

Significance. If validated, the framework provides an integrated approach to embedding disease topology directly via GCN-derived weights, potentially improving reasoning on complex lesions and enabling better cross-dataset transfer without retrieval modules. The dual-stream design and DGSA could enhance both diagnostic accuracy and interpretability in MRG. Credit is due for the explicit attempt to close the diagnosis-visual grounding loop and avoid external dependencies, though significance hinges on demonstrating that the GCN step yields non-trivial gains.

major comments (2)

[TKI module] TKI module (method section): The central claim that the GCN produces a parameterized weight matrix capturing topological structures (beyond raw co-occurrence priors) and improves reasoning on complex lesions lacks supporting ablations. No comparison to a direct (non-GCN) use of the same priors, no statistics or visualizations of the learned adjacency, and no isolation of TKI's contribution are described, leaving open whether the internalization step adds value or merely propagates dataset-specific bias that could undermine zero-shot transfer.
[Experimental results] Experimental results (results section): The abstract asserts 'comprehensive experiments' with competitive CE on MIMIC-CXR and robust zero-shot generalization on IU X-Ray, yet no quantitative metrics, baseline tables, ablation details on TKI/DGSA, or error analysis are referenced. This makes it impossible to evaluate the load-bearing claims of competitiveness and generalization; post-hoc selection of 'competitive' cannot be assessed without full results.

minor comments (2)

[Abstract] Abstract: The full expansion of GDMRG is lengthy; a shorter acronym or clearer phrasing would improve readability.
[Method] Notation: The distinction between 'main branch' and 'auxiliary branch' in the dual-stream system could be clarified with explicit equations or pseudocode for the asymmetric optimization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. These observations highlight areas where we can improve clarity and provide stronger empirical support for our claims. We address each major comment point by point below, indicating the specific revisions we will incorporate in the next version.

read point-by-point responses

Referee: [TKI module] TKI module (method section): The central claim that the GCN produces a parameterized weight matrix capturing topological structures (beyond raw co-occurrence priors) and improves reasoning on complex lesions lacks supporting ablations. No comparison to a direct (non-GCN) use of the same priors, no statistics or visualizations of the learned adjacency, and no isolation of TKI's contribution are described, leaving open whether the internalization step adds value or merely propagates dataset-specific bias that could undermine zero-shot transfer.

Authors: We agree that dedicated ablations are required to substantiate the added value of the GCN within TKI. In the revised manuscript we will insert a new ablation subsection (Section 4.3) that directly compares (i) the full TKI module against (ii) a non-GCN baseline that injects the raw co-occurrence matrix as fixed weights. We will also add visualizations of the learned adjacency matrices before and after GCN propagation, together with quantitative metrics such as spectral gap and edge-weight entropy to illustrate the emergence of higher-order topological structure. To address the zero-shot concern, we will report the IU X-Ray zero-shot scores with and without TKI, showing that the learned parameterization improves rather than harms cross-dataset transfer. These additions will isolate TKI's contribution without altering the core method. revision: yes
Referee: [Experimental results] Experimental results (results section): The abstract asserts 'comprehensive experiments' with competitive CE on MIMIC-CXR and robust zero-shot generalization on IU X-Ray, yet no quantitative metrics, baseline tables, ablation details on TKI/DGSA, or error analysis are referenced. This makes it impossible to evaluate the load-bearing claims of competitiveness and generalization; post-hoc selection of 'competitive' cannot be assessed without full results.

Authors: We apologize that the quantitative grounding was not made sufficiently explicit. The full manuscript already contains Section 4 with Table 1 (MIMIC-CXR main results reporting BLEU-4, METEOR, ROUGE-L, CheXpert F1, and RadGraph F1 against prior baselines), Table 2 (zero-shot IU X-Ray results), and Table 3 (ablation on TKI and DGSA). However, to eliminate any ambiguity we will (a) add explicit forward references from the abstract and introduction to these tables, (b) include a new error-analysis subsection with qualitative examples of complex-lesion cases, and (c) report all numerical values inline when claims of competitiveness are made. These changes will allow readers to directly verify the reported performance. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external priors and independent validation

full rationale

The paper's central mechanism (TKI module) takes global disease co-occurrence priors as input and applies a GCN to produce a parameterized weight matrix for topological injection. This step is not self-definitional, as the priors are stated to be external and the GCN output is not equated to the input by construction. No equations or claims in the abstract reduce a prediction to a fitted parameter or rename a known result. No self-citations are invoked as load-bearing uniqueness theorems. The reported experiments on MIMIC-CXR and zero-shot IU X-Ray are presented as empirical outcomes rather than tautological consequences of the modeling choices. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The framework rests on the existence of stable global disease co-occurrence statistics that can be pre-computed and injected via GCN without domain shift; no free parameters or invented entities are quantified in the abstract.

axioms (1)

domain assumption Disease co-occurrence priors form a useful topological structure that improves diagnostic reasoning when internalized via GCN
Invoked in the description of the TKI module as the basis for the parameterized weight matrix.

invented entities (2)

Topological Knowledge Internalization (TKI) module no independent evidence
purpose: Generate explicit parameterized weight matrix from co-occurrence priors using GCN
New module introduced to inject topological knowledge without external retrieval
Diagnosis-Guided Spatial Attention (DGSA) no independent evidence
purpose: Recalibrate visual encoder using high-dimensional clinical semantics to mitigate feature hallucinations
New attention mechanism to close the loop between diagnosis and visual grounding

pith-pipeline@v0.9.0 · 5580 in / 1352 out tokens · 100025 ms · 2026-05-08T19:23:18.061047+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Foundation/AlexanderDuality.lean (genuine topological forcing) alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Topological Knowledge Internalization (TKI) module ... leverages a Graph Convolutional Network (GCN) to generate an explicit parameterized weight matrix based on global disease co-occurrence priors.
Foundation/ArithmeticFromLogic.lean (forced structural counts) n/a — paper's '18' is a dataset annotation choice, not a forced count unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we define N_d = 18 specific concept nodes ... 14 core chest abnormalities and 4 auxiliary anatomical attributes
Cost/FunctionalEquation.lean (J-cost uniqueness) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Truncated Asymmetric Loss (T-ASL) ... bidirectional gradient truncation mechanism

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

[1]

On the automatic gen- eration of medical imaging reports,

B. Jing, P. Xie, and E. Xing, “On the automatic gen- eration of medical imaging reports,” inProceedings of ACL, 2018, pp. 2577–2586

work page 2018
[2]

Gen- erating radiology reports via memory-driven trans- former,

Z. Chen, Y. Song, T. H. Chang, and X. Wan, “Gen- erating radiology reports via memory-driven trans- former,” inProceedings of EMNLP, 2020, pp. 1439– 1449

work page 2020
[3]

Error and discrepancy in radiology: In- evitable or avoidable?

A. Brady, “Error and discrepancy in radiology: In- evitable or avoidable?”Insights into Imaging, vol. 8, no. 1, pp. 171–182, 2017

work page 2017
[4]

C2FNAS: Coarse-to-fine neural architecture search for 3D medical image segmen- tation,

Q. Yu, D. Yang, H. Roth, Y. Bai, Y. Zhang, A. L. Yuille, and D. Xu, “C2FNAS: Coarse-to-fine neural architecture search for 3D medical image segmen- tation,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4126–4135

work page 2020
[5]

Clinically accurate chest X-ray re- port generation,

G. Liu et al., “Clinically accurate chest X-ray re- port generation,” inProceedings of MLHC, 2019, pp. 249–269

work page 2019
[6]

Label co-occurrence learning with graph convolutional networks for multi-label chest X-ray image classification,

H. Chen, S. Miao, D. Xu, G. D. Hager, and A. P. Harrison, “Label co-occurrence learning with graph convolutional networks for multi-label chest X-ray image classification,”IEEE J. Biomed. Health Inform., vol. 24, no. 8, pp. 2292–2302, 2020

work page 2020
[7]

GloVe: Global vectors for word representation,

J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Proceedings of EMNLP, 2014, pp. 1532–1543

work page 2014
[8]

Learning to prompt for vision- language models,

K. Zhou et al., “Learning to prompt for vision- language models,”International Journal of Com- puter Vision, vol. 130, no. 9, pp. 2337–2348, 2022

work page 2022
[9]

Improving factual completeness and consis- tency of image-to-text radiology report generation,

Y. Miura, Y. Zhang, E. Tsai, C. Langlotz, and D. Ju- rafsky, “Improving factual completeness and consis- tency of image-to-text radiology report generation,” inProceedings of NAACL, 2021, pp. 5288–5304

work page 2021
[10]

Class-balanced loss based on effective number of samples,

Y. Cui, M. Jia, T. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” inProceedings of CVPR, 2019, pp. 9268– 9277

work page 2019
[11]

Asym- metric loss for multi-label classification,

T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, and L. Zelnik-Manor, “Asym- metric loss for multi-label classification,” inProceed- ings of ICCV, 2021, pp. 82–91. 10

work page 2021
[12]

TieNet: Text-image embedding net- work for common thorax disease classification and reporting in chest X-rays,

X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “TieNet: Text-image embedding net- work for common thorax disease classification and reporting in chest X-rays,” inProceedings of CVPR, 2018, pp. 9049–9058

work page 2018
[13]

Multimodal recurrent model with at- tention for automated radiology report generation,

Y. Xue et al., “Multimodal recurrent model with at- tention for automated radiology report generation,” inProceedings of MICCAI, 2018, pp. 457–466

work page 2018
[14]

Hybrid retrieval-generation reinforced agent for medical im- age report generation,

Y. Li, X. Liang, Z. Hu, and E. P. Xing, “Hybrid retrieval-generation reinforced agent for medical im- age report generation,” inAdvances in Neural Infor- mation Processing Systems, 2018

work page 2018
[15]

High-performance medicine: The con- vergence of human and artificial intelligence,

E. J. Topol, “High-performance medicine: The con- vergence of human and artificial intelligence,”Na- ture Medicine, vol. 25, no. 1, pp. 44–56, 2019

work page 2019
[16]

PromptMRG: Diagnosis-driven prompts for med- ical report generation,

H. Jin, H. Che, Y. Lin, H. Chen et al., “PromptMRG: Diagnosis-driven prompts for med- ical report generation,” inProceedings of AAAI, 2024, pp. 15432–15440

work page 2024
[17]

Scaling up visual and vision-language representation learning with noisy text supervision,

C. Jia et al., “Scaling up visual and vision-language representation learning with noisy text supervision,” inProceedings of ICML, 2021, pp. 4904–4916

work page 2021
[18]

Pre-train, prompt, and predict: A systematic survey of prompting methods in natu- ral language processing,

P. Liu et al., “Pre-train, prompt, and predict: A systematic survey of prompting methods in natu- ral language processing,”ACM Computing Surveys, vol. 55, no. 9, pp. 1–35, 2023

work page 2023
[19]

Cross- modal memory networks for radiology report gener- ation,

Z. Chen, Y. Shen, Y. Song, and X. Wan, “Cross- modal memory networks for radiology report gener- ation,” inProceedings of ACL, 2021, pp. 5904–5914

work page 2021
[20]

RadGraph: Extracting clinical en- tities and relations from radiology reports,

S. Jain et al., “RadGraph: Extracting clinical en- tities and relations from radiology reports,” inAd- vances in Neural Information Processing Systems, 2021

work page 2021
[21]

Semi-supervised classi- fication with graph convolutional networks,

T. N. Kipf and M. Welling, “Semi-supervised classi- fication with graph convolutional networks,” inPro- ceedings of ICLR, 2017

work page 2017
[22]

Graph attention networks,

P. Veliˇ ckovi´ c et al., “Graph attention networks,” in Proceedings of ICLR, 2018

work page 2018
[23]

Ratchet: Medical transformer for chest X-ray diagnosis and reporting,

B. Hou et al., “Ratchet: Medical transformer for chest X-ray diagnosis and reporting,” inProceedings of MICCAI, 2021, pp. 293–303

work page 2021
[24]

When radiology report generation meets knowledge graph,

Y. Zhang et al., “When radiology report generation meets knowledge graph,” inProceedings of AAAI, 2020, pp. 12910–12917

work page 2020
[25]

Retrieval-based chest X-ray report generation using a pre-trained contrastive language- image model,

M. Endo et al., “Retrieval-based chest X-ray report generation using a pre-trained contrastive language- image model,” inProceedings of MLHC, 2021, pp. 209–230

work page 2021
[26]

Knowledge-enhanced visual- language pre-training on chest radiology images,

X. Zhang, C. Wu, Z. Zhao, W. Lin, Y. Zhang, Y. Wang, and W. Xie, “Knowledge-enhanced visual- language pre-training on chest radiology images,” Nature Communications, vol. 14, no. 1, p. 4542, 2023

work page 2023
[27]

Learning imbalanced datasets with label- distribution-aware margin loss,

K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learning imbalanced datasets with label- distribution-aware margin loss,” inAdvances in Neural Information Processing Systems, 2019

work page 2019
[28]

SMOTE: Synthetic minority over- sampling technique,

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over- sampling technique,”Journal of Artificial Intelli- gence Research, vol. 16, pp. 321–357, 2002

work page 2002
[29]

Focal loss for dense object detection,

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ ar, “Focal loss for dense object detection,” in Proceedings of ICCV, 2017, pp. 2980–2988

work page 2017
[30]

Decoupling represen- tation and classifier for long-tailed recognition,

B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis, “Decoupling represen- tation and classifier for long-tailed recognition,” in Proceedings of ICLR, 2020

work page 2020
[31]

Long-tail learning via logit adjustment,

A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar, “Long-tail learning via logit adjustment,” inProceedings of ICLR, 2021

work page 2021
[32]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inPro- ceedings of ICML, 2017, pp. 1321–1330

work page 2017
[33]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of CVPR, 2016, pp. 770–778

work page 2016
[34]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” inProceedings of CVPR, 2018, pp. 7132– 7141

work page 2018
[35]

Attention is all you need,

A. Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017

work page 2017
[36]

BERT: Pre-training of deep bidirectional trans- formers for language understanding,

J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional trans- formers for language understanding,” inProceedings of NAACL, 2019, pp. 4171–4186

work page 2019
[37]

Decoupled weight de- cay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight de- cay regularization,” inProceedings of ICLR, 2019

work page 2019
[38]

MIMIC-CXR, a de- identified publicly available database of chest radio- graphs with free-text reports,

A. E. W. Johnson et al., “MIMIC-CXR, a de- identified publicly available database of chest radio- graphs with free-text reports,”Scientific Data, vol. 6, no. 1, p. 317, 2019

work page 2019
[39]

Preparing a collection of radiology examinations for distribution and re- trieval,

D. Demner-Fushman et al., “Preparing a collection of radiology examinations for distribution and re- trieval,”J. Am. Med. Inform. Assoc., vol. 23, no. 2, pp. 304–310, 2016

work page 2016
[40]

CheXpert: A large chest radiograph dataset with uncertainty labels and expert compar- ison,

J. Irvin et al., “CheXpert: A large chest radiograph dataset with uncertainty labels and expert compar- ison,” inProceedings of AAAI, 2019, pp. 590–597

work page 2019
[41]

BLEU: A method for automatic evaluation of ma- chine translation,

K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, “BLEU: A method for automatic evaluation of ma- chine translation,” inProceedings of ACL, 2002, pp. 311–318. 11

work page 2002
[42]

METEOR: An auto- matic metric for MT evaluation with improved cor- relation with human judgments,

S. Banerjee and A. Lavie, “METEOR: An auto- matic metric for MT evaluation with improved cor- relation with human judgments,” inProceedings of ACL, 2005, pp. 65–72

work page 2005
[43]

ROUGE: A package for automatic eval- uation of summaries,

C.-Y. Lin, “ROUGE: A package for automatic eval- uation of summaries,” inProceedings of ACL, 2004

work page 2004
[44]

Progressive transformer-based generation of radiology reports,

F. Nooralahzadeh, N. P. Gonzalez, T. Frauenfelder, K. Fujimoto, and M. Krauthammer, “Progressive transformer-based generation of radiology reports,” inFindings of the Association for Computational Linguistics: EMNLP, 2021, pp. 2824–2832

work page 2021
[45]

Contrastive attention for automatic chest X-ray report generation,

F. Liu, C. Yin, X. Wu, S. Ge, P. Zhang, and X. Sun, “Contrastive attention for automatic chest X-ray report generation,” inFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 269–280

work page 2021
[46]

Im- proving chest X-ray report generation by leverag- ing warm-starting,

A. Nicolson, J. Dowling, and B. Koopman, “Im- proving chest X-ray report generation by leverag- ing warm-starting,”Artif. Intell. Med., vol. 144, p. 102633, 2023

work page 2023
[47]

Radiology report generation with a learned knowledge base and multi-modal alignment,

S. Yang, X. Wu, S. Ge, Z. Zheng, S. K. Zhou, and L. Xiao, “Radiology report generation with a learned knowledge base and multi-modal alignment,”Medi- cal Image Anal., vol. 86, p. 102798, 2023

work page 2023
[48]

ME Trans- former: Radiology report generation by transformer with multiple learnable expert tokens,

Z. Wang, L. Liu, L. Wang, and L. Zhou, “ME Trans- former: Radiology report generation by transformer with multiple learnable expert tokens,” inProceed- ings of CVPR, 2023, pp. 11558–11567

work page 2023
[49]

Dynamic graph enhanced contrastive learn- ing for chest X-ray report generation,

M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, and X. Chang, “Dynamic graph enhanced contrastive learn- ing for chest X-ray report generation,” inProceed- ings of CVPR, 2023, pp. 3334–3343

work page 2023
[50]

Interactive and explainable region-guided radiology report generation,

T. Tanida, P. M¨ uller, G. Kaissis, and D. Rueckert, “Interactive and explainable region-guided radiology report generation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2023, pp. 7433–7442

work page 2023
[51]

CAMANet: Class activation map guided attention network for radiology report generation,

J. Wang, A. Bhalerao, T. Yin, S. See, and Y. He, “CAMANet: Class activation map guided attention network for radiology report generation,”IEEE J. Biomed. Health Inform., vol. 28, no. 4, pp. 2199– 2210, 2024

work page 2024
[52]

Topicwise separable sentence retrieval for medical report generation,

J. Zhao, Y. Zhou, Z. Chen, H. Fu, and L. Wan, “Topicwise separable sentence retrieval for medical report generation,”IEEE Trans. Med. Imaging, vol. 44, no. 3, pp. 1505–1514, 2025

work page 2025
[53]

Cross-modal causal representa- tion learning for radiology report generation,

W. Chen et al., “Cross-modal causal representa- tion learning for radiology report generation,”IEEE Trans. Image Process., vol. 34, pp. 2970–2985, 2025. 12 Supplementary Material for “Graph-Augmented Topological Internalization with Dual-Stream Classifiers for Medical Report Generation” S-I Supplementary Results on Sample-level Clinical Efficacy We report...

work page arXiv 2025

[1] [1]

On the automatic gen- eration of medical imaging reports,

B. Jing, P. Xie, and E. Xing, “On the automatic gen- eration of medical imaging reports,” inProceedings of ACL, 2018, pp. 2577–2586

work page 2018

[2] [2]

Gen- erating radiology reports via memory-driven trans- former,

Z. Chen, Y. Song, T. H. Chang, and X. Wan, “Gen- erating radiology reports via memory-driven trans- former,” inProceedings of EMNLP, 2020, pp. 1439– 1449

work page 2020

[3] [3]

Error and discrepancy in radiology: In- evitable or avoidable?

A. Brady, “Error and discrepancy in radiology: In- evitable or avoidable?”Insights into Imaging, vol. 8, no. 1, pp. 171–182, 2017

work page 2017

[4] [4]

C2FNAS: Coarse-to-fine neural architecture search for 3D medical image segmen- tation,

Q. Yu, D. Yang, H. Roth, Y. Bai, Y. Zhang, A. L. Yuille, and D. Xu, “C2FNAS: Coarse-to-fine neural architecture search for 3D medical image segmen- tation,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4126–4135

work page 2020

[5] [5]

Clinically accurate chest X-ray re- port generation,

G. Liu et al., “Clinically accurate chest X-ray re- port generation,” inProceedings of MLHC, 2019, pp. 249–269

work page 2019

[6] [6]

Label co-occurrence learning with graph convolutional networks for multi-label chest X-ray image classification,

H. Chen, S. Miao, D. Xu, G. D. Hager, and A. P. Harrison, “Label co-occurrence learning with graph convolutional networks for multi-label chest X-ray image classification,”IEEE J. Biomed. Health Inform., vol. 24, no. 8, pp. 2292–2302, 2020

work page 2020

[7] [7]

GloVe: Global vectors for word representation,

J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Proceedings of EMNLP, 2014, pp. 1532–1543

work page 2014

[8] [8]

Learning to prompt for vision- language models,

K. Zhou et al., “Learning to prompt for vision- language models,”International Journal of Com- puter Vision, vol. 130, no. 9, pp. 2337–2348, 2022

work page 2022

[9] [9]

Improving factual completeness and consis- tency of image-to-text radiology report generation,

Y. Miura, Y. Zhang, E. Tsai, C. Langlotz, and D. Ju- rafsky, “Improving factual completeness and consis- tency of image-to-text radiology report generation,” inProceedings of NAACL, 2021, pp. 5288–5304

work page 2021

[10] [10]

Class-balanced loss based on effective number of samples,

Y. Cui, M. Jia, T. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” inProceedings of CVPR, 2019, pp. 9268– 9277

work page 2019

[11] [11]

Asym- metric loss for multi-label classification,

T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, and L. Zelnik-Manor, “Asym- metric loss for multi-label classification,” inProceed- ings of ICCV, 2021, pp. 82–91. 10

work page 2021

[12] [12]

TieNet: Text-image embedding net- work for common thorax disease classification and reporting in chest X-rays,

X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “TieNet: Text-image embedding net- work for common thorax disease classification and reporting in chest X-rays,” inProceedings of CVPR, 2018, pp. 9049–9058

work page 2018

[13] [13]

Multimodal recurrent model with at- tention for automated radiology report generation,

Y. Xue et al., “Multimodal recurrent model with at- tention for automated radiology report generation,” inProceedings of MICCAI, 2018, pp. 457–466

work page 2018

[14] [14]

Hybrid retrieval-generation reinforced agent for medical im- age report generation,

Y. Li, X. Liang, Z. Hu, and E. P. Xing, “Hybrid retrieval-generation reinforced agent for medical im- age report generation,” inAdvances in Neural Infor- mation Processing Systems, 2018

work page 2018

[15] [15]

High-performance medicine: The con- vergence of human and artificial intelligence,

E. J. Topol, “High-performance medicine: The con- vergence of human and artificial intelligence,”Na- ture Medicine, vol. 25, no. 1, pp. 44–56, 2019

work page 2019

[16] [16]

PromptMRG: Diagnosis-driven prompts for med- ical report generation,

H. Jin, H. Che, Y. Lin, H. Chen et al., “PromptMRG: Diagnosis-driven prompts for med- ical report generation,” inProceedings of AAAI, 2024, pp. 15432–15440

work page 2024

[17] [17]

Scaling up visual and vision-language representation learning with noisy text supervision,

C. Jia et al., “Scaling up visual and vision-language representation learning with noisy text supervision,” inProceedings of ICML, 2021, pp. 4904–4916

work page 2021

[18] [18]

Pre-train, prompt, and predict: A systematic survey of prompting methods in natu- ral language processing,

P. Liu et al., “Pre-train, prompt, and predict: A systematic survey of prompting methods in natu- ral language processing,”ACM Computing Surveys, vol. 55, no. 9, pp. 1–35, 2023

work page 2023

[19] [19]

Cross- modal memory networks for radiology report gener- ation,

Z. Chen, Y. Shen, Y. Song, and X. Wan, “Cross- modal memory networks for radiology report gener- ation,” inProceedings of ACL, 2021, pp. 5904–5914

work page 2021

[20] [20]

RadGraph: Extracting clinical en- tities and relations from radiology reports,

S. Jain et al., “RadGraph: Extracting clinical en- tities and relations from radiology reports,” inAd- vances in Neural Information Processing Systems, 2021

work page 2021

[21] [21]

Semi-supervised classi- fication with graph convolutional networks,

T. N. Kipf and M. Welling, “Semi-supervised classi- fication with graph convolutional networks,” inPro- ceedings of ICLR, 2017

work page 2017

[22] [22]

Graph attention networks,

P. Veliˇ ckovi´ c et al., “Graph attention networks,” in Proceedings of ICLR, 2018

work page 2018

[23] [23]

Ratchet: Medical transformer for chest X-ray diagnosis and reporting,

B. Hou et al., “Ratchet: Medical transformer for chest X-ray diagnosis and reporting,” inProceedings of MICCAI, 2021, pp. 293–303

work page 2021

[24] [24]

When radiology report generation meets knowledge graph,

Y. Zhang et al., “When radiology report generation meets knowledge graph,” inProceedings of AAAI, 2020, pp. 12910–12917

work page 2020

[25] [25]

Retrieval-based chest X-ray report generation using a pre-trained contrastive language- image model,

M. Endo et al., “Retrieval-based chest X-ray report generation using a pre-trained contrastive language- image model,” inProceedings of MLHC, 2021, pp. 209–230

work page 2021

[26] [26]

Knowledge-enhanced visual- language pre-training on chest radiology images,

X. Zhang, C. Wu, Z. Zhao, W. Lin, Y. Zhang, Y. Wang, and W. Xie, “Knowledge-enhanced visual- language pre-training on chest radiology images,” Nature Communications, vol. 14, no. 1, p. 4542, 2023

work page 2023

[27] [27]

Learning imbalanced datasets with label- distribution-aware margin loss,

K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learning imbalanced datasets with label- distribution-aware margin loss,” inAdvances in Neural Information Processing Systems, 2019

work page 2019

[28] [28]

SMOTE: Synthetic minority over- sampling technique,

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over- sampling technique,”Journal of Artificial Intelli- gence Research, vol. 16, pp. 321–357, 2002

work page 2002

[29] [29]

Focal loss for dense object detection,

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ ar, “Focal loss for dense object detection,” in Proceedings of ICCV, 2017, pp. 2980–2988

work page 2017

[30] [30]

Decoupling represen- tation and classifier for long-tailed recognition,

B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis, “Decoupling represen- tation and classifier for long-tailed recognition,” in Proceedings of ICLR, 2020

work page 2020

[31] [31]

Long-tail learning via logit adjustment,

A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar, “Long-tail learning via logit adjustment,” inProceedings of ICLR, 2021

work page 2021

[32] [32]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inPro- ceedings of ICML, 2017, pp. 1321–1330

work page 2017

[33] [33]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of CVPR, 2016, pp. 770–778

work page 2016

[34] [34]

Squeeze-and-excitation networks,

J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” inProceedings of CVPR, 2018, pp. 7132– 7141

work page 2018

[35] [35]

Attention is all you need,

A. Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017

work page 2017

[36] [36]

BERT: Pre-training of deep bidirectional trans- formers for language understanding,

J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional trans- formers for language understanding,” inProceedings of NAACL, 2019, pp. 4171–4186

work page 2019

[37] [37]

Decoupled weight de- cay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight de- cay regularization,” inProceedings of ICLR, 2019

work page 2019

[38] [38]

MIMIC-CXR, a de- identified publicly available database of chest radio- graphs with free-text reports,

A. E. W. Johnson et al., “MIMIC-CXR, a de- identified publicly available database of chest radio- graphs with free-text reports,”Scientific Data, vol. 6, no. 1, p. 317, 2019

work page 2019

[39] [39]

Preparing a collection of radiology examinations for distribution and re- trieval,

D. Demner-Fushman et al., “Preparing a collection of radiology examinations for distribution and re- trieval,”J. Am. Med. Inform. Assoc., vol. 23, no. 2, pp. 304–310, 2016

work page 2016

[40] [40]

CheXpert: A large chest radiograph dataset with uncertainty labels and expert compar- ison,

J. Irvin et al., “CheXpert: A large chest radiograph dataset with uncertainty labels and expert compar- ison,” inProceedings of AAAI, 2019, pp. 590–597

work page 2019

[41] [41]

BLEU: A method for automatic evaluation of ma- chine translation,

K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, “BLEU: A method for automatic evaluation of ma- chine translation,” inProceedings of ACL, 2002, pp. 311–318. 11

work page 2002

[42] [42]

METEOR: An auto- matic metric for MT evaluation with improved cor- relation with human judgments,

S. Banerjee and A. Lavie, “METEOR: An auto- matic metric for MT evaluation with improved cor- relation with human judgments,” inProceedings of ACL, 2005, pp. 65–72

work page 2005

[43] [43]

ROUGE: A package for automatic eval- uation of summaries,

C.-Y. Lin, “ROUGE: A package for automatic eval- uation of summaries,” inProceedings of ACL, 2004

work page 2004

[44] [44]

Progressive transformer-based generation of radiology reports,

F. Nooralahzadeh, N. P. Gonzalez, T. Frauenfelder, K. Fujimoto, and M. Krauthammer, “Progressive transformer-based generation of radiology reports,” inFindings of the Association for Computational Linguistics: EMNLP, 2021, pp. 2824–2832

work page 2021

[45] [45]

Contrastive attention for automatic chest X-ray report generation,

F. Liu, C. Yin, X. Wu, S. Ge, P. Zhang, and X. Sun, “Contrastive attention for automatic chest X-ray report generation,” inFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 269–280

work page 2021

[46] [46]

Im- proving chest X-ray report generation by leverag- ing warm-starting,

A. Nicolson, J. Dowling, and B. Koopman, “Im- proving chest X-ray report generation by leverag- ing warm-starting,”Artif. Intell. Med., vol. 144, p. 102633, 2023

work page 2023

[47] [47]

Radiology report generation with a learned knowledge base and multi-modal alignment,

S. Yang, X. Wu, S. Ge, Z. Zheng, S. K. Zhou, and L. Xiao, “Radiology report generation with a learned knowledge base and multi-modal alignment,”Medi- cal Image Anal., vol. 86, p. 102798, 2023

work page 2023

[48] [48]

ME Trans- former: Radiology report generation by transformer with multiple learnable expert tokens,

Z. Wang, L. Liu, L. Wang, and L. Zhou, “ME Trans- former: Radiology report generation by transformer with multiple learnable expert tokens,” inProceed- ings of CVPR, 2023, pp. 11558–11567

work page 2023

[49] [49]

Dynamic graph enhanced contrastive learn- ing for chest X-ray report generation,

M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, and X. Chang, “Dynamic graph enhanced contrastive learn- ing for chest X-ray report generation,” inProceed- ings of CVPR, 2023, pp. 3334–3343

work page 2023

[50] [50]

Interactive and explainable region-guided radiology report generation,

T. Tanida, P. M¨ uller, G. Kaissis, and D. Rueckert, “Interactive and explainable region-guided radiology report generation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2023, pp. 7433–7442

work page 2023

[51] [51]

CAMANet: Class activation map guided attention network for radiology report generation,

J. Wang, A. Bhalerao, T. Yin, S. See, and Y. He, “CAMANet: Class activation map guided attention network for radiology report generation,”IEEE J. Biomed. Health Inform., vol. 28, no. 4, pp. 2199– 2210, 2024

work page 2024

[52] [52]

Topicwise separable sentence retrieval for medical report generation,

J. Zhao, Y. Zhou, Z. Chen, H. Fu, and L. Wan, “Topicwise separable sentence retrieval for medical report generation,”IEEE Trans. Med. Imaging, vol. 44, no. 3, pp. 1505–1514, 2025

work page 2025

[53] [53]

Cross-modal causal representa- tion learning for radiology report generation,

W. Chen et al., “Cross-modal causal representa- tion learning for radiology report generation,”IEEE Trans. Image Process., vol. 34, pp. 2970–2985, 2025. 12 Supplementary Material for “Graph-Augmented Topological Internalization with Dual-Stream Classifiers for Medical Report Generation” S-I Supplementary Results on Sample-level Clinical Efficacy We report...

work page arXiv 2025