pith. sign in

arxiv: 2605.02376 · v1 · submitted 2026-05-04 · 💻 cs.CV

Graph-Augmented Topological Internalization with Dual-Stream Classifiers for Medical Report Generation

Pith reviewed 2026-05-08 19:23 UTC · model grok-4.3

classification 💻 cs.CV
keywords medical report generationgraph convolutional networkdual-stream classifiertopological internalizationchest X-rayzero-shot generalizationclinical efficacydisease co-occurrence
0
0 comments X

The pith

A graph convolutional network turns disease co-occurrence patterns into explicit weights that guide more accurate medical report generation from chest images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard approaches to automated medical report generation treat each chest abnormality as an isolated target, which overlooks how diseases naturally co-occur and limits reasoning on subtle or combined lesions. The paper proposes embedding these co-occurrence patterns directly into the model by using a graph convolutional network to create a learnable weight matrix from global priors, then feeding that structure into a dual-stream classifier. One stream produces diagnostic prompts under the topological constraints while the second adjusts decision boundaries for rare cases, and a diagnosis-guided attention layer uses the resulting clinical semantics to focus the visual features. If the approach works, reports would become both clinically more reliable and linguistically fluent without depending on external data retrieval, and the same structure would transfer to new datasets.

Core claim

The Topological Knowledge Internalization module uses a Graph Convolutional Network to convert global disease co-occurrence priors into an explicit parameterized weight matrix that injects topological structure into the classification process. This matrix constrains a main diagnostic branch to generate discrete prompts while an auxiliary branch applies asymmetric optimization to handle class imbalance; a Diagnosis-Guided Spatial Attention mechanism then closes the loop by using those diagnostics to recalibrate the visual encoder and reduce feature hallucinations. Experiments show the resulting GDMRG model reaches competitive clinical efficacy scores on the MIMIC-CXR dataset while preserving

What carries the argument

The Topological Knowledge Internalization module, which employs a Graph Convolutional Network to generate an explicit parameterized weight matrix from disease co-occurrence priors and injects it as topological constraints into the dual-stream classifier.

If this is right

  • The main branch produces discrete diagnostic prompts that respect the learned topological constraints from disease co-occurrences.
  • Asymmetric optimization in the auxiliary branch dynamically adjusts decision boundaries for highly imbalanced abnormality classes.
  • Diagnosis-Guided Spatial Attention uses high-dimensional clinical semantics to recalibrate visual features and reduce hallucinations.
  • The integrated system maintains natural language fluency while achieving competitive clinical efficacy on MIMIC-CXR.
  • The same internalized structure supports robust zero-shot generalization to the IU X-Ray dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Because the model avoids external retrieval steps, it could run with lower latency and stronger privacy guarantees in hospital workflows.
  • The same co-occurrence internalization technique could be tested on other imaging modalities such as CT or MRI where relational disease patterns matter.
  • Periodic retraining of the GCN priors on updated hospital data might be needed to keep the topology current as disease patterns shift.

Load-bearing premise

Global disease co-occurrence priors can be turned into an explicit parameterized weight matrix via GCN that accurately captures topological structures and improves reasoning on complex lesions without introducing bias or requiring external retrieval.

What would settle it

A controlled test set of complex or rare lesion combinations whose co-occurrence statistics deviate from the training priors, on which the model shows no gain or a drop in clinical metrics such as CheXbert F1 compared with non-graph baselines.

Figures

Figures reproduced from arXiv: 2605.02376 by Chupei Tang, Di Wang, Junxiao Kong, Moyu Tang, Tianchi Lu.

Figure 1
Figure 1. Figure 1: The overall architecture of the proposed GDMRG framework. The system consists of five cohesive modules: view at source ↗
Figure 2
Figure 2. Figure 2: Detailed architecture of the proposed Topologi view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of absolute F1 scores on the ex view at source ↗
Figure 3
Figure 3. Figure 3: (a) The 18-dimensional prior co-occurrence view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of visual grounding and text generation. While the baseline model (w/o TKI) exhibits dif￾fuse attention and hallucinates pulmonary edema, the complete GDMRG leverages topological priors to focus its attention on the basal and retrocardiac regions. This spatial alignment helps capture concurrent morbidities (e.g., pleural effusion and atelectasis) and suppresses feature hallucinations… view at source ↗
read the original abstract

Automated medical report generation, MRG, holds substantial value for alleviating radiologist workload and enhancing diagnostic efficiency. However, mainstream approaches typically treat diverse chest abnormalities as isolated classification targets. This paradigm often overlooks inherent disease co-occurrences and struggles to translate medical topological structures into explicit data correlations, constraining the model's reasoning capacity on complex or subtle lesions. To address this, we propose a Graph-Augmented Dual-Stream Medical Report Generation with Topological Internalization, GDMRG. Our framework introduces a Topological Knowledge Internalization module, TKI, which leverages a Graph Convolutional Network, GCN, to generate an explicit parameterized weight matrix based on global disease co-occurrence priors. This facilitates efficient topological knowledge injection without relying on external retrieval mechanisms. Building upon this, we construct a dual-stream classification system: the main branch generates discrete diagnostic prompts under topological constraints, while the auxiliary branch employs an asymmetric optimization strategy to dynamically calibrate decision boundaries for highly imbalanced samples. Concurrently, to establish a logical closed loop between diagnosis and visual grounding, we design a diagnostic-driven Diagnosis-Guided Spatial Attention, DGSA, that utilizes high-dimensional clinical semantics to recalibrate the visual encoder, mitigating feature hallucinations. Comprehensive experiments on the MIMIC-CXR dataset demonstrate that GDMRG achieves competitive clinical efficacy, CE, while maintaining natural language fluency. Furthermore, our model exhibits robust zero-shot generalization on the IU X-Ray dataset. In summary, this work presents an integrated and interpretable paradigm for medical report generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes GDMRG for automated medical report generation. It introduces a Topological Knowledge Internalization (TKI) module that uses a Graph Convolutional Network (GCN) to convert global disease co-occurrence priors into an explicit parameterized weight matrix for topological knowledge injection without external retrieval. This supports a dual-stream classifier (main branch for topology-constrained diagnostic prompts; auxiliary branch with asymmetric optimization for imbalanced samples) and a Diagnosis-Guided Spatial Attention (DGSA) mechanism to link clinical semantics with visual features. The paper claims competitive clinical efficacy (CE) on MIMIC-CXR while preserving language fluency, plus robust zero-shot generalization on IU X-Ray.

Significance. If validated, the framework provides an integrated approach to embedding disease topology directly via GCN-derived weights, potentially improving reasoning on complex lesions and enabling better cross-dataset transfer without retrieval modules. The dual-stream design and DGSA could enhance both diagnostic accuracy and interpretability in MRG. Credit is due for the explicit attempt to close the diagnosis-visual grounding loop and avoid external dependencies, though significance hinges on demonstrating that the GCN step yields non-trivial gains.

major comments (2)
  1. [TKI module] TKI module (method section): The central claim that the GCN produces a parameterized weight matrix capturing topological structures (beyond raw co-occurrence priors) and improves reasoning on complex lesions lacks supporting ablations. No comparison to a direct (non-GCN) use of the same priors, no statistics or visualizations of the learned adjacency, and no isolation of TKI's contribution are described, leaving open whether the internalization step adds value or merely propagates dataset-specific bias that could undermine zero-shot transfer.
  2. [Experimental results] Experimental results (results section): The abstract asserts 'comprehensive experiments' with competitive CE on MIMIC-CXR and robust zero-shot generalization on IU X-Ray, yet no quantitative metrics, baseline tables, ablation details on TKI/DGSA, or error analysis are referenced. This makes it impossible to evaluate the load-bearing claims of competitiveness and generalization; post-hoc selection of 'competitive' cannot be assessed without full results.
minor comments (2)
  1. [Abstract] Abstract: The full expansion of GDMRG is lengthy; a shorter acronym or clearer phrasing would improve readability.
  2. [Method] Notation: The distinction between 'main branch' and 'auxiliary branch' in the dual-stream system could be clarified with explicit equations or pseudocode for the asymmetric optimization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. These observations highlight areas where we can improve clarity and provide stronger empirical support for our claims. We address each major comment point by point below, indicating the specific revisions we will incorporate in the next version.

read point-by-point responses
  1. Referee: [TKI module] TKI module (method section): The central claim that the GCN produces a parameterized weight matrix capturing topological structures (beyond raw co-occurrence priors) and improves reasoning on complex lesions lacks supporting ablations. No comparison to a direct (non-GCN) use of the same priors, no statistics or visualizations of the learned adjacency, and no isolation of TKI's contribution are described, leaving open whether the internalization step adds value or merely propagates dataset-specific bias that could undermine zero-shot transfer.

    Authors: We agree that dedicated ablations are required to substantiate the added value of the GCN within TKI. In the revised manuscript we will insert a new ablation subsection (Section 4.3) that directly compares (i) the full TKI module against (ii) a non-GCN baseline that injects the raw co-occurrence matrix as fixed weights. We will also add visualizations of the learned adjacency matrices before and after GCN propagation, together with quantitative metrics such as spectral gap and edge-weight entropy to illustrate the emergence of higher-order topological structure. To address the zero-shot concern, we will report the IU X-Ray zero-shot scores with and without TKI, showing that the learned parameterization improves rather than harms cross-dataset transfer. These additions will isolate TKI's contribution without altering the core method. revision: yes

  2. Referee: [Experimental results] Experimental results (results section): The abstract asserts 'comprehensive experiments' with competitive CE on MIMIC-CXR and robust zero-shot generalization on IU X-Ray, yet no quantitative metrics, baseline tables, ablation details on TKI/DGSA, or error analysis are referenced. This makes it impossible to evaluate the load-bearing claims of competitiveness and generalization; post-hoc selection of 'competitive' cannot be assessed without full results.

    Authors: We apologize that the quantitative grounding was not made sufficiently explicit. The full manuscript already contains Section 4 with Table 1 (MIMIC-CXR main results reporting BLEU-4, METEOR, ROUGE-L, CheXpert F1, and RadGraph F1 against prior baselines), Table 2 (zero-shot IU X-Ray results), and Table 3 (ablation on TKI and DGSA). However, to eliminate any ambiguity we will (a) add explicit forward references from the abstract and introduction to these tables, (b) include a new error-analysis subsection with qualitative examples of complex-lesion cases, and (c) report all numerical values inline when claims of competitiveness are made. These changes will allow readers to directly verify the reported performance. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external priors and independent validation

full rationale

The paper's central mechanism (TKI module) takes global disease co-occurrence priors as input and applies a GCN to produce a parameterized weight matrix for topological injection. This step is not self-definitional, as the priors are stated to be external and the GCN output is not equated to the input by construction. No equations or claims in the abstract reduce a prediction to a fitted parameter or rename a known result. No self-citations are invoked as load-bearing uniqueness theorems. The reported experiments on MIMIC-CXR and zero-shot IU X-Ray are presented as empirical outcomes rather than tautological consequences of the modeling choices. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The framework rests on the existence of stable global disease co-occurrence statistics that can be pre-computed and injected via GCN without domain shift; no free parameters or invented entities are quantified in the abstract.

axioms (1)
  • domain assumption Disease co-occurrence priors form a useful topological structure that improves diagnostic reasoning when internalized via GCN
    Invoked in the description of the TKI module as the basis for the parameterized weight matrix.
invented entities (2)
  • Topological Knowledge Internalization (TKI) module no independent evidence
    purpose: Generate explicit parameterized weight matrix from co-occurrence priors using GCN
    New module introduced to inject topological knowledge without external retrieval
  • Diagnosis-Guided Spatial Attention (DGSA) no independent evidence
    purpose: Recalibrate visual encoder using high-dimensional clinical semantics to mitigate feature hallucinations
    New attention mechanism to close the loop between diagnosis and visual grounding

pith-pipeline@v0.9.0 · 5580 in / 1352 out tokens · 100025 ms · 2026-05-08T19:23:18.061047+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages

  1. [1]

    On the automatic gen- eration of medical imaging reports,

    B. Jing, P. Xie, and E. Xing, “On the automatic gen- eration of medical imaging reports,” inProceedings of ACL, 2018, pp. 2577–2586

  2. [2]

    Gen- erating radiology reports via memory-driven trans- former,

    Z. Chen, Y. Song, T. H. Chang, and X. Wan, “Gen- erating radiology reports via memory-driven trans- former,” inProceedings of EMNLP, 2020, pp. 1439– 1449

  3. [3]

    Error and discrepancy in radiology: In- evitable or avoidable?

    A. Brady, “Error and discrepancy in radiology: In- evitable or avoidable?”Insights into Imaging, vol. 8, no. 1, pp. 171–182, 2017

  4. [4]

    C2FNAS: Coarse-to-fine neural architecture search for 3D medical image segmen- tation,

    Q. Yu, D. Yang, H. Roth, Y. Bai, Y. Zhang, A. L. Yuille, and D. Xu, “C2FNAS: Coarse-to-fine neural architecture search for 3D medical image segmen- tation,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4126–4135

  5. [5]

    Clinically accurate chest X-ray re- port generation,

    G. Liu et al., “Clinically accurate chest X-ray re- port generation,” inProceedings of MLHC, 2019, pp. 249–269

  6. [6]

    Label co-occurrence learning with graph convolutional networks for multi-label chest X-ray image classification,

    H. Chen, S. Miao, D. Xu, G. D. Hager, and A. P. Harrison, “Label co-occurrence learning with graph convolutional networks for multi-label chest X-ray image classification,”IEEE J. Biomed. Health Inform., vol. 24, no. 8, pp. 2292–2302, 2020

  7. [7]

    GloVe: Global vectors for word representation,

    J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global vectors for word representation,” in Proceedings of EMNLP, 2014, pp. 1532–1543

  8. [8]

    Learning to prompt for vision- language models,

    K. Zhou et al., “Learning to prompt for vision- language models,”International Journal of Com- puter Vision, vol. 130, no. 9, pp. 2337–2348, 2022

  9. [9]

    Improving factual completeness and consis- tency of image-to-text radiology report generation,

    Y. Miura, Y. Zhang, E. Tsai, C. Langlotz, and D. Ju- rafsky, “Improving factual completeness and consis- tency of image-to-text radiology report generation,” inProceedings of NAACL, 2021, pp. 5288–5304

  10. [10]

    Class-balanced loss based on effective number of samples,

    Y. Cui, M. Jia, T. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” inProceedings of CVPR, 2019, pp. 9268– 9277

  11. [11]

    Asym- metric loss for multi-label classification,

    T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, and L. Zelnik-Manor, “Asym- metric loss for multi-label classification,” inProceed- ings of ICCV, 2021, pp. 82–91. 10

  12. [12]

    TieNet: Text-image embedding net- work for common thorax disease classification and reporting in chest X-rays,

    X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “TieNet: Text-image embedding net- work for common thorax disease classification and reporting in chest X-rays,” inProceedings of CVPR, 2018, pp. 9049–9058

  13. [13]

    Multimodal recurrent model with at- tention for automated radiology report generation,

    Y. Xue et al., “Multimodal recurrent model with at- tention for automated radiology report generation,” inProceedings of MICCAI, 2018, pp. 457–466

  14. [14]

    Hybrid retrieval-generation reinforced agent for medical im- age report generation,

    Y. Li, X. Liang, Z. Hu, and E. P. Xing, “Hybrid retrieval-generation reinforced agent for medical im- age report generation,” inAdvances in Neural Infor- mation Processing Systems, 2018

  15. [15]

    High-performance medicine: The con- vergence of human and artificial intelligence,

    E. J. Topol, “High-performance medicine: The con- vergence of human and artificial intelligence,”Na- ture Medicine, vol. 25, no. 1, pp. 44–56, 2019

  16. [16]

    PromptMRG: Diagnosis-driven prompts for med- ical report generation,

    H. Jin, H. Che, Y. Lin, H. Chen et al., “PromptMRG: Diagnosis-driven prompts for med- ical report generation,” inProceedings of AAAI, 2024, pp. 15432–15440

  17. [17]

    Scaling up visual and vision-language representation learning with noisy text supervision,

    C. Jia et al., “Scaling up visual and vision-language representation learning with noisy text supervision,” inProceedings of ICML, 2021, pp. 4904–4916

  18. [18]

    Pre-train, prompt, and predict: A systematic survey of prompting methods in natu- ral language processing,

    P. Liu et al., “Pre-train, prompt, and predict: A systematic survey of prompting methods in natu- ral language processing,”ACM Computing Surveys, vol. 55, no. 9, pp. 1–35, 2023

  19. [19]

    Cross- modal memory networks for radiology report gener- ation,

    Z. Chen, Y. Shen, Y. Song, and X. Wan, “Cross- modal memory networks for radiology report gener- ation,” inProceedings of ACL, 2021, pp. 5904–5914

  20. [20]

    RadGraph: Extracting clinical en- tities and relations from radiology reports,

    S. Jain et al., “RadGraph: Extracting clinical en- tities and relations from radiology reports,” inAd- vances in Neural Information Processing Systems, 2021

  21. [21]

    Semi-supervised classi- fication with graph convolutional networks,

    T. N. Kipf and M. Welling, “Semi-supervised classi- fication with graph convolutional networks,” inPro- ceedings of ICLR, 2017

  22. [22]

    Graph attention networks,

    P. Veliˇ ckovi´ c et al., “Graph attention networks,” in Proceedings of ICLR, 2018

  23. [23]

    Ratchet: Medical transformer for chest X-ray diagnosis and reporting,

    B. Hou et al., “Ratchet: Medical transformer for chest X-ray diagnosis and reporting,” inProceedings of MICCAI, 2021, pp. 293–303

  24. [24]

    When radiology report generation meets knowledge graph,

    Y. Zhang et al., “When radiology report generation meets knowledge graph,” inProceedings of AAAI, 2020, pp. 12910–12917

  25. [25]

    Retrieval-based chest X-ray report generation using a pre-trained contrastive language- image model,

    M. Endo et al., “Retrieval-based chest X-ray report generation using a pre-trained contrastive language- image model,” inProceedings of MLHC, 2021, pp. 209–230

  26. [26]

    Knowledge-enhanced visual- language pre-training on chest radiology images,

    X. Zhang, C. Wu, Z. Zhao, W. Lin, Y. Zhang, Y. Wang, and W. Xie, “Knowledge-enhanced visual- language pre-training on chest radiology images,” Nature Communications, vol. 14, no. 1, p. 4542, 2023

  27. [27]

    Learning imbalanced datasets with label- distribution-aware margin loss,

    K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learning imbalanced datasets with label- distribution-aware margin loss,” inAdvances in Neural Information Processing Systems, 2019

  28. [28]

    SMOTE: Synthetic minority over- sampling technique,

    N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over- sampling technique,”Journal of Artificial Intelli- gence Research, vol. 16, pp. 321–357, 2002

  29. [29]

    Focal loss for dense object detection,

    T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ ar, “Focal loss for dense object detection,” in Proceedings of ICCV, 2017, pp. 2980–2988

  30. [30]

    Decoupling represen- tation and classifier for long-tailed recognition,

    B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis, “Decoupling represen- tation and classifier for long-tailed recognition,” in Proceedings of ICLR, 2020

  31. [31]

    Long-tail learning via logit adjustment,

    A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar, “Long-tail learning via logit adjustment,” inProceedings of ICLR, 2021

  32. [32]

    On calibration of modern neural networks,

    C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inPro- ceedings of ICML, 2017, pp. 1321–1330

  33. [33]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of CVPR, 2016, pp. 770–778

  34. [34]

    Squeeze-and-excitation networks,

    J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” inProceedings of CVPR, 2018, pp. 7132– 7141

  35. [35]

    Attention is all you need,

    A. Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017

  36. [36]

    BERT: Pre-training of deep bidirectional trans- formers for language understanding,

    J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional trans- formers for language understanding,” inProceedings of NAACL, 2019, pp. 4171–4186

  37. [37]

    Decoupled weight de- cay regularization,

    I. Loshchilov and F. Hutter, “Decoupled weight de- cay regularization,” inProceedings of ICLR, 2019

  38. [38]

    MIMIC-CXR, a de- identified publicly available database of chest radio- graphs with free-text reports,

    A. E. W. Johnson et al., “MIMIC-CXR, a de- identified publicly available database of chest radio- graphs with free-text reports,”Scientific Data, vol. 6, no. 1, p. 317, 2019

  39. [39]

    Preparing a collection of radiology examinations for distribution and re- trieval,

    D. Demner-Fushman et al., “Preparing a collection of radiology examinations for distribution and re- trieval,”J. Am. Med. Inform. Assoc., vol. 23, no. 2, pp. 304–310, 2016

  40. [40]

    CheXpert: A large chest radiograph dataset with uncertainty labels and expert compar- ison,

    J. Irvin et al., “CheXpert: A large chest radiograph dataset with uncertainty labels and expert compar- ison,” inProceedings of AAAI, 2019, pp. 590–597

  41. [41]

    BLEU: A method for automatic evaluation of ma- chine translation,

    K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, “BLEU: A method for automatic evaluation of ma- chine translation,” inProceedings of ACL, 2002, pp. 311–318. 11

  42. [42]

    METEOR: An auto- matic metric for MT evaluation with improved cor- relation with human judgments,

    S. Banerjee and A. Lavie, “METEOR: An auto- matic metric for MT evaluation with improved cor- relation with human judgments,” inProceedings of ACL, 2005, pp. 65–72

  43. [43]

    ROUGE: A package for automatic eval- uation of summaries,

    C.-Y. Lin, “ROUGE: A package for automatic eval- uation of summaries,” inProceedings of ACL, 2004

  44. [44]

    Progressive transformer-based generation of radiology reports,

    F. Nooralahzadeh, N. P. Gonzalez, T. Frauenfelder, K. Fujimoto, and M. Krauthammer, “Progressive transformer-based generation of radiology reports,” inFindings of the Association for Computational Linguistics: EMNLP, 2021, pp. 2824–2832

  45. [45]

    Contrastive attention for automatic chest X-ray report generation,

    F. Liu, C. Yin, X. Wu, S. Ge, P. Zhang, and X. Sun, “Contrastive attention for automatic chest X-ray report generation,” inFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 269–280

  46. [46]

    Im- proving chest X-ray report generation by leverag- ing warm-starting,

    A. Nicolson, J. Dowling, and B. Koopman, “Im- proving chest X-ray report generation by leverag- ing warm-starting,”Artif. Intell. Med., vol. 144, p. 102633, 2023

  47. [47]

    Radiology report generation with a learned knowledge base and multi-modal alignment,

    S. Yang, X. Wu, S. Ge, Z. Zheng, S. K. Zhou, and L. Xiao, “Radiology report generation with a learned knowledge base and multi-modal alignment,”Medi- cal Image Anal., vol. 86, p. 102798, 2023

  48. [48]

    ME Trans- former: Radiology report generation by transformer with multiple learnable expert tokens,

    Z. Wang, L. Liu, L. Wang, and L. Zhou, “ME Trans- former: Radiology report generation by transformer with multiple learnable expert tokens,” inProceed- ings of CVPR, 2023, pp. 11558–11567

  49. [49]

    Dynamic graph enhanced contrastive learn- ing for chest X-ray report generation,

    M. Li, B. Lin, Z. Chen, H. Lin, X. Liang, and X. Chang, “Dynamic graph enhanced contrastive learn- ing for chest X-ray report generation,” inProceed- ings of CVPR, 2023, pp. 3334–3343

  50. [50]

    Interactive and explainable region-guided radiology report generation,

    T. Tanida, P. M¨ uller, G. Kaissis, and D. Rueckert, “Interactive and explainable region-guided radiology report generation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 2023, pp. 7433–7442

  51. [51]

    CAMANet: Class activation map guided attention network for radiology report generation,

    J. Wang, A. Bhalerao, T. Yin, S. See, and Y. He, “CAMANet: Class activation map guided attention network for radiology report generation,”IEEE J. Biomed. Health Inform., vol. 28, no. 4, pp. 2199– 2210, 2024

  52. [52]

    Topicwise separable sentence retrieval for medical report generation,

    J. Zhao, Y. Zhou, Z. Chen, H. Fu, and L. Wan, “Topicwise separable sentence retrieval for medical report generation,”IEEE Trans. Med. Imaging, vol. 44, no. 3, pp. 1505–1514, 2025

  53. [53]

    Cross-modal causal representa- tion learning for radiology report generation,

    W. Chen et al., “Cross-modal causal representa- tion learning for radiology report generation,”IEEE Trans. Image Process., vol. 34, pp. 2970–2985, 2025. 12 Supplementary Material for “Graph-Augmented Topological Internalization with Dual-Stream Classifiers for Medical Report Generation” S-I Supplementary Results on Sample-level Clinical Efficacy We report...