Graph-of-Differences: Anatomy-Structured Difference Alignment for Medical Image Re-Identification
Pith reviewed 2026-06-26 14:34 UTC · model grok-4.3
The pith
Representing medical images as anatomy graphs and aligning differences over matched nodes improves re-identification accuracy and generalization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GoD represents each image as an anatomy graph, establishes soft node correspondence for image pairs, computes differences over matched anatomy, and uses a graph-level difference alignment objective to tie these to the global backbone difference, ensuring the retrieval signal is anchored in homologous structures.
What carries the argument
The anatomy graph with soft node correspondence and graph-level difference alignment objective that anchors differences to named structures.
If this is right
- Rank-1 accuracy increases by 7.1 percentage points on fundus images and 3.1 on CXR over baseline.
- Gains extend to zero-shot external transfers, indicating better generalization.
- Explanations become verifiable through node insertion and deletion tests on named graph nodes.
- The method reduces vulnerability to shortcut learning from non-anatomical features.
Where Pith is reading between the lines
- The approach may apply to other tasks requiring anatomical consistency, such as segmentation or registration.
- Structured explanations could support regulatory requirements for AI in healthcare.
- Performance might further improve if node correspondences incorporate domain-specific priors.
Load-bearing premise
Soft node correspondence between anatomy graphs from different images can be established reliably enough that the resulting differences are meaningful and not dominated by correspondence errors.
What would settle it
If ablating the anatomy graph and alignment components yields no improvement or worse performance than the frozen-backbone baseline, or if correspondence errors lead to meaningless differences, the central claim would be falsified.
Figures
read the original abstract
Medical image re-identification (MedReID) enables longitudinal patient linkage but remains vulnerable to shortcut learning and often produces decisions that clinicians cannot audit against named anatomy. We propose Graph-of-Differences (GoD), which grounds identity comparisons in explicit anatomical structure. Each image is represented as an anatomy graph whose nodes correspond to named anatomical regions; given an image pair, soft node correspondence is established, and differences are computed over matched anatomy. A graph-level difference alignment objective ties these anatomy-matched differences to the global backbone difference, ensuring the retrieval signal is anchored in homologous structures rather than arbitrary spatial tokens. Explanations are defined over named graph nodes and quantitatively audited via node insertion/deletion tests, replacing unstable pixel heatmaps with verifiable structure-level evidence. On internal benchmarks, GoD improves Rank-1 by +7.1 pp on fundus and +3.1 pp on CXR over a strong frozen-backbone baseline, with further gains on zero-shot external transfers confirming that anatomy grounding improves both accuracy and generalization. Code is available at https://github.com/GenMI-Lab/GoD.git.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Graph-of-Differences (GoD) for medical image re-identification (MedReID). Images are represented as anatomy graphs with nodes for named anatomical regions. For pairs, soft node correspondence is established and differences computed over matched nodes; a graph-level alignment objective ties these anatomy-specific differences to the global backbone difference. This is claimed to reduce shortcut learning, improve Rank-1 by +7.1 pp (fundus) and +3.1 pp (CXR) over a frozen-backbone baseline, yield further zero-shot external gains, and enable auditable explanations via named nodes with insertion/deletion tests. Code is released.
Significance. If the results hold after addressing validation gaps, the contribution would be significant for MedReID by replacing opaque spatial-token comparisons with explicit, named anatomical structure. The reported accuracy and generalization improvements, combined with structure-level explanations and public code, would advance both performance and clinical auditability in longitudinal patient linkage tasks.
major comments (2)
- [Method] Method section (description of soft node correspondence and graph-level alignment): the central claim attributes performance gains to anatomy-grounded differences, yet no quantitative validation of correspondence quality (e.g., precision/recall on held-out annotated node pairs) or ablation replacing learned correspondence with random/uniform matching is provided; without these, it is impossible to confirm that reported improvements arise from meaningful anatomical variation rather than the auxiliary alignment loss or backbone features.
- [Experiments] Experiments section (internal benchmarks and zero-shot transfers): the +7.1 pp and +3.1 pp Rank-1 gains and external-transfer results are presented without error bars, statistical significance tests, or explicit baseline definitions; this weakens the ability to assess whether the anatomy-grounding component is the load-bearing driver of the claimed generalization benefit.
minor comments (2)
- [Abstract] Abstract and Experiments: the description of the frozen-backbone baseline and the temperature parameter in soft correspondence lack implementation specifics that would aid reproducibility.
- [Explanations] Explanations section: while node insertion/deletion tests are mentioned, the quantitative auditing protocol (e.g., how many nodes, how deletion is performed) should be detailed with pseudocode or equations for clarity.
Simulated Author's Rebuttal
We thank the referee for their thoughtful comments, which highlight important aspects for validating the method and strengthening the experimental presentation. We provide point-by-point responses below and will make revisions to the manuscript as indicated.
read point-by-point responses
-
Referee: [Method] Method section (description of soft node correspondence and graph-level alignment): the central claim attributes performance gains to anatomy-grounded differences, yet no quantitative validation of correspondence quality (e.g., precision/recall on held-out annotated node pairs) or ablation replacing learned correspondence with random/uniform matching is provided; without these, it is impossible to confirm that reported improvements arise from meaningful anatomical variation rather than the auxiliary alignment loss or backbone features.
Authors: We agree that additional validation of the soft node correspondence would be beneficial. While the manuscript uses insertion/deletion tests on named nodes to audit explanations, we did not include direct metrics like precision/recall on annotated correspondences or an ablation with random matching. In the revised version, we will include an ablation study comparing the learned correspondence against random and uniform matching baselines, reporting the impact on Rank-1 accuracy. This will help confirm that the gains stem from the anatomy-structured differences. Note that creating held-out annotated node pairs would require new annotations not present in the original datasets, so we focus on the ablation instead. revision: yes
-
Referee: [Experiments] Experiments section (internal benchmarks and zero-shot transfers): the +7.1 pp and +3.1 pp Rank-1 gains and external-transfer results are presented without error bars, statistical significance tests, or explicit baseline definitions; this weakens the ability to assess whether the anatomy-grounding component is the load-bearing driver of the claimed generalization benefit.
Authors: The current manuscript presents the gains without error bars or significance tests, which is a valid observation. We will revise the experiments section to include results averaged over multiple random seeds with standard deviations (error bars), perform statistical significance tests (e.g., t-tests) on the improvements, and provide explicit definitions of the baselines in the text, tables, and captions. This will better demonstrate the contribution of the anatomy-grounding component. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper defines GoD via anatomy graphs, soft node correspondence, per-node differences, and a graph-level alignment loss that anchors to backbone features; these are design choices and training objectives, not self-definitions or fitted inputs renamed as predictions. Reported Rank-1 gains (+7.1 pp fundus, +3.1 pp CXR) and zero-shot transfers are empirical measurements against a frozen-backbone baseline, not reductions by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the abstract or described method. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Peking university international competition on ocular disease intelligent recogni- tion (odir-2019).https://odir2019.grand-challenge.org/(2019), grand Chal- lenge dataset and competition on ocular disease classification
2019
-
[2]
Image Analysis & Stereology33(3), 231–234 (August 2014).https://doi.org/10.5566/ias.1155
Decencière, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C., Gain, P., Ordonez, R., Massin, P., Erginay, A., Charton, B., Klein, J.C.: Feedback on a publicly distributed image database: The MESSIDOR database. Image Analysis & Stereology33(3), 231–234 (August 2014).https://doi.org/10.5566/ias.1155
-
[3]
In: International Conference on Learning Representations (ICLR) (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2021)
2021
-
[4]
PhysioNet (January 2025).https://doi.org/10.13026/ 3705-zg36, version 1.0.0
Gaggion, N., Mosquera, C., Aineseder, M., Mansilla, L., Milone, D., Ferrante, E.: Chexmask database: A large-scale dataset of anatomical segmentation masks for chest x-ray images. PhysioNet (January 2025).https://doi.org/10.13026/ 3705-zg36, version 1.0.0
2025
-
[5]
Medical Image Analysis99, 103335 (2025).https: //doi.org/10.1016/j.media.2024.103335
Ganz, J., Ammeling, J., Jabari, S., Breininger, K., Aubreville, M.: Re-identification from histopathology images. Medical Image Analysis99, 103335 (2025).https: //doi.org/10.1016/j.media.2024.103335
-
[6]
Nature Machine Intelligence , author =
Geirhos, R., Jacobsen, J.H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., Wichmann, F.A.: Shortcut learning in deep neural networks. Nature Machine Intel- ligence2(11), 665–673 (2020).https://doi.org/10.1038/s42256-020-00257-z
-
[7]
European Radiology35(5), 2422–2433 (2024).https://doi.org/10.1007/ s00330-024-11013-x
Heinrich, A.: Automatic personal identification using a single ct image. European Radiology35(5), 2422–2433 (2024).https://doi.org/10.1007/ s00330-024-11013-x
2024
-
[8]
arXiv preprint arXiv:1703.07737 (2017).https://doi.org/10
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).https://doi.org/10. 48550/arXiv.1703.07737,https://arxiv.org/abs/1703.07737
Pith/arXiv arXiv 2017
-
[9]
CoRRabs/1802.04712(2018),http://arxiv.org/abs/1802.04712
Ilse, M., Tomczak, J.M., Welling, M.: Attention-based deep multiple instance learn- ing. CoRRabs/1802.04712(2018),http://arxiv.org/abs/1802.04712
Pith/arXiv arXiv 2018
-
[10]
Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., Seekins, J., Mong, D.A., Halabi, S.S., Sand- berg, J.K., Jones, R., Larson, D.B., Langlotz, C.P., Patel, B.N., Lungren, M.P., Ng, 10 Wasalathilaka et al. A.Y.: Chexpert: A large chest radiograph dataset with uncertainty labels and ex...
-
[11]
Radiology: Artificial Intelligence5(6) (2023)
Macpherson, M.S., Hutchinson, C.E., Horst, C., Goh, V., Montana, G.: Pa- tient reidentification from chest radiographs: An interpretable deep metric learn- ing approach and its applications. Radiology: Artificial Intelligence5(6) (2023). https://doi.org/10.1148/ryai.230019
-
[12]
In: 2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)
Manesco, J.R.R., Jodas, D., Zanella, M.J.G., Santos, M.K., Papa, J.P.: Graph fea- ture embeddings for patient re-identification from chest x-ray images. In: 2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). pp. 1–6. IEEE (2024)
2024
-
[13]
Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., Schlötterer, J., van Keulen, M., Seifert, C.: From anecdotal evidence to quantitative evalua- tion methods: A systematic review on evaluating explainable ai. ACM Computing Surveys55(13s), 1–42 (2023).https://doi.org/10.1145/3583558
-
[14]
Scientific Reports12(1) (2022).https://doi.org/10
Packhäuser, K., Gündel, S., Münster, N., Syben, C., Christlein, V., Maier, A.: Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest x-ray data. Scientific Reports12(1) (2022).https://doi.org/10. 1038/s41598-022-19045-3
2022
-
[15]
In: Advances in Neural Information Processing Systems (NeurIPS)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learn- ing on point sets in a metric space. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 30 (2017)
2017
-
[16]
Tian, Y., Ji, K., Zhang, R., Jiang, Y., Li, C., Wang, X., Zhai, G.: Towards all-in- one medical image re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 30774–30786 (2025). https://doi.org/10.1109/CVPR52734.2025.02866
-
[17]
In: Advances in Neural Information Processing Systems (NeurIPS)
Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 30 (2017)
2017
-
[18]
In: International Conference on Learning Representations (ICLR) (2018)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations (ICLR) (2018)
2018
-
[19]
URL http://dx.doi.org/ 10.1109/CVPR.2017.369
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classi- fication and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2097–2106 (2017).https://doi.org/10.1109/CVPR.2017.369
-
[20]
Wang, Y., Sun, Y., Liu, Z., Sarma, S., Bronstein, M., Solomon, J.: Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics38(5), 146:1– 146:12 (January 2018).https://doi.org/10.1145/3326362
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.