pith. sign in

arxiv: 2507.15777 · v3 · pith:TRQ6Q2DTnew · submitted 2025-07-21 · 💻 cs.CV

Label tree semantic losses for rich multi-class medical image segmentation

Pith reviewed 2026-05-22 13:16 UTC · model grok-4.3

classification 💻 cs.CV
keywords medical image segmentationsemantic losslabel hierarchytree-based losssparse annotationsbrain parcellationhyperspectral imaging
0
0 comments X

The pith

Tree-based semantic losses that respect label hierarchies improve multi-class medical image segmentation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces two loss functions organized around a tree of labels so that mistakes between unrelated classes cost more than mistakes within the same branch. These losses are added to a recent training method that works with sparse annotations lacking background labels. Experiments cover full-supervision whole-brain parcellation on head MRI and sparse-annotation scene understanding on neurosurgical hyperspectral images. Results show steady gains over ordinary task-specific baselines, with the Wasserstein compound loss strongest in the MRI case and hierarchy-weighted supervision strongest in the sparse setting. A reader would care because ordinary losses ignore the natural semantic structure of rich clinical label sets and therefore waste capacity on implausible errors.

Core claim

Two tree-based semantic loss functions exploit a hierarchical label organization to weight errors by semantic distance; when combined with sparse-annotation training, they produce consistent accuracy gains over baselines on whole-brain parcellation and neurosurgical hyperspectral imaging, with the Wasserstein-based compound loss and hierarchy-weighted top-level supervision each proving most effective in their respective regimes.

What carries the argument

Tree-based semantic loss functions that penalize prediction errors according to distance within a pre-specified label hierarchy.

If this is right

  • The Wasserstein-based compound loss yields the largest gains on fully supervised whole-brain parcellation.
  • Hierarchy-weighted top-level supervision performs best when annotations are sparse and background-free in hyperspectral imaging.
  • The losses can be dropped into existing sparse-annotation frameworks without changing the underlying network architecture.
  • Rich multi-class medical segmentation becomes more practical once inter-class semantics are explicitly penalized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tree-loss idea could be tested on other clinical tasks whose labels form natural taxonomies, such as multi-organ CT segmentation with substructures.
  • If the hierarchy could be inferred from data instead of supplied by experts, the method would require less manual curation.
  • Better segmentation of fine anatomical classes may directly improve downstream clinical metrics such as surgical navigation precision or post-operative outcome prediction.

Load-bearing premise

The label hierarchy is specified correctly in advance and its branch distances genuinely reflect semantic dissimilarity among the classes.

What would settle it

Replacing the given label tree with a random or flat hierarchy on the same two tasks and observing no remaining improvement over standard losses would show that the semantic structure itself is not responsible for the reported gains.

Figures

Figures reproduced from arXiv: 2507.15777 by Aaron Kujawa, Jonathan Shapey, Junwen Wang, Oscar MacCormac, Tom Vercauteren, William Rochford.

Figure 1
Figure 1. Figure 1: The neuro-anatomical label hierarchy of Mindboggle dataset. From [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visual comparison of the baseline loss and the proposed Wasserstein-based loss on the AOMIC dataset. Each column shows the predicted segmentation [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative result on top-level classes. We show the result of same image using different methods at confidence threshold [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Confusion matrices for the WBP and HSI tasks. For WBP, the evaluation is on 11 hard classes of the ANOMIC dataset. Class names from [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Rich and accurate medical image segmentation is poised to underpin the next generation of AI-defined clinical practice by delineating critical anatomy for pre-operative planning, guiding real-time intra-operative navigation, and supporting precise post-operative assessment. However, commonly used learning methods for medical and surgical imaging segmentation tasks penalise all errors equivalently and thus fail to exploit any inter-class semantics in the label space. This becomes particularly problematic as the cardinality and richness of labels increases to include subtly different classes. In this work, we propose two tree-based semantic loss functions which take advantage of a hierarchical organisation of the labels. We further incorporate our losses in a recently proposed approach for training with sparse, background-free annotations to extend the applicability of our proposed losses. Extensive experiments are reported on two medical and surgical imaging segmentation tasks, namely head MRI for whole brain parcellation with full supervision and neurosurgical hyperspectral imaging for scene understanding with sparse annotations. Results demonstrate consistent improvements over the evaluated task-specific baselines, with the strongest support for the Wasserstein-based compound loss in whole-brain parcellation and for hierarchy-weighted top-level supervision in the sparse HSI setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes two tree-based semantic loss functions that exploit a hierarchical organization of labels to penalize errors according to semantic distances rather than treating all misclassifications equally. These losses are integrated into a sparse, background-free annotation training framework and evaluated on two tasks: fully supervised whole-brain parcellation from head MRI and sparsely annotated neurosurgical hyperspectral imaging for scene understanding. The abstract reports consistent improvements over task-specific baselines, with the strongest support for the Wasserstein-based compound loss in the first task and hierarchy-weighted top-level supervision in the second.

Significance. If the empirical gains prove robust, the work offers a practical way to inject label semantics into segmentation training for rich medical label spaces, which could improve utility for pre-operative planning and intra-operative guidance. The combination with sparse-annotation methods broadens applicability. The paper supplies reproducible experimental protocols on two distinct clinical tasks and reports gains for specific loss variants, which are positive attributes.

major comments (2)
  1. Abstract and experimental description: the central claim of consistent improvements attributable to semantic hierarchy is load-bearing, yet the manuscript provides no ablation on alternative hierarchies, no flat (non-hierarchical) baseline comparison, and no sensitivity checks on tree edge weights or scaling factors. Without these, it is unclear whether the reported gains stem from the semantic structure or from generic regularization effects.
  2. Methods and results sections: the hierarchy is treated as a fixed, correctly specified input whose distances match clinical importance, but no quantitative validation or clinical-expert agreement study is described. If the tree does not align with task-relevant semantics (e.g., functional vs. gross anatomical grouping), the Wasserstein and hierarchy-weighted losses lose their claimed advantage.
minor comments (2)
  1. Add error bars, statistical significance tests, and full ablation tables (including per-class metrics) to the experimental results so that robustness to post-hoc choices can be assessed.
  2. Clarify the precise mathematical definitions of the two proposed losses (including any weighting schemes or distance metrics on the tree) with explicit equations and pseudocode for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and describe the revisions we will make to improve the clarity and robustness of the claims.

read point-by-point responses
  1. Referee: Abstract and experimental description: the central claim of consistent improvements attributable to semantic hierarchy is load-bearing, yet the manuscript provides no ablation on alternative hierarchies, no flat (non-hierarchical) baseline comparison, and no sensitivity checks on tree edge weights or scaling factors. Without these, it is unclear whether the reported gains stem from the semantic structure or from generic regularization effects.

    Authors: We agree that the absence of these controls leaves the source of the gains open to interpretation. In the revised manuscript we will add (i) a flat baseline that replaces the tree-based losses with standard cross-entropy, (ii) a sensitivity analysis sweeping the edge-weight and scaling hyperparameters, and (iii) a short discussion of the clinical rationale for the chosen hierarchies together with a note that systematic exploration of alternative trees remains future work. These additions will directly test whether the observed improvements are attributable to the semantic structure. revision: yes

  2. Referee: Methods and results sections: the hierarchy is treated as a fixed, correctly specified input whose distances match clinical importance, but no quantitative validation or clinical-expert agreement study is described. If the tree does not align with task-relevant semantics (e.g., functional vs. gross anatomical grouping), the Wasserstein and hierarchy-weighted losses lose their claimed advantage.

    Authors: The label trees were constructed from established anatomical and functional parcellation schemes reported in the neuroimaging and neurosurgical literature, as stated in the Methods section. We acknowledge that a formal inter-expert agreement study would provide stronger quantitative support. Because such a study would require new expert consultations and data collection outside the present scope, we will instead add a dedicated limitations paragraph that (a) details the literature sources used to define the hierarchies, (b) discusses the risk of misalignment with alternative clinical groupings, and (c) qualifies the claims accordingly. This revision will make the dependence on hierarchy specification explicit without overstating the current evidence. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical gains from proposed losses on pre-specified hierarchy

full rationale

The paper defines two new loss functions (Wasserstein-based and hierarchy-weighted) that operate on an externally provided label tree. These are incorporated into training and evaluated via experiments on whole-brain parcellation and hyperspectral imaging tasks, with reported improvements over task-specific baselines. No equation reduces a claimed prediction to a fitted parameter or input by construction. The hierarchy is treated as a fixed input rather than derived from the same data or equations. A citation to a prior sparse-annotation method exists but is not load-bearing for the central empirical claim and does not form a self-referential chain. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach depends on an externally provided label hierarchy and on the validity of the sparse annotation training framework it extends; no new entities are postulated.

free parameters (1)
  • tree edge weights or hierarchy scaling factors
    Likely parameters that control how much semantic distance affects the loss; their values are not stated in the abstract.
axioms (1)
  • domain assumption A pre-defined hierarchical organization of the label space exists and reflects clinically meaningful semantic relationships.
    Invoked when the losses are defined to exploit inter-class semantics.

pith-pipeline@v0.9.0 · 5740 in / 1239 out tokens · 36416 ms · 2026-05-22T13:16:04.545062+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

  1. [1]

    Imaging-based parcellations of the human brain,

    S. B. Eickhoff, B. T. T. Yeo, and S. Genon, “Imaging-based parcellations of the human brain,”Nature Reviews Neuroscience, vol. 19, pp. 672– 686, Nov. 2018

  2. [2]

    Intraoperative multispectral and hyperspectral label-free imaging: A systematic review of in vivo clinical studies,

    J. Shapey, Y . Xie, E. Nabavi, R. Bradford, S. R. Saeed, S. Ourselin, and T. Vercauteren, “Intraoperative multispectral and hyperspectral label-free imaging: A systematic review of in vivo clinical studies,”Journal of Biophotonics, vol. 12, p. e201800455, 2019

  3. [3]

    Robust deep learning-based semantic organ segmentation in hyperspectral images,

    S. Seidlitz, J. Sellner, J. Odenthal, B. ¨Ozdemir, A. Studier-Fischer, S. Kn ¨odler, L. Ayala, T. J. Adler, H. G. Kenngottet al., “Robust deep learning-based semantic organ segmentation in hyperspectral images,” Medical Image Analysis, vol. 80, p. 102488, Aug. 2022

  4. [4]

    Hyperspectral image segmentation: A preliminary study on the oral and dental spectral image database (odsi-db),

    L. C. Garcia Peraza Herrera, C. Horgan, S. Ourselin, M. Ebner, and T. Vercauteren, “Hyperspectral image segmentation: A preliminary study on the oral and dental spectral image database (odsi-db),”Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 11, pp. 1290–1298, Jul. 2023

  5. [5]

    Ma- chine learning performance trends: A comparative study of independent hyperspectral human brain cancer databases,

    A. Mart ´ın-P´erez, B. Martinez-Vega, M. Villa, R. Leon, A. Martinez de Ternero, H. Fabelo, S. Ortega, E. Quevedo, G. M. Callicoet al., “Ma- chine learning performance trends: A comparative study of independent hyperspectral human brain cancer databases,” Rochester, NY , Aug. 2024

  6. [6]

    Learning with a wasserstein loss,

    C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggio, “Learning with a wasserstein loss,” inAdvances in Neural Information Processing Systems, vol. 28, 2015

  7. [7]

    Tree-sliced variants of wasserstein distances,

    T. Le, M. Yamada, K. Fukumizu, and M. Cuturi, “Tree-sliced variants of wasserstein distances,” inAdvances in Neural Information Processing Systems, vol. 32, 2019

  8. [8]

    Making better mistakes: Leveraging class hierarchies with deep networks,

    L. Bertinetto, R. Mueller, K. Tertikas, S. Samangooei, and N. A. Lord, “Making better mistakes: Leveraging class hierarchies with deep networks,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, W A, USA, Jun. 2020, pp. 12 503– 12 512

  9. [9]

    Heiporspectral - the heidelberg porcine hyperspectral imaging dataset of 20 physiological organs,

    A. Studier-Fischer, S. Seidlitz, J. Sellner, M. Bressan, B. ¨Ozdemir, L. Ay- ala, J. Odenthal, S. Knoedler, K.-F. Kowalewskiet al., “Heiporspectral - the heidelberg porcine hyperspectral imaging dataset of 20 physiological organs,”Scientific Data, vol. 10, p. 414, Jun. 2023

  10. [10]

    Oral and dental spectral image database—odsi-db,

    J. Hyttinen, P. F ¨alt, H. J ¨asberg, A. Kullaa, and M. Hauta-Kasari, “Oral and dental spectral image database—odsi-db,”Applied Sciences, vol. 10, p. 7246, Jan. 2020

  11. [11]

    The dresden surgical anatomy dataset for abdominal organ segmentation in surgical data science,

    M. Carstens, F. M. Rinner, S. Bodenstedt, A. C. Jenke, J. Weitz, M. Distler, S. Speidel, and F. R. Kolbinger, “The dresden surgical anatomy dataset for abdominal organ segmentation in surgical data science,”Scientific Data, vol. 10, p. 3, Jan. 2023

  12. [12]

    Ood- seg: Out-of-distribution detection for image segmentation with sparse multi-class positive-only annotations,

    J. Wang, Z. Wang, O. MacCormac, J. Shapey, and T. Vercauteren, “Ood- seg: Out-of-distribution detection for image segmentation with sparse multi-class positive-only annotations,”arXiv, 2024

  13. [13]

    101 labeled brain images and a consistent human cortical labeling protocol,

    A. Klein and J. Tourville, “101 labeled brain images and a consistent human cortical labeling protocol,”Frontiers in Neuroscience, vol. 6, Dec. 2012

  14. [14]

    Tree-based semantic losses: Application to sparsely- supervised large multi-class hyperspectral segmentation,

    J. Wang, O. Maccormac, W. Rochford, A. Kujawa, J. Shapey, and T. Vercauteren, “Tree-based semantic losses: Application to sparsely- supervised large multi-class hyperspectral segmentation,” Jun. 2025

  15. [15]

    Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain,

    B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove, A. van der Kouwe, R. Killiany, D. Kennedyet al., “Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain,”Neuron, vol. 33, pp. 341–355, Jan. 2002

  16. [16]

    Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,

    M. J. Cardoso, M. Modat, R. Wolz, A. Melbourne, D. Cash, D. Rueckert, and S. Ourselin, “Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,”IEEE transactions on medical imaging, vol. 34, pp. 1976–1988, Sep. 2015

  17. [17]

    Fastsurfer - a fast and accurate deep learning based neuroimaging pipeline,

    L. Henschel, S. Conjeti, S. Estrada, K. Diers, B. Fischl, and M. Reuter, “Fastsurfer - a fast and accurate deep learning based neuroimaging pipeline,”NeuroImage, vol. 219, p. 117012, Oct. 2020

  18. [18]

    Quicknat: A fully convolutional network for quick and accurate segmentation of neuroanatomy,

    A. Guha Roy, S. Conjeti, N. Navab, and C. Wachinger, “Quicknat: A fully convolutional network for quick and accurate segmentation of neuroanatomy,”NeuroImage, vol. 186, pp. 713–727, Feb. 2019. J. DOEet al.: IEEE TMI TEMPLATE HACKS - V2 9 sRGB Sparselyannotatedground truth LMℓ twce+seg(τ0)L Mℓ twce+seg LMt twce+seg LMt wass+seg LMh wass+seg Other Out-of-fo...

  19. [19]

    Error corrective boosting for learning fully convolutional networks with limited data,

    A. G. Roy, S. Conjeti, D. Sheet, A. Katouzian, N. Navab, and C. Wachinger, “Error corrective boosting for learning fully convolutional networks with limited data,” inMedical Image Computing and Computer Assisted Intervention - MICCAI 2017, Cham, 2017, pp. 231–239

  20. [20]

    Label merge- and-split: A graph-colouring approach for memory-efficient brain par- cellation,

    A. Kujawa, R. Dorent, S. Ourselin, and T. Vercauteren, “Label merge- and-split: A graph-colouring approach for memory-efficient brain par- cellation,” inMedical Image Computing and Computer Assisted Inter- vention – MICCAI 2024, Cham, 2024, pp. 350–360

  21. [21]

    Hierarchical brain parcellation with uncertainty,

    M. S. Graham, C. H. Sudre, T. Varsavsky, P.-D. Tudosiu, P. Nachev, S. Ourselin, and M. J. Cardoso, “Hierarchical brain parcellation with uncertainty,” inUncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Graphs in Biomedical Image Analysis, Cham, 2020, pp. 23–31

  22. [22]

    Manifold embedding and semantic segmentation for intraoperative guidance with hyperspec- tral brain imaging,

    D. Rav `ı, H. Fabelo, G. M. Callic, and G.-Z. Yang, “Manifold embedding and semantic segmentation for intraoperative guidance with hyperspec- tral brain imaging,”IEEE Transactions on Medical Imaging, vol. 36, pp. 1845–1857, Sep. 2017

  23. [23]

    Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations,

    H. Fabelo, S. Ortega, D. Ravi, B. R. Kiran, C. Sosa, D. Bulters, G. M. Callic´o, H. Bulstrode, A. Szolnaet al., “Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations,”PLOS ONE, vol. 13, p. e0193721, Mar. 2018

  24. [24]

    Uncertainty-aware organ classification for surgical data science applications in laparoscopy,

    S. Moccia, S. J. Wirkert, H. Kenngott, A. S. Vemuri, M. Apitz, B. Mayer, E. De Momi, L. S. Mattos, and L. Maier-Hein, “Uncertainty-aware organ classification for surgical data science applications in laparoscopy,”IEEE Transactions on Biomedical Engineering, vol. 65, pp. 2649–2659, Nov. 2018

  25. [25]

    Trends in deep learning for medical hyperspectral image analysis,

    U. Khan, S. Paheding, C. P. Elkin, and V . K. Devabhaktuni, “Trends in deep learning for medical hyperspectral image analysis,”IEEE Access, vol. 9, pp. 79 534–79 548, 2021

  26. [26]

    Tongue tumor detection in hyperspectral images using deep learning semantic segmentation,

    S. Trajanovski, C. Shan, P. J. C. Weijtmans, S. G. B. de Koning, and T. J. M. Ruers, “Tongue tumor detection in hyperspectral images using deep learning semantic segmentation,”IEEE Transactions on Biomedical Engineering, vol. 68, pp. 1330–1340, Apr. 2021

  27. [27]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Cham, 2015, pp. 234– 241

  28. [28]

    The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation,

    S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y . Bengio, “The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 11–19

  29. [29]

    What does classifying more than 10,000 image categories tell us?

    J. Deng, A. C. Berg, K. Li, and L. Fei-Fei, “What does classifying more than 10,000 image categories tell us?” inComputer Vision – ECCV 2010, Berlin, Heidelberg, 2010, pp. 71–84

  30. [30]

    Large-scale category structure aware image categorization,

    B. Zhao, F. Li, and E. Xing, “Large-scale category structure aware image categorization,” inAdvances in Neural Information Processing Systems, 10 SUBMITTED TO PREPRINT ARCHIVE vol. 24, 2011

  31. [31]

    Learning hierarchical similarity metrics,

    N. Verma, D. Mahajan, S. Sellamanickam, and V . Nair, “Learning hierarchical similarity metrics,” in2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 2280–2287

  32. [32]

    Generalised wasserstein dice score for imbalanced multi-class segmentation using holistic convolutional networks,

    L. Fidon, W. Li, L. C. Garcia-Peraza-Herrera, J. Ekanayake, N. Kitchen, S. Ourselin, and T. Vercauteren, “Generalised wasserstein dice score for imbalanced multi-class segmentation using holistic convolutional networks,” in9th International MICCAI Brainlesion Workshop, Cham, 2018, pp. 64–76

  33. [33]

    The multimodal brain tumor image segmentation benchmark (brats),

    B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y . Burren, N. Porz, J. Slotboomet al., “The multimodal brain tumor image segmentation benchmark (brats),”IEEE Transactions on Medical Imaging, vol. 34, pp. 1993–2024, Oct. 2015

  34. [34]

    3d u-net: Learning dense volumetric segmentation from sparse annota- tion,

    ¨O. C ¸ ic ¸ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: Learning dense volumetric segmentation from sparse annota- tion,” inMedical Image Computing and Computer-Assisted Intervention – MICCAI 2016, Cham, 2016, pp. 424–432

  35. [35]

    Weakly supervised learning for multi- class medical image segmentation via feature decomposition,

    Z. Kuang, Z. Yan, and L. Yu, “Weakly supervised learning for multi- class medical image segmentation via feature decomposition,”Comput- ers in Biology and Medicine, vol. 171, p. 108228, Mar. 2024

  36. [36]

    Bounding box tightness prior for weakly super- vised image segmentation,

    J. Wang and B. Xia, “Bounding box tightness prior for weakly super- vised image segmentation,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2021, Cham, 2021, pp. 526–536

  37. [37]

    Interactive medical image segmentation using deep learning with image-specific fine tuning,

    G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprestet al., “Interactive medical image segmentation using deep learning with image-specific fine tuning,”IEEE Transactions on Medical Imaging, vol. 37, pp. 1562–1573, Jul. 2018

  38. [38]

    Deepigeos: A deep interactive geodesic framework for medical image segmentation,

    G. Wang, M. A. Zuluaga, W. Li, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprestet al., “Deepigeos: A deep interactive geodesic framework for medical image segmentation,”IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 41, pp. 1559– 1572, Jul. 2019

  39. [39]

    Vertebrae localization in pathological spine ct via dense classification from sparse annotations,

    B. Glocker, D. Zikic, E. Konukoglu, D. R. Haynor, and A. Criminisi, “Vertebrae localization in pathological spine ct via dense classification from sparse annotations,” inMedical Image Computing and Computer- Assisted Intervention – MICCAI 2013, Berlin, Heidelberg, 2013, pp. 262–270

  40. [40]

    Inter extreme points geodesics for end-to-end weakly supervised image segmentation,

    R. Dorent, S. Joutard, J. Shapey, A. Kujawa, M. Modat, S. Ourselin, and T. Vercauteren, “Inter extreme points geodesics for end-to-end weakly supervised image segmentation,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2021, Cham, 2021, pp. 615–624

  41. [41]

    3d medical image segmentation with sparse annotation via cross-teaching between 3d and 2d networks,

    H. Cai, L. Qi, Q. Yu, Y . Shi, and Y . Gao, “3d medical image segmentation with sparse annotation via cross-teaching between 3d and 2d networks,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2023: 26th International Conference, Berlin, Heidelberg, Aug. 2023, pp. 614–624

  42. [42]

    Weakly supervised histopathology cancer image segmentation and classification,

    Y . Xu, J.-Y . Zhu, E. I.-C. Chang, M. Lai, and Z. Tu, “Weakly supervised histopathology cancer image segmentation and classification,”Medical Image Analysis, vol. 18, pp. 591–604, Apr. 2014

  43. [43]

    A baseline for detecting misclassified and out-of-distribution examples in neural networks,

    D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” inInternational Conference on Learning Representations, Feb. 2017, pp. 1–12

  44. [44]

    Enhancing the reliability of out- of-distribution image detection in neural networks,

    S. Liang, Y . Li, and R. Srikant, “Enhancing the reliability of out- of-distribution image detection in neural networks,” inInternational Conference on Learning Representations, Feb. 2018, pp. 1–12

  45. [45]

    A simple unified framework for detecting out-of-distribution samples and adversarial attacks,

    K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” in Advances in Neural Information Processing Systems, vol. 31, 2018, pp. 1–12

  46. [46]

    Generalized odin: Detect- ing out-of-distribution image without learning from out-of-distribution data,

    Y .-C. Hsu, Y . Shen, H. Jin, and Z. Kira, “Generalized odin: Detect- ing out-of-distribution image without learning from out-of-distribution data,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp. 10 948–10 957

  47. [47]

    On calibration of modern neural networks,

    C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inProceedings of the 34th International Conference on Machine Learning, Jul. 2017, pp. 1321–1330

  48. [48]

    Trustwor- thy clinical ai solutions: A unified review of uncertainty quantification in deep learning models for medical image analysis,

    B. Lambert, F. Forbes, S. Doyle, H. Dehaene, and M. Dojat, “Trustwor- thy clinical ai solutions: A unified review of uncertainty quantification in deep learning models for medical image analysis,”Artificial Intelligence in Medicine, vol. 150, p. 102830, Apr. 2024

  49. [49]

    Anomalous example detection in deep learning: A survey,

    S. Bulusu, B. Kailkhura, B. Li, P. K. Varshney, and D. Song, “Anomalous example detection in deep learning: A survey,”IEEE Access, vol. 8, pp. 132 330–132 347, 2020

  50. [50]

    Improving calibration and out-of- distribution detection in deep models for medical image segmentation,

    D. Karimi and A. Gholipour, “Improving calibration and out-of- distribution detection in deep models for medical image segmentation,” IEEE Transactions on Artificial Intelligence, vol. 4, pp. 383–397, Apr. 2023

  51. [51]

    Distance-based de- tection of out-of-distribution silent failures for covid-19 lung lesion segmentation,

    C. Gonz ´alez, K. Gotkowski, M. Fuchs, A. Bucher, A. Dadras, R. Fis- chbach, I. J. Kaltenborn, and A. Mukhopadhyay, “Distance-based de- tection of out-of-distribution silent failures for covid-19 lung lesion segmentation,”Medical Image Analysis, vol. 82, p. 102596, Nov. 2022

  52. [52]

    Loss odyssey in medical image segmentation,

    J. Ma, J. Chen, M. Ng, R. Huang, Y . Li, C. Li, X. Yang, and A. L. Martel, “Loss odyssey in medical image segmentation,”Medical Image Analysis, vol. 71, p. 102035, Jul. 2021

  53. [53]

    nnu-net: A self-configuring method for deep learning-based biomedical image segmentation,

    F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: A self-configuring method for deep learning-based biomedical image segmentation,”Nature Methods, vol. 18, pp. 203–211, Feb. 2021

  54. [54]

    Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,

    M. J. Cardoso, M. Modat, R. Wolz, A. Melbourne, D. Cash, D. Rueckert, and S. Ourselin, “Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,”IEEE Transactions on Medical Imaging, vol. 34, pp. 1976–1988, Sep. 2015

  55. [55]

    The amsterdam open mri collec- tion, a set of multimodal mri datasets for individual difference analyses,

    L. Snoek, M. M. van der Miesen, T. Beemsterboer, A. van der Leij, A. Eigenhuis, and H. Steven Scholte, “The amsterdam open mri collec- tion, a set of multimodal mri datasets for individual difference analyses,” Scientific Data, vol. 8, p. 85, Mar. 2021

  56. [56]

    Deep learning approach for hyperspectral image demosaicking, spectral correction and high-resolution rgb reconstruc- tion,

    P. Li, M. Ebner, P. Noonan, C. Horgan, A. Bahl, S. Ourselin, J. Shapey, and T. Vercauteren, “Deep learning approach for hyperspectral image demosaicking, spectral correction and high-resolution rgb reconstruc- tion,”Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 10, pp. 409–417, Jul. 2022

  57. [57]

    A self-supervised and adversarial approach to hyperspectral demosaicking and rgb recon- struction in surgical imaging,

    P. Li, O. MacCormac, J. Shapey, and T. Vercauteren, “A self-supervised and adversarial approach to hyperspectral demosaicking and rgb recon- struction in surgical imaging,” in35th British Machine Vision Confer- ence 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024, 2024

  58. [58]

    Rapid and robust endoscopic content area estimation: A lean gpu-based pipeline and curated benchmark dataset,

    C. Budd, L. C. Garcia-Peraza Herrera, M. Huber, S. Ourselin, and T. Vercauteren, “Rapid and robust endoscopic content area estimation: A lean gpu-based pipeline and curated benchmark dataset,”Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 11, pp. 1215–1224, Jul. 2023

  59. [59]

    Efficientnet: Rethinking model scaling for con- volutional neural networks,

    M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con- volutional neural networks,” inProceedings of the 36th International Conference on Machine Learning, May 2019, pp. 6105–6114

  60. [60]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255

  61. [61]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in3rd International Conference on Learning Representations (ICLR), San Diego, 2015, Jan. 2017

  62. [62]

    Synthetic white balancing for intra-operative hyperspectral imaging,

    A. Bahl, C. C. Horgan, M. Janatka, O. J. MacCormac, P. Noonan, Y . Xie, J. Qiu, N. Cavalcanti, P. F ¨urnstahlet al., “Synthetic white balancing for intra-operative hyperspectral imaging,”Journal of Medical Imaging, vol. 10, p. 046001, Jul. 2023. J. DOEet al.: IEEE TMI TEMPLATE HACKS - V2 11 VII. APPENDIX Root Supra Tentorial Infra Tentorial WM CSF Supra B...