Label tree semantic losses for rich multi-class medical image segmentation

Aaron Kujawa; Jonathan Shapey; Junwen Wang; Oscar MacCormac; Tom Vercauteren; William Rochford

arxiv: 2507.15777 · v3 · pith:TRQ6Q2DTnew · submitted 2025-07-21 · 💻 cs.CV

Label tree semantic losses for rich multi-class medical image segmentation

Junwen Wang , Oscar MacCormac , William Rochford , Aaron Kujawa , Jonathan Shapey , Tom Vercauteren This is my paper

Pith reviewed 2026-05-22 13:16 UTC · model grok-4.3

classification 💻 cs.CV

keywords medical image segmentationsemantic losslabel hierarchytree-based losssparse annotationsbrain parcellationhyperspectral imaging

0 comments

The pith

Tree-based semantic losses that respect label hierarchies improve multi-class medical image segmentation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces two loss functions organized around a tree of labels so that mistakes between unrelated classes cost more than mistakes within the same branch. These losses are added to a recent training method that works with sparse annotations lacking background labels. Experiments cover full-supervision whole-brain parcellation on head MRI and sparse-annotation scene understanding on neurosurgical hyperspectral images. Results show steady gains over ordinary task-specific baselines, with the Wasserstein compound loss strongest in the MRI case and hierarchy-weighted supervision strongest in the sparse setting. A reader would care because ordinary losses ignore the natural semantic structure of rich clinical label sets and therefore waste capacity on implausible errors.

Core claim

Two tree-based semantic loss functions exploit a hierarchical label organization to weight errors by semantic distance; when combined with sparse-annotation training, they produce consistent accuracy gains over baselines on whole-brain parcellation and neurosurgical hyperspectral imaging, with the Wasserstein-based compound loss and hierarchy-weighted top-level supervision each proving most effective in their respective regimes.

What carries the argument

Tree-based semantic loss functions that penalize prediction errors according to distance within a pre-specified label hierarchy.

If this is right

The Wasserstein-based compound loss yields the largest gains on fully supervised whole-brain parcellation.
Hierarchy-weighted top-level supervision performs best when annotations are sparse and background-free in hyperspectral imaging.
The losses can be dropped into existing sparse-annotation frameworks without changing the underlying network architecture.
Rich multi-class medical segmentation becomes more practical once inter-class semantics are explicitly penalized.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tree-loss idea could be tested on other clinical tasks whose labels form natural taxonomies, such as multi-organ CT segmentation with substructures.
If the hierarchy could be inferred from data instead of supplied by experts, the method would require less manual curation.
Better segmentation of fine anatomical classes may directly improve downstream clinical metrics such as surgical navigation precision or post-operative outcome prediction.

Load-bearing premise

The label hierarchy is specified correctly in advance and its branch distances genuinely reflect semantic dissimilarity among the classes.

What would settle it

Replacing the given label tree with a random or flat hierarchy on the same two tasks and observing no remaining improvement over standard losses would show that the semantic structure itself is not responsible for the reported gains.

Figures

Figures reproduced from arXiv: 2507.15777 by Aaron Kujawa, Jonathan Shapey, Junwen Wang, Oscar MacCormac, Tom Vercauteren, William Rochford.

**Figure 2.** Figure 2: Visual comparison of the baseline loss and the proposed Wasserstein-based loss on the AOMIC dataset. Each column shows the predicted segmentation [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative result on top-level classes. We show the result of same image using different methods at confidence threshold [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Confusion matrices for the WBP and HSI tasks. For WBP, the evaluation is on 11 hard classes of the ANOMIC dataset. Class names from [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Rich and accurate medical image segmentation is poised to underpin the next generation of AI-defined clinical practice by delineating critical anatomy for pre-operative planning, guiding real-time intra-operative navigation, and supporting precise post-operative assessment. However, commonly used learning methods for medical and surgical imaging segmentation tasks penalise all errors equivalently and thus fail to exploit any inter-class semantics in the label space. This becomes particularly problematic as the cardinality and richness of labels increases to include subtly different classes. In this work, we propose two tree-based semantic loss functions which take advantage of a hierarchical organisation of the labels. We further incorporate our losses in a recently proposed approach for training with sparse, background-free annotations to extend the applicability of our proposed losses. Extensive experiments are reported on two medical and surgical imaging segmentation tasks, namely head MRI for whole brain parcellation with full supervision and neurosurgical hyperspectral imaging for scene understanding with sparse annotations. Results demonstrate consistent improvements over the evaluated task-specific baselines, with the strongest support for the Wasserstein-based compound loss in whole-brain parcellation and for hierarchy-weighted top-level supervision in the sparse HSI setting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Tree-based losses give some gains on medical tasks but the results hinge on an untested label hierarchy.

read the letter

The key point is that the authors introduce two tree-based losses that use a label hierarchy to make some misclassifications cost more than others, and they show these help in both fully supervised brain parcellation and sparsely annotated hyperspectral imaging. This is new in the medical context with the sparse annotation twist. Standard losses treat every wrong label equally, which wastes the structure when you have many similar classes. Their approach lets you encode that confusing two nearby structures is less bad than mixing up unrelated ones. They do a decent job testing on two different medical tasks, which adds some credibility. The Wasserstein version seems to stand out for the full supervision case. The soft spot is the lack of checks on the hierarchy itself. The method assumes the tree is correctly specified and that the distances match what matters clinically. Without ablations on different trees or a direct flat baseline comparison, it's not clear if the gains come from the semantics or just from the extra structure in the loss. The abstract mentions consistent improvements but skips details like variance or full ablation tables. The math looks like a reasonable extension of existing hierarchical ideas, and the citation pattern covers the relevant prior work without obvious gaps. This paper is for people working on rich multi-class segmentation in medicine or surgery who already have or can create a label tree. A reader could take the losses and apply them if their labels have natural groupings. It deserves a serious referee because the problem it targets is real and the proposed fix is straightforward to try. More experiments on hierarchy sensitivity would make it stronger, but the current version is worth reviewing. I would recommend sending it out for peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes two tree-based semantic loss functions that exploit a hierarchical organization of labels to penalize errors according to semantic distances rather than treating all misclassifications equally. These losses are integrated into a sparse, background-free annotation training framework and evaluated on two tasks: fully supervised whole-brain parcellation from head MRI and sparsely annotated neurosurgical hyperspectral imaging for scene understanding. The abstract reports consistent improvements over task-specific baselines, with the strongest support for the Wasserstein-based compound loss in the first task and hierarchy-weighted top-level supervision in the second.

Significance. If the empirical gains prove robust, the work offers a practical way to inject label semantics into segmentation training for rich medical label spaces, which could improve utility for pre-operative planning and intra-operative guidance. The combination with sparse-annotation methods broadens applicability. The paper supplies reproducible experimental protocols on two distinct clinical tasks and reports gains for specific loss variants, which are positive attributes.

major comments (2)

Abstract and experimental description: the central claim of consistent improvements attributable to semantic hierarchy is load-bearing, yet the manuscript provides no ablation on alternative hierarchies, no flat (non-hierarchical) baseline comparison, and no sensitivity checks on tree edge weights or scaling factors. Without these, it is unclear whether the reported gains stem from the semantic structure or from generic regularization effects.
Methods and results sections: the hierarchy is treated as a fixed, correctly specified input whose distances match clinical importance, but no quantitative validation or clinical-expert agreement study is described. If the tree does not align with task-relevant semantics (e.g., functional vs. gross anatomical grouping), the Wasserstein and hierarchy-weighted losses lose their claimed advantage.

minor comments (2)

Add error bars, statistical significance tests, and full ablation tables (including per-class metrics) to the experimental results so that robustness to post-hoc choices can be assessed.
Clarify the precise mathematical definitions of the two proposed losses (including any weighting schemes or distance metrics on the tree) with explicit equations and pseudocode for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and describe the revisions we will make to improve the clarity and robustness of the claims.

read point-by-point responses

Referee: Abstract and experimental description: the central claim of consistent improvements attributable to semantic hierarchy is load-bearing, yet the manuscript provides no ablation on alternative hierarchies, no flat (non-hierarchical) baseline comparison, and no sensitivity checks on tree edge weights or scaling factors. Without these, it is unclear whether the reported gains stem from the semantic structure or from generic regularization effects.

Authors: We agree that the absence of these controls leaves the source of the gains open to interpretation. In the revised manuscript we will add (i) a flat baseline that replaces the tree-based losses with standard cross-entropy, (ii) a sensitivity analysis sweeping the edge-weight and scaling hyperparameters, and (iii) a short discussion of the clinical rationale for the chosen hierarchies together with a note that systematic exploration of alternative trees remains future work. These additions will directly test whether the observed improvements are attributable to the semantic structure. revision: yes
Referee: Methods and results sections: the hierarchy is treated as a fixed, correctly specified input whose distances match clinical importance, but no quantitative validation or clinical-expert agreement study is described. If the tree does not align with task-relevant semantics (e.g., functional vs. gross anatomical grouping), the Wasserstein and hierarchy-weighted losses lose their claimed advantage.

Authors: The label trees were constructed from established anatomical and functional parcellation schemes reported in the neuroimaging and neurosurgical literature, as stated in the Methods section. We acknowledge that a formal inter-expert agreement study would provide stronger quantitative support. Because such a study would require new expert consultations and data collection outside the present scope, we will instead add a dedicated limitations paragraph that (a) details the literature sources used to define the hierarchies, (b) discusses the risk of misalignment with alternative clinical groupings, and (c) qualifies the claims accordingly. This revision will make the dependence on hierarchy specification explicit without overstating the current evidence. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical gains from proposed losses on pre-specified hierarchy

full rationale

The paper defines two new loss functions (Wasserstein-based and hierarchy-weighted) that operate on an externally provided label tree. These are incorporated into training and evaluated via experiments on whole-brain parcellation and hyperspectral imaging tasks, with reported improvements over task-specific baselines. No equation reduces a claimed prediction to a fitted parameter or input by construction. The hierarchy is treated as a fixed input rather than derived from the same data or equations. A citation to a prior sparse-annotation method exists but is not load-bearing for the central empirical claim and does not form a self-referential chain. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach depends on an externally provided label hierarchy and on the validity of the sparse annotation training framework it extends; no new entities are postulated.

free parameters (1)

tree edge weights or hierarchy scaling factors
Likely parameters that control how much semantic distance affects the loss; their values are not stated in the abstract.

axioms (1)

domain assumption A pre-defined hierarchical organization of the label space exists and reflects clinically meaningful semantic relationships.
Invoked when the losses are defined to exploit inter-class semantics.

pith-pipeline@v0.9.0 · 5740 in / 1239 out tokens · 36416 ms · 2026-05-22T13:16:04.545062+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose two tree-based semantic loss functions which take advantage of a hierarchical organisation of the labels... Wasserstein distance-based segmentation loss that penalises mis-classifications based on the path length between the predicted and ground-truth labels in the tree
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

label hierarchy... derived from pre-existing guidelines... DKT protocol

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

[1]

Imaging-based parcellations of the human brain,

S. B. Eickhoff, B. T. T. Yeo, and S. Genon, “Imaging-based parcellations of the human brain,”Nature Reviews Neuroscience, vol. 19, pp. 672– 686, Nov. 2018

work page 2018
[2]

Intraoperative multispectral and hyperspectral label-free imaging: A systematic review of in vivo clinical studies,

J. Shapey, Y . Xie, E. Nabavi, R. Bradford, S. R. Saeed, S. Ourselin, and T. Vercauteren, “Intraoperative multispectral and hyperspectral label-free imaging: A systematic review of in vivo clinical studies,”Journal of Biophotonics, vol. 12, p. e201800455, 2019

work page 2019
[3]

Robust deep learning-based semantic organ segmentation in hyperspectral images,

S. Seidlitz, J. Sellner, J. Odenthal, B. ¨Ozdemir, A. Studier-Fischer, S. Kn ¨odler, L. Ayala, T. J. Adler, H. G. Kenngottet al., “Robust deep learning-based semantic organ segmentation in hyperspectral images,” Medical Image Analysis, vol. 80, p. 102488, Aug. 2022

work page 2022
[4]

Hyperspectral image segmentation: A preliminary study on the oral and dental spectral image database (odsi-db),

L. C. Garcia Peraza Herrera, C. Horgan, S. Ourselin, M. Ebner, and T. Vercauteren, “Hyperspectral image segmentation: A preliminary study on the oral and dental spectral image database (odsi-db),”Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 11, pp. 1290–1298, Jul. 2023

work page 2023
[5]

Ma- chine learning performance trends: A comparative study of independent hyperspectral human brain cancer databases,

A. Mart ´ın-P´erez, B. Martinez-Vega, M. Villa, R. Leon, A. Martinez de Ternero, H. Fabelo, S. Ortega, E. Quevedo, G. M. Callicoet al., “Ma- chine learning performance trends: A comparative study of independent hyperspectral human brain cancer databases,” Rochester, NY , Aug. 2024

work page 2024
[6]

Learning with a wasserstein loss,

C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggio, “Learning with a wasserstein loss,” inAdvances in Neural Information Processing Systems, vol. 28, 2015

work page 2015
[7]

Tree-sliced variants of wasserstein distances,

T. Le, M. Yamada, K. Fukumizu, and M. Cuturi, “Tree-sliced variants of wasserstein distances,” inAdvances in Neural Information Processing Systems, vol. 32, 2019

work page 2019
[8]

Making better mistakes: Leveraging class hierarchies with deep networks,

L. Bertinetto, R. Mueller, K. Tertikas, S. Samangooei, and N. A. Lord, “Making better mistakes: Leveraging class hierarchies with deep networks,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, W A, USA, Jun. 2020, pp. 12 503– 12 512

work page 2020
[9]

Heiporspectral - the heidelberg porcine hyperspectral imaging dataset of 20 physiological organs,

A. Studier-Fischer, S. Seidlitz, J. Sellner, M. Bressan, B. ¨Ozdemir, L. Ay- ala, J. Odenthal, S. Knoedler, K.-F. Kowalewskiet al., “Heiporspectral - the heidelberg porcine hyperspectral imaging dataset of 20 physiological organs,”Scientific Data, vol. 10, p. 414, Jun. 2023

work page 2023
[10]

Oral and dental spectral image database—odsi-db,

J. Hyttinen, P. F ¨alt, H. J ¨asberg, A. Kullaa, and M. Hauta-Kasari, “Oral and dental spectral image database—odsi-db,”Applied Sciences, vol. 10, p. 7246, Jan. 2020

work page 2020
[11]

The dresden surgical anatomy dataset for abdominal organ segmentation in surgical data science,

M. Carstens, F. M. Rinner, S. Bodenstedt, A. C. Jenke, J. Weitz, M. Distler, S. Speidel, and F. R. Kolbinger, “The dresden surgical anatomy dataset for abdominal organ segmentation in surgical data science,”Scientific Data, vol. 10, p. 3, Jan. 2023

work page 2023
[12]

Ood- seg: Out-of-distribution detection for image segmentation with sparse multi-class positive-only annotations,

J. Wang, Z. Wang, O. MacCormac, J. Shapey, and T. Vercauteren, “Ood- seg: Out-of-distribution detection for image segmentation with sparse multi-class positive-only annotations,”arXiv, 2024

work page 2024
[13]

101 labeled brain images and a consistent human cortical labeling protocol,

A. Klein and J. Tourville, “101 labeled brain images and a consistent human cortical labeling protocol,”Frontiers in Neuroscience, vol. 6, Dec. 2012

work page 2012
[14]

Tree-based semantic losses: Application to sparsely- supervised large multi-class hyperspectral segmentation,

J. Wang, O. Maccormac, W. Rochford, A. Kujawa, J. Shapey, and T. Vercauteren, “Tree-based semantic losses: Application to sparsely- supervised large multi-class hyperspectral segmentation,” Jun. 2025

work page 2025
[15]

Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain,

B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove, A. van der Kouwe, R. Killiany, D. Kennedyet al., “Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain,”Neuron, vol. 33, pp. 341–355, Jan. 2002

work page 2002
[16]

Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,

M. J. Cardoso, M. Modat, R. Wolz, A. Melbourne, D. Cash, D. Rueckert, and S. Ourselin, “Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,”IEEE transactions on medical imaging, vol. 34, pp. 1976–1988, Sep. 2015

work page 1976
[17]

Fastsurfer - a fast and accurate deep learning based neuroimaging pipeline,

L. Henschel, S. Conjeti, S. Estrada, K. Diers, B. Fischl, and M. Reuter, “Fastsurfer - a fast and accurate deep learning based neuroimaging pipeline,”NeuroImage, vol. 219, p. 117012, Oct. 2020

work page 2020
[18]

Quicknat: A fully convolutional network for quick and accurate segmentation of neuroanatomy,

A. Guha Roy, S. Conjeti, N. Navab, and C. Wachinger, “Quicknat: A fully convolutional network for quick and accurate segmentation of neuroanatomy,”NeuroImage, vol. 186, pp. 713–727, Feb. 2019. J. DOEet al.: IEEE TMI TEMPLATE HACKS - V2 9 sRGB Sparselyannotatedground truth LMℓ twce+seg(τ0)L Mℓ twce+seg LMt twce+seg LMt wass+seg LMh wass+seg Other Out-of-fo...

work page 2019
[19]

Error corrective boosting for learning fully convolutional networks with limited data,

A. G. Roy, S. Conjeti, D. Sheet, A. Katouzian, N. Navab, and C. Wachinger, “Error corrective boosting for learning fully convolutional networks with limited data,” inMedical Image Computing and Computer Assisted Intervention - MICCAI 2017, Cham, 2017, pp. 231–239

work page 2017
[20]

Label merge- and-split: A graph-colouring approach for memory-efficient brain par- cellation,

A. Kujawa, R. Dorent, S. Ourselin, and T. Vercauteren, “Label merge- and-split: A graph-colouring approach for memory-efficient brain par- cellation,” inMedical Image Computing and Computer Assisted Inter- vention – MICCAI 2024, Cham, 2024, pp. 350–360

work page 2024
[21]

Hierarchical brain parcellation with uncertainty,

M. S. Graham, C. H. Sudre, T. Varsavsky, P.-D. Tudosiu, P. Nachev, S. Ourselin, and M. J. Cardoso, “Hierarchical brain parcellation with uncertainty,” inUncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Graphs in Biomedical Image Analysis, Cham, 2020, pp. 23–31

work page 2020
[22]

Manifold embedding and semantic segmentation for intraoperative guidance with hyperspec- tral brain imaging,

D. Rav `ı, H. Fabelo, G. M. Callic, and G.-Z. Yang, “Manifold embedding and semantic segmentation for intraoperative guidance with hyperspec- tral brain imaging,”IEEE Transactions on Medical Imaging, vol. 36, pp. 1845–1857, Sep. 2017

work page 2017
[23]

Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations,

H. Fabelo, S. Ortega, D. Ravi, B. R. Kiran, C. Sosa, D. Bulters, G. M. Callic´o, H. Bulstrode, A. Szolnaet al., “Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations,”PLOS ONE, vol. 13, p. e0193721, Mar. 2018

work page 2018
[24]

Uncertainty-aware organ classification for surgical data science applications in laparoscopy,

S. Moccia, S. J. Wirkert, H. Kenngott, A. S. Vemuri, M. Apitz, B. Mayer, E. De Momi, L. S. Mattos, and L. Maier-Hein, “Uncertainty-aware organ classification for surgical data science applications in laparoscopy,”IEEE Transactions on Biomedical Engineering, vol. 65, pp. 2649–2659, Nov. 2018

work page 2018
[25]

Trends in deep learning for medical hyperspectral image analysis,

U. Khan, S. Paheding, C. P. Elkin, and V . K. Devabhaktuni, “Trends in deep learning for medical hyperspectral image analysis,”IEEE Access, vol. 9, pp. 79 534–79 548, 2021

work page 2021
[26]

Tongue tumor detection in hyperspectral images using deep learning semantic segmentation,

S. Trajanovski, C. Shan, P. J. C. Weijtmans, S. G. B. de Koning, and T. J. M. Ruers, “Tongue tumor detection in hyperspectral images using deep learning semantic segmentation,”IEEE Transactions on Biomedical Engineering, vol. 68, pp. 1330–1340, Apr. 2021

work page 2021
[27]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Cham, 2015, pp. 234– 241

work page 2015
[28]

The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation,

S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y . Bengio, “The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 11–19

work page 2017
[29]

What does classifying more than 10,000 image categories tell us?

J. Deng, A. C. Berg, K. Li, and L. Fei-Fei, “What does classifying more than 10,000 image categories tell us?” inComputer Vision – ECCV 2010, Berlin, Heidelberg, 2010, pp. 71–84

work page 2010
[30]

Large-scale category structure aware image categorization,

B. Zhao, F. Li, and E. Xing, “Large-scale category structure aware image categorization,” inAdvances in Neural Information Processing Systems, 10 SUBMITTED TO PREPRINT ARCHIVE vol. 24, 2011

work page 2011
[31]

Learning hierarchical similarity metrics,

N. Verma, D. Mahajan, S. Sellamanickam, and V . Nair, “Learning hierarchical similarity metrics,” in2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 2280–2287

work page 2012
[32]

Generalised wasserstein dice score for imbalanced multi-class segmentation using holistic convolutional networks,

L. Fidon, W. Li, L. C. Garcia-Peraza-Herrera, J. Ekanayake, N. Kitchen, S. Ourselin, and T. Vercauteren, “Generalised wasserstein dice score for imbalanced multi-class segmentation using holistic convolutional networks,” in9th International MICCAI Brainlesion Workshop, Cham, 2018, pp. 64–76

work page 2018
[33]

The multimodal brain tumor image segmentation benchmark (brats),

B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y . Burren, N. Porz, J. Slotboomet al., “The multimodal brain tumor image segmentation benchmark (brats),”IEEE Transactions on Medical Imaging, vol. 34, pp. 1993–2024, Oct. 2015

work page 1993
[34]

3d u-net: Learning dense volumetric segmentation from sparse annota- tion,

¨O. C ¸ ic ¸ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: Learning dense volumetric segmentation from sparse annota- tion,” inMedical Image Computing and Computer-Assisted Intervention – MICCAI 2016, Cham, 2016, pp. 424–432

work page 2016
[35]

Weakly supervised learning for multi- class medical image segmentation via feature decomposition,

Z. Kuang, Z. Yan, and L. Yu, “Weakly supervised learning for multi- class medical image segmentation via feature decomposition,”Comput- ers in Biology and Medicine, vol. 171, p. 108228, Mar. 2024

work page 2024
[36]

Bounding box tightness prior for weakly super- vised image segmentation,

J. Wang and B. Xia, “Bounding box tightness prior for weakly super- vised image segmentation,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2021, Cham, 2021, pp. 526–536

work page 2021
[37]

Interactive medical image segmentation using deep learning with image-specific fine tuning,

G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprestet al., “Interactive medical image segmentation using deep learning with image-specific fine tuning,”IEEE Transactions on Medical Imaging, vol. 37, pp. 1562–1573, Jul. 2018

work page 2018
[38]

Deepigeos: A deep interactive geodesic framework for medical image segmentation,

G. Wang, M. A. Zuluaga, W. Li, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprestet al., “Deepigeos: A deep interactive geodesic framework for medical image segmentation,”IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 41, pp. 1559– 1572, Jul. 2019

work page 2019
[39]

Vertebrae localization in pathological spine ct via dense classification from sparse annotations,

B. Glocker, D. Zikic, E. Konukoglu, D. R. Haynor, and A. Criminisi, “Vertebrae localization in pathological spine ct via dense classification from sparse annotations,” inMedical Image Computing and Computer- Assisted Intervention – MICCAI 2013, Berlin, Heidelberg, 2013, pp. 262–270

work page 2013
[40]

Inter extreme points geodesics for end-to-end weakly supervised image segmentation,

R. Dorent, S. Joutard, J. Shapey, A. Kujawa, M. Modat, S. Ourselin, and T. Vercauteren, “Inter extreme points geodesics for end-to-end weakly supervised image segmentation,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2021, Cham, 2021, pp. 615–624

work page 2021
[41]

3d medical image segmentation with sparse annotation via cross-teaching between 3d and 2d networks,

H. Cai, L. Qi, Q. Yu, Y . Shi, and Y . Gao, “3d medical image segmentation with sparse annotation via cross-teaching between 3d and 2d networks,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2023: 26th International Conference, Berlin, Heidelberg, Aug. 2023, pp. 614–624

work page 2023
[42]

Weakly supervised histopathology cancer image segmentation and classification,

Y . Xu, J.-Y . Zhu, E. I.-C. Chang, M. Lai, and Z. Tu, “Weakly supervised histopathology cancer image segmentation and classification,”Medical Image Analysis, vol. 18, pp. 591–604, Apr. 2014

work page 2014
[43]

A baseline for detecting misclassified and out-of-distribution examples in neural networks,

D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” inInternational Conference on Learning Representations, Feb. 2017, pp. 1–12

work page 2017
[44]

Enhancing the reliability of out- of-distribution image detection in neural networks,

S. Liang, Y . Li, and R. Srikant, “Enhancing the reliability of out- of-distribution image detection in neural networks,” inInternational Conference on Learning Representations, Feb. 2018, pp. 1–12

work page 2018
[45]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks,

K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” in Advances in Neural Information Processing Systems, vol. 31, 2018, pp. 1–12

work page 2018
[46]

Generalized odin: Detect- ing out-of-distribution image without learning from out-of-distribution data,

Y .-C. Hsu, Y . Shen, H. Jin, and Z. Kira, “Generalized odin: Detect- ing out-of-distribution image without learning from out-of-distribution data,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp. 10 948–10 957

work page 2020
[47]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inProceedings of the 34th International Conference on Machine Learning, Jul. 2017, pp. 1321–1330

work page 2017
[48]

Trustwor- thy clinical ai solutions: A unified review of uncertainty quantification in deep learning models for medical image analysis,

B. Lambert, F. Forbes, S. Doyle, H. Dehaene, and M. Dojat, “Trustwor- thy clinical ai solutions: A unified review of uncertainty quantification in deep learning models for medical image analysis,”Artificial Intelligence in Medicine, vol. 150, p. 102830, Apr. 2024

work page 2024
[49]

Anomalous example detection in deep learning: A survey,

S. Bulusu, B. Kailkhura, B. Li, P. K. Varshney, and D. Song, “Anomalous example detection in deep learning: A survey,”IEEE Access, vol. 8, pp. 132 330–132 347, 2020

work page 2020
[50]

Improving calibration and out-of- distribution detection in deep models for medical image segmentation,

D. Karimi and A. Gholipour, “Improving calibration and out-of- distribution detection in deep models for medical image segmentation,” IEEE Transactions on Artificial Intelligence, vol. 4, pp. 383–397, Apr. 2023

work page 2023
[51]

Distance-based de- tection of out-of-distribution silent failures for covid-19 lung lesion segmentation,

C. Gonz ´alez, K. Gotkowski, M. Fuchs, A. Bucher, A. Dadras, R. Fis- chbach, I. J. Kaltenborn, and A. Mukhopadhyay, “Distance-based de- tection of out-of-distribution silent failures for covid-19 lung lesion segmentation,”Medical Image Analysis, vol. 82, p. 102596, Nov. 2022

work page 2022
[52]

Loss odyssey in medical image segmentation,

J. Ma, J. Chen, M. Ng, R. Huang, Y . Li, C. Li, X. Yang, and A. L. Martel, “Loss odyssey in medical image segmentation,”Medical Image Analysis, vol. 71, p. 102035, Jul. 2021

work page 2021
[53]

nnu-net: A self-configuring method for deep learning-based biomedical image segmentation,

F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: A self-configuring method for deep learning-based biomedical image segmentation,”Nature Methods, vol. 18, pp. 203–211, Feb. 2021

work page 2021
[54]

Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,

M. J. Cardoso, M. Modat, R. Wolz, A. Melbourne, D. Cash, D. Rueckert, and S. Ourselin, “Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,”IEEE Transactions on Medical Imaging, vol. 34, pp. 1976–1988, Sep. 2015

work page 1976
[55]

The amsterdam open mri collec- tion, a set of multimodal mri datasets for individual difference analyses,

L. Snoek, M. M. van der Miesen, T. Beemsterboer, A. van der Leij, A. Eigenhuis, and H. Steven Scholte, “The amsterdam open mri collec- tion, a set of multimodal mri datasets for individual difference analyses,” Scientific Data, vol. 8, p. 85, Mar. 2021

work page 2021
[56]

Deep learning approach for hyperspectral image demosaicking, spectral correction and high-resolution rgb reconstruc- tion,

P. Li, M. Ebner, P. Noonan, C. Horgan, A. Bahl, S. Ourselin, J. Shapey, and T. Vercauteren, “Deep learning approach for hyperspectral image demosaicking, spectral correction and high-resolution rgb reconstruc- tion,”Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 10, pp. 409–417, Jul. 2022

work page 2022
[57]

A self-supervised and adversarial approach to hyperspectral demosaicking and rgb recon- struction in surgical imaging,

P. Li, O. MacCormac, J. Shapey, and T. Vercauteren, “A self-supervised and adversarial approach to hyperspectral demosaicking and rgb recon- struction in surgical imaging,” in35th British Machine Vision Confer- ence 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024, 2024

work page 2024
[58]

Rapid and robust endoscopic content area estimation: A lean gpu-based pipeline and curated benchmark dataset,

C. Budd, L. C. Garcia-Peraza Herrera, M. Huber, S. Ourselin, and T. Vercauteren, “Rapid and robust endoscopic content area estimation: A lean gpu-based pipeline and curated benchmark dataset,”Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 11, pp. 1215–1224, Jul. 2023

work page 2023
[59]

Efficientnet: Rethinking model scaling for con- volutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con- volutional neural networks,” inProceedings of the 36th International Conference on Machine Learning, May 2019, pp. 6105–6114

work page 2019
[60]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255

work page 2009
[61]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in3rd International Conference on Learning Representations (ICLR), San Diego, 2015, Jan. 2017

work page 2015
[62]

Synthetic white balancing for intra-operative hyperspectral imaging,

A. Bahl, C. C. Horgan, M. Janatka, O. J. MacCormac, P. Noonan, Y . Xie, J. Qiu, N. Cavalcanti, P. F ¨urnstahlet al., “Synthetic white balancing for intra-operative hyperspectral imaging,”Journal of Medical Imaging, vol. 10, p. 046001, Jul. 2023. J. DOEet al.: IEEE TMI TEMPLATE HACKS - V2 11 VII. APPENDIX Root Supra Tentorial Infra Tentorial WM CSF Supra B...

work page 2023

[1] [1]

Imaging-based parcellations of the human brain,

S. B. Eickhoff, B. T. T. Yeo, and S. Genon, “Imaging-based parcellations of the human brain,”Nature Reviews Neuroscience, vol. 19, pp. 672– 686, Nov. 2018

work page 2018

[2] [2]

Intraoperative multispectral and hyperspectral label-free imaging: A systematic review of in vivo clinical studies,

J. Shapey, Y . Xie, E. Nabavi, R. Bradford, S. R. Saeed, S. Ourselin, and T. Vercauteren, “Intraoperative multispectral and hyperspectral label-free imaging: A systematic review of in vivo clinical studies,”Journal of Biophotonics, vol. 12, p. e201800455, 2019

work page 2019

[3] [3]

Robust deep learning-based semantic organ segmentation in hyperspectral images,

S. Seidlitz, J. Sellner, J. Odenthal, B. ¨Ozdemir, A. Studier-Fischer, S. Kn ¨odler, L. Ayala, T. J. Adler, H. G. Kenngottet al., “Robust deep learning-based semantic organ segmentation in hyperspectral images,” Medical Image Analysis, vol. 80, p. 102488, Aug. 2022

work page 2022

[4] [4]

Hyperspectral image segmentation: A preliminary study on the oral and dental spectral image database (odsi-db),

L. C. Garcia Peraza Herrera, C. Horgan, S. Ourselin, M. Ebner, and T. Vercauteren, “Hyperspectral image segmentation: A preliminary study on the oral and dental spectral image database (odsi-db),”Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 11, pp. 1290–1298, Jul. 2023

work page 2023

[5] [5]

Ma- chine learning performance trends: A comparative study of independent hyperspectral human brain cancer databases,

A. Mart ´ın-P´erez, B. Martinez-Vega, M. Villa, R. Leon, A. Martinez de Ternero, H. Fabelo, S. Ortega, E. Quevedo, G. M. Callicoet al., “Ma- chine learning performance trends: A comparative study of independent hyperspectral human brain cancer databases,” Rochester, NY , Aug. 2024

work page 2024

[6] [6]

Learning with a wasserstein loss,

C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggio, “Learning with a wasserstein loss,” inAdvances in Neural Information Processing Systems, vol. 28, 2015

work page 2015

[7] [7]

Tree-sliced variants of wasserstein distances,

T. Le, M. Yamada, K. Fukumizu, and M. Cuturi, “Tree-sliced variants of wasserstein distances,” inAdvances in Neural Information Processing Systems, vol. 32, 2019

work page 2019

[8] [8]

Making better mistakes: Leveraging class hierarchies with deep networks,

L. Bertinetto, R. Mueller, K. Tertikas, S. Samangooei, and N. A. Lord, “Making better mistakes: Leveraging class hierarchies with deep networks,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, W A, USA, Jun. 2020, pp. 12 503– 12 512

work page 2020

[9] [9]

Heiporspectral - the heidelberg porcine hyperspectral imaging dataset of 20 physiological organs,

A. Studier-Fischer, S. Seidlitz, J. Sellner, M. Bressan, B. ¨Ozdemir, L. Ay- ala, J. Odenthal, S. Knoedler, K.-F. Kowalewskiet al., “Heiporspectral - the heidelberg porcine hyperspectral imaging dataset of 20 physiological organs,”Scientific Data, vol. 10, p. 414, Jun. 2023

work page 2023

[10] [10]

Oral and dental spectral image database—odsi-db,

J. Hyttinen, P. F ¨alt, H. J ¨asberg, A. Kullaa, and M. Hauta-Kasari, “Oral and dental spectral image database—odsi-db,”Applied Sciences, vol. 10, p. 7246, Jan. 2020

work page 2020

[11] [11]

The dresden surgical anatomy dataset for abdominal organ segmentation in surgical data science,

M. Carstens, F. M. Rinner, S. Bodenstedt, A. C. Jenke, J. Weitz, M. Distler, S. Speidel, and F. R. Kolbinger, “The dresden surgical anatomy dataset for abdominal organ segmentation in surgical data science,”Scientific Data, vol. 10, p. 3, Jan. 2023

work page 2023

[12] [12]

Ood- seg: Out-of-distribution detection for image segmentation with sparse multi-class positive-only annotations,

J. Wang, Z. Wang, O. MacCormac, J. Shapey, and T. Vercauteren, “Ood- seg: Out-of-distribution detection for image segmentation with sparse multi-class positive-only annotations,”arXiv, 2024

work page 2024

[13] [13]

101 labeled brain images and a consistent human cortical labeling protocol,

A. Klein and J. Tourville, “101 labeled brain images and a consistent human cortical labeling protocol,”Frontiers in Neuroscience, vol. 6, Dec. 2012

work page 2012

[14] [14]

Tree-based semantic losses: Application to sparsely- supervised large multi-class hyperspectral segmentation,

J. Wang, O. Maccormac, W. Rochford, A. Kujawa, J. Shapey, and T. Vercauteren, “Tree-based semantic losses: Application to sparsely- supervised large multi-class hyperspectral segmentation,” Jun. 2025

work page 2025

[15] [15]

Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain,

B. Fischl, D. H. Salat, E. Busa, M. Albert, M. Dieterich, C. Haselgrove, A. van der Kouwe, R. Killiany, D. Kennedyet al., “Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain,”Neuron, vol. 33, pp. 341–355, Jan. 2002

work page 2002

[16] [16]

Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,

M. J. Cardoso, M. Modat, R. Wolz, A. Melbourne, D. Cash, D. Rueckert, and S. Ourselin, “Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,”IEEE transactions on medical imaging, vol. 34, pp. 1976–1988, Sep. 2015

work page 1976

[17] [17]

Fastsurfer - a fast and accurate deep learning based neuroimaging pipeline,

L. Henschel, S. Conjeti, S. Estrada, K. Diers, B. Fischl, and M. Reuter, “Fastsurfer - a fast and accurate deep learning based neuroimaging pipeline,”NeuroImage, vol. 219, p. 117012, Oct. 2020

work page 2020

[18] [18]

Quicknat: A fully convolutional network for quick and accurate segmentation of neuroanatomy,

A. Guha Roy, S. Conjeti, N. Navab, and C. Wachinger, “Quicknat: A fully convolutional network for quick and accurate segmentation of neuroanatomy,”NeuroImage, vol. 186, pp. 713–727, Feb. 2019. J. DOEet al.: IEEE TMI TEMPLATE HACKS - V2 9 sRGB Sparselyannotatedground truth LMℓ twce+seg(τ0)L Mℓ twce+seg LMt twce+seg LMt wass+seg LMh wass+seg Other Out-of-fo...

work page 2019

[19] [19]

Error corrective boosting for learning fully convolutional networks with limited data,

A. G. Roy, S. Conjeti, D. Sheet, A. Katouzian, N. Navab, and C. Wachinger, “Error corrective boosting for learning fully convolutional networks with limited data,” inMedical Image Computing and Computer Assisted Intervention - MICCAI 2017, Cham, 2017, pp. 231–239

work page 2017

[20] [20]

Label merge- and-split: A graph-colouring approach for memory-efficient brain par- cellation,

A. Kujawa, R. Dorent, S. Ourselin, and T. Vercauteren, “Label merge- and-split: A graph-colouring approach for memory-efficient brain par- cellation,” inMedical Image Computing and Computer Assisted Inter- vention – MICCAI 2024, Cham, 2024, pp. 350–360

work page 2024

[21] [21]

Hierarchical brain parcellation with uncertainty,

M. S. Graham, C. H. Sudre, T. Varsavsky, P.-D. Tudosiu, P. Nachev, S. Ourselin, and M. J. Cardoso, “Hierarchical brain parcellation with uncertainty,” inUncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Graphs in Biomedical Image Analysis, Cham, 2020, pp. 23–31

work page 2020

[22] [22]

Manifold embedding and semantic segmentation for intraoperative guidance with hyperspec- tral brain imaging,

D. Rav `ı, H. Fabelo, G. M. Callic, and G.-Z. Yang, “Manifold embedding and semantic segmentation for intraoperative guidance with hyperspec- tral brain imaging,”IEEE Transactions on Medical Imaging, vol. 36, pp. 1845–1857, Sep. 2017

work page 2017

[23] [23]

Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations,

H. Fabelo, S. Ortega, D. Ravi, B. R. Kiran, C. Sosa, D. Bulters, G. M. Callic´o, H. Bulstrode, A. Szolnaet al., “Spatio-spectral classification of hyperspectral images for brain cancer detection during surgical operations,”PLOS ONE, vol. 13, p. e0193721, Mar. 2018

work page 2018

[24] [24]

Uncertainty-aware organ classification for surgical data science applications in laparoscopy,

S. Moccia, S. J. Wirkert, H. Kenngott, A. S. Vemuri, M. Apitz, B. Mayer, E. De Momi, L. S. Mattos, and L. Maier-Hein, “Uncertainty-aware organ classification for surgical data science applications in laparoscopy,”IEEE Transactions on Biomedical Engineering, vol. 65, pp. 2649–2659, Nov. 2018

work page 2018

[25] [25]

Trends in deep learning for medical hyperspectral image analysis,

U. Khan, S. Paheding, C. P. Elkin, and V . K. Devabhaktuni, “Trends in deep learning for medical hyperspectral image analysis,”IEEE Access, vol. 9, pp. 79 534–79 548, 2021

work page 2021

[26] [26]

Tongue tumor detection in hyperspectral images using deep learning semantic segmentation,

S. Trajanovski, C. Shan, P. J. C. Weijtmans, S. G. B. de Koning, and T. J. M. Ruers, “Tongue tumor detection in hyperspectral images using deep learning semantic segmentation,”IEEE Transactions on Biomedical Engineering, vol. 68, pp. 1330–1340, Apr. 2021

work page 2021

[27] [27]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Cham, 2015, pp. 234– 241

work page 2015

[28] [28]

The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation,

S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y . Bengio, “The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 11–19

work page 2017

[29] [29]

What does classifying more than 10,000 image categories tell us?

J. Deng, A. C. Berg, K. Li, and L. Fei-Fei, “What does classifying more than 10,000 image categories tell us?” inComputer Vision – ECCV 2010, Berlin, Heidelberg, 2010, pp. 71–84

work page 2010

[30] [30]

Large-scale category structure aware image categorization,

B. Zhao, F. Li, and E. Xing, “Large-scale category structure aware image categorization,” inAdvances in Neural Information Processing Systems, 10 SUBMITTED TO PREPRINT ARCHIVE vol. 24, 2011

work page 2011

[31] [31]

Learning hierarchical similarity metrics,

N. Verma, D. Mahajan, S. Sellamanickam, and V . Nair, “Learning hierarchical similarity metrics,” in2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp. 2280–2287

work page 2012

[32] [32]

Generalised wasserstein dice score for imbalanced multi-class segmentation using holistic convolutional networks,

L. Fidon, W. Li, L. C. Garcia-Peraza-Herrera, J. Ekanayake, N. Kitchen, S. Ourselin, and T. Vercauteren, “Generalised wasserstein dice score for imbalanced multi-class segmentation using holistic convolutional networks,” in9th International MICCAI Brainlesion Workshop, Cham, 2018, pp. 64–76

work page 2018

[33] [33]

The multimodal brain tumor image segmentation benchmark (brats),

B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y . Burren, N. Porz, J. Slotboomet al., “The multimodal brain tumor image segmentation benchmark (brats),”IEEE Transactions on Medical Imaging, vol. 34, pp. 1993–2024, Oct. 2015

work page 1993

[34] [34]

3d u-net: Learning dense volumetric segmentation from sparse annota- tion,

¨O. C ¸ ic ¸ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: Learning dense volumetric segmentation from sparse annota- tion,” inMedical Image Computing and Computer-Assisted Intervention – MICCAI 2016, Cham, 2016, pp. 424–432

work page 2016

[35] [35]

Weakly supervised learning for multi- class medical image segmentation via feature decomposition,

Z. Kuang, Z. Yan, and L. Yu, “Weakly supervised learning for multi- class medical image segmentation via feature decomposition,”Comput- ers in Biology and Medicine, vol. 171, p. 108228, Mar. 2024

work page 2024

[36] [36]

Bounding box tightness prior for weakly super- vised image segmentation,

J. Wang and B. Xia, “Bounding box tightness prior for weakly super- vised image segmentation,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2021, Cham, 2021, pp. 526–536

work page 2021

[37] [37]

Interactive medical image segmentation using deep learning with image-specific fine tuning,

G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprestet al., “Interactive medical image segmentation using deep learning with image-specific fine tuning,”IEEE Transactions on Medical Imaging, vol. 37, pp. 1562–1573, Jul. 2018

work page 2018

[38] [38]

Deepigeos: A deep interactive geodesic framework for medical image segmentation,

G. Wang, M. A. Zuluaga, W. Li, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprestet al., “Deepigeos: A deep interactive geodesic framework for medical image segmentation,”IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 41, pp. 1559– 1572, Jul. 2019

work page 2019

[39] [39]

Vertebrae localization in pathological spine ct via dense classification from sparse annotations,

B. Glocker, D. Zikic, E. Konukoglu, D. R. Haynor, and A. Criminisi, “Vertebrae localization in pathological spine ct via dense classification from sparse annotations,” inMedical Image Computing and Computer- Assisted Intervention – MICCAI 2013, Berlin, Heidelberg, 2013, pp. 262–270

work page 2013

[40] [40]

Inter extreme points geodesics for end-to-end weakly supervised image segmentation,

R. Dorent, S. Joutard, J. Shapey, A. Kujawa, M. Modat, S. Ourselin, and T. Vercauteren, “Inter extreme points geodesics for end-to-end weakly supervised image segmentation,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2021, Cham, 2021, pp. 615–624

work page 2021

[41] [41]

3d medical image segmentation with sparse annotation via cross-teaching between 3d and 2d networks,

H. Cai, L. Qi, Q. Yu, Y . Shi, and Y . Gao, “3d medical image segmentation with sparse annotation via cross-teaching between 3d and 2d networks,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2023: 26th International Conference, Berlin, Heidelberg, Aug. 2023, pp. 614–624

work page 2023

[42] [42]

Weakly supervised histopathology cancer image segmentation and classification,

Y . Xu, J.-Y . Zhu, E. I.-C. Chang, M. Lai, and Z. Tu, “Weakly supervised histopathology cancer image segmentation and classification,”Medical Image Analysis, vol. 18, pp. 591–604, Apr. 2014

work page 2014

[43] [43]

A baseline for detecting misclassified and out-of-distribution examples in neural networks,

D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” inInternational Conference on Learning Representations, Feb. 2017, pp. 1–12

work page 2017

[44] [44]

Enhancing the reliability of out- of-distribution image detection in neural networks,

S. Liang, Y . Li, and R. Srikant, “Enhancing the reliability of out- of-distribution image detection in neural networks,” inInternational Conference on Learning Representations, Feb. 2018, pp. 1–12

work page 2018

[45] [45]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks,

K. Lee, K. Lee, H. Lee, and J. Shin, “A simple unified framework for detecting out-of-distribution samples and adversarial attacks,” in Advances in Neural Information Processing Systems, vol. 31, 2018, pp. 1–12

work page 2018

[46] [46]

Generalized odin: Detect- ing out-of-distribution image without learning from out-of-distribution data,

Y .-C. Hsu, Y . Shen, H. Jin, and Z. Kira, “Generalized odin: Detect- ing out-of-distribution image without learning from out-of-distribution data,” in2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp. 10 948–10 957

work page 2020

[47] [47]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inProceedings of the 34th International Conference on Machine Learning, Jul. 2017, pp. 1321–1330

work page 2017

[48] [48]

Trustwor- thy clinical ai solutions: A unified review of uncertainty quantification in deep learning models for medical image analysis,

B. Lambert, F. Forbes, S. Doyle, H. Dehaene, and M. Dojat, “Trustwor- thy clinical ai solutions: A unified review of uncertainty quantification in deep learning models for medical image analysis,”Artificial Intelligence in Medicine, vol. 150, p. 102830, Apr. 2024

work page 2024

[49] [49]

Anomalous example detection in deep learning: A survey,

S. Bulusu, B. Kailkhura, B. Li, P. K. Varshney, and D. Song, “Anomalous example detection in deep learning: A survey,”IEEE Access, vol. 8, pp. 132 330–132 347, 2020

work page 2020

[50] [50]

Improving calibration and out-of- distribution detection in deep models for medical image segmentation,

D. Karimi and A. Gholipour, “Improving calibration and out-of- distribution detection in deep models for medical image segmentation,” IEEE Transactions on Artificial Intelligence, vol. 4, pp. 383–397, Apr. 2023

work page 2023

[51] [51]

Distance-based de- tection of out-of-distribution silent failures for covid-19 lung lesion segmentation,

C. Gonz ´alez, K. Gotkowski, M. Fuchs, A. Bucher, A. Dadras, R. Fis- chbach, I. J. Kaltenborn, and A. Mukhopadhyay, “Distance-based de- tection of out-of-distribution silent failures for covid-19 lung lesion segmentation,”Medical Image Analysis, vol. 82, p. 102596, Nov. 2022

work page 2022

[52] [52]

Loss odyssey in medical image segmentation,

J. Ma, J. Chen, M. Ng, R. Huang, Y . Li, C. Li, X. Yang, and A. L. Martel, “Loss odyssey in medical image segmentation,”Medical Image Analysis, vol. 71, p. 102035, Jul. 2021

work page 2021

[53] [53]

nnu-net: A self-configuring method for deep learning-based biomedical image segmentation,

F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: A self-configuring method for deep learning-based biomedical image segmentation,”Nature Methods, vol. 18, pp. 203–211, Feb. 2021

work page 2021

[54] [54]

Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,

M. J. Cardoso, M. Modat, R. Wolz, A. Melbourne, D. Cash, D. Rueckert, and S. Ourselin, “Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion,”IEEE Transactions on Medical Imaging, vol. 34, pp. 1976–1988, Sep. 2015

work page 1976

[55] [55]

The amsterdam open mri collec- tion, a set of multimodal mri datasets for individual difference analyses,

L. Snoek, M. M. van der Miesen, T. Beemsterboer, A. van der Leij, A. Eigenhuis, and H. Steven Scholte, “The amsterdam open mri collec- tion, a set of multimodal mri datasets for individual difference analyses,” Scientific Data, vol. 8, p. 85, Mar. 2021

work page 2021

[56] [56]

Deep learning approach for hyperspectral image demosaicking, spectral correction and high-resolution rgb reconstruc- tion,

P. Li, M. Ebner, P. Noonan, C. Horgan, A. Bahl, S. Ourselin, J. Shapey, and T. Vercauteren, “Deep learning approach for hyperspectral image demosaicking, spectral correction and high-resolution rgb reconstruc- tion,”Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 10, pp. 409–417, Jul. 2022

work page 2022

[57] [57]

A self-supervised and adversarial approach to hyperspectral demosaicking and rgb recon- struction in surgical imaging,

P. Li, O. MacCormac, J. Shapey, and T. Vercauteren, “A self-supervised and adversarial approach to hyperspectral demosaicking and rgb recon- struction in surgical imaging,” in35th British Machine Vision Confer- ence 2024, BMVC 2024, Glasgow, UK, November 25-28, 2024, 2024

work page 2024

[58] [58]

Rapid and robust endoscopic content area estimation: A lean gpu-based pipeline and curated benchmark dataset,

C. Budd, L. C. Garcia-Peraza Herrera, M. Huber, S. Ourselin, and T. Vercauteren, “Rapid and robust endoscopic content area estimation: A lean gpu-based pipeline and curated benchmark dataset,”Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 11, pp. 1215–1224, Jul. 2023

work page 2023

[59] [59]

Efficientnet: Rethinking model scaling for con- volutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con- volutional neural networks,” inProceedings of the 36th International Conference on Machine Learning, May 2019, pp. 6105–6114

work page 2019

[60] [60]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255

work page 2009

[61] [61]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in3rd International Conference on Learning Representations (ICLR), San Diego, 2015, Jan. 2017

work page 2015

[62] [62]

Synthetic white balancing for intra-operative hyperspectral imaging,

A. Bahl, C. C. Horgan, M. Janatka, O. J. MacCormac, P. Noonan, Y . Xie, J. Qiu, N. Cavalcanti, P. F ¨urnstahlet al., “Synthetic white balancing for intra-operative hyperspectral imaging,”Journal of Medical Imaging, vol. 10, p. 046001, Jul. 2023. J. DOEet al.: IEEE TMI TEMPLATE HACKS - V2 11 VII. APPENDIX Root Supra Tentorial Infra Tentorial WM CSF Supra B...

work page 2023