Learning to Look Closer: A New Instance-Wise Loss for Small Cerebral Lesion Segmentation
Pith reviewed 2026-05-17 20:46 UTC · model grok-4.3
The pith
CC-DiceCE loss raises detection recall for small cerebral lesions with little impact on segmentation quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CC-DiceCE loss, based on the CC-Metrics framework, increases detection (recall) with minimal to no degradation in segmentation performance compared to a DiceCE baseline, though with dataset-dependent trade-offs in precision, and our multi-dataset study shows that CC-DiceCE generally outperforms blob loss.
What carries the argument
The CC-DiceCE loss function, which uses connected-component metrics to evaluate and penalize segmentation errors on a per-lesion basis.
If this is right
- CC-DiceCE increases lesion detection recall compared with the DiceCE baseline.
- Segmentation performance experiences minimal or no degradation.
- Precision shows dataset-dependent trade-offs.
- CC-DiceCE outperforms blob loss across the tested datasets.
Where Pith is reading between the lines
- The same per-lesion loss approach could be tested on small-object segmentation tasks outside cerebral imaging, such as lung nodules or retinal vessels.
- Models trained with CC-DiceCE might generalize better to rare or tiny structures when data are highly imbalanced.
- Combining CC-DiceCE with post-processing steps that merge or filter connected components could further reduce false positives without retraining.
Load-bearing premise
That evaluating segmentation on a per-lesion basis reliably improves clinical usefulness and that nnU-Net comparisons remain fair without hidden dataset-specific effects.
What would settle it
A replication on the same datasets in which CC-DiceCE produces no gain in recall or a clear drop in Dice scores relative to the DiceCE baseline would falsify the central claim.
read the original abstract
Traditional loss functions in medical image segmentation, such as Dice, often under-segment small lesions because their small relative volume contributes negligibly to the overall loss. To address this, instance-wise loss functions and metrics have been proposed to evaluate segmentation quality on a per-lesion basis. We introduce CC-DiceCE, a loss function based on the CC-Metrics framework, and compare it with the existing blob loss. Both are benchmarked against a DiceCE baseline within the nnU-Net framework, which provides a robust and standardized setup. We find that CC-DiceCE loss increases detection (recall) with minimal to no degradation in segmentation performance, though with dataset-dependent trade-offs in precision. Furthermore, our multi-dataset study shows that CC-DiceCE generally outperforms blob loss.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces CC-DiceCE, an instance-wise loss derived from the CC-Metrics framework, to mitigate under-segmentation of small cerebral lesions by emphasizing per-lesion detection. It benchmarks CC-DiceCE against a DiceCE baseline and the blob loss within the standardized nnU-Net framework across multiple datasets, reporting gains in lesion recall with little or no Dice degradation and general outperformance over blob loss, albeit with dataset-dependent precision trade-offs.
Significance. If the reported recall improvements prove robust, the work could provide a practical tool for clinical tasks where missing small lesions is costly. The choice of nnU-Net supplies a reproducible baseline that strengthens cross-loss comparisons. The empirical focus and multi-dataset evaluation are appropriate for the claim, though the absence of statistical tests or variance estimates reduces the strength of the performance assertions.
major comments (2)
- [Results / Experiments] Results section (and abstract): recall and Dice values are given as single-run point estimates with no standard deviations, confidence intervals, or repeated random seeds. Because nnU-Net training incorporates stochastic augmentation and initialization, and small-lesion recall is known to be sensitive to these factors, the observed deltas cannot be distinguished from training noise without additional runs or statistical tests.
- [Results] Table(s) reporting per-dataset metrics: the claim that CC-DiceCE 'generally outperforms blob loss' is qualified by 'dataset-dependent trade-offs in precision,' yet no quantitative measure of consistency (e.g., win rate across datasets or lesion-size strata) is supplied to support the 'generally' qualifier.
minor comments (2)
- [Abstract] Abstract: exact dataset sizes, number of lesions, and lesion-size definitions are omitted; these details should appear at least in the first results table or methods paragraph.
- [Methods] Notation: the precise mathematical definition of CC-DiceCE (how the CC-Metrics per-lesion scores are folded into the DiceCE term) should be given an equation number and contrasted explicitly with blob loss.
Simulated Author's Rebuttal
We thank the referee for their insightful and constructive comments. We address each of the major comments below and describe the revisions planned for the manuscript.
read point-by-point responses
-
Referee: [Results / Experiments] Results section (and abstract): recall and Dice values are given as single-run point estimates with no standard deviations, confidence intervals, or repeated random seeds. Because nnU-Net training incorporates stochastic augmentation and initialization, and small-lesion recall is known to be sensitive to these factors, the observed deltas cannot be distinguished from training noise without additional runs or statistical tests.
Authors: We agree that single-run results limit the ability to assess variability due to stochastic elements in nnU-Net training. To address this, we will rerun the experiments with multiple random seeds for each configuration and report means along with standard deviations for recall, Dice, and precision metrics. We will also consider adding statistical significance tests in the revised manuscript. revision: yes
-
Referee: [Results] Table(s) reporting per-dataset metrics: the claim that CC-DiceCE 'generally outperforms blob loss' is qualified by 'dataset-dependent trade-offs in precision,' yet no quantitative measure of consistency (e.g., win rate across datasets or lesion-size strata) is supplied to support the 'generally' qualifier.
Authors: We acknowledge the value of a quantitative consistency measure to support the 'generally outperforms' statement. In the revision, we will add a summary analysis computing win rates for CC-DiceCE versus blob loss across the evaluated datasets and lesion-size groups for the primary metrics. This will provide objective support while retaining the discussion of precision trade-offs. revision: yes
Circularity Check
Empirical loss comparison with no reduction to self-defined quantities
full rationale
The paper introduces the CC-DiceCE loss by building on the CC-Metrics framework and reports empirical performance gains versus DiceCE and blob loss baselines inside the nnU-Net pipeline across multiple datasets. All central claims (recall increase with limited Dice degradation, general outperformance of blob loss) rest on observed training/evaluation metrics rather than any derivation, prediction, or uniqueness result that reduces by the paper's own equations to fitted parameters or prior self-citations. No load-bearing step collapses to a self-definitional or fitted-input pattern; the work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption nnU-Net framework provides a robust and standardized setup for fair comparison
invented entities (1)
-
CC-DiceCE loss
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CC-Metrics computes the Voronoi region for every connected component (lesion) and then scores each region individually. This assigns the same weight to every lesion, regardless of size, in the final score. Vm(P, K) = 1/|K| ∑_{C∈K} m(P ∩ R_C, C)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Instance Awareness of Multi-class Semantic Segmentation Loss Functions
Multi-class blob and CC losses via one-vs-rest decomposition and per-component weighting improve foreground Dice, rare-class Dice, and Panoptic Quality on BraTS-METS 2025 compared to baseline.
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION Automated segmentation of small cerebral lesions in brain MRI enables scalable detection and quantification [1]. How- ever, conventional voxel-overlap losses, such as the stan- dard Dice with cross-entropy (DiceCE) used in nnU-Net [2], overweight large structures. This can reduce small-lesion detection when lesion sizes vary widely [3, 4, 5],...
-
[2]
baseline (Dice0.659) is substantially lower than standard nnU-Net performance (Dice0.801) [3, 9]. This concern is echoed by [9], which notes that many segmentation studies fail to configure baselines properly or evaluate on too few datasets, which is a notable problem given the high hetero- geneity of medical data. We address these limitations by us- ing ...
-
[3]
We investigate the potential of CC-Metrics as a loss func- tion for small-lesion segmentation
-
[4]
Learning to Look Closer: A New Instance-Wise Loss for Small Cerebral Lesion Segmentation
We provide a rigorous evaluation of instance-aware losses (CC-Metrics and blob loss) against a strong, standard- ized baseline (nnU-Net) across multiple heterogeneous datasets. The code for the experiments can be found at https://github.com/TIO-IKIM/Learning-to-Look-Closer. © 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[5]
RELA TED WORK 2.1. Losses LetT⊂Z 3 be the lattice,K⊂Tthe ground-truth fore- ground,Kthe set of its maximal 26-connected components, andm:P(T)× P(T)→Ra base set metric (e.g., Dice). For t∈TandA⊆T, defined(t, A) = min u∈A ∥t−u∥ 2. For eachC∈ K, the V oronoi region is (ties broken arbitrarily) RC = t∈T:d(t, C)< d(t, C ′),∀C ′ ∈ K\{C} .(1) CC-Metrics computes...
work page 2020
-
[6]
Training We evaluated all loss functions within the nnU-Net frame- work
EXPERIMENTAL SETUP 3.1. Training We evaluated all loss functions within the nnU-Net frame- work. We retain all default nnU-Net parameters, with one exception: we use the non-smooth variant of Dice loss with ϵ= 0, as we observed training instability on the CMB dataset with the default smooth Dice. The base metricmfor the instance-wise functions is DiceCE. ...
-
[7]
RESULTS As summarized in Tab. 2, replacing the baseline DiceCE with CC-DiceCE maintained the global Dice score within the typical 5-fold variation for each cohort, while improving CC-Dice and recall in most of them. On LAC and CMB, CC-DiceCE increased lesion-wise per- formance (higher CC-Dice and F1) with a trade-off in global Dice on LAC and a consistent...
-
[8]
We also observe increases in CC-Dice in four of five datasets, with a small decrease only on WMH
DISCUSSION We find that CC-DiceCE improves detection rates (recall) while the change in segmentation performance (Dice) is min- imal (at worst−0.011) across all five datasets. We also observe increases in CC-Dice in four of five datasets, with a small decrease only on WMH. We hypothesize that the inclu- sion of the global DiceCE loss term helps maintain t...
-
[9]
CONCLUSION We studied instance-aware objectives for small cerebral le- sion segmentation within a strong and standardized nnU-Net setup across five heterogeneous MRI cohorts. Replacing con- ventional DiceCE with CC-DiceCE consistently improved instance-aware detection (higher recall and CC-Dice) in four of five datasets while having negligible effect on g...
-
[10]
Ethical approval was not required as confirmed by the license attached with the open access data
COMPLIANCE WITH ETHICAL STANDARDS This research study was conducted retrospectively using hu- man subject data made available in open access [11, 12, 6, 13, 14, 15, 16]. Ethical approval was not required as confirmed by the license attached with the open access data
-
[11]
Ahmed W Moawad, Anastasia Janas, Ujjwal Baid, Divya Ramakrishnan, Rachit Saluja, Nader Ashraf, Nazanin Maleki, Leon Jekel, Nikolay Yordanov, Pas- cal Fehringer, et al., “The brain tumor segmentation- metastases (brats-mets) challenge 2023: Brain metas- tasis segmentation on pre-treatment mri,”ArXiv, pp. arXiv–2306, 2024
work page 2023
-
[12]
nnu-net: a self- configuring method for deep learning-based biomedical image segmentation,
Fabian Isensee, Paul F Jaeger, Simon AA Kohl, Jens Petersen, and Klaus H Maier-Hein, “nnu-net: a self- configuring method for deep learning-based biomedical image segmentation,”Nature methods, vol. 18, no. 2, pp. 203–211, 2021
work page 2021
-
[13]
Blob loss: Instance imbalance aware loss functions for semantic segmentation,
Florian Kofler, Suprosanna Shit, Ivan Ezhov, Lucas Fi- don, Izabela Horvath, Rami Al-Maskari, Hongwei Bran Li, Harsharan Bhatia, Timo Loehr, Marie Piraud, et al., “Blob loss: Instance imbalance aware loss functions for semantic segmentation,” inInternational Conference on Information Processing in Medical Imaging. Springer, 2023, pp. 755–767
work page 2023
-
[14]
Alexander Jaus, Constantin Marc Seibold, Simon Reiß, Zdravko Marinov, Keyi Li, Zeling Ye, Stefan Krieg, Jens Kleesiek, and Rainer Stiefelhagen, “Every component counts: Rethinking the measure of success for medical semantic segmentation in multi-instance segmentation tasks,” inProceedings of the AAAI Conference on Arti- ficial Intelligence, 2025, vol. 39,...
work page 2025
-
[15]
Muhammad Febrian Rachmadi, Michal Byra, and Hen- rik Skibbe, “A new family of instance-level loss func- tions for improving instance-level segmentation and de- tection of white matter hyperintensities in routine clini- cal brain mri,”Computers in Biology and Medicine, vol. 174, pp. 108414, 2024
work page 2024
-
[16]
Deep learning enables automatic detection and segmentation of brain metastases on multisequence mri,
Endre Grøvik, Darvin Yi, Michael Iv, Elizabeth Tong, Daniel Rubin, and Greg Zaharchuk, “Deep learning enables automatic detection and segmentation of brain metastases on multisequence mri,”Journal of Magnetic Resonance Imaging, vol. 51, no. 1, pp. 175–182, 2020
work page 2020
-
[17]
Febrian Rachmadi, Charissa Poon, and Henrik Skibbe, “Improving segmentation of objects with varying sizes in biomedical images using instance-wise and center-of- instance segmentation loss function,” inMedical Imag- ing with Deep Learning. PMLR, 2024, pp. 286–300
work page 2024
-
[18]
The liver tumor segmenta- tion benchmark (lits),
Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene V orontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Sze- skin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, et al., “The liver tumor segmenta- tion benchmark (lits),”Medical image analysis, vol. 84, pp. 102680, 2023
work page 2023
-
[19]
nnu-net revisited: A call for rig- orous validation in 3d medical image segmentation,
Fabian Isensee, Tassilo Wald, Constantin Ulrich, Michael Baumgartner, Saikat Roy, Klaus Maier-Hein, and Paul F Jaeger, “nnu-net revisited: A call for rig- orous validation in 3d medical image segmentation,” inInternational Conference on Medical Image Com- puting and Computer-Assisted Intervention. Springer, 2024, pp. 488–498
work page 2024
-
[20]
Incident cerebral lacunes: a review,
Yifeng Ling and Hugues Chabriat, “Incident cerebral lacunes: a review,”Journal of Cerebral Blood Flow & Metabolism, vol. 40, no. 5, pp. 909–921, 2020
work page 2020
-
[21]
Where is valdo? vascular lesions detection and segmentation challenge at miccai 2021,
Carole H Sudre, Kimberlin Van Wijnen, Florian Du- bost, Hieab Adams, David Atkinson, Frederik Barkhof, Mahlet A Birhanu, Esther E Bron, Robin Camarasa, Nish Chaturvedi, et al., “Where is valdo? vascular lesions detection and segmentation challenge at miccai 2021,”Medical Image Analysis, vol. 91, pp. 103029, 2024
work page 2021
-
[22]
Hugo J Kuijf, J Matthijs Biesbroek, Jeroen De Bresser, Rutger Heinen, Simon Andermatt, Mariana Bento, Matt Berseth, Mikhail Belyaev, M Jorge Cardoso, Adria Casamitjana, et al., “Standardized assessment of au- tomatic segmentation of white matter hyperintensities and results of the wmh segmentation challenge,”IEEE transactions on medical imaging, vol. 38, ...
work page 2019
-
[23]
The multimodal brain tumor image seg- mentation benchmark (brats),
Bjoern H Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al., “The multimodal brain tumor image seg- mentation benchmark (brats),”IEEE transactions on medical imaging, vol. 34, no. 10, pp. 1993–2024, 2014
work page 1993
-
[24]
Spyridon Bakas, Hamed Akbari, Aristeidis Sotiras, Michel Bilello, Martin Rozycki, Justin S Kirby, John B Freymann, Keyvan Farahani, and Christos Davatzikos, “Advancing the cancer genome atlas glioma mri collec- tions with expert segmentation labels and radiomic fea- tures,”Scientific data, vol. 4, no. 1, pp. 1–13, 2017
work page 2017
-
[25]
Spyridon Bakas, Mauricio Reyes, Andras Jakab, Ste- fan Bauer, Markus Rempfler, Alessandro Crimi, Rus- sell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, et al., “Identifying the best machine learning algorithms for brain tumor segmentation, pro- gression assessment, and overall survival prediction in the brats challenge,”arXiv preprint...
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[26]
Segmentation labels for the pre-operative scans of the tcga-lgg collection,
Spyridon Bakas, Hamed Akbari, Aristeidis Sotiras, Michel Bilello, Martin Rozycki, Justin Kirby, John Freymann, Keyvan Farahani, and Christos Davatzikos, “Segmentation labels for the pre-operative scans of the tcga-lgg collection,”The cancer imaging archive, 2017
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.