Continuous Dice Coefficient: a Method for Evaluating Probabilistic Segmentations
Pith reviewed 2026-05-25 15:52 UTC · model grok-4.3
The pith
The continuous Dice coefficient directly compares binary ground truth to probabilistic segmentation maps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors extend the classical Dice coefficient by replacing its binary intersection and union counts with sums that incorporate the continuous probability values from a segmentation map. They establish that the resulting continuous Dice coefficient is bounded above by one, equals one if and only if overlap is complete, and decreases monotonically with reduced overlap. Partial-volume simulations on the thalamus and subthalamic nucleus, together with an automatic STN segmentation example, indicate that the continuous measure exhibits smaller size bias and greater stability under partial-volume conditions than the discrete Dice coefficient.
What carries the argument
continuous Dice coefficient (cDC), an extension of the Dice overlap formula that sums probability values against a binary ground truth
If this is right
- Probabilistic segmentation outputs can be scored without an intermediate thresholding step.
- Overlap scores become less dependent on the physical size of the target structure.
- Partial-volume blurring produces smaller distortions in the reported overlap value.
- Evaluation results can guide the design of segmentation algorithms that output calibrated probabilities.
- Automatic segmentations of small nuclei such as the subthalamic nucleus receive more stable numerical assessments.
Where Pith is reading between the lines
- The same summation approach could be used to adapt other binary overlap indices to continuous inputs.
- Loss functions inside neural-network training could replace the discrete Dice with its continuous counterpart to encourage calibrated probability maps.
- Clinical workflows that rely on automated labels for small or variable structures may adopt the measure to reduce size-related scoring artifacts.
- Longitudinal tracking of segmentation quality across patients or scanners could become more consistent when partial-volume effects vary.
Load-bearing premise
The values in a probabilistic segmentation map can be treated as calibrated probabilities and summed directly against binary ground truth without extra calibration or thresholding.
What would settle it
Apply the partial-volume simulation to a new set of structures with known true overlap fractions, threshold the probability maps at 0.5, and check whether the classical Dice then matches the robustness and size-independence reported for the continuous version.
Figures
read the original abstract
Objective: Overlapping measures are often utilized to quantify the similarity between two binary regions. However, modern segmentation algorithms output a probability or confidence map with continuous values in the zero-to-one interval. Moreover, these binary overlapping measures are biased to structure size. Addressing these challenges is the objective of this work. Methods: We extend the definition of the classical Dice coefficient (DC) overlap to facilitate the direct comparison of a ground truth binary image with a probabilistic map. We call the extended method continuous Dice coefficient (cDC) and show that 1) cDC is less or equal to 1 and cDC = 1 if-and-only-if the structures overlap is complete, and, 2) cDC is monotonically decreasing with the amount of overlap. We compare the classical DC and the cDC in a simulation of partial volume effects that incorporates segmentations of common targets for deep-brainstimulation. Lastly, we investigate the cDC for an automatic segmentation of the subthalamic-nucleus. Results: Partial volume effect simulation on thalamus (large structure) resulted with DC and cDC averages (SD) of 0.98 (0.006) and 0.99 (0.001), respectively. For subthalamic-nucleus (small structure) DC and cDC were 0.86 (0.025) and 0.97 (0.006), respectively. The DC and cDC for automatic STN segmentation were 0.66 and 0.80, respectively. Conclusion: The cDC is well defined for probabilistic segmentation, less biased to structure size and more robust to partial volume effects in comparison to DC. Significance: The proposed method facilitates a better evaluation of segmentation algorithms. As a better measurement tool, it opens the door for the development of better segmentation methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes the continuous Dice coefficient (cDC) as an extension of the classical binary Dice coefficient (DC) for directly comparing probabilistic segmentation maps (values in [0,1]) against binary ground truth. It asserts two properties: cDC ≤ 1 with equality if and only if overlap is complete, and cDC is monotonically decreasing with the amount of overlap. Simulations of partial-volume effects on thalamus and subthalamic nucleus (STN) segmentations yield DC/cDC averages (SD) of 0.98(0.006)/0.99(0.001) for the large structure and 0.86(0.025)/0.97(0.006) for the small structure; an automatic STN segmentation example gives DC=0.66 and cDC=0.80. The conclusion is that cDC is well-defined for probabilistic outputs, less size-biased, and more robust to partial-volume effects than DC.
Significance. If the two stated properties hold for the chosen formula and the simulation results are reproducible, cDC would supply a practical, size-robust metric for evaluating modern probabilistic segmenters in medical imaging. The reported simulation numbers already illustrate a concrete reduction in variance for small structures, which is a tangible strength of the work.
major comments (3)
- [Methods / Definition of cDC] Methods (definition of cDC): the two mathematical properties are asserted in the abstract and presumably proved in the text, but the exact formula, the derivation steps establishing cDC=1 iff complete overlap, and the precise sense in which monotonicity holds are not visible; without these steps the central claim that cDC is “well defined” cannot be verified.
- [Partial volume effect simulation] Partial-volume simulation section: the comparison of DC (thresholded) versus cDC (raw probabilities) rests on the assumption that the network outputs p_i may be summed directly against binary g_i as calibrated volume fractions. No calibration step, sensitivity analysis, or description of how the probabilistic maps were synthesized appears; this assumption is load-bearing for both the size-bias and PV-robustness claims.
- [Automatic STN segmentation experiment] STN experiment: the single reported pair (DC=0.66, cDC=0.80) is presented without error bars, multiple runs, or a statement of how the probabilistic map was obtained, so it is impossible to judge whether the observed difference is stable or merely an artifact of the particular output calibration.
minor comments (1)
- [Abstract / Results] The abstract reports averages and SDs but does not state the number of simulation realizations or the exact partial-volume model; adding these details would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the manuscript. We address each major comment below, providing clarifications and indicating planned revisions.
read point-by-point responses
-
Referee: [Methods / Definition of cDC] Methods (definition of cDC): the two mathematical properties are asserted in the abstract and presumably proved in the text, but the exact formula, the derivation steps establishing cDC=1 iff complete overlap, and the precise sense in which monotonicity holds are not visible; without these steps the central claim that cDC is “well defined” cannot be verified.
Authors: The cDC formula appears in the Methods section: cDC = 2 * sum(p_i * g_i) / (sum(p_i) + sum(g_i)), with p_i in [0,1] and g_i binary. Proof that cDC=1 iff complete overlap: equality holds precisely when p_i = g_i everywhere (numerator equals denominator only on perfect match). Monotonicity follows because any increase in mismatch (reducing overlap sum or adding false positive mass) strictly decreases the ratio, as can be shown by considering incremental changes to p. We will revise to include the explicit formula and full derivation steps. revision: yes
-
Referee: [Partial volume effect simulation] Partial-volume simulation section: the comparison of DC (thresholded) versus cDC (raw probabilities) rests on the assumption that the network outputs p_i may be summed directly against binary g_i as calibrated volume fractions. No calibration step, sensitivity analysis, or description of how the probabilistic maps were synthesized appears; this assumption is load-bearing for both the size-bias and PV-robustness claims.
Authors: The simulation models partial-volume effects by treating p_i as linear volume fractions within each voxel based on anatomical priors for thalamus and STN. We will add an explicit description of the map synthesis procedure, state the calibration assumption, and include a sensitivity analysis over partial-volume parameters to support the robustness claims. revision: yes
-
Referee: [Automatic STN segmentation experiment] STN experiment: the single reported pair (DC=0.66, cDC=0.80) is presented without error bars, multiple runs, or a statement of how the probabilistic map was obtained, so it is impossible to judge whether the observed difference is stable or merely an artifact of the particular output calibration.
Authors: This pair is from a single illustrative automatic segmentation of clinical MRI data using a standard probabilistic method. We will expand the description of how the map was generated and clarify that the example demonstrates the metric difference rather than providing statistical validation. Additional runs are not available from the original experiment. revision: partial
Circularity Check
No significant circularity; cDC is a direct definitional extension with independent algebraic properties
full rationale
The paper defines cDC by direct algebraic extension of the classical Dice formula, replacing binary intersection and union counts with summed products of the probabilistic map values against binary ground truth. The claimed properties (cDC ≤ 1 with equality iff complete overlap, and monotonic decrease with overlap) are shown to follow from this definition without any fitted parameters, self-citations, or imported uniqueness results. The partial-volume simulation and STN segmentation experiments supply separate empirical comparisons rather than deriving the measure from its own outputs. No step reduces the claimed result to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Probabilistic segmentation outputs are real numbers in the closed interval [0,1].
- standard math The classical Dice coefficient is defined via set intersection and union cardinalities.
Reference graph
Works this paper leans on
-
[1]
Reduced risk trajectory planning in image-‐guided keyhole neurosurgery.,
[1] R. R. Shamir, L. Joskowicz, I. Tamir, E. Dabool, L. Pertman, A. Ben-‐Ami, and Y. Shoshan, “Reduced risk trajectory planning in image-‐guided keyhole neurosurgery.,” Med. Phys., vol. 39, no. 5, pp. 2885–95, May 2012. [2] N. Sarkalkan, J. H. Waarsing, P. K. Bos, H. Weinans, and A. A. Zadpoor, “Statistical shape and appearance models for fast and autom...
work page 2012
-
[2]
3D Slicer as an image computing platform for the Quantitative Imaging Network.,
Buatti, S. Aylward, J. V Miller, S. Pieper, and R. Kikinis, “3D Slicer as an image computing platform for the Quantitative Imaging Network.,” Magn. Reson. Imaging, vol. 30, no. 9, pp. 1323–41, Nov. 2012. [4] W. R. Crum, O. Camara, and D. L. G. Hill, “Generalized overlap measures for evaluation and validation in medical image analysis.,” IEEE Trans. Med. I...
work page 2012
-
[3]
Statistical validation of image segmentation quality based on a spatial overlap index.,
Kaus, S. J. Haker, W. M. Wells, F. A. Jolesz, and R. Kikinis, “Statistical validation of image segmentation quality based on a spatial overlap index.,” Acad. Radiol., vol. 11, no. 2, pp. 178–89, Feb. 2004. [9] C. Lenglet, A. Abosch, E. Yacoub, F. De Martino, G. Sapiro, and N. Harel, “Comprehensive in vivo mapping of the human basal ganglia and thalamic co...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.