Continuous Dice Coefficient: a Method for Evaluating Probabilistic Segmentations

Guillermo Sapiro; Jinyoung Kim; Noam Harel; Reuben R Shamir; Yuval Duchin

arxiv: 1906.11031 · v1 · pith:TSLNLM7Fnew · submitted 2019-06-26 · 💻 cs.CV · eess.IV

Continuous Dice Coefficient: a Method for Evaluating Probabilistic Segmentations

Reuben R Shamir , Yuval Duchin , Jinyoung Kim , Guillermo Sapiro , Noam Harel This is my paper

Pith reviewed 2026-05-25 15:52 UTC · model grok-4.3

classification 💻 cs.CV eess.IV

keywords dice coefficientcontinuous dice coefficientprobabilistic segmentationoverlap measurepartial volume effectsmedical image segmentationbrain structure analysis

0 comments

The pith

The continuous Dice coefficient directly compares binary ground truth to probabilistic segmentation maps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Modern segmentation methods output probability maps with values between zero and one, yet standard overlap tools like the Dice coefficient demand binary inputs and favor larger structures. This paper defines the continuous Dice coefficient to compare a binary ground truth image straight against a continuous probability map. It proves the new measure stays at or below one, reaches one only with complete overlap, and falls steadily as overlap worsens. Simulations that add partial volume effects to brain targets show the continuous version produces higher scores and lower variation than the classical Dice, especially for small structures. The result supplies a more stable yardstick for judging probabilistic segmentations used in medical imaging.

Core claim

The authors extend the classical Dice coefficient by replacing its binary intersection and union counts with sums that incorporate the continuous probability values from a segmentation map. They establish that the resulting continuous Dice coefficient is bounded above by one, equals one if and only if overlap is complete, and decreases monotonically with reduced overlap. Partial-volume simulations on the thalamus and subthalamic nucleus, together with an automatic STN segmentation example, indicate that the continuous measure exhibits smaller size bias and greater stability under partial-volume conditions than the discrete Dice coefficient.

What carries the argument

continuous Dice coefficient (cDC), an extension of the Dice overlap formula that sums probability values against a binary ground truth

If this is right

Probabilistic segmentation outputs can be scored without an intermediate thresholding step.
Overlap scores become less dependent on the physical size of the target structure.
Partial-volume blurring produces smaller distortions in the reported overlap value.
Evaluation results can guide the design of segmentation algorithms that output calibrated probabilities.
Automatic segmentations of small nuclei such as the subthalamic nucleus receive more stable numerical assessments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same summation approach could be used to adapt other binary overlap indices to continuous inputs.
Loss functions inside neural-network training could replace the discrete Dice with its continuous counterpart to encourage calibrated probability maps.
Clinical workflows that rely on automated labels for small or variable structures may adopt the measure to reduce size-related scoring artifacts.
Longitudinal tracking of segmentation quality across patients or scanners could become more consistent when partial-volume effects vary.

Load-bearing premise

The values in a probabilistic segmentation map can be treated as calibrated probabilities and summed directly against binary ground truth without extra calibration or thresholding.

What would settle it

Apply the partial-volume simulation to a new set of structures with known true overlap fractions, threshold the probability maps at 0.5, and check whether the classical Dice then matches the robustness and size-independence reported for the continuous version.

Figures

Figures reproduced from arXiv: 1906.11031 by Guillermo Sapiro, Jinyoung Kim, Noam Harel, Reuben R Shamir, Yuval Duchin.

**Figure 1.** Figure 1: Empirical illustration of the proposed cDC. (a) A probabilistic map was simulated with a Gaussian distribution over a manually segmented image of the subthalamic nucleus (green line marks its boundaries at a selected plane). (b) Then, the probabilistic map was shifted to simulate a simple segmentation error (2mm in this example). The proposed cDC was computed under the various translation errors to empiric… view at source ↗

read the original abstract

Objective: Overlapping measures are often utilized to quantify the similarity between two binary regions. However, modern segmentation algorithms output a probability or confidence map with continuous values in the zero-to-one interval. Moreover, these binary overlapping measures are biased to structure size. Addressing these challenges is the objective of this work. Methods: We extend the definition of the classical Dice coefficient (DC) overlap to facilitate the direct comparison of a ground truth binary image with a probabilistic map. We call the extended method continuous Dice coefficient (cDC) and show that 1) cDC is less or equal to 1 and cDC = 1 if-and-only-if the structures overlap is complete, and, 2) cDC is monotonically decreasing with the amount of overlap. We compare the classical DC and the cDC in a simulation of partial volume effects that incorporates segmentations of common targets for deep-brainstimulation. Lastly, we investigate the cDC for an automatic segmentation of the subthalamic-nucleus. Results: Partial volume effect simulation on thalamus (large structure) resulted with DC and cDC averages (SD) of 0.98 (0.006) and 0.99 (0.001), respectively. For subthalamic-nucleus (small structure) DC and cDC were 0.86 (0.025) and 0.97 (0.006), respectively. The DC and cDC for automatic STN segmentation were 0.66 and 0.80, respectively. Conclusion: The cDC is well defined for probabilistic segmentation, less biased to structure size and more robust to partial volume effects in comparison to DC. Significance: The proposed method facilitates a better evaluation of segmentation algorithms. As a better measurement tool, it opens the door for the development of better segmentation methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a direct soft-Dice extension for evaluating probability maps against binary ground truth, with simulations showing it reduces size bias on small structures, but it treats raw outputs as calibrated fractions without checking that assumption.

read the letter

The main thing to know is that this paper replaces the binary counts in the classic Dice formula with sums over probability values to create a continuous Dice coefficient, asserts that it stays at most 1 with equality only on full overlap and decreases monotonically with less overlap, and then runs partial-volume simulations on thalamus and subthalamic-nucleus segmentations to show cDC stays higher than DC for the small structure. The automatic STN segmentation example also reports a gap of 0.66 versus 0.80. Those targeted numbers are the concrete part that could matter for people ranking algorithms on low-contrast or small targets. The simulation design itself is reasonable because it starts from real segmentations and adds controlled partial voluming, which makes the size-bias comparison easy to see. The soft spot is the direct-use assumption flagged in the stress-test note. The definition inserts the network's raw p_i values straight into the overlap formula as if they already represent expected volume fractions, yet nothing in the reported results tests calibration, sensitivity to over- or under-confident outputs, or what happens after common post-processing steps. If that assumption does not hold, the claimed robustness to partial volumes becomes harder to interpret. The two mathematical properties are stated clearly but the derivation steps and edge-case handling are not visible in the abstract, so a referee would need to verify them in the full text. This is for the medical-image segmentation community that evaluates probabilistic outputs on brain structures. A reader who needs a size-independent overlap score for small targets would get a usable alternative to try. It is worth sending for peer review because the core definition is simple, the simulation evidence is focused, and the calibration question is a fixable gap rather than a load-bearing flaw.

Referee Report

3 major / 1 minor

Summary. The paper proposes the continuous Dice coefficient (cDC) as an extension of the classical binary Dice coefficient (DC) for directly comparing probabilistic segmentation maps (values in [0,1]) against binary ground truth. It asserts two properties: cDC ≤ 1 with equality if and only if overlap is complete, and cDC is monotonically decreasing with the amount of overlap. Simulations of partial-volume effects on thalamus and subthalamic nucleus (STN) segmentations yield DC/cDC averages (SD) of 0.98(0.006)/0.99(0.001) for the large structure and 0.86(0.025)/0.97(0.006) for the small structure; an automatic STN segmentation example gives DC=0.66 and cDC=0.80. The conclusion is that cDC is well-defined for probabilistic outputs, less size-biased, and more robust to partial-volume effects than DC.

Significance. If the two stated properties hold for the chosen formula and the simulation results are reproducible, cDC would supply a practical, size-robust metric for evaluating modern probabilistic segmenters in medical imaging. The reported simulation numbers already illustrate a concrete reduction in variance for small structures, which is a tangible strength of the work.

major comments (3)

[Methods / Definition of cDC] Methods (definition of cDC): the two mathematical properties are asserted in the abstract and presumably proved in the text, but the exact formula, the derivation steps establishing cDC=1 iff complete overlap, and the precise sense in which monotonicity holds are not visible; without these steps the central claim that cDC is “well defined” cannot be verified.
[Partial volume effect simulation] Partial-volume simulation section: the comparison of DC (thresholded) versus cDC (raw probabilities) rests on the assumption that the network outputs p_i may be summed directly against binary g_i as calibrated volume fractions. No calibration step, sensitivity analysis, or description of how the probabilistic maps were synthesized appears; this assumption is load-bearing for both the size-bias and PV-robustness claims.
[Automatic STN segmentation experiment] STN experiment: the single reported pair (DC=0.66, cDC=0.80) is presented without error bars, multiple runs, or a statement of how the probabilistic map was obtained, so it is impossible to judge whether the observed difference is stable or merely an artifact of the particular output calibration.

minor comments (1)

[Abstract / Results] The abstract reports averages and SDs but does not state the number of simulation realizations or the exact partial-volume model; adding these details would improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the manuscript. We address each major comment below, providing clarifications and indicating planned revisions.

read point-by-point responses

Referee: [Methods / Definition of cDC] Methods (definition of cDC): the two mathematical properties are asserted in the abstract and presumably proved in the text, but the exact formula, the derivation steps establishing cDC=1 iff complete overlap, and the precise sense in which monotonicity holds are not visible; without these steps the central claim that cDC is “well defined” cannot be verified.

Authors: The cDC formula appears in the Methods section: cDC = 2 * sum(p_i * g_i) / (sum(p_i) + sum(g_i)), with p_i in [0,1] and g_i binary. Proof that cDC=1 iff complete overlap: equality holds precisely when p_i = g_i everywhere (numerator equals denominator only on perfect match). Monotonicity follows because any increase in mismatch (reducing overlap sum or adding false positive mass) strictly decreases the ratio, as can be shown by considering incremental changes to p. We will revise to include the explicit formula and full derivation steps. revision: yes
Referee: [Partial volume effect simulation] Partial-volume simulation section: the comparison of DC (thresholded) versus cDC (raw probabilities) rests on the assumption that the network outputs p_i may be summed directly against binary g_i as calibrated volume fractions. No calibration step, sensitivity analysis, or description of how the probabilistic maps were synthesized appears; this assumption is load-bearing for both the size-bias and PV-robustness claims.

Authors: The simulation models partial-volume effects by treating p_i as linear volume fractions within each voxel based on anatomical priors for thalamus and STN. We will add an explicit description of the map synthesis procedure, state the calibration assumption, and include a sensitivity analysis over partial-volume parameters to support the robustness claims. revision: yes
Referee: [Automatic STN segmentation experiment] STN experiment: the single reported pair (DC=0.66, cDC=0.80) is presented without error bars, multiple runs, or a statement of how the probabilistic map was obtained, so it is impossible to judge whether the observed difference is stable or merely an artifact of the particular output calibration.

Authors: This pair is from a single illustrative automatic segmentation of clinical MRI data using a standard probabilistic method. We will expand the description of how the map was generated and clarify that the example demonstrates the metric difference rather than providing statistical validation. Additional runs are not available from the original experiment. revision: partial

Circularity Check

0 steps flagged

No significant circularity; cDC is a direct definitional extension with independent algebraic properties

full rationale

The paper defines cDC by direct algebraic extension of the classical Dice formula, replacing binary intersection and union counts with summed products of the probabilistic map values against binary ground truth. The claimed properties (cDC ≤ 1 with equality iff complete overlap, and monotonic decrease with overlap) are shown to follow from this definition without any fitted parameters, self-citations, or imported uniqueness results. The partial-volume simulation and STN segmentation experiments supply separate empirical comparisons rather than deriving the measure from its own outputs. No step reduces the claimed result to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The construction rests on the standard arithmetic properties of summation and the assumption that probability values lie in [0,1] and can be directly compared to binary labels.

axioms (2)

domain assumption Probabilistic segmentation outputs are real numbers in the closed interval [0,1].
Required for the weighted sums that replace binary intersection and union counts.
standard math The classical Dice coefficient is defined via set intersection and union cardinalities.
The cDC is obtained by direct substitution of those cardinalities with probability-weighted sums.

pith-pipeline@v0.9.0 · 5876 in / 1413 out tokens · 19849 ms · 2026-05-25T15:52:34.018961+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

Reduced risk trajectory planning in image-‐guided keyhole neurosurgery.,

[1] R. R. Shamir, L. Joskowicz, I. Tamir, E. Dabool, L. Pertman, A. Ben-‐Ami, and Y. Shoshan, “Reduced risk trajectory planning in image-‐guided keyhole neurosurgery.,” Med. Phys., vol. 39, no. 5, pp. 2885–95, May 2012. [2] N. Sarkalkan, J. H. Waarsing, P. K. Bos, H. Weinans, and A. A. Zadpoor, “Statistical shape and appearance models for fast and autom...

work page 2012
[2]

3D Slicer as an image computing platform for the Quantitative Imaging Network.,

Buatti, S. Aylward, J. V Miller, S. Pieper, and R. Kikinis, “3D Slicer as an image computing platform for the Quantitative Imaging Network.,” Magn. Reson. Imaging, vol. 30, no. 9, pp. 1323–41, Nov. 2012. [4] W. R. Crum, O. Camara, and D. L. G. Hill, “Generalized overlap measures for evaluation and validation in medical image analysis.,” IEEE Trans. Med. I...

work page 2012
[3]

Statistical validation of image segmentation quality based on a spatial overlap index.,

Kaus, S. J. Haker, W. M. Wells, F. A. Jolesz, and R. Kikinis, “Statistical validation of image segmentation quality based on a spatial overlap index.,” Acad. Radiol., vol. 11, no. 2, pp. 178–89, Feb. 2004. [9] C. Lenglet, A. Abosch, E. Yacoub, F. De Martino, G. Sapiro, and N. Harel, “Comprehensive in vivo mapping of the human basal ganglia and thalamic co...

work page doi:10.1101/306977doi: 2004

[1] [1]

Reduced risk trajectory planning in image-‐guided keyhole neurosurgery.,

[1] R. R. Shamir, L. Joskowicz, I. Tamir, E. Dabool, L. Pertman, A. Ben-‐Ami, and Y. Shoshan, “Reduced risk trajectory planning in image-‐guided keyhole neurosurgery.,” Med. Phys., vol. 39, no. 5, pp. 2885–95, May 2012. [2] N. Sarkalkan, J. H. Waarsing, P. K. Bos, H. Weinans, and A. A. Zadpoor, “Statistical shape and appearance models for fast and autom...

work page 2012

[2] [2]

3D Slicer as an image computing platform for the Quantitative Imaging Network.,

Buatti, S. Aylward, J. V Miller, S. Pieper, and R. Kikinis, “3D Slicer as an image computing platform for the Quantitative Imaging Network.,” Magn. Reson. Imaging, vol. 30, no. 9, pp. 1323–41, Nov. 2012. [4] W. R. Crum, O. Camara, and D. L. G. Hill, “Generalized overlap measures for evaluation and validation in medical image analysis.,” IEEE Trans. Med. I...

work page 2012

[3] [3]

Statistical validation of image segmentation quality based on a spatial overlap index.,

Kaus, S. J. Haker, W. M. Wells, F. A. Jolesz, and R. Kikinis, “Statistical validation of image segmentation quality based on a spatial overlap index.,” Acad. Radiol., vol. 11, no. 2, pp. 178–89, Feb. 2004. [9] C. Lenglet, A. Abosch, E. Yacoub, F. De Martino, G. Sapiro, and N. Harel, “Comprehensive in vivo mapping of the human basal ganglia and thalamic co...

work page doi:10.1101/306977doi: 2004