As easy as 1, 2... 4? Uncertainty in counting tasks for medical imaging
Pith reviewed 2026-05-24 15:59 UTC · model grok-4.3
The pith
A multi-task network outputs narrow predictive intervals for counts that cover a target percentage of medical images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By training a network to predict both the count value and the bounds of a predictive interval in a single forward pass, with the interval loss constructed to minimize width subject to a coverage constraint, the resulting intervals are calibrated on unseen data for the two counting tasks without requiring separate recalibration steps.
What carries the argument
Multi-task network whose auxiliary heads predict interval bounds; the joint loss balances count accuracy against a term that shrinks interval width while maintaining the target coverage probability.
If this is right
- Counts reported with these intervals can be used directly in clinical decision rules without an extra calibration stage.
- The same network architecture can be applied to other dense-prediction counting problems in imaging once the loss is re-weighted for the desired coverage level.
- Existing single-task counting networks can be extended by adding the interval heads rather than replaced entirely.
- The approach removes the need to choose between separate uncertainty techniques such as bootstrapping or Bayesian approximations for this task.
Where Pith is reading between the lines
- The same joint-optimization idea could be tested on non-count regression targets such as volume or length measurements where interval calibration is also required.
- If the method generalizes, it reduces reliance on large ensembles or Monte-Carlo sampling at inference time for uncertainty in medical imaging.
- A natural next check is whether the intervals remain reliable when the test distribution shifts in scanner vendor or staining protocol.
Load-bearing premise
Jointly optimizing count accuracy and interval width plus coverage inside one network will produce intervals that remain well-calibrated on data the network has never seen.
What would settle it
On a new test set of the same imaging modalities, measure whether the predicted intervals cover the stated percentage of ground-truth counts; if coverage falls substantially below the target or intervals are wider than those from a well-tuned post-hoc method, the claim fails.
Figures
read the original abstract
Counting is a fundamental task in biomedical imaging and count is an important biomarker in a number of conditions. Estimating the uncertainty in the measurement is thus vital to making definite, informed conclusions. In this paper, we first compare a range of existing methods to perform counting in medical imaging and suggest ways of deriving predictive intervals from these. We then propose and test a method for calculating intervals as an output of a multi-task network. These predictive intervals are optimised to be as narrow as possible, while also enclosing a desired percentage of the data. We demonstrate the effectiveness of this technique on histopathological cell counting and white matter hyperintensity counting. Finally, we offer insight into other areas where this technique may apply.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares existing methods for object counting in medical images and proposes deriving predictive intervals from them. It then introduces a multi-task network whose outputs include both a count estimate and interval bounds; these bounds are jointly optimized to minimize width subject to a target coverage level. The approach is demonstrated on histopathological cell counting and white matter hyperintensity counting tasks.
Significance. If the reported intervals prove well-calibrated on held-out data without post-hoc recalibration, the multi-task formulation would supply a practical, end-to-end route to uncertainty quantification for count-based biomarkers. The absence of any quantitative results, loss definitions, or calibration diagnostics in the provided text prevents assessment of whether this benefit is realized.
major comments (2)
- [Abstract] Abstract: the central claim that the multi-task network produces intervals 'optimised to be as narrow as possible, while also enclosing a desired percentage of the data' cannot be evaluated because no loss function, coverage target, training procedure, or empirical coverage statistics are supplied.
- [Abstract] Abstract: no experiment is described that tests whether the learned intervals maintain nominal coverage on data whose count distribution differs from the training set, which is required to substantiate the claim that post-hoc adjustment is unnecessary.
Simulated Author's Rebuttal
We thank the referee for their comments. We address each major comment below and indicate where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the multi-task network produces intervals 'optimised to be as narrow as possible, while also enclosing a desired percentage of the data' cannot be evaluated because no loss function, coverage target, training procedure, or empirical coverage statistics are supplied.
Authors: The full manuscript (Methods and Results sections) specifies the joint loss (counting regression plus a coverage-constrained width penalty), the target coverage level used in experiments, the end-to-end training procedure, and the resulting empirical coverage on held-out test sets. The abstract is intentionally concise; we will revise it to include a short reference to these elements so the central claim can be evaluated directly from the abstract. revision: yes
-
Referee: [Abstract] Abstract: no experiment is described that tests whether the learned intervals maintain nominal coverage on data whose count distribution differs from the training set, which is required to substantiate the claim that post-hoc adjustment is unnecessary.
Authors: The reported experiments use standard held-out test sets drawn from the same distribution as the training data and show that nominal coverage is achieved without post-hoc recalibration. The manuscript does not contain explicit tests on data with substantially shifted count distributions. We will add a clarifying sentence noting that the current evaluation is in-distribution and that robustness to strong distribution shift remains untested. revision: partial
Circularity Check
No circularity; derivation self-contained with no equations or self-referential reductions shown
full rationale
The provided abstract and context contain no equations, fitting procedures, or derivation steps that could be inspected for self-definition, fitted-input predictions, or self-citation load-bearing. The described multi-task network for interval optimization is presented as a proposal without any reduction to its own inputs by construction. Per the rules, absence of inspectable circular steps requires score 0 and empty steps list; the method is treated as self-contained on the given text.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: international conference on machine learn- ing
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: international conference on machine learn- ing. (2016) 1050–1059
work page 2016
-
[2]
Kendall, A., Badrinarayanan, V., Cipolla, R.: Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv preprint arXiv:1511.02680 (2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[3]
Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks
Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D., Batra, D.: Why M heads are better than one: Training a diverse ensemble of deep networks. arXiv preprint arXiv:1511.06314 (2015)
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[4]
In: MICCAI, Springer (2017) 611–619
Tanno, R., Worrall, D.E., Ghosh, A., Kaden, E., Sotiropoulos, S.N., Criminisi, A., Alexander, D.C.: Bayesian image quality transfer with CNNs: Exploring uncer- tainty in DMRI super-resolution. In: MICCAI, Springer (2017) 611–619
work page 2017
-
[5]
In: MICCAI, Springer (2018) 3–11
Bragman, F.J., Tanno, R., Eaton-Rosen, Z., Li, W., Hawkes, D.J., Ourselin, S., Alexander, D.C., McClelland, J.R., Cardoso, M.J.: Uncertainty in multitask learn- ing: joint representations for probabilistic MR-only radiotherapy planning. In: MICCAI, Springer (2018) 3–11
work page 2018
- [6]
-
[7]
Wang, G., Li, W., Aertsen, M., Deprest, J., Ourselin, S., Vercauteren, T.: Aleatoric uncertainty estimation with test-time augmentation for medical image segmenta- tion with convolutional neural networks. Neurocomputing (2019)
work page 2019
-
[8]
High-Quality Prediction Intervals for Deep Learning: A Distribution-Free, Ensembled Approach
Pearce, T., Zaki, M., Brintrup, A., Neely, A.: High-quality prediction inter- vals for deep learning: A distribution-free, ensembled approach. arXiv preprint arXiv:1802.07167 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[9]
In: MICCAI, Springer (2018) 691–699
Eaton-Rosen, Z., Bragman, F., Bisdas, S., Ourselin, S., Cardoso, M.J.: Towards safe deep learning: accurately quantifying biomarker uncertainty in neural network predictions. In: MICCAI, Springer (2018) 691–699
work page 2018
-
[10]
IEEE transactions on medical imaging 38(2) (2019) 448–459
Naylor, P., La´ e, M., Reyal, F., Walter, T.: Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE transactions on medical imaging 38(2) (2019) 448–459
work page 2019
-
[11]
In: Advances in neural information processing systems
Lempitsky, V., Zisserman, A.: Learning to count objects in images. In: Advances in neural information processing systems. (2010) 1324–1332
work page 2010
-
[12]
Xie, W., Noble, J.A., Zisserman, A.: Microscopy cell counting and detection with fully convolutional regression networks. Computer methods in biomechanics and biomedical engineering: Imaging & Visualization 6(3) (2018) 283–292
work page 2018
-
[13]
In: MICCAI, Springer (2015) 234–241
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: MICCAI, Springer (2015) 234–241
work page 2015
-
[14]
Computer methods and programs in biomedicine 158 (2018) 113–122
Gibson, E., Li, W., Sudre, C., Fidon, L., Shakir, D.I., Wang, G., Eaton-Rosen, Z., Gray, R., Doel, T., Hu, Y., et al.: NiftyNet: a deep-learning platform for medical imaging. Computer methods and programs in biomedicine 158 (2018) 113–122
work page 2018
-
[15]
https://github.com/aleju/imgaug (2018)
Jung, A.B.: imgaug. https://github.com/aleju/imgaug (2018)
work page 2018
-
[16]
IEEE transactions on medical imaging (2019) A Supplementary Materials A.1 Data Augmentation Fig
Kuijf, H., Biesbroek, J., de Bresser, J., Heinen, R., Andermatt, S., Bento, M., Berseth, M., Belyaev, M., Cardoso, M., Casamitjana, A., et al.: Standardized assessment of automatic segmentation of white matter hyperintensities; results of the WMH segmentation challenge. IEEE transactions on medical imaging (2019) A Supplementary Materials A.1 Data Augment...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.