Deep Active Learning for Axon-Myelin Segmentation on Histology Data
Pith reviewed 2026-05-24 23:13 UTC · model grok-4.3
The pith
A U-Net reaches peak myelin segmentation accuracy after annotating just three uncertainty-selected histology images instead of fifteen random ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Experiments on spinal cord and brain microscopic histology samples showed that the method reached a maximum Dice value after adding 3 uncertainty-selected samples to the initial training set, versus 15 randomly-selected samples, thereby significantly reducing the annotation effort.
What carries the argument
Overall uncertainty measure obtained by taking Monte Carlo samples while using Dropout regularization scheme in the U-Net to select which samples to annotate.
If this is right
- The framework achieves high segmentation performance with very few labelled samples on realistic small datasets.
- It works across different acquisition settings including Serial Block-Face Electron Microscopy and Transmitting Electron Microscopy.
- Annotation effort for experts is significantly reduced for axon-myelin segmentation tasks.
- The straightforward implementation supports fast and accurate segmentation on new biomedical datasets.
Where Pith is reading between the lines
- Similar uncertainty sampling could lower labeling costs for other pixel-level biomedical segmentation problems beyond myelin.
- The approach may enable labs with limited annotation resources to apply deep models to their own histology data more readily.
- Further checks on whether uncertainty scores predict actual error reduction across additional modalities would test broader applicability.
Load-bearing premise
The uncertainty measure from Monte Carlo dropout samples reliably identifies the samples that most improve the segmentation model on these histology datasets.
What would settle it
A test showing that randomly selected samples produce equivalent or greater Dice score gains than uncertainty-selected samples when added in equal numbers to the same initial training sets.
Figures
read the original abstract
Semantic segmentation is a crucial task in biomedical image processing, which recent breakthroughs in deep learning have allowed to improve. However, deep learning methods in general are not yet widely used in practice since they require large amount of data for training complex models. This is particularly challenging for biomedical images, because data and ground truths are a scarce resource. Annotation efforts for biomedical images come with a real cost, since experts have to manually label images at pixel-level on samples usually containing many instances of the target anatomy (e.g. in histology samples: neurons, astrocytes, mitochondria, etc.). In this paper we provide a framework for Deep Active Learning applied to a real-world scenario. Our framework relies on the U-Net architecture and overall uncertainty measure to suggest which sample to annotate. It takes advantage of the uncertainty measure obtained by taking Monte Carlo samples while using Dropout regularization scheme. Experiments were done on spinal cord and brain microscopic histology samples to perform a myelin segmentation task. Two realistic small datasets of 14 and 24 images were used, from different acquisition settings (Serial Block-Face Electron Microscopy and Transmitting Electron Microscopy) and showed that our method reached a maximum Dice value after adding 3 uncertainty-selected samples to the initial training set, versus 15 randomly-selected samples, thereby significantly reducing the annotation effort. We focused on a plausible scenario and showed evidence that this straightforward implementation achieves a high segmentation performance with very few labelled samples. We believe our framework may benefit any biomedical researcher willing to obtain fast and accurate image segmentation on their own dataset. The code is freely available at https://github.com/neuropoly/deep-active-learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a deep active learning framework that combines a U-Net architecture with Monte Carlo Dropout-based uncertainty sampling to select histology images for annotation in an axon-myelin segmentation task. On two small datasets (14 spinal-cord and 24 brain images acquired under different EM modalities), the method is reported to reach its maximum Dice score after the addition of only 3 uncertainty-selected samples to an initial training set, versus 15 randomly selected samples.
Significance. If the performance gap is shown to be robust, the work would offer a practical, low-cost route to high-accuracy myelin segmentation when expert pixel-level labels are scarce. The public release of the code is a clear strength that aids reproducibility and adoption in biomedical imaging.
major comments (3)
- [Results / Experiments] Results section (and abstract): the central claim that uncertainty sampling reaches peak Dice after 3 samples versus 15 for random selection is presented without any report of multiple random initializations, standard deviations, or statistical tests. With total dataset sizes of only 14 and 24 images, performance curves are known to be sensitive to the composition of the initial labeled pool; absence of variance measures leaves the reported reduction in annotation effort unverified.
- [Methods / Experiments] Experimental protocol: the manuscript gives no description of how the initial training set is chosen, the size of the unlabeled pool at each iteration, the precise stopping criterion for the active-learning loop, or the full hyper-parameter settings used for the U-Net and MC-Dropout sampling. These omissions make it impossible to reproduce or assess the reliability of the 3-versus-15 comparison.
- [Results] Evaluation: only a single overall Dice value trajectory is shown; no per-class (axon vs. myelin) metrics, no comparison against other established active-learning acquisition functions (e.g., BALD, core-set), and no baseline using a non-Dropout uncertainty estimator are provided. This limits the ability to attribute the observed gain specifically to the MC-Dropout uncertainty measure.
minor comments (2)
- [Abstract] The abstract states that the method 'significantly reducing the annotation effort' without supplying the actual Dice curves, the number of MC samples, or any quantitative measure of significance.
- [Figures] Figure captions and axis labels should explicitly state the number of MC forward passes and the exact uncertainty aggregation formula used.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects for strengthening the experimental validation and reproducibility of our active learning framework. We address each major comment point-by-point below.
read point-by-point responses
-
Referee: [Results / Experiments] Results section (and abstract): the central claim that uncertainty sampling reaches peak Dice after 3 samples versus 15 for random selection is presented without any report of multiple random initializations, standard deviations, or statistical tests. With total dataset sizes of only 14 and 24 images, performance curves are known to be sensitive to the composition of the initial labeled pool; absence of variance measures leaves the reported reduction in annotation effort unverified.
Authors: We agree that variance reporting and statistical analysis are essential for small datasets. In the revised manuscript we will rerun all experiments across multiple random initializations (minimum 5 seeds), report mean Dice trajectories with standard deviations, and include paired statistical tests (e.g., Wilcoxon signed-rank) between uncertainty and random selection curves. revision: yes
-
Referee: [Methods / Experiments] Experimental protocol: the manuscript gives no description of how the initial training set is chosen, the size of the unlabeled pool at each iteration, the precise stopping criterion for the active-learning loop, or the full hyper-parameter settings used for the U-Net and MC-Dropout sampling. These omissions make it impossible to reproduce or assess the reliability of the 3-versus-15 comparison.
Authors: We acknowledge the protocol details were insufficiently specified. The revised Methods section will explicitly state: initial training set selection procedure and size, unlabeled pool composition at each step, stopping criterion (performance plateau or fixed budget), and complete hyper-parameter values for the U-Net (architecture, optimizer, learning rate, epochs) together with MC-Dropout settings (dropout probability, number of forward passes). revision: yes
-
Referee: [Results] Evaluation: only a single overall Dice value trajectory is shown; no per-class (axon vs. myelin) metrics, no comparison against other established active-learning acquisition functions (e.g., BALD, core-set), and no baseline using a non-Dropout uncertainty estimator are provided. This limits the ability to attribute the observed gain specifically to the MC-Dropout uncertainty measure.
Authors: We will add per-class (axon and myelin) Dice scores to the results. However, systematic comparisons against BALD, core-set, and non-Dropout estimators would require extensive new experiments that exceed the scope of the current work, which demonstrates a simple, reproducible MC-Dropout baseline. We will note this limitation explicitly and indicate that such comparisons are left for future investigation. revision: partial
Circularity Check
No circularity: purely empirical active-learning experiments
full rationale
The paper reports experimental results comparing uncertainty sampling (MC Dropout on U-Net) versus random selection on two small histology datasets (14 and 24 images). No derivation chain, first-principles equations, fitted parameters renamed as predictions, or self-citation load-bearing steps exist. The headline claim (max Dice after +3 uncertainty samples vs +15 random) is a direct empirical outcome, not reduced to inputs by construction. This matches the default non-finding for experimental papers without mathematical derivations.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Mechanisms of white matter damage in multiple sclerosis
Hans Lassmann. Mechanisms of white matter damage in multiple sclerosis. Glia, 62(11):1816–1830, 2014
work page 2014
-
[2]
From de- myelination to remyelination: the road toward therapies for spinal cord injury
Florentia Papastefanaki and Rebecca Matsas. From de- myelination to remyelination: the road toward therapies for spinal cord injury. Glia, 63(7):1101–1125, July 2015
work page 2015
-
[3]
U-Net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. In Lecture Notes in Computer Science , pages 234–241. 2015
work page 2015
-
[4]
Aldo Zaimi, Maxime Wabartha, Victor Herman, Pierre- Louis Antonsanti, Christian S Perone, and Julien Cohen- Adad. AxonDeepSeg: automatic axon and myelin seg- mentation from microscopy data using convolutional neural networks. Sci. Rep., 8(1):3816, February 2018
work page 2018
-
[5]
Annegreet van Opbroek, M Arfan Ikram, Meike W Vernooij, and Marleen de Bruijne. A Transfer-Learning approach to image segmentation across scanners by maximizing distribution similarity. In Lecture Notes in Computer Science, pages 49–56. 2013
work page 2013
-
[6]
Weakly-and Semi-Supervised learning of a deep convolutional network for semantic image segmentation
George Papandreou, Liang-Chieh Chen, Kevin P Murphy, and Alan L Yuille. Weakly-and Semi-Supervised learning of a deep convolutional network for semantic image segmentation. In 2015 IEEE International Conference on Computer Vision (ICCV) , 2015
work page 2015
-
[7]
Burr Settles. Active Learning . Morgan & Claypool Publishers, July 2012
work page 2012
-
[8]
Suggestive annotation: A deep active learning framework for biomedical image segmentation
Lin Yang, Yizhe Zhang, Jianxu Chen, Siyuan Zhang, and Danny Z Chen. Suggestive annotation: A deep active learning framework for biomedical image segmentation. In Lecture Notes in Computer Science , pages 399–407. 2017
work page 2017
-
[9]
Membrane segmen- tation via active learning with deep networks
Utkarsh Gaur, Matthew Kourakis, Erin Newman-Smith, William Smith, and B S Manjunath. Membrane segmen- tation via active learning with deep networks. In 2016 IEEE International Conference on Image Processing (ICIP), 2016
work page 2016
-
[10]
The MNIST database of hand- written digits, 1998
Y Lecun and C Cortes. The MNIST database of hand- written digits, 1998. 8 Fig. 9. U-Net architecture used for these experiments
work page 1998
-
[11]
International skin imaging collaboration: Melanoma project website, 2017
ISIC. International skin imaging collaboration: Melanoma project website, 2017
work page 2017
-
[12]
Yarin Gal. Uncertainty in Deep Learning . PhD thesis, University of Cambridge, 2016
work page 2016
-
[13]
Deep bayesian active learning with image data
Yarin Gal, Riashat Islam, and Zoubin Ghahramani. Deep bayesian active learning with image data. 2016
work page 2016
-
[14]
Dropout: A simple way to pre- vent neural net- works from overfitting
N Srivastava, G Hinton, A Krizhevsky, I Sutskever, Salakhutdinov, and R. Dropout: A simple way to pre- vent neural net- works from overfitting. The Journal ofMachine Learning Research , 15(1):1929–1958, 2014
work page 1929
-
[15]
Active Deep Learning for Medical Imaging Segmentation
Marc Gorriz Blanch. Active Deep Learning for Medical Imaging Segmentation. PhD thesis, Universitat Politec- nica de Catalunya (UPC), 2017
work page 2017
-
[16]
High-resolution whole-brain staining for electron microscopic circuit reconstruction
Shawn Mikula and Winfried Denk. High-resolution whole-brain staining for electron microscopic circuit reconstruction. Nat. Methods, 12(6):541–546, June 2015
work page 2015
-
[17]
White matter microscopy database, Jun 2019
Julien Cohen-Adad, Mark Does, Tanguy DUV AL, Tim B Dyrby, Els Fieremans, Alexandru Foias, Harris Nami, Farshid Sepehrband, Nikola Stikov, Aldo Zaimi, and et al. White matter microscopy database, Jun 2019
work page 2019
-
[18]
Risk versus uncertainty in deep learning: Bayes, bootstrap and the dangers of dropout
Ian Osband. Risk versus uncertainty in deep learning: Bayes, bootstrap and the dangers of dropout. In Pro- ceedings of the NIPS* 2016 Workshop on Bayesian Deep Learning, 2016
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.