A Regularized Convolutional Neural Network for Semantic Image Segmentation

Fan Jia; Jun Liu; Xue-Cheng Tai

arxiv: 1907.05287 · v1 · pith:UBRYXAIAnew · submitted 2019-06-28 · 💻 cs.CV

A Regularized Convolutional Neural Network for Semantic Image Segmentation

Fan Jia , Jun Liu , Xue-Cheng Tai This is my paper

Pith reviewed 2026-05-25 13:37 UTC · model grok-4.3

classification 💻 cs.CV

keywords semantic segmentationconvolutional neural networkstotal variationregularizationU-NetSegNetspatial regularitynoise robustness

0 comments

The pith

Integrating total variation into the loss of U-Net and SegNet produces more regular and noise-robust semantic segmentations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes adding a total variation term to the loss function of established CNN segmentation models like U-Net and SegNet. This regularization encourages spatial smoothness in the pixel predictions without altering the network architecture. Experiments on white blood cell, CamVid, and SUN-RGBD datasets show improved segmentation quality and increased robustness to noise compared to the unregularized baselines. A sympathetic reader would care because standard CNNs often produce irregular boundaries in segmentation tasks due to lack of explicit neighbor pixel constraints.

Core claim

By incorporating a total variation regularization term into the training loss of convolutional neural networks for semantic segmentation, the method achieves smoother object boundaries and greater resilience to input noise while maintaining the original network structures of U-Net and SegNet.

What carries the argument

The total variation term added to the segmentation loss function, which penalizes differences between neighboring pixel predictions to enforce spatial regularity.

If this is right

The regularized models achieve better segmentation results with regularization effect than the original ones.
The regularized networks have certain robustness to noise.
This approach integrates spatial regularization without changing the network architecture.
The method is tested and shown effective on WBC, CamVid, and SUN-RGBD datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could reduce reliance on separate post-processing for boundary smoothness in segmentation workflows.
The regularization approach may generalize to other pixel-wise prediction tasks in computer vision.
Further tests on diverse noisy environments could strengthen evidence for robustness.
Potential to combine with other regularization techniques for enhanced performance.

Load-bearing premise

The total variation term can be integrated into the loss of U-Net and SegNet without requiring changes to the network architecture or training procedure that would invalidate the original models' learned features.

What would settle it

Experiments demonstrating no improvement in segmentation accuracy or no added robustness to noise on the tested datasets would disprove the benefits claimed.

Figures

Figures reproduced from arXiv: 1907.05287 by Fan Jia, Jun Liu, Xue-Cheng Tai.

**Figure 1.** Figure 1: An example of segmentation results by performing the original Unet [23] and our proposed regularized Unet (RUnet) on WBC Dataset[33]. When adding noise to image, the segmentation of nucleus by Unet becomes messy ( [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Unet1 and RUnet1 are trained on clean WBC dataset, Unet2 and RUnet2 are trained on noisy WBC dataset. We add gaussian noise with zero mean, standard deviation σ from 0.01 to 0.1 to WBC testing dataset. WBC Dataset 2 has simple image structure and distinct details, it is very convenient for us to observe the difference in details intuitively. We replace original softmax layer with regularized softmax layer,… view at source ↗

**Figure 3.** Figure 3: Segmentation results predicted by Unet and RUnet trained on noisy dataset. Noise type from left to right: small level salt and pepper(s&p) noise, large level s&p noise, small level gaussian noise, medium level gaussian noise, medium level gaussian noise. regularization may happen. Our trainable λ scheme helps avoid falling into such a problem. We can see obvious degradation in predictions on noisy images f… view at source ↗

**Figure 4.** Figure 4: Segnet1 and RSegnet1 are trained on clean CamVid dataset, Segnet2 and RSegnet2 are trained on noisy CamVid dataset. We add gaussian noise with zero mean, standard deviation σ from 0.01 to 0.1 to CamVid testing dataset. We replace original softmax layer with regularized softmax layer, other layers and parameters of Segnet and RSegnet remain the same. Both Segnet and RSegnet are trained for 80k iterations w… view at source ↗

**Figure 5.** Figure 5: Segmentation results of Segnet and RSegnet trained on noisy dataset. Noise type from left to right: clean image, medium level pepper noise, medium level gaussian noise, large level gaussian noise. 4.3. SUN-RGBD Dataset. SUN-RGBD Dataset[28] is a much more challenging dataset of indoor scenes with 10355 images in total. We randomly select 5,285 images as our training dataset and the remaining images are use… view at source ↗

**Figure 6.** Figure 6: Segnet1 and RSegnet1 are trained on clean SUN-RGBD Dataset, Segnet2 and RSegnet2 are trained on noisy SUN-RGBD dataset. We add gaussian noise with zero mean, standard deviation σ from 0.01 to 0.1 to SUN-RGBD testing dataset [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Segmentation results of Segnet and RSegnet trained on clean dataset. Noise type from left to right: clean image, medium level gaussian noise, medium level gaussian noise, small level salt noise. This manuscript is for review purposes only [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

Convolutional neural networks (CNNs) show outstanding performance in many image processing problems, such as image recognition, object detection and image segmentation. Semantic segmentation is a very challenging task that requires recognizing, understanding what's in the image in pixel level. Though the state of the art has been greatly improved by CNNs, there is no explicit connections between prediction of neighbouring pixels. That is, spatial regularity of the segmented objects is still a problem for CNNs. In this paper, we propose a method to add spatial regularization to the segmented objects. In our method, the spatial regularization such as total variation (TV) can be easily integrated into CNN network. It can help CNN find a better local optimum and make the segmentation results more robust to noise. We apply our proposed method to Unet and Segnet, which are well established CNNs for image segmentation, and test them on WBC, CamVid and SUN-RGBD datasets, respectively. The results show that the regularized networks not only could provide better segmentation results with regularization effect than the original ones but also have certain robustness to noise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a total variation term to the cross-entropy loss of U-Net and SegNet to enforce spatial regularity in segmentations, with tests on WBC, CamVid and SUN-RGBD.

read the letter

The main contribution is a direct insertion of total variation into the training loss of two established segmentation networks. The TV term penalizes label differences between neighboring pixels and is meant to produce smoother object boundaries while keeping the original architectures and training pipelines intact. This is a practical extension rather than a new framework, but the implementation looks clean enough that it could be tried without major code changes. The datasets are standard and the claim of added noise robustness follows from the regularizer's known properties in classical image processing. If the full experiments include ablations that isolate the TV weight and report consistent gains in boundary metrics, the work supplies a usable engineering note for applications where jagged segmentations are a problem. The soft spot is the abstract's reliance on qualitative assertions about better results without visible numbers or controls. Even if the full text supplies the exact loss equation, the absence of reported IoU or Dice deltas and the lack of comparison to other regularizers make it difficult to gauge the size of the effect. The integration itself does not appear to require hidden architectural adjustments, so that part of the argument holds. This paper is for readers already running U-Net or SegNet who want a lightweight way to improve spatial coherence. It is not for people seeking new theoretical insights or large performance jumps. I would bring it to a reading group only if the group is focused on practical regularization tricks. I would not cite it in my own work. It deserves peer review because the idea is reproducible and the datasets are public, even though the evidence presented so far is limited.

Referee Report

2 major / 1 minor

Summary. The paper proposes integrating a total variation (TV) regularization term directly into the cross-entropy loss of unmodified U-Net and SegNet architectures for semantic image segmentation. It claims this yields improved segmentation accuracy with a regularization effect and greater robustness to noise, evaluated on the WBC, CamVid, and SUN-RGBD datasets.

Significance. If the empirical improvements hold under proper controls, the approach offers a lightweight, architecture-preserving method for enforcing spatial regularity in CNN segmentation outputs. This could be practically useful for noisy real-world imagery, building on standard models and datasets without requiring new network designs.

major comments (2)

[Abstract] Abstract: the central claims of 'better segmentation results' and 'certain robustness to noise' are asserted without any quantitative metrics, tables, ablation studies, or statistical comparisons, so the magnitude and reliability of the reported gains cannot be assessed from the provided text.
[Method] Method description (paragraph on integration): the claim that the TV term integrates 'easily' into existing U-Net/SegNet losses without altering learned features or training procedures is load-bearing for the isolation of the regularization effect, yet no concrete loss equation, weighting schedule, or training-protocol details are supplied to verify this.

minor comments (1)

The abstract would be strengthened by including at least one key quantitative result (e.g., mIoU delta or noise-robustness metric) to ground the claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and will revise the manuscript to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims of 'better segmentation results' and 'certain robustness to noise' are asserted without any quantitative metrics, tables, ablation studies, or statistical comparisons, so the magnitude and reliability of the reported gains cannot be assessed from the provided text.

Authors: We agree that the abstract would benefit from quantitative highlights to better convey the magnitude of improvements. The full manuscript reports accuracy and robustness metrics on WBC, CamVid, and SUN-RGBD, but these are not summarized in the abstract. We will revise the abstract to include key quantitative results (e.g., mIoU gains and noise-robustness deltas) drawn from the experimental tables. revision: yes
Referee: [Method] Method description (paragraph on integration): the claim that the TV term integrates 'easily' into existing U-Net/SegNet losses without altering learned features or training procedures is load-bearing for the isolation of the regularization effect, yet no concrete loss equation, weighting schedule, or training-protocol details are supplied to verify this.

Authors: We acknowledge that the current method section lacks an explicit loss equation and training details. In the revision we will add the precise combined loss formulation (cross-entropy plus weighted TV term), the schedule for the regularization weight, and confirmation that the network architecture and optimizer remain unchanged, thereby isolating the regularization effect. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical modification: a total variation term is added directly to the cross-entropy loss of unmodified U-Net and SegNet architectures, with training performed on external standard datasets (WBC, CamVid, SUN-RGBD). No derivation chain, uniqueness theorem, or fitted parameter is invoked whose output is definitionally equivalent to its input; reported improvements are measured against held-out test data rather than being forced by internal construction or self-citation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes that standard CNN training assumptions (gradient descent convergence, dataset representativeness) continue to hold after the added term.

pith-pipeline@v0.9.0 · 5717 in / 996 out tokens · 19638 ms · 2026-05-25T13:37:59.326704+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Neural Flow Operators can Approximate any Operator: Abstract Frameworks and Universal Approcimations
cs.LG 2026-05 unverdicted novelty 7.0

Neural flow operators with composition and separation structures are proven to universally approximate any operator in finite and infinite dimensions, recovering ResNet-type and plain architectures via time discretizations.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

V. Badrinarayanan, A. Kendall, and R. Cipolla , Segnet: A deep convolutional encoder-decoder architecture for image segmentation, arXiv preprint arXiv:1511.00561, (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[2]

Barghout and L

L. Barghout and L. Lee , Perceptual information processing system , Mar. 25 2004. US Patent App. 10/618,543

work page 2004
[3]

G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla , Segmentation and recognition using structure from motion point clouds , in European conference on computer vision, Springer, 2008, pp. 44–57

work page 2008
[4]

Chambolle, An algorithm for total variation minimization and applications , Journal of Mathematical imaging and vision, 20 (2004), pp

A. Chambolle, An algorithm for total variation minimization and applications , Journal of Mathematical imaging and vision, 20 (2004), pp. 89–97

work page 2004
[5]

Chambolle and P.-L

A. Chambolle and P.-L. Lions , Image recovery via total variation minimization and related problems , Numerische Mathematik, 76 (1997), pp. 167–188

work page 1997
[6]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille , Semantic image segmen- tation with deep convolutional nets and fully connected crfs , arXiv preprint arXiv:1412.7062, (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[7]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille , Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs , IEEE transactions on pattern analysis and machine intelligence, 40 (2018), pp. 834–848

work page 2018
[8]

Erhan, C

D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov , Scalable object detection using deep neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2147–2154

work page 2014
[9]

Girshick, J

R. Girshick, J. Donahue, T. Darrell, and J. Malik , Rich feature hierarchies for accurate object detection and semantic segmentation , in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

work page 2014
[10]

Hariharan, P

B. Hariharan, P. Arbel ´aez, R. Girshick, and J. Malik , Simultaneous detection and segmentation , in European Conference on Computer Vision, Springer, 2014, pp. 297–312

work page 2014
[11]

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectiﬁers: Surpassing human-level performance on imagenet classiﬁcation , in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034

work page 2015
[12]

K. He, X. Zhang, S. Ren, and J. Sun , Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016
[13]

Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?

M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, K. Rosaen, and R. Vasudevan , Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? , arXiv preprint arXiv:1610.01983, (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[14]

Kr¨ahenb¨uhl and V

P. Kr¨ahenb¨uhl and V. Koltun, Eﬃcient inference in fully connected crfs with gaussian edge potentials, in Advances in neural information processing systems, 2011, pp. 109–117

work page 2011
[15]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton , Imagenet classiﬁcation with deep convolutional neural networks, in Advances in neural information processing systems, 2012, pp. 1097–1105

work page 2012
[16]

Ladick`y, P

L. Ladick`y, P. Sturgess, K. Alahari, C. Russell, and P. H. Torr , What, where and how many? This manuscript is for review purposes only. 20 FAN JIA, JUN LIU, AND XUE-CHENG TAI combining object detectors and crfs , in European conference on computer vision, Springer, 2010, pp. 424–437

work page 2010
[17]

LeCun, L

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner , Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998), pp. 2278–2324

work page 1998
[18]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg , Ssd: Single shot multibox detector, in European conference on computer vision, Springer, 2016, pp. 21–37

work page 2016
[19]

J. Long, E. Shelhamer, and T. Darrell , Fully convolutional networks for semantic segmentation , in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440

work page 2015
[20]

H. Noh, S. Hong, and B. Han, Learning deconvolution network for semantic segmentation, in Proceed- ings of the IEEE international conference on computer vision, 2015, pp. 1520–1528

work page 2015
[21]

P. Ochs, R. Ranftl, T. Brox, and T. Pock , Techniques for gradient-based bilevel optimization with non-smooth lower level problems , Journal of Mathematical Imaging and Vision, 56 (2016), pp. 175– 194

work page 2016
[22]

Papandreou, I

G. Papandreou, I. Kokkinos, and P.-A. Savalle , Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection , in Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 390–399

work page 2015
[23]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox , U-net: Convolutional networks for biomedical image segmentation, in International Conference on Medical image computing and computer-assisted inter- vention, Springer, 2015, pp. 234–241

work page 2015
[24]

L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: nonlinear phenomena, 60 (1992), pp. 259–268

work page 1992
[25]

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun , Overfeat: Integrated recognition, localization and detection using convolutional networks , arXiv preprint arXiv:1312.6229, (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[26]

Shapiro and G

L. Shapiro and G. C. Stockman , Computer vision. 2001 , ed: Prentice Hall, (2001)

work page 2001
[27]

Shotton, M

J. Shotton, M. Johnson, and R. Cipolla , Semantic texton forests for image categorization and segmentation, in Computer vision and pattern recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, 2008, pp. 1–8

work page 2008
[28]

S. Song, S. P. Lichtenberg, and J. Xiao, Sun rgb-d: A rgb-d scene understanding benchmark suite , in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 567–576

work page 2015
[29]

Sturgess, K

P. Sturgess, K. Alahari, L. Ladicky, and P. H. Torr , Combining appearance and structure from motion features for road scene understanding , in BMVC-British Machine Vision Conference, BMVA, 2009

work page 2009
[30]

Wu and X.-C

C. Wu and X.-C. Tai, Augmented lagrangian method, dual methods, and split bregman iteration for rof, vectorial tv, and high order models , SIAM Journal on Imaging Sciences, 3 (2010), pp. 300–339

work page 2010
[31]

M. D. Zeiler and R. Fergus , Visualizing and understanding convolutional networks , in European conference on computer vision, Springer, 2014, pp. 818–833

work page 2014
[32]

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia , Pyramid scene parsing network , in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2881–2890

work page 2017
[33]

Masset, R

X. Zheng, Y. Wang, G. Wang, and J. Liu , Fast and robust segmentation of white blood cell images by self-supervised learning, Micron, 107 (2018), pp. 55–71, https://doi.org/https://doi.org/10.1016/j. micron.2018.01.010, https://www.sciencedirect.com/science/article/pii/S0968432817303037. This manuscript is for review purposes only

work page doi:10.1016/j 2018

[1] [1]

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

V. Badrinarayanan, A. Kendall, and R. Cipolla , Segnet: A deep convolutional encoder-decoder architecture for image segmentation, arXiv preprint arXiv:1511.00561, (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[2] [2]

Barghout and L

L. Barghout and L. Lee , Perceptual information processing system , Mar. 25 2004. US Patent App. 10/618,543

work page 2004

[3] [3]

G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla , Segmentation and recognition using structure from motion point clouds , in European conference on computer vision, Springer, 2008, pp. 44–57

work page 2008

[4] [4]

Chambolle, An algorithm for total variation minimization and applications , Journal of Mathematical imaging and vision, 20 (2004), pp

A. Chambolle, An algorithm for total variation minimization and applications , Journal of Mathematical imaging and vision, 20 (2004), pp. 89–97

work page 2004

[5] [5]

Chambolle and P.-L

A. Chambolle and P.-L. Lions , Image recovery via total variation minimization and related problems , Numerische Mathematik, 76 (1997), pp. 167–188

work page 1997

[6] [6]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille , Semantic image segmen- tation with deep convolutional nets and fully connected crfs , arXiv preprint arXiv:1412.7062, (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[7] [7]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille , Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs , IEEE transactions on pattern analysis and machine intelligence, 40 (2018), pp. 834–848

work page 2018

[8] [8]

Erhan, C

D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov , Scalable object detection using deep neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2147–2154

work page 2014

[9] [9]

Girshick, J

R. Girshick, J. Donahue, T. Darrell, and J. Malik , Rich feature hierarchies for accurate object detection and semantic segmentation , in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

work page 2014

[10] [10]

Hariharan, P

B. Hariharan, P. Arbel ´aez, R. Girshick, and J. Malik , Simultaneous detection and segmentation , in European Conference on Computer Vision, Springer, 2014, pp. 297–312

work page 2014

[11] [11]

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectiﬁers: Surpassing human-level performance on imagenet classiﬁcation , in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034

work page 2015

[12] [12]

K. He, X. Zhang, S. Ren, and J. Sun , Deep residual learning for image recognition, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016

[13] [13]

Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks?

M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, K. Rosaen, and R. Vasudevan , Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? , arXiv preprint arXiv:1610.01983, (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[14] [14]

Kr¨ahenb¨uhl and V

P. Kr¨ahenb¨uhl and V. Koltun, Eﬃcient inference in fully connected crfs with gaussian edge potentials, in Advances in neural information processing systems, 2011, pp. 109–117

work page 2011

[15] [15]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton , Imagenet classiﬁcation with deep convolutional neural networks, in Advances in neural information processing systems, 2012, pp. 1097–1105

work page 2012

[16] [16]

Ladick`y, P

L. Ladick`y, P. Sturgess, K. Alahari, C. Russell, and P. H. Torr , What, where and how many? This manuscript is for review purposes only. 20 FAN JIA, JUN LIU, AND XUE-CHENG TAI combining object detectors and crfs , in European conference on computer vision, Springer, 2010, pp. 424–437

work page 2010

[17] [17]

LeCun, L

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner , Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998), pp. 2278–2324

work page 1998

[18] [18]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg , Ssd: Single shot multibox detector, in European conference on computer vision, Springer, 2016, pp. 21–37

work page 2016

[19] [19]

J. Long, E. Shelhamer, and T. Darrell , Fully convolutional networks for semantic segmentation , in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440

work page 2015

[20] [20]

H. Noh, S. Hong, and B. Han, Learning deconvolution network for semantic segmentation, in Proceed- ings of the IEEE international conference on computer vision, 2015, pp. 1520–1528

work page 2015

[21] [21]

P. Ochs, R. Ranftl, T. Brox, and T. Pock , Techniques for gradient-based bilevel optimization with non-smooth lower level problems , Journal of Mathematical Imaging and Vision, 56 (2016), pp. 175– 194

work page 2016

[22] [22]

Papandreou, I

G. Papandreou, I. Kokkinos, and P.-A. Savalle , Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection , in Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 390–399

work page 2015

[23] [23]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox , U-net: Convolutional networks for biomedical image segmentation, in International Conference on Medical image computing and computer-assisted inter- vention, Springer, 2015, pp. 234–241

work page 2015

[24] [24]

L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: nonlinear phenomena, 60 (1992), pp. 259–268

work page 1992

[25] [25]

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun , Overfeat: Integrated recognition, localization and detection using convolutional networks , arXiv preprint arXiv:1312.6229, (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[26] [26]

Shapiro and G

L. Shapiro and G. C. Stockman , Computer vision. 2001 , ed: Prentice Hall, (2001)

work page 2001

[27] [27]

Shotton, M

J. Shotton, M. Johnson, and R. Cipolla , Semantic texton forests for image categorization and segmentation, in Computer vision and pattern recognition, 2008. CVPR 2008. IEEE Conference on, IEEE, 2008, pp. 1–8

work page 2008

[28] [28]

S. Song, S. P. Lichtenberg, and J. Xiao, Sun rgb-d: A rgb-d scene understanding benchmark suite , in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 567–576

work page 2015

[29] [29]

Sturgess, K

P. Sturgess, K. Alahari, L. Ladicky, and P. H. Torr , Combining appearance and structure from motion features for road scene understanding , in BMVC-British Machine Vision Conference, BMVA, 2009

work page 2009

[30] [30]

Wu and X.-C

C. Wu and X.-C. Tai, Augmented lagrangian method, dual methods, and split bregman iteration for rof, vectorial tv, and high order models , SIAM Journal on Imaging Sciences, 3 (2010), pp. 300–339

work page 2010

[31] [31]

M. D. Zeiler and R. Fergus , Visualizing and understanding convolutional networks , in European conference on computer vision, Springer, 2014, pp. 818–833

work page 2014

[32] [32]

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia , Pyramid scene parsing network , in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2881–2890

work page 2017

[33] [33]

Masset, R

X. Zheng, Y. Wang, G. Wang, and J. Liu , Fast and robust segmentation of white blood cell images by self-supervised learning, Micron, 107 (2018), pp. 55–71, https://doi.org/https://doi.org/10.1016/j. micron.2018.01.010, https://www.sciencedirect.com/science/article/pii/S0968432817303037. This manuscript is for review purposes only

work page doi:10.1016/j 2018