A Comparative Study of High-Recall Real-Time Semantic Segmentation Based on Swift Factorized Network

Kailun Yang; Kaite Xiang; Kaiwei Wang

arxiv: 1907.11394 · v1 · pith:FEAGMDOInew · submitted 2019-07-26 · 💻 cs.CV

A Comparative Study of High-Recall Real-Time Semantic Segmentation Based on Swift Factorized Network

Kaite Xiang , Kaiwei Wang , Kailun Yang This is my paper

Pith reviewed 2026-05-24 16:04 UTC · model grok-4.3

classification 💻 cs.CV

keywords semantic segmentationreal-timehigh recallautonomous vehiclestraffic scenesreceptive fieldloss functiondecision rules

0 comments

The pith

A Swift Factorized Network with enlarged receptive-field blocks and three targeted recall methods improves detection of traffic objects over its baseline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on making semantic segmentation usable in safety-critical settings such as autonomous vehicles, where failing to detect a car or pedestrian carries higher cost than a false positive. It introduces the Swift Factorized Network, a real-time model built on a U-shaped structure with lateral connections, and adds two blocks that expand the effective area each pixel considers. The work then tests three separate adjustments—one to the loss, one to the classifier, and one to the final decision rules—to push recall higher. On the CamVid and Cityscapes datasets the combined changes produce clear gains in recall while preserving speed.

Core claim

The Swift Factorized Network, which incorporates enlarged receptive-field blocks and applies three recall-enhancement methods through the loss function, the classifier, and decision rules, reaches excellent performance and significantly improves recall rates compared with the baseline network on the CamVid and Cityscapes datasets.

What carries the argument

Swift Factorized Network (SFN), a U-shaped real-time segmentation architecture with lateral connections plus enlarged receptive-field blocks and recall adjustments applied at loss, classifier, and decision stages.

If this is right

Fewer traffic objects are missed during real-time operation.
The model remains fast enough for vehicle deployment while recall rises.
The three recall methods can be compared directly for their individual contributions.
The same blocks and adjustments can be inserted into other U-shaped segmentation networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The recall adjustments may transfer to segmentation models that were not originally factorized.
Safety validation for new environments would still require fresh recall measurements rather than relying on the original datasets alone.
Designers of other real-time vision systems for hazard detection could adopt similar loss or decision changes without enlarging the network.

Load-bearing premise

Performance gains measured on CamVid and Cityscapes will continue when the same trained model faces new cameras, weather, or road layouts.

What would settle it

Measuring recall on a held-out collection of driving images recorded under different lighting or camera conditions and finding that the enhanced model no longer exceeds the baseline.

Figures

Figures reproduced from arXiv: 1907.11394 by Kailun Yang, Kaite Xiang, Kaiwei Wang.

**Figure 1.** Figure 1: Basic procedure of the paper: a semantic segmentation network with three key methods towards high recall. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Basic structure of SFN: the box in yellow is the Encoder, the box in orange is the Upsampling Decoder, the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The proposed Upsampling Decoder Blocks: (a) is the basic version of SwiftNet, (b) is the variation of ERFNet’s [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: (a) illustrates the importance ranking of classes for CamVid, where G1 is the most important group. (b) and [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Overall framework of SFN based on GCN classifier: the bottom of the figure is GCN and the dotted box is the [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: The pixel-wise priors for bicycle and rider. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: The result comparison between baseline and IAL on CamVid. The white box area is the main difference area, [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: The result comparison between baseline and IAL on Cityscapes. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: The result comparison between baseline and GCN on Cityscapes. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: t-SNE on the GCN classifier. The closer the items are, the closer the semantic meaning is. It shows the [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: The result comparison between baseline and ML decision rule on Cityscapes. [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗

read the original abstract

Semantic Segmentation (SS) is the task to assign a semantic label to each pixel of the observed images, which is of crucial significance for autonomous vehicles, navigation assistance systems for the visually impaired, and augmented reality devices. However, there is still a long way for SS to be put into practice as there are two essential challenges that need to be addressed: efficiency and evaluation criterions for practical application. For specific application scenarios, different criterions need to be adopted. Recall rate is an important criterion for many tasks like autonomous vehicles. For autonomous vehicles, we need to focus on the detection of the traffic objects like cars, buses, and pedestrians, which should be detected with high recall rates. In other words, it is preferable to detect it wrongly than miss it, because the other traffic objects will be dangerous if the algorithm miss them and segment them as safe roadways. In this paper, our main goal is to explore possible methods to attain high recall rate. Firstly, we propose a real-time SS network named Swift Factorized Network (SFN). The proposed network is adapted from SwiftNet, whose structure is a typical U-shape structure with lateral connections. Inspired by ERFNet and Global convolution Networks (GCNet), we propose two different blocks to enlarge valid receptive field. They do not take up too much calculation resources, but significantly enhance the performance compared with the baseline network. Secondly, we explore three ways to achieve higher recall rate, i.e. loss function, classifier and decision rules. We perform a comprehensive set of experiments on state-of-the-art datasets including CamVid and Cityscapes. We demonstrate that our SS convolutional neural networks reach excellent performance. Furthermore, we make a detailed analysis and comparison of the three proposed methods on the promotion of recall rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward empirical comparison that shows concrete recall gains from SFN tweaks and three standard methods on CamVid and Cityscapes, with supporting ablations.

read the letter

The paper adapts SwiftNet into SFN by adding receptive-field blocks drawn from ERFNet and GCNet, then runs a side-by-side test of three recall levers: loss weighting, classifier adjustment, and decision-rule changes. The experiments report higher recall on the two benchmarks while staying real-time, and the full text includes architecture diagrams, ablation tables, and direct baseline comparisons that line up with the claims. That internal evidence is the main value here; the methods are described clearly enough to reproduce from the text. The work is incremental rather than novel—the blocks and recall techniques already exist in the cited literature, and the gains are modest in scale. No new theoretical result or architecture paradigm appears. Generalization beyond the two datasets is not tested, which limits how far the numbers can be taken for new cameras or conditions, but that is a standard limitation rather than a flaw in the reported comparison. The paper is aimed at practitioners who already use real-time segmentation pipelines and need practical tuning options for safety-critical classes. It is honest about its scope as a comparative study. I would bring the ablation tables to a reading group for the concrete numbers. It deserves peer review because the experiments are reproducible and the central comparison holds up without circularity or unsupported leaps.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes the Swift Factorized Network (SFN), adapted from SwiftNet with two enlarged receptive-field blocks inspired by ERFNet and GCNet. It further explores three recall-enhancement techniques (loss function, classifier, and decision rules) and evaluates the resulting models on the CamVid and Cityscapes benchmarks, claiming significantly higher recall than the SwiftNet baseline while preserving real-time inference speed.

Significance. If the internal comparisons hold, the work supplies concrete, reproducible techniques for improving recall in real-time semantic segmentation without sacrificing efficiency. The use of standard public benchmarks, ablation tables, and quantitative results on two datasets provides a verifiable empirical contribution to practical applications such as autonomous driving.

minor comments (2)

[Abstract] Abstract: the summary asserts performance gains and 'excellent performance' but supplies no numerical metrics, error bars, or specific recall/accuracy figures; adding the key quantitative results would make the abstract self-contained.
The manuscript would benefit from an explicit statement of the real-time FPS achieved by the final SFN variants on the target hardware, to directly support the efficiency claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments appear in the provided report, so we offer no point-by-point responses below.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a purely empirical study that proposes the SFN architecture (adapted from SwiftNet with receptive-field blocks) and three recall-enhancement techniques, then reports measured performance on the independent public benchmarks CamVid and Cityscapes. No mathematical derivation, first-principles prediction, or fitted parameter is presented as a result; all claims rest on direct experimental tables and ablations. No self-citation is load-bearing for any uniqueness claim, and no step reduces by construction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on experimental outcomes on two standard road-scene datasets. No explicit free parameters, axioms, or invented entities are stated in the abstract; the work is purely empirical.

pith-pipeline@v0.9.0 · 5858 in / 1155 out tokens · 23510 ms · 2026-05-24T16:04:38.750041+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 6 internal anchors

[1]

Unifying terrain awareness for the visually impaired through real-time semantic segmentation,

Yang, K., Wang, K., Bergasa, L., Romera, E., Hu, W., Sun, D., Sun, J., Cheng, R., Chen, T., and L´ opez, E., “Unifying terrain awareness for the visually impaired through real-time semantic segmentation,” Sensors 18(5), 1506 (2018)

work page 2018
[2]

Importance-aware semantic segmentation with eﬃcient pyramidal context network for navigational assistant systems,

Xiang, K., Wang, K., and Yang, K., “Importance-aware semantic segmentation with eﬃcient pyramidal context network for navigational assistant systems,” in [2019 IEEE Intelligent Transportation Systems Con- ference (ITSC) ], 1–7, IEEE (2019)

work page 2019
[3]

Fully convolutional networks for semantic segmentation,

Long, J., Shelhamer, E., and Darrell, T., “Fully convolutional networks for semantic segmentation,” in [2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ], 3431–3440, IEEE (2015)

work page 2015
[4]

Deep residual learning for image recognition,

He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ], 770–778, IEEE (2016)

work page 2016
[5]

In defense of pre-trained imagenet architectures for real- time semantic segmentation of road-driving images,

Orsic, M., Kreso, I., Bevandic, P., and Segvic, S., “In defense of pre-trained imagenet architectures for real- time semantic segmentation of road-driving images,” in [ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ], 12607–12616 (2019)

work page 2019
[6]

Erfnet: Eﬃcient residual factorized convnet for real-time semantic segmentation,

Romera, E., Alvarez, J. M., Bergasa, L. M., and Arroyo, R., “Erfnet: Eﬃcient residual factorized convnet for real-time semantic segmentation,” IEEE Transactions on Intelligent Transportation Systems 19(1), 263–272 (2018)

work page 2018
[7]

Bridging the day and night domain gap for semantic segmentation,

Romera, E., Bergasa, L. M., Yang, K., Alvarez, J. M., and Barea, R., “Bridging the day and night domain gap for semantic segmentation,” in [ 2019 IEEE Intelligent Vehicles Symposium (IV) ], 1184–1190, IEEE (2019)

work page 2019
[8]

Large kernel matters improve semantic segmentation by global convolutional network,

Peng, C., Zhang, X., Yu, G., Luo, G., and Sun, J., “Large kernel matters improve semantic segmentation by global convolutional network,” in [ 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)], 1743–1751, IEEE (2017)

work page 2017
[9]

Importance-aware semantic segmentation for autonomous vehicles,

Chen, B., Gong, C., and Yang, J., “Importance-aware semantic segmentation for autonomous vehicles,” IEEE Transactions on Intelligent Transportation Systems 20(1), 137–148 (2018)

work page 2018
[10]

Semantic object classes in video: A high-deﬁnition ground truth database,

Brostow, G. J., Fauqueur, J., and Cipolla, R., “Semantic object classes in video: A high-deﬁnition ground truth database,” Pattern Recognition Letters 30(2), 88–97 (2009)

work page 2009
[11]

The cityscapes dataset for semantic urban scene understanding,

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B., “The cityscapes dataset for semantic urban scene understanding,” in [ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ], 3213–3223, IEEE (2016)

work page 2016
[12]

U-net: Convolutional networks for biomedical image seg- mentation,

Ronneberger, O., Fischer, P., and Brox, T., “U-net: Convolutional networks for biomedical image seg- mentation,” in [International Conference on Medical image computing and computer-assisted intervention ], 234–241, Springer (2015)

work page 2015
[13]

Pyramid scene parsing network,

Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J., “Pyramid scene parsing network,” in [2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ], 6230–6239, IEEE (2017)

work page 2017
[14]

Rethinking Atrous Convolution for Semantic Image Segmentation

Chen, L.-C., Papandreou, G., Schroﬀ, F., and Adam, H., “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

ACNet: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation

Hu, X., Yang, K., Fei, L., and Wang, K., “Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation,” arXiv preprint arXiv:1905.10089 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1905
[16]

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Paszke, A., Chaurasia, A., Kim, S., and Culurciello, E., “Enet: A deep neural network architecture for real-time semantic segmentation,” arXiv preprint arXiv:1606.02147 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[17]

Unifying terrain awareness through real-time semantic segmentation,

Yang, K., Bergasa, L. M., Romera, E., Cheng, R., Chen, T., and Wang, K., “Unifying terrain awareness through real-time semantic segmentation,” in [2018 IEEE Intelligent Vehicles Symposium (IV) ], 1033–1038, IEEE (2018)

work page 2018
[18]

Can we pass beyond the ﬁeld of view? panoramic annular semantic segmentation for real-world surrounding perception,

Yang, K., Hu, X., Bergasa, L. M., Romera, E., Huang, X., Sun, D., and Wang, K., “Can we pass beyond the ﬁeld of view? panoramic annular semantic segmentation for real-world surrounding perception,” in [ 2019 IEEE Intelligent Vehicles Symposium (IV) ], 374–381, IEEE (2019)

work page 2019
[19]

Icnet for real-time semantic segmentation on high-resolution images,

Zhao, H., Qi, X., Shen, X., Shi, J., and Jia, J., “Icnet for real-time semantic segmentation on high-resolution images,” in [ Proceedings of the European Conference on Computer Vision (ECCV) ], 405–420 (2018)

work page 2018
[20]

Bisenet: Bilateral segmentation network for real- time semantic segmentation,

Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N., “Bisenet: Bilateral segmentation network for real- time semantic segmentation,” in [ Proceedings of the European Conference on Computer Vision (ECCV) ], 325–341 (2018)

work page 2018
[21]

Shuﬄenet: An extremely eﬃcient convolutional neural network for mobile devices,

Zhang, X., Zhou, X., Lin, M., and Sun, J., “Shuﬄenet: An extremely eﬃcient convolutional neural network for mobile devices,” in [ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition ], 6848– 6856, IEEE (2018)

work page 2018
[22]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H., “Mobilenets: Eﬃcient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

Focal loss for dense object detection,

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll´ ar, P., “Focal loss for dense object detection,” in [ 2017 IEEE International Conference on Computer Vision (ICCV) ], 2999–3007, IEEE (2017)

work page 2017
[24]

Predicting polarization beyond se- mantics for wearable robotics,

Yang, K., Bergasa, L. M., Romera, E., Huang, X., and Wang, K., “Predicting polarization beyond se- mantics for wearable robotics,” in [ 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)], 96–103, IEEE (2018)

work page 2018
[25]

Not all pixels are equal: Diﬃculty-aware semantic seg- mentation via deep layer cascade,

Li, X., Liu, Z., Luo, P., Loy, C. C., and Tang, X., “Not all pixels are equal: Diﬃculty-aware semantic seg- mentation via deep layer cascade,” in [2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)], 6459–6468, IEEE (2017)

work page 2017
[26]

Not all areas are equal: Transfer learning for semantic segmentation via hierarchical region selection,

Sun, R., Zhu, X., Wu, C., Huang, C., Shi, J., and Ma, L., “Not all areas are equal: Transfer learning for semantic segmentation via hierarchical region selection,” in [ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ], 4360–4369 (2019)

work page 2019
[27]

Multi-label image recognition with graph convolutional networks,

Chen, Z.-M., Wei, X.-S., Wang, P., and Guo, Y., “Multi-label image recognition with graph convolutional networks,” in [ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ], 5177– 5186 (2019)

work page 2019
[28]

3d graph neural networks for rgbd semantic segmen- tation,

Qi, X., Liao, R., Jia, J., Fidler, S., and Urtasun, R., “3d graph neural networks for rgbd semantic segmen- tation,” in [ 2017 IEEE International Conference on Computer Vision (ICCV) ], 5209–5218, IEEE (2017)

work page 2017
[29]

Robustifying semantic cognition of traversability across wearable rgb-depth cameras,

Yang, K., Bergasa, L. M., Romera, E., and Wang, K., “Robustifying semantic cognition of traversability across wearable rgb-depth cameras,” Applied optics 58(12), 3141–3155 (2019)

work page 2019
[30]

Application of Decision Rules for Handling Class Imbalance in Semantic Segmentation

Chan, R., Rottmann, M., H¨ uger, F., Schlicht, P., and Gottschalk, H., “Application of decision rules for handling class imbalance in semantic segmentation,” arXiv preprint arXiv:1901.08394 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1901
[31]

Imagenet large scale visual recognition challenge,

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision 115(3), 211–252 (2015)

work page 2015
[32]

Spatial pyramid pooling in deep convolutional networks for visual recognition,

He, K., Zhang, X., Ren, S., and Sun, J., “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE transactions on pattern analysis and machine intelligence 37(9), 1904–1916 (2015)

work page 1904
[33]

Adam: A Method for Stochastic Optimization

Kingma, D. P. and Ba, J., “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[34]

Visualizing data using t-sne,

Maaten, L. v. d. and Hinton, G., “Visualizing data using t-sne,” Journal of machine learning re- search 9(Nov), 2579–2605 (2008)

work page 2008
[35]

Visual localizer: Outdoor localization based on convnet descriptor and global optimization for visually impaired pedestrians,

Lin, S., Cheng, R., Wang, K., and Yang, K., “Visual localizer: Outdoor localization based on convnet descriptor and global optimization for visually impaired pedestrians,” Sensors 18(8), 2476 (2018)

work page 2018
[36]

Panoramic annular localizer: Tackling the variation challenges of outdoor localization using panoramic annular images and active deep descriptors,

Cheng, R., Wang, K., Lin, S., Hu, W., Yang, K., Huang, X., Li, H., Sun, D., and Bai, J., “Panoramic annular localizer: Tackling the variation challenges of outdoor localization using panoramic annular images and active deep descriptors,” arXiv preprint arXiv:1905.05425 (2019)

work page arXiv 1905

[1] [1]

Unifying terrain awareness for the visually impaired through real-time semantic segmentation,

Yang, K., Wang, K., Bergasa, L., Romera, E., Hu, W., Sun, D., Sun, J., Cheng, R., Chen, T., and L´ opez, E., “Unifying terrain awareness for the visually impaired through real-time semantic segmentation,” Sensors 18(5), 1506 (2018)

work page 2018

[2] [2]

Importance-aware semantic segmentation with eﬃcient pyramidal context network for navigational assistant systems,

Xiang, K., Wang, K., and Yang, K., “Importance-aware semantic segmentation with eﬃcient pyramidal context network for navigational assistant systems,” in [2019 IEEE Intelligent Transportation Systems Con- ference (ITSC) ], 1–7, IEEE (2019)

work page 2019

[3] [3]

Fully convolutional networks for semantic segmentation,

Long, J., Shelhamer, E., and Darrell, T., “Fully convolutional networks for semantic segmentation,” in [2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ], 3431–3440, IEEE (2015)

work page 2015

[4] [4]

Deep residual learning for image recognition,

He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ], 770–778, IEEE (2016)

work page 2016

[5] [5]

In defense of pre-trained imagenet architectures for real- time semantic segmentation of road-driving images,

Orsic, M., Kreso, I., Bevandic, P., and Segvic, S., “In defense of pre-trained imagenet architectures for real- time semantic segmentation of road-driving images,” in [ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ], 12607–12616 (2019)

work page 2019

[6] [6]

Erfnet: Eﬃcient residual factorized convnet for real-time semantic segmentation,

Romera, E., Alvarez, J. M., Bergasa, L. M., and Arroyo, R., “Erfnet: Eﬃcient residual factorized convnet for real-time semantic segmentation,” IEEE Transactions on Intelligent Transportation Systems 19(1), 263–272 (2018)

work page 2018

[7] [7]

Bridging the day and night domain gap for semantic segmentation,

Romera, E., Bergasa, L. M., Yang, K., Alvarez, J. M., and Barea, R., “Bridging the day and night domain gap for semantic segmentation,” in [ 2019 IEEE Intelligent Vehicles Symposium (IV) ], 1184–1190, IEEE (2019)

work page 2019

[8] [8]

Large kernel matters improve semantic segmentation by global convolutional network,

Peng, C., Zhang, X., Yu, G., Luo, G., and Sun, J., “Large kernel matters improve semantic segmentation by global convolutional network,” in [ 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)], 1743–1751, IEEE (2017)

work page 2017

[9] [9]

Importance-aware semantic segmentation for autonomous vehicles,

Chen, B., Gong, C., and Yang, J., “Importance-aware semantic segmentation for autonomous vehicles,” IEEE Transactions on Intelligent Transportation Systems 20(1), 137–148 (2018)

work page 2018

[10] [10]

Semantic object classes in video: A high-deﬁnition ground truth database,

Brostow, G. J., Fauqueur, J., and Cipolla, R., “Semantic object classes in video: A high-deﬁnition ground truth database,” Pattern Recognition Letters 30(2), 88–97 (2009)

work page 2009

[11] [11]

The cityscapes dataset for semantic urban scene understanding,

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B., “The cityscapes dataset for semantic urban scene understanding,” in [ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ], 3213–3223, IEEE (2016)

work page 2016

[12] [12]

U-net: Convolutional networks for biomedical image seg- mentation,

Ronneberger, O., Fischer, P., and Brox, T., “U-net: Convolutional networks for biomedical image seg- mentation,” in [International Conference on Medical image computing and computer-assisted intervention ], 234–241, Springer (2015)

work page 2015

[13] [13]

Pyramid scene parsing network,

Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J., “Pyramid scene parsing network,” in [2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ], 6230–6239, IEEE (2017)

work page 2017

[14] [14]

Rethinking Atrous Convolution for Semantic Image Segmentation

Chen, L.-C., Papandreou, G., Schroﬀ, F., and Adam, H., “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

ACNet: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation

Hu, X., Yang, K., Fei, L., and Wang, K., “Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation,” arXiv preprint arXiv:1905.10089 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1905

[16] [16]

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Paszke, A., Chaurasia, A., Kim, S., and Culurciello, E., “Enet: A deep neural network architecture for real-time semantic segmentation,” arXiv preprint arXiv:1606.02147 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[17] [17]

Unifying terrain awareness through real-time semantic segmentation,

Yang, K., Bergasa, L. M., Romera, E., Cheng, R., Chen, T., and Wang, K., “Unifying terrain awareness through real-time semantic segmentation,” in [2018 IEEE Intelligent Vehicles Symposium (IV) ], 1033–1038, IEEE (2018)

work page 2018

[18] [18]

Can we pass beyond the ﬁeld of view? panoramic annular semantic segmentation for real-world surrounding perception,

Yang, K., Hu, X., Bergasa, L. M., Romera, E., Huang, X., Sun, D., and Wang, K., “Can we pass beyond the ﬁeld of view? panoramic annular semantic segmentation for real-world surrounding perception,” in [ 2019 IEEE Intelligent Vehicles Symposium (IV) ], 374–381, IEEE (2019)

work page 2019

[19] [19]

Icnet for real-time semantic segmentation on high-resolution images,

Zhao, H., Qi, X., Shen, X., Shi, J., and Jia, J., “Icnet for real-time semantic segmentation on high-resolution images,” in [ Proceedings of the European Conference on Computer Vision (ECCV) ], 405–420 (2018)

work page 2018

[20] [20]

Bisenet: Bilateral segmentation network for real- time semantic segmentation,

Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N., “Bisenet: Bilateral segmentation network for real- time semantic segmentation,” in [ Proceedings of the European Conference on Computer Vision (ECCV) ], 325–341 (2018)

work page 2018

[21] [21]

Shuﬄenet: An extremely eﬃcient convolutional neural network for mobile devices,

Zhang, X., Zhou, X., Lin, M., and Sun, J., “Shuﬄenet: An extremely eﬃcient convolutional neural network for mobile devices,” in [ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition ], 6848– 6856, IEEE (2018)

work page 2018

[22] [22]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H., “Mobilenets: Eﬃcient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

Focal loss for dense object detection,

Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll´ ar, P., “Focal loss for dense object detection,” in [ 2017 IEEE International Conference on Computer Vision (ICCV) ], 2999–3007, IEEE (2017)

work page 2017

[24] [24]

Predicting polarization beyond se- mantics for wearable robotics,

Yang, K., Bergasa, L. M., Romera, E., Huang, X., and Wang, K., “Predicting polarization beyond se- mantics for wearable robotics,” in [ 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)], 96–103, IEEE (2018)

work page 2018

[25] [25]

Not all pixels are equal: Diﬃculty-aware semantic seg- mentation via deep layer cascade,

Li, X., Liu, Z., Luo, P., Loy, C. C., and Tang, X., “Not all pixels are equal: Diﬃculty-aware semantic seg- mentation via deep layer cascade,” in [2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)], 6459–6468, IEEE (2017)

work page 2017

[26] [26]

Not all areas are equal: Transfer learning for semantic segmentation via hierarchical region selection,

Sun, R., Zhu, X., Wu, C., Huang, C., Shi, J., and Ma, L., “Not all areas are equal: Transfer learning for semantic segmentation via hierarchical region selection,” in [ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ], 4360–4369 (2019)

work page 2019

[27] [27]

Multi-label image recognition with graph convolutional networks,

Chen, Z.-M., Wei, X.-S., Wang, P., and Guo, Y., “Multi-label image recognition with graph convolutional networks,” in [ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ], 5177– 5186 (2019)

work page 2019

[28] [28]

3d graph neural networks for rgbd semantic segmen- tation,

Qi, X., Liao, R., Jia, J., Fidler, S., and Urtasun, R., “3d graph neural networks for rgbd semantic segmen- tation,” in [ 2017 IEEE International Conference on Computer Vision (ICCV) ], 5209–5218, IEEE (2017)

work page 2017

[29] [29]

Robustifying semantic cognition of traversability across wearable rgb-depth cameras,

Yang, K., Bergasa, L. M., Romera, E., and Wang, K., “Robustifying semantic cognition of traversability across wearable rgb-depth cameras,” Applied optics 58(12), 3141–3155 (2019)

work page 2019

[30] [30]

Application of Decision Rules for Handling Class Imbalance in Semantic Segmentation

Chan, R., Rottmann, M., H¨ uger, F., Schlicht, P., and Gottschalk, H., “Application of decision rules for handling class imbalance in semantic segmentation,” arXiv preprint arXiv:1901.08394 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1901

[31] [31]

Imagenet large scale visual recognition challenge,

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision 115(3), 211–252 (2015)

work page 2015

[32] [32]

Spatial pyramid pooling in deep convolutional networks for visual recognition,

He, K., Zhang, X., Ren, S., and Sun, J., “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE transactions on pattern analysis and machine intelligence 37(9), 1904–1916 (2015)

work page 1904

[33] [33]

Adam: A Method for Stochastic Optimization

Kingma, D. P. and Ba, J., “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[34] [34]

Visualizing data using t-sne,

Maaten, L. v. d. and Hinton, G., “Visualizing data using t-sne,” Journal of machine learning re- search 9(Nov), 2579–2605 (2008)

work page 2008

[35] [35]

Visual localizer: Outdoor localization based on convnet descriptor and global optimization for visually impaired pedestrians,

Lin, S., Cheng, R., Wang, K., and Yang, K., “Visual localizer: Outdoor localization based on convnet descriptor and global optimization for visually impaired pedestrians,” Sensors 18(8), 2476 (2018)

work page 2018

[36] [36]

Panoramic annular localizer: Tackling the variation challenges of outdoor localization using panoramic annular images and active deep descriptors,

Cheng, R., Wang, K., Lin, S., Hu, W., Yang, K., Huang, X., Li, H., Sun, D., and Bai, J., “Panoramic annular localizer: Tackling the variation challenges of outdoor localization using panoramic annular images and active deep descriptors,” arXiv preprint arXiv:1905.05425 (2019)

work page arXiv 1905