ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
read the original abstract
The ability to perform pixel-wise semantic segmentation in real-time is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point operations and have long run-times that hinder their usability. In this paper, we propose a novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation. ENet is up to 18$\times$ faster, requires 75$\times$ less FLOPs, has 79$\times$ less parameters, and provides similar or better accuracy to existing models. We have tested it on CamVid, Cityscapes and SUN datasets and report on comparisons with existing state-of-the-art methods, and the trade-offs between accuracy and processing time of a network. We present performance measurements of the proposed architecture on embedded systems and suggest possible software improvements that could make ENet even faster.
This paper has not been read by Pith yet.
Forward citations
Cited by 17 Pith papers
-
Bidirectional Cross-Attention Fusion of High-Res RGB and Low-Res HSI for Multimodal Automated Waste Sorting
BCAF fuses native-grid high-res RGB and low-res HSI via bidirectional cross-attention in adapted Swin Transformers to reach state-of-the-art mIoU on SpectralWaste and a new industrial dataset while running at real-tim...
-
Accelerating Large-Kernel Convolution Using Summed-Area Tables
Learnable box filters and precomputed summed-area tables enable efficient arbitrarily large kernel convolutions in fully-convolutional networks while maintaining constant parameters per filter and competitive performa...
-
Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation
Patch-wise sparse MoE layers in CNNs for semantic segmentation yield architecture-dependent gains up to 3.9 mIoU on Cityscapes and BDD100K with low overhead, but show strong design sensitivity.
-
Geometric Flood Depth Estimation: Fusing Transformer-Based Segmentation with Digital Elevation Models
A pipeline uses Mask2Former flood masks and DEMs to compute a single water surface elevation then derives local depths under hydrostatic equilibrium.
-
Attention-Mamba: A Mamba-Enhanced Multi-Scale Parallel Inference Network for Medical Image Segmentation
Attention-Mamba uses parallel branches, Recursive Alignment Module, and Mamba-enhanced attention to report highest segmentation accuracy on Synapse, ACDC, ISIC-2018, and PH2 with 14.05M parameters and 8.94 GFLOPs.
-
Uncertainty in Real-Time Semantic Segmentation on Embedded Systems
Combines pre-trained features, Bayesian regression, and moment propagation to enable real-time epistemic uncertainty for semantic segmentation on embedded systems while preserving accuracy.
-
Associative Embedding for Game-Agnostic Team Discrimination
A lightweight segmentation network learns associative embeddings to assign consistent descriptors to unconnected pixels of same-team players for game-agnostic team discrimination in basketball videos.
-
DA-SegFormer: Damage-Aware Semantic Segmentation for Fine-Grained Disaster Assessment
DA-SegFormer reaches 74.61% mIoU on RescueNet by adding class-aware sampling and OHEM-Dice loss to SegFormer, delivering double-digit gains on minor and major damage classes.
-
FoR-Net: Learning to Focus on Hard Regions for Efficient Semantic Segmentation
FoR-Net improves efficiency in semantic segmentation by focusing on hard regions with a learned selector and multi-scale convolutions, achieving competitive results on Cityscapes.
-
From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving
A survey organizes synthetic data use, digital twin simulation, and domain adaptation techniques for autonomous driving while identifying open challenges like Sim2Real transfer.
-
TwinLiteNet+: An Enhanced Multi-Task Segmentation Model for Autonomous Driving
TwinLiteNet+ is a hybrid-encoder multi-task segmentation model with new UCB, USB, and PCAA modules that reports 92.9% mIoU on drivable area and 34.2% IoU on lane segmentation on BDD100K while using 11x fewer FLOPs tha...
-
A Comparative Study of High-Recall Real-Time Semantic Segmentation Based on Swift Factorized Network
Adapts SwiftNet into SFN with ERFNet/GCNet-inspired blocks and compares loss functions, classifiers, and decision rules to increase recall in real-time semantic segmentation on CamVid and Cityscapes.
-
ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation
ESNet is a lightweight symmetric CNN using factorized residual units and parallel dilated convolutions that reaches over 62 FPS semantic segmentation on Cityscapes with 1.6M parameters.
-
Importance-Aware Semantic Segmentation with Efficient Pyramidal Context Network for Navigational Assistant Systems
Introduces importance-aware loss and BiERF-PSPNet extension for semantic segmentation tailored to navigational assistant systems, evaluated on CamVid and Cityscapes.
-
Real-time Vision-based Depth Reconstruction with NVidia Jetson
A comparison of FCNN architectures for monocular depth estimation yields a model suitable for real-time operation on NVidia Jetson hardware with evaluation in vSLAM.
-
Modern CNNs for IoT Based Farms
A survey of state-of-the-art CNN architectures for agricultural IoT applications that proposes a tailored classification taxonomy and reviews existing research to guide architecture selection.
-
Understanding Deep Learning Techniques for Image Segmentation
A 2019 survey that categorizes and intuitively explains major deep learning techniques for image segmentation, progressing from classical methods to modern neural architectures.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.