TwinLiteNet+: An Enhanced Multi-Task Segmentation Model for Autonomous Driving

Duc-Khai Lam; Duc-Tri Le; Minh-Quan Pham; Quang-Huy Che; Vinh-Tiep Nguyen

arxiv: 2403.16958 · v6 · submitted 2024-03-25 · 💻 cs.CV

TwinLiteNet+: An Enhanced Multi-Task Segmentation Model for Autonomous Driving

Quang-Huy Che , Duc-Tri Le , Minh-Quan Pham , Vinh-Tiep Nguyen , Duc-Khai Lam This is my paper

Pith reviewed 2026-05-24 03:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords semantic segmentationautonomous drivingdrivable area segmentationlane segmentationlightweight neural networkmulti-task learningBDD100Kembedded inference

0 comments

The pith

TwinLiteNet+ achieves higher drivable area and lane segmentation accuracy than prior models while using 11 times fewer operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TwinLiteNet+ as a family of lightweight multi-task models for segmenting drivable areas and lane markings in autonomous driving images. It builds a hybrid encoder from stride-based dilated convolutions and depthwise separable dilated convolutions, then adds two new upsampling blocks and a partial class activation attention module to improve decoding precision. On the BDD100K dataset the largest version records 92.9 percent mean intersection-over-union for drivable areas and 34.2 percent intersection-over-union for lanes, exceeding published baselines while cutting floating-point operations by a factor of eleven. The work also reports fast quantized inference and low energy use on embedded hardware, directly addressing the requirement for real-time perception on vehicle-grade chips with limited compute and power budgets.

Core claim

TwinLiteNet+ employs a hybrid encoder architecture that integrates stride-based dilated convolutions and depthwise separable dilated convolutions, balancing representational capacity and computational cost. To improve task-specific decoding, it introduces two lightweight upsampling modules—Upper Convolution Block (UCB) and Upper Simple Block (USB)—alongside a Partial Class Activation Attention (PCAA) mechanism. The model family ranges from 34K parameters in the Nano variant to 1.94M parameters in the Large variant. On the BDD100K dataset, TwinLiteNet+_Large reaches 92.9 percent mIoU for drivable area segmentation and 34.2 percent IoU for lane segmentation while requiring 11 times fewer FLOPs

What carries the argument

Hybrid encoder that combines stride-based dilated convolutions with depthwise separable dilated convolutions to maintain feature quality at low computational cost for simultaneous drivable-area and lane segmentation.

If this is right

The four size variants allow direct trade-offs between accuracy and parameter count for different hardware budgets.
Quantization to INT8 and FP16 preserves performance, enabling deployment on typical embedded accelerators.
Measured inference speed and energy consumption on embedded devices exceed those of heavier models.
Simultaneous drivable-area and lane outputs support downstream path planning without separate networks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same modules transfer to other segmentation backbones, the approach could reduce compute for additional driving perception tasks such as object detection.
The partial class activation attention may improve boundary accuracy on narrow structures like lane markings when applied to different datasets.
Ultra-small variants could enable on-device segmentation for low-power microcontrollers in consumer robotics.
Combining the encoder with temporal fusion across video frames might further raise lane-segmentation IoU without added parameters.

Load-bearing premise

The accuracy and efficiency gains are produced by the hybrid encoder, UCB, USB, and PCAA modules rather than by differences in training schedule, data augmentation, or hyperparameter choices relative to the baselines.

What would settle it

Retrain the compared state-of-the-art models on the identical BDD100K splits, augmentation pipeline, optimizer schedule, and hyperparameters used for TwinLiteNet+ and check whether the reported accuracy and FLOP gaps remain.

Figures

Figures reproduced from arXiv: 2403.16958 by Duc-Khai Lam, Duc-Tri Le, Minh-Quan Pham, Quang-Huy Che, Vinh-Tiep Nguyen.

**Figure 2.** Figure 2: The TwinLiteNet+ architecture comprises two phases. During the Encode phase, the input image passes through an Encoder block followed by a Partial Class Activation Attention mechanism. In the Decode phase, the output from the Encoder is channeled through two identical yet independent Decoder blocks, transforming the feature maps into two separate segmentation maps. Despite significant progress, achieving a… view at source ↗

**Figure 3.** Figure 3: This figure presents variants of ESP blocks within the Encoder. The convolutional layers are designated as [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Comprehensive schematic of the Encoder in TwinLiteNet [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Decoder Block Design in TwinLiteNet+. This illustrates the implementation of Upper Convolution Block (UCB) and Upper Simple Block (USB) within the decoder, specifically tailored for upsampling to generate segment maps for diverse tasks. blocks. These blocks independently handle the segmentation of distinct regions and drivable lanes. Our decoder effectively reduces the depth of the feature map and employs … view at source ↗

**Figure 6.** Figure 6: Qualitative results of TwinLiteNet and TwinLiteNet [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative results of TwinLiteNet and TwinLiteNet [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative results of TwinLiteNet and TwinLiteNet [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Examples of Ground truth visualization for Directly & Alternative Area segmentation task. [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Results visualization of TwinLiteNet+ D&A for Directly & Alternative Area segmentation. Red regions are directly drivable area, the blue ones are alternative and the lanes are green [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

read the original abstract

Semantic segmentation is a fundamental perception task in autonomous driving, particularly for identifying drivable areas and lane markings to enable safe navigation. However, most state-of-the-art (SOTA) models are computationally intensive and unsuitable for real-time deployment on resource-constrained embedded devices. In this paper, we introduce TwinLiteNet+, an enhanced multi-task segmentation model designed for real-time drivable area and lane segmentation with high efficiency. TwinLiteNet+ employs a hybrid encoder architecture that integrates stride-based dilated convolutions and depthwise separable dilated convolutions, balancing representational capacity and computational cost. To improve task-specific decoding, we propose two lightweight upsampling modules-Upper Convolution Block (UCB) and Upper Simple Block (USB)-alongside a Partial Class Activation Attention (PCAA) mechanism that enhances segmentation precision. The model is available in four configurations, ranging from the ultra-compact TwinLiteNet+_{Nano} (34K parameters) to the high-performance TwinLiteNet+_{Large} (1.94M parameters). On the BDD100K dataset, TwinLiteNet+_{Large} achieves 92.9% mIoU for drivable area segmentation and 34.2% IoU for lane segmentation-surpassing existing state-of-the-art models while requiring 11x fewer floating-point operations (FLOPs) for computation. Extensive evaluations on embedded devices demonstrate superior inference speed, quantization robustness (INT8/FP16), and energy efficiency, validating TwinLiteNet+ as a compelling solution for real-world autonomous driving systems. Code is available at https://github.com/chequanghuy/TwinLiteNetPlus.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TwinLiteNet+ adds a hybrid encoder and three new lightweight blocks to an earlier model and posts concrete efficiency numbers on BDD100K, but the gains are not cleanly tied to those blocks.

read the letter

The main takeaway is that this is a practical incremental model for real-time drivable-area and lane segmentation on edge hardware. The authors extend TwinLiteNet with a hybrid stride/dilated encoder, Upper Convolution Block, Upper Simple Block, and Partial Class Activation Attention, then release four size variants and code. On BDD100K the largest version reaches 92.9% mIoU and 34.2% IoU while claiming an 11× FLOPs cut versus prior work, plus they show embedded-device speed, quantization, and energy results. Those are the concrete deliverables a reader can use directly. The work is straightforward applied CV and avoids any circular claims by sticking to public-dataset measurements. The soft spot is exactly the one flagged in the stress test: the abstract gives no ablation tables and no statement that the cited SOTA baselines were retrained under the same schedule, augmentation, or loss settings. Without those controls it is impossible to know how much of the headline numbers comes from the new modules versus training differences. That gap is real but not fatal for a paper of this type. This paper is for people who need small, fast segmentation models for autonomous-driving perception stacks on limited hardware. A reader looking for ready-to-test variants and hardware numbers will get usable information from the size options and the open code. It deserves a serious referee because the empirical claims are specific, the task is relevant, and the code release lets others check the numbers. Send it for review and ask the authors to clarify the baseline protocol.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces TwinLiteNet+, a multi-task segmentation architecture for simultaneous drivable-area and lane-marking segmentation aimed at real-time autonomous driving. It proposes a hybrid encoder that mixes stride-based dilated convolutions with depthwise-separable dilated convolutions, two lightweight up-sampling blocks (UCB and USB), and a Partial Class Activation Attention (PCAA) module. Four model scales are defined (Nano at 34 K parameters to Large at 1.94 M parameters). On BDD100K the Large variant is reported to reach 92.9 % mIoU on drivable area and 34.2 % IoU on lanes while using 11× fewer FLOPs than prior SOTA models; additional embedded-device measurements for latency, INT8/FP16 quantization, and energy are provided. Public code is released.

Significance. If the performance numbers are shown to result from the architectural contributions rather than training-protocol differences, the work would supply a practical, low-FLOP family of models suitable for embedded real-time perception in autonomous driving. The public code release is a clear positive for reproducibility.

major comments (2)

[Experiments section] Experiments section / results tables: the central claim that the hybrid encoder, UCB, USB and PCAA produce the reported 92.9 % mIoU / 34.2 % IoU and 11× FLOPs reduction is load-bearing, yet the manuscript supplies no statement that the cited SOTA baselines were re-trained under an identical data-augmentation schedule, loss weighting, optimizer, or number of epochs. Without this control the attribution of gains to the proposed modules cannot be verified.
[Experiments section] Experiments section: no ablation tables isolate the incremental contribution of the hybrid encoder versus the UCB/USB blocks versus PCAA. Because the headline numbers rest on the joint effect of these modules, the absence of controlled ablations leaves the source of the improvement unclear.

minor comments (1)

The abstract states four model configurations but does not list the exact parameter counts or FLOPs for each; adding a compact table row in the abstract or §3 would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The two major comments correctly identify gaps in experimental controls and analysis. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Experiments section] Experiments section / results tables: the central claim that the hybrid encoder, UCB, USB and PCAA produce the reported 92.9 % mIoU / 34.2 % IoU and 11× FLOPs reduction is load-bearing, yet the manuscript supplies no statement that the cited SOTA baselines were re-trained under an identical data-augmentation schedule, loss weighting, optimizer, or number of epochs. Without this control the attribution of gains to the proposed modules cannot be verified.

Authors: We agree that the manuscript does not state whether baselines were re-trained under identical conditions; the numbers are taken from the original publications. This is a limitation for direct attribution. In revision we will add an explicit statement in the Experiments section noting that comparisons use reported figures and discussing possible protocol differences. We will also re-train one key baseline under our exact schedule to provide a controlled reference point. revision: yes
Referee: [Experiments section] Experiments section: no ablation tables isolate the incremental contribution of the hybrid encoder versus the UCB/USB blocks versus PCAA. Because the headline numbers rest on the joint effect of these modules, the absence of controlled ablations leaves the source of the improvement unclear.

Authors: We acknowledge the lack of component-wise ablations. The revised manuscript will include new ablation tables that evaluate the hybrid encoder, UCB, USB, and PCAA in isolation and in combination, using the same training protocol. These results will be placed in the Experiments section to clarify the contribution of each module. revision: yes

Circularity Check

0 steps flagged

No circularity: results are direct empirical measurements on public BDD100K dataset

full rationale

The paper proposes an architecture (hybrid encoder, UCB, USB, PCAA) and reports mIoU/IoU/FLOPs numbers obtained by training and evaluating on the public BDD100K benchmark. No equations, fitted parameters, or self-citations are used to derive the headline metrics; they are measured outputs. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the provided text. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The performance claims rest on empirical training and evaluation of the proposed architecture on the BDD100K dataset; the new modules constitute the primary addition beyond prior literature.

free parameters (1)

model size configurations
The four variants (Nano to Large) involve manual choices of channel widths and depths to trade off parameters against accuracy.

axioms (1)

domain assumption BDD100K provides reliable ground-truth annotations for drivable area and lane classes.
All quantitative claims depend on this standard benchmark being correctly labeled and representative.

pith-pipeline@v0.9.0 · 5842 in / 1350 out tokens · 35665 ms · 2026-05-24T03:21:22.924430+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 4 internal anchors

[1]

F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, T. Darrell, Bdd100k: A diverse driving dataset for heterogeneous multitask learning, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2633–2642. doi:10.1109/CVPR42600.2020. 00271

work page doi:10.1109/cvpr42600.2020 2020
[2]

A. A. Mehta, A. A. Padaria, D. J. Bavisi, V. Ukani, P. Thakkar, R. Geddam, K. Kotecha, A. Abra- ham, Securing the future: A comprehensive review of security challenges and solutions in advanced driver assistance systems, IEEE Access 12 (2024) 643–678. doi:10.1109/ACCESS.2023.3347200

work page doi:10.1109/access.2023.3347200 2024
[3]

H. B. Gade, A. R. Uppala, P. N. Karri, R. Devi Sinduvala Mallesh, V. K. Odugu, J. R. B, Hardware architecture of efficient image dehazing technique for advanced driving assistance system, Computers and Electrical Engineering 126 (2025) 110493. doi:https://doi.org/10.1016/j.compeleceng. 2025.110493

work page doi:10.1016/j.compeleceng 2025
[4]

Guti´ errez-Zaballa, K

J. Guti´ errez-Zaballa, K. Basterretxea, J. Echanobe, M. V. Mart´ ınez, I. del Campo, Exploring fully convolutional networks for ¬†the¬†segmentation of ¬†hyperspectral imaging applied to ¬†advanced driver assistance systems, in: K. Desnos, S. Pertuz (Eds.), Design and Architecture for Signal and Image Processing, Springer International Publishing, Cham, 2...

work page 2022
[5]

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neural network architecture for real-time semantic segmentation, ArXiv abs/1606.02147 (2016). 27

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Y. Hou, Z. Ma, C. Liu, C. C. Loy, Learning lightweight lane detection cnns by self attention distil- lation, 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019) 1013–1021

work page 2019
[7]

H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017
[8]

Teichmann, M

M. Teichmann, M. Weber, M. Z¨ ollner, R. Cipolla, R. Urtasun, Multinet: Real-time joint semantic reasoning for autonomous driving, in: 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 1013–1020. doi:10.1109/IVS.2018.8500504

work page doi:10.1109/ivs.2018.8500504 2018
[9]

Parashar, M

A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. S. Emer, S. W. Keckler, W. J. Dally, Scnn: An accelerator for compressed-sparse convolutional neural networks, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017) 27–40

work page 2017
[10]

W. Tian, X. Yu, H. Hu, Interactive attention learning on detection of lane and lane marking on the road by monocular camera image, Sensors 23 (14) (2023)

work page 2023
[11]

Z. Hu, Y. Shen, Lane detection based on boundary feature enhancement and information interaction, Academic Journal of Computing & Information Science 8 (1) (2025) 57–63. doi:10.25236/AJCIS. 2025.080108. URL https://doi.org/10.25236/AJCIS.2025.080108

work page doi:10.25236/ajcis 2025
[12]

Che, D.-P

Q.-H. Che, D.-P. Nguyen, M.-Q. Pham, D.-K. Lam, Twinlitenet: An efficient and lightweight model for driveable area and lane segmentation in self-driving cars, in: 2023 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), 2023, pp. 1–6. doi:10.1109/MAPR59823. 2023.10288646

work page doi:10.1109/mapr59823 2023
[13]

J. Sun, Y. Li, Multi-feature fusion network for road scene semantic segmentation, Computers & Electrical Engineering 92 (2021) 107155. doi:https://doi.org/10.1016/j.compeleceng.2021. 107155

work page doi:10.1016/j.compeleceng.2021 2021
[14]

Zhang, S

M. Zhang, S. Li, D. Wang, Z. Cui, M. Xin, Omnidirectional semantic segmentation fusion network with cross-stage and cross-dimensional remodeling, Computers and Electrical Engineering 122 (2025) 110014. doi:https://doi.org/10.1016/j.compeleceng.2024.110014

work page doi:10.1016/j.compeleceng.2024.110014 2025
[15]

L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision – ECCV 2018, Springer International Publishing, Cham, 2018, pp. 833–851

work page 2018
[16]

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, Segformer: Simple and efficient design for semantic segmentation with transformers, in: Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[17]

Mehta, M

S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, H. Hajishirzi, Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision – ECCV 2018, Springer International Publishing, Cham, 2018, pp. 561–580

work page 2018
[18]

Mehta, M

S. Mehta, M. Rastegari, L. Shapiro, H. Hajishirzi, Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9182–9192. doi:10.1109/CVPR.2019.00941. 28

work page doi:10.1109/cvpr.2019.00941 2019
[19]

2016.280

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, B. Schiele, The cityscapes dataset for semantic urban scene understanding, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213–3223. doi:10.1109/CVPR. 2016.350

work page doi:10.1109/cvpr 2016
[20]

D. Vu, B. Ngo, H. Phan, Hybridnets: End-to-end perception network (2022). arXiv:2203.09035

work page arXiv 2022
[21]

J. Zhan, Y. Luo, C. Guo, Y. Wu, J. Meng, J. Liu, Yolopx: Anchor-free multi-task learning network for panoptic driving perception, Pattern Recognition 148 (2024) 110152

work page 2024
[22]

J. Zhan, J. Liu, Y. Wu, C. Guo, Multi-task visual perception for object detection and semantic segmentation in intelligent driving, Remote Sensing 16 (10) (2024). doi:10.3390/rs16101774. URL https://www.mdpi.com/2072-4292/16/10/1774

work page doi:10.3390/rs16101774 2024
[23]

Cheng, I

B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, R. Girdhar, Masked-attention mask transformer for universal image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 1290–1299

work page 2022
[24]

J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, H. Lu, Dual attention network for scene segmentation, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3141–3149. doi:10.1109/CVPR.2019.00326

work page doi:10.1109/cvpr.2019.00326 2019
[25]

J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: 2018 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2018, pp. 7132–7141. doi:10.1109/CVPR.2018.00745

work page doi:10.1109/cvpr.2018.00745 2018
[26]

S.-A. Liu, H. Xie, H. Xu, Y. Zhang, Q. Tian, Partial class activation attention for semantic seg- mentation, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 16815–16824. doi:10.1109/CVPR52688.2022.01633

work page doi:10.1109/cvpr52688.2022.01633 2022
[27]

S. Woo, J. Park, J.-Y. Lee, I. S. Kweon, Cbam: Convolutional block attention module, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision – ECCV 2018, Springer International Publishing, Cham, 2018, pp. 3–19

work page 2018
[28]

Pizzati, F

F. Pizzati, F. Garcia, Enhanced free space detection in multiple lanes based on single cnn with scene identification, in: 2019 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2019. doi:10.1109/ivs. 2019.8814181

work page doi:10.1109/ivs 2019
[29]

Z. Wang, Z. Cheng, H. Huang, J. Zhao, Shuda-rfbnet for real-time multi-task traffic scene perception, in: 2019 Chinese Automation Congress (CAC), 2019, pp. 305–310. doi:10.1109/CAC48633.2019. 8997236

work page doi:10.1109/cac48633.2019 2019
[30]

D. Qiao, F. Zulkernine, Drivable area detection using deep learning models for autonomous driving, in: 2021 IEEE International Conference on Big Data (Big Data), 2021, pp. 5233–5238. doi:10. 1109/BigData52589.2021.9671392

work page arXiv 2021
[31]

Lee, Fast drivable areas estimation with multi-task learning for real-time autonomous driving assistant, Applied Sciences 11 (22) (2021)

D.-G. Lee, Fast drivable areas estimation with multi-task learning for real-time autonomous driving assistant, Applied Sciences 11 (22) (2021). doi:10.3390/app112210713

work page doi:10.3390/app112210713 2021
[32]

L. Sun, F. Yan, T. Deng, C. Jiang, J. Li, A lightweight network with lane feature enhancement for multilane drivable area detection, in: 2022 14th International Conference on Wireless Communica- tions and Signal Processing (WCSP), 2022, pp. 66–71

work page 2022
[33]

T. Luo, Y. Chen, T. Luan, B. Cai, L. Chen, H. Wang, Ids-model: An efficient multi-task model of road scene instance and drivable area segmentation for autonomous driving, IEEE Transactions on Transportation Electrification (2023) 1–1 doi:10.1109/TTE.2023.3293495. 29

work page doi:10.1109/tte.2023.3293495 2023
[34]

Y. Ko, Y. Lee, S. Azam, F. Munir, M. Jeon, W. Pedrycz, Key points estimation and point instance segmentation approach for lane detection, IEEE Transactions on Intelligent Transportation Systems 23 (7) (2022) 8949–8958. doi:10.1109/TITS.2021.3088488

work page doi:10.1109/tits.2021.3088488 2022
[35]

Z. Qin, H. Wang, X. Li, Ultra fast structure-aware deep lane detection, in: The European Conference on Computer Vision (ECCV), 2020

work page 2020
[36]

Z. Qin, P. Zhang, X. Li, Ultra fast deep lane detection with hybrid anchor driven ordinal classification, IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) 1–14 doi:10.1109/TPAMI. 2022.3182097

work page doi:10.1109/tpami 2022
[37]

D. K. Lam, C. V. Du, H. L. Pham, Quantlanenet: A 640-fps and 34-gops/w fpga-based cnn accelerator for lane detection, Sensors 23 (15) (2023). doi:10.3390/s23156661. URL https://www.mdpi.com/1424-8220/23/15/6661

work page doi:10.3390/s23156661 2023
[38]

Honda, Y

H. Honda, Y. Uchida, Clrernet: improving confidence of lane detection with laneiou, in: Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 1176–1185

work page 2024
[39]

Silberman, D

N. Silberman, D. Hoiem, P. Kohli, R. Fergus, Indoor segmentation and support inference from rgbd images, in: A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, C. Schmid (Eds.), Computer Vision – ECCV 2012, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 746–760

work page 2012
[40]

Pizzati, F

F. Pizzati, F. Garc´ ıa, Enhanced free space detection in multiple lanes based on single cnn with scene identification, in: 2019 IEEE Intelligent Vehicles Symposium (IV), 2019, pp. 2536–2541. doi: 10.1109/IVS.2019.8814181

work page doi:10.1109/ivs.2019.8814181 2019
[41]

G. M. Jacob, V. Agarwal, B. Stenger, Online knowledge distillation for multi-task learning, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 2359–2368

work page 2023
[42]

Taghavi, R

P. Taghavi, R. Langari, G. Pandey, Swinmtl: A shared architecture for simultaneous depth estima- tion and semantic segmentation from monocular camera images, in: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2024, pp. 4957–4964

work page 2024
[43]

Wu, M.-W

D. Wu, M.-W. Liao, W.-T. Zhang, X. Wang, X. Bai, W. Cheng, W.-Y. Liu, Yolop: You only look once for panoptic driving perception, Machine Intelligence Research 19 (2021) 550 – 562

work page 2021
[44]

H. Wang, M. Qiu, Y. Cai, L. Chen, Y. Li, Sparse u-pdp: A unified multi-task framework for panoptic driving perception, IEEE Transactions on Intelligent Transportation Systems 24 (10) (2023) 11308– 11320. doi:10.1109/TITS.2023.3273286

work page doi:10.1109/tits.2023.3273286 2023
[45]

Quantizing deep convolutional networks for efficient inference: A whitepaper

R. Krishnamoorthi, Quantizing deep convolutional networks for efficient inference: A whitepaper, CoRR abs/1806.08342 (2018). arXiv:1806.08342. URL http://arxiv.org/abs/1806.08342

work page internal anchor Pith review Pith/arXiv arXiv 2018
[46]

S. Han, J. Pool, J. Tran, W. J. Dally, Learning both weights and connections for efficient neural networks, in: Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, MIT Press, Cambridge, MA, USA, 2015, p. 1135–1143

work page 2015
[47]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). arXiv:1503. 02531. URL https://arxiv.org/abs/1503.02531 30

work page internal anchor Pith review Pith/arXiv arXiv 2015
[48]

Howard, M

A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications (04 2017)

work page 2017
[49]

Zhang, X

X. Zhang, X. Zhou, M. Lin, J. Sun, Shufflenet: An extremely efficient convolutional neural network for mobile devices, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2017) 6848–6856

work page 2018
[50]

K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, C. Xu, Ghostnet: More features from cheap operations, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1577–1586. doi:10.1109/CVPR42600.2020.00165

work page doi:10.1109/cvpr42600.2020.00165 2020
[51]

Jiang, H

X. Jiang, H. Wang, Y. Chen, Z. Wu, L. Wang, B. Zou, Y. Yang, Z. Cui, Y. Cai, T. Yu, C. Lv, Z. Wu, Mnn: A universal and efficient inference engine, in: MLSys, 2020

work page 2020
[52]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal loss for dense object detection, in: Proceed- ings of the IEEE International Conference on Computer Vision (ICCV), 2017

work page 2017
[53]

S. S. M. Salehi, D. Erdogmus, A. Gholipour, Tversky loss function for image segmentation using 3d fully convolutional deep networks, in: Q. Wang, Y. Shi, H.-I. Suk, K. Suzuki (Eds.), Machine Learning in Medical Imaging, Springer International Publishing, Cham, 2017, pp. 379–387

work page 2017
[54]

Ioffe, C

S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, in: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, JMLR.org, 2015, p. 448–456

work page 2015
[55]

K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034. doi:10.1109/ICCV.2015.123

work page doi:10.1109/iccv.2015.123 2015
[56]

C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, M. Jorge Cardoso, Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer International Publishing, Cham, 2017, pp. 240–248

work page 2017
[57]

J. Wang, Q. M. J. Wu, N. Zhang, You only look at once for real-time and generic multi-task (2023). arXiv:2310.01641

work page arXiv 2023
[58]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. doi:10. 1109/CVPR.2016.90

work page 2016
[59]

Loshchilov, F

I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: International Conference on Learning Representations, 2017

work page 2017
[60]

Tarvainen, H

A. Tarvainen, H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, Inc., 2017

work page 2017
[61]

Jocher, A

G. Jocher, A. Chaurasia, J. Qiu, Ultralytics YOLO (Jan. 2023). URL https://github.com/ultralytics/ultralytics

work page 2023
[62]

H. Wang, J. Wang, B. Xiao, Y. Jiao, J. Guo, Drivable area and lane line detection model based on semantic segmentation, in: 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), 2024, pp. 1–7. doi:10.1109/ICMI60790.2024.10585878. 31

work page doi:10.1109/icmi60790.2024.10585878 2024
[63]

J. Zhao, D. Wu, Z. Yu, Z. Gao, Drmnet: A multi-task detection model based on image processing for autonomous driving scenarios, IEEE Transactions on Vehicular Technology (2023) 1–16 doi: 10.1109/TVT.2023.3296735

work page doi:10.1109/tvt.2023.3296735 2023
[64]

C. Han, Q. Zhao, S. Zhang, Y. Chen, Z. Zhang, J. Yuan, Yolopv2: Better, faster, stronger for panoptic driving perception (2022). arXiv:2208.11434

work page arXiv 2022
[65]

G. Chen, T. Wu, J. Duan, Q. Hu, D. Huang, H. Li, Centerpnets: A multi-task shared network for traffic perception, Sensors 23 (5) (2023). doi:10.3390/s23052467

work page doi:10.3390/s23052467 2023
[66]

Zhang, Y

Y. Zhang, Y. Zheng, Z. Tu, C. Wu, T. Zhang, Cffm: Multi-task lane object detection method based on cross-layer feature fusion, Expert Systems with Applications 257 (2024) 125051. doi:https: //doi.org/10.1016/j.eswa.2024.125051

work page doi:10.1016/j.eswa.2024.125051 2024
[67]

Che, D.-K

Q.-H. Che, D.-K. Lam, Trilitenet: Lightweight model for multi-task visual perception, IEEE Access 13 (2025) 50152–50166. doi:10.1109/ACCESS.2025.3552088

work page doi:10.1109/access.2025.3552088 2025
[68]

Z. Wang, W. Ren, Q. Qiu, Lanenet: Real-time lane detection networks for autonomous driving, CoRR abs/1807.01726 (2018). arXiv:1807.01726. 32

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, T. Darrell, Bdd100k: A diverse driving dataset for heterogeneous multitask learning, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2633–2642. doi:10.1109/CVPR42600.2020. 00271

work page doi:10.1109/cvpr42600.2020 2020

[2] [2]

A. A. Mehta, A. A. Padaria, D. J. Bavisi, V. Ukani, P. Thakkar, R. Geddam, K. Kotecha, A. Abra- ham, Securing the future: A comprehensive review of security challenges and solutions in advanced driver assistance systems, IEEE Access 12 (2024) 643–678. doi:10.1109/ACCESS.2023.3347200

work page doi:10.1109/access.2023.3347200 2024

[3] [3]

H. B. Gade, A. R. Uppala, P. N. Karri, R. Devi Sinduvala Mallesh, V. K. Odugu, J. R. B, Hardware architecture of efficient image dehazing technique for advanced driving assistance system, Computers and Electrical Engineering 126 (2025) 110493. doi:https://doi.org/10.1016/j.compeleceng. 2025.110493

work page doi:10.1016/j.compeleceng 2025

[4] [4]

Guti´ errez-Zaballa, K

J. Guti´ errez-Zaballa, K. Basterretxea, J. Echanobe, M. V. Mart´ ınez, I. del Campo, Exploring fully convolutional networks for ¬†the¬†segmentation of ¬†hyperspectral imaging applied to ¬†advanced driver assistance systems, in: K. Desnos, S. Pertuz (Eds.), Design and Architecture for Signal and Image Processing, Springer International Publishing, Cham, 2...

work page 2022

[5] [5]

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neural network architecture for real-time semantic segmentation, ArXiv abs/1606.02147 (2016). 27

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

Y. Hou, Z. Ma, C. Liu, C. C. Loy, Learning lightweight lane detection cnns by self attention distil- lation, 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019) 1013–1021

work page 2019

[7] [7]

H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017

[8] [8]

Teichmann, M

M. Teichmann, M. Weber, M. Z¨ ollner, R. Cipolla, R. Urtasun, Multinet: Real-time joint semantic reasoning for autonomous driving, in: 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 1013–1020. doi:10.1109/IVS.2018.8500504

work page doi:10.1109/ivs.2018.8500504 2018

[9] [9]

Parashar, M

A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. S. Emer, S. W. Keckler, W. J. Dally, Scnn: An accelerator for compressed-sparse convolutional neural networks, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017) 27–40

work page 2017

[10] [10]

W. Tian, X. Yu, H. Hu, Interactive attention learning on detection of lane and lane marking on the road by monocular camera image, Sensors 23 (14) (2023)

work page 2023

[11] [11]

Z. Hu, Y. Shen, Lane detection based on boundary feature enhancement and information interaction, Academic Journal of Computing & Information Science 8 (1) (2025) 57–63. doi:10.25236/AJCIS. 2025.080108. URL https://doi.org/10.25236/AJCIS.2025.080108

work page doi:10.25236/ajcis 2025

[12] [12]

Che, D.-P

Q.-H. Che, D.-P. Nguyen, M.-Q. Pham, D.-K. Lam, Twinlitenet: An efficient and lightweight model for driveable area and lane segmentation in self-driving cars, in: 2023 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), 2023, pp. 1–6. doi:10.1109/MAPR59823. 2023.10288646

work page doi:10.1109/mapr59823 2023

[13] [13]

J. Sun, Y. Li, Multi-feature fusion network for road scene semantic segmentation, Computers & Electrical Engineering 92 (2021) 107155. doi:https://doi.org/10.1016/j.compeleceng.2021. 107155

work page doi:10.1016/j.compeleceng.2021 2021

[14] [14]

Zhang, S

M. Zhang, S. Li, D. Wang, Z. Cui, M. Xin, Omnidirectional semantic segmentation fusion network with cross-stage and cross-dimensional remodeling, Computers and Electrical Engineering 122 (2025) 110014. doi:https://doi.org/10.1016/j.compeleceng.2024.110014

work page doi:10.1016/j.compeleceng.2024.110014 2025

[15] [15]

L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision – ECCV 2018, Springer International Publishing, Cham, 2018, pp. 833–851

work page 2018

[16] [16]

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, P. Luo, Segformer: Simple and efficient design for semantic segmentation with transformers, in: Neural Information Processing Systems (NeurIPS), 2021

work page 2021

[17] [17]

Mehta, M

S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, H. Hajishirzi, Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision – ECCV 2018, Springer International Publishing, Cham, 2018, pp. 561–580

work page 2018

[18] [18]

Mehta, M

S. Mehta, M. Rastegari, L. Shapiro, H. Hajishirzi, Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9182–9192. doi:10.1109/CVPR.2019.00941. 28

work page doi:10.1109/cvpr.2019.00941 2019

[19] [19]

2016.280

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, B. Schiele, The cityscapes dataset for semantic urban scene understanding, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213–3223. doi:10.1109/CVPR. 2016.350

work page doi:10.1109/cvpr 2016

[20] [20]

D. Vu, B. Ngo, H. Phan, Hybridnets: End-to-end perception network (2022). arXiv:2203.09035

work page arXiv 2022

[21] [21]

J. Zhan, Y. Luo, C. Guo, Y. Wu, J. Meng, J. Liu, Yolopx: Anchor-free multi-task learning network for panoptic driving perception, Pattern Recognition 148 (2024) 110152

work page 2024

[22] [22]

J. Zhan, J. Liu, Y. Wu, C. Guo, Multi-task visual perception for object detection and semantic segmentation in intelligent driving, Remote Sensing 16 (10) (2024). doi:10.3390/rs16101774. URL https://www.mdpi.com/2072-4292/16/10/1774

work page doi:10.3390/rs16101774 2024

[23] [23]

Cheng, I

B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, R. Girdhar, Masked-attention mask transformer for universal image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 1290–1299

work page 2022

[24] [24]

J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, H. Lu, Dual attention network for scene segmentation, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3141–3149. doi:10.1109/CVPR.2019.00326

work page doi:10.1109/cvpr.2019.00326 2019

[25] [25]

J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: 2018 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2018, pp. 7132–7141. doi:10.1109/CVPR.2018.00745

work page doi:10.1109/cvpr.2018.00745 2018

[26] [26]

S.-A. Liu, H. Xie, H. Xu, Y. Zhang, Q. Tian, Partial class activation attention for semantic seg- mentation, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 16815–16824. doi:10.1109/CVPR52688.2022.01633

work page doi:10.1109/cvpr52688.2022.01633 2022

[27] [27]

S. Woo, J. Park, J.-Y. Lee, I. S. Kweon, Cbam: Convolutional block attention module, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision – ECCV 2018, Springer International Publishing, Cham, 2018, pp. 3–19

work page 2018

[28] [28]

Pizzati, F

F. Pizzati, F. Garcia, Enhanced free space detection in multiple lanes based on single cnn with scene identification, in: 2019 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2019. doi:10.1109/ivs. 2019.8814181

work page doi:10.1109/ivs 2019

[29] [29]

Z. Wang, Z. Cheng, H. Huang, J. Zhao, Shuda-rfbnet for real-time multi-task traffic scene perception, in: 2019 Chinese Automation Congress (CAC), 2019, pp. 305–310. doi:10.1109/CAC48633.2019. 8997236

work page doi:10.1109/cac48633.2019 2019

[30] [30]

D. Qiao, F. Zulkernine, Drivable area detection using deep learning models for autonomous driving, in: 2021 IEEE International Conference on Big Data (Big Data), 2021, pp. 5233–5238. doi:10. 1109/BigData52589.2021.9671392

work page arXiv 2021

[31] [31]

Lee, Fast drivable areas estimation with multi-task learning for real-time autonomous driving assistant, Applied Sciences 11 (22) (2021)

D.-G. Lee, Fast drivable areas estimation with multi-task learning for real-time autonomous driving assistant, Applied Sciences 11 (22) (2021). doi:10.3390/app112210713

work page doi:10.3390/app112210713 2021

[32] [32]

L. Sun, F. Yan, T. Deng, C. Jiang, J. Li, A lightweight network with lane feature enhancement for multilane drivable area detection, in: 2022 14th International Conference on Wireless Communica- tions and Signal Processing (WCSP), 2022, pp. 66–71

work page 2022

[33] [33]

T. Luo, Y. Chen, T. Luan, B. Cai, L. Chen, H. Wang, Ids-model: An efficient multi-task model of road scene instance and drivable area segmentation for autonomous driving, IEEE Transactions on Transportation Electrification (2023) 1–1 doi:10.1109/TTE.2023.3293495. 29

work page doi:10.1109/tte.2023.3293495 2023

[34] [34]

Y. Ko, Y. Lee, S. Azam, F. Munir, M. Jeon, W. Pedrycz, Key points estimation and point instance segmentation approach for lane detection, IEEE Transactions on Intelligent Transportation Systems 23 (7) (2022) 8949–8958. doi:10.1109/TITS.2021.3088488

work page doi:10.1109/tits.2021.3088488 2022

[35] [35]

Z. Qin, H. Wang, X. Li, Ultra fast structure-aware deep lane detection, in: The European Conference on Computer Vision (ECCV), 2020

work page 2020

[36] [36]

Z. Qin, P. Zhang, X. Li, Ultra fast deep lane detection with hybrid anchor driven ordinal classification, IEEE Transactions on Pattern Analysis and Machine Intelligence (2022) 1–14 doi:10.1109/TPAMI. 2022.3182097

work page doi:10.1109/tpami 2022

[37] [37]

D. K. Lam, C. V. Du, H. L. Pham, Quantlanenet: A 640-fps and 34-gops/w fpga-based cnn accelerator for lane detection, Sensors 23 (15) (2023). doi:10.3390/s23156661. URL https://www.mdpi.com/1424-8220/23/15/6661

work page doi:10.3390/s23156661 2023

[38] [38]

Honda, Y

H. Honda, Y. Uchida, Clrernet: improving confidence of lane detection with laneiou, in: Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 1176–1185

work page 2024

[39] [39]

Silberman, D

N. Silberman, D. Hoiem, P. Kohli, R. Fergus, Indoor segmentation and support inference from rgbd images, in: A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, C. Schmid (Eds.), Computer Vision – ECCV 2012, Springer Berlin Heidelberg, Berlin, Heidelberg, 2012, pp. 746–760

work page 2012

[40] [40]

Pizzati, F

F. Pizzati, F. Garc´ ıa, Enhanced free space detection in multiple lanes based on single cnn with scene identification, in: 2019 IEEE Intelligent Vehicles Symposium (IV), 2019, pp. 2536–2541. doi: 10.1109/IVS.2019.8814181

work page doi:10.1109/ivs.2019.8814181 2019

[41] [41]

G. M. Jacob, V. Agarwal, B. Stenger, Online knowledge distillation for multi-task learning, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 2359–2368

work page 2023

[42] [42]

Taghavi, R

P. Taghavi, R. Langari, G. Pandey, Swinmtl: A shared architecture for simultaneous depth estima- tion and semantic segmentation from monocular camera images, in: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2024, pp. 4957–4964

work page 2024

[43] [43]

Wu, M.-W

D. Wu, M.-W. Liao, W.-T. Zhang, X. Wang, X. Bai, W. Cheng, W.-Y. Liu, Yolop: You only look once for panoptic driving perception, Machine Intelligence Research 19 (2021) 550 – 562

work page 2021

[44] [44]

H. Wang, M. Qiu, Y. Cai, L. Chen, Y. Li, Sparse u-pdp: A unified multi-task framework for panoptic driving perception, IEEE Transactions on Intelligent Transportation Systems 24 (10) (2023) 11308– 11320. doi:10.1109/TITS.2023.3273286

work page doi:10.1109/tits.2023.3273286 2023

[45] [45]

Quantizing deep convolutional networks for efficient inference: A whitepaper

R. Krishnamoorthi, Quantizing deep convolutional networks for efficient inference: A whitepaper, CoRR abs/1806.08342 (2018). arXiv:1806.08342. URL http://arxiv.org/abs/1806.08342

work page internal anchor Pith review Pith/arXiv arXiv 2018

[46] [46]

S. Han, J. Pool, J. Tran, W. J. Dally, Learning both weights and connections for efficient neural networks, in: Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, MIT Press, Cambridge, MA, USA, 2015, p. 1135–1143

work page 2015

[47] [47]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015). arXiv:1503. 02531. URL https://arxiv.org/abs/1503.02531 30

work page internal anchor Pith review Pith/arXiv arXiv 2015

[48] [48]

Howard, M

A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications (04 2017)

work page 2017

[49] [49]

Zhang, X

X. Zhang, X. Zhou, M. Lin, J. Sun, Shufflenet: An extremely efficient convolutional neural network for mobile devices, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2017) 6848–6856

work page 2018

[50] [50]

K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, C. Xu, Ghostnet: More features from cheap operations, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1577–1586. doi:10.1109/CVPR42600.2020.00165

work page doi:10.1109/cvpr42600.2020.00165 2020

[51] [51]

Jiang, H

X. Jiang, H. Wang, Y. Chen, Z. Wu, L. Wang, B. Zou, Y. Yang, Z. Cui, Y. Cai, T. Yu, C. Lv, Z. Wu, Mnn: A universal and efficient inference engine, in: MLSys, 2020

work page 2020

[52] [52]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal loss for dense object detection, in: Proceed- ings of the IEEE International Conference on Computer Vision (ICCV), 2017

work page 2017

[53] [53]

S. S. M. Salehi, D. Erdogmus, A. Gholipour, Tversky loss function for image segmentation using 3d fully convolutional deep networks, in: Q. Wang, Y. Shi, H.-I. Suk, K. Suzuki (Eds.), Machine Learning in Medical Imaging, Springer International Publishing, Cham, 2017, pp. 379–387

work page 2017

[54] [54]

Ioffe, C

S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, in: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, JMLR.org, 2015, p. 448–456

work page 2015

[55] [55]

K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034. doi:10.1109/ICCV.2015.123

work page doi:10.1109/iccv.2015.123 2015

[56] [56]

C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, M. Jorge Cardoso, Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer International Publishing, Cham, 2017, pp. 240–248

work page 2017

[57] [57]

J. Wang, Q. M. J. Wu, N. Zhang, You only look at once for real-time and generic multi-task (2023). arXiv:2310.01641

work page arXiv 2023

[58] [58]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. doi:10. 1109/CVPR.2016.90

work page 2016

[59] [59]

Loshchilov, F

I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: International Conference on Learning Representations, 2017

work page 2017

[60] [60]

Tarvainen, H

A. Tarvainen, H. Valpola, Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, Inc., 2017

work page 2017

[61] [61]

Jocher, A

G. Jocher, A. Chaurasia, J. Qiu, Ultralytics YOLO (Jan. 2023). URL https://github.com/ultralytics/ultralytics

work page 2023

[62] [62]

H. Wang, J. Wang, B. Xiao, Y. Jiao, J. Guo, Drivable area and lane line detection model based on semantic segmentation, in: 2024 IEEE 3rd International Conference on Computing and Machine Intelligence (ICMI), 2024, pp. 1–7. doi:10.1109/ICMI60790.2024.10585878. 31

work page doi:10.1109/icmi60790.2024.10585878 2024

[63] [63]

J. Zhao, D. Wu, Z. Yu, Z. Gao, Drmnet: A multi-task detection model based on image processing for autonomous driving scenarios, IEEE Transactions on Vehicular Technology (2023) 1–16 doi: 10.1109/TVT.2023.3296735

work page doi:10.1109/tvt.2023.3296735 2023

[64] [64]

C. Han, Q. Zhao, S. Zhang, Y. Chen, Z. Zhang, J. Yuan, Yolopv2: Better, faster, stronger for panoptic driving perception (2022). arXiv:2208.11434

work page arXiv 2022

[65] [65]

G. Chen, T. Wu, J. Duan, Q. Hu, D. Huang, H. Li, Centerpnets: A multi-task shared network for traffic perception, Sensors 23 (5) (2023). doi:10.3390/s23052467

work page doi:10.3390/s23052467 2023

[66] [66]

Zhang, Y

Y. Zhang, Y. Zheng, Z. Tu, C. Wu, T. Zhang, Cffm: Multi-task lane object detection method based on cross-layer feature fusion, Expert Systems with Applications 257 (2024) 125051. doi:https: //doi.org/10.1016/j.eswa.2024.125051

work page doi:10.1016/j.eswa.2024.125051 2024

[67] [67]

Che, D.-K

Q.-H. Che, D.-K. Lam, Trilitenet: Lightweight model for multi-task visual perception, IEEE Access 13 (2025) 50152–50166. doi:10.1109/ACCESS.2025.3552088

work page doi:10.1109/access.2025.3552088 2025

[68] [68]

Z. Wang, W. Ren, Q. Qiu, Lanenet: Real-time lane detection networks for autonomous driving, CoRR abs/1807.01726 (2018). arXiv:1807.01726. 32

work page internal anchor Pith review Pith/arXiv arXiv 2018