arxiv: 2604.22856 · v1 · submitted 2026-04-22 · 💻 cs.CV

Recognition: unknown

Attention-Augmented YOLOv8 with Ghost Convolution for Real-Time Vehicle Detection in Intelligent Transportation Systems

Syed Sajid Ullah , Muhammad Zunair Zamir , Ahsan Ishfaq , Salman Khan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:29 UTC · model grok-4.3

classification 💻 cs.CV

keywords YOLOv8vehicle detectionGhost ModuleCBAMDCNv2KITTI datasetintelligent transportation systemsreal-time detection

0 comments

The pith

Adding Ghost Module, CBAM, and DCNv2 to YOLOv8n raises vehicle detection mAP to 95.4 percent on KITTI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that a standard YOLOv8n detector can be made more accurate for vehicles by inserting three specific modules: the Ghost Module to cut redundant features, the Convolutional Block Attention Module to emphasize useful channels and locations, and Deformable Convolutional Networks v2 to adjust for varying vehicle shapes. This combination is presented as a way to handle the demands of real-time detection in traffic scenes without losing speed. The authors support the claim with results on the KITTI dataset, where the modified model beats the plain baseline by nearly nine percentage points and also outperforms several other detectors. Ablation tests are used to attribute the gains to the added modules rather than other changes.

Core claim

By integrating the Ghost Module for efficient feature generation, CBAM for channel and spatial attention, and DCNv2 for geometric adaptability into YOLOv8n, the resulting detector reaches 95.4% mAP@0.5 on the KITTI dataset, an 8.97% gain over the unmodified YOLOv8n baseline, together with 96.2% precision, 93.7% recall, and 94.93% F1-score. Comparative tests against seven other detectors and ablation studies confirm that the three modules together produce consistent improvements in feature handling for vehicle detection.

What carries the argument

The attention-augmented YOLOv8n backbone that combines Ghost Module, CBAM, and DCNv2 to reduce redundancy, refine features, and adapt to shape variations.

If this is right

The model outperforms seven existing detectors across precision, recall, and mAP metrics on KITTI.
Ablation experiments show each module contributes measurably when added individually or in combination.
The architecture maintains computational efficiency suitable for real-time traffic monitoring.
The same modules address feature redundancy, attention focus, and shape variation in complex scenes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same three-module pattern could be tested on other YOLO variants or on pedestrian and cyclist detection within the same dataset.
If the efficiency gains hold on embedded hardware, the detector becomes a candidate for roadside cameras in live traffic systems.
Extending the approach to multi-camera fusion or night-time infrared images would test whether the attention and deformable layers generalize to harder lighting conditions.

Load-bearing premise

The measured accuracy gains come chiefly from the three added modules and will hold for vehicle detection outside the KITTI dataset and under different training conditions.

What would settle it

Re-run the exact same training schedule and data augmentations on KITTI for both the baseline YOLOv8n and the proposed model; if the mAP gap shrinks below roughly 5 points, the claim that the modules are the main source of the 8.97% lift is weakened.

Figures

Figures reproduced from arXiv: 2604.22856 by Ahsan Ishfaq, Muhammad Zunair Zamir, Salman Khan, Syed Sajid Ullah.

**Figure 2.** Figure 2: Ghost Module 2) Convolutional Block Attention Module (CBAM): The Convolutional Block Attention Module (CBAM) [33] enhances feature representations by applying attention mechanisms in both channel and spatial dimensions. It consists of two sequential submodules: the Channel Attention Module (CAM) and the Spatial Attention Module (SAM). Channel Attention: Given an input feature map F ∈ R C×H×W , average-po… view at source ↗

**Figure 4.** Figure 4: Proposed Model Detection E. Experimental Setup The experiments were conducted on a high-performance computing environment to ensure efficient model training and evaluation. The detailed hardware and software configurations are presented in Table III, while the key training hyperparameters such as batch size, optimizer, learning rate schedule, and input resolution are summarized in Table IV. III. RESULTS A… view at source ↗

**Figure 5.** Figure 5: Metrics vs Epochs - Proposed [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 7.** Figure 7: Comparison of confusion matrices between the base [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

Accurate vehicle detection is a critical component of autonomous driving, traffic surveillance, and intelligent transportation systems. This paper presents an enhanced YOLOv8n-based model that integrates the Ghost Module, Convolutional Block Attention Module (CBAM), and Deformable Convolutional Networks v2 (DCNv2) to improve detection performance. The Ghost Module reduces feature redundancy through efficient feature generation, CBAM refines feature representation via channel and spatial attention, and DCNv2 enhances adaptability to geometric variations in vehicle structures. Evaluated on the KITTI dataset, the proposed model achieves 95.4% mAP@0.5, representing an 8.97% improvement over the baseline YOLOv8n, along with 96.2% precision, 93.7% recall, and a 94.93% F1-score. Comparative analysis against seven state-of-the-art detectors demonstrates consistent superiority across key performance metrics, while ablation studies validate the individual and combined contributions of the integrated modules. By addressing feature redundancy, attention refinement, and spatial adaptability, the proposed approach offers a robust and computationally efficient solution for vehicle detection in diverse and complex traffic environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a straightforward combo of Ghost, CBAM, and DCNv2 on YOLOv8n that posts 95.4% mAP on KITTI, but the 8.97% lift needs proof that training protocols were identical to the baseline.

read the letter

The paper takes YOLOv8n and adds the Ghost module to cut feature redundancy, CBAM for channel and spatial attention, and DCNv2 to handle geometric shifts in vehicles. They evaluate on KITTI and report 95.4% mAP@0.5, 96.2% precision, 93.7% recall, and 94.93% F1, plus comparisons to seven other detectors and ablation runs that break out each module's contribution. The focus stays on keeping inference fast enough for real-time intelligent transportation use. That specific combination and the exact KITTI numbers are new empirical results, and the ablations plus side-by-side metrics give a usable recipe for anyone already running YOLOv8 pipelines. The work stays grounded in public benchmarks and existing modules rather than claiming new theory. The main soft spot is the baseline comparison. The headline gain is pinned on the three modules, yet the abstract and stress-test note leave open whether the baseline YOLOv8n used the exact same optimizer, learning-rate schedule, epochs, augmentations, and data splits. If any of those differed even modestly, part of the 8.97% could be training rather than architecture. The paper mentions ablations, but without seeing the full tables or error bars it is still plausible rather than locked down. Everything is reported on KITTI alone, so we lack evidence on how the model behaves on other traffic datasets or under domain shift. This paper is for engineers and applied CV groups who need incremental, deployable improvements to YOLOv8 for vehicle detection. It will not reset research directions, but readers who want a documented starting point with attention and efficiency tweaks will get concrete numbers and module breakdowns. It deserves peer review so a referee can check the training logs and ask for cross-dataset tests or clearer ablation controls.

Referee Report

1 major / 2 minor

Summary. The paper proposes an attention-augmented YOLOv8n variant that integrates the Ghost Module for efficient feature generation, CBAM for channel/spatial attention, and DCNv2 for handling geometric variations in vehicles. Evaluated on the KITTI dataset, the model reports 95.4% mAP@0.5 (8.97% above baseline YOLOv8n), 96.2% precision, 93.7% recall, and 94.93% F1-score, with comparative results against seven other detectors and ablation studies claimed to validate the modules' contributions for real-time vehicle detection in intelligent transportation systems.

Significance. If the reported gains are shown to arise specifically from the added modules under matched training conditions, the work would offer a practical, efficiency-aware improvement to YOLOv8 for vehicle detection tasks. The combination of Ghost convolution, attention, and deformable convolutions is a standard and plausible direction in the field; reproducible ablation results and consistent outperformance on a public benchmark would strengthen its utility for ITS applications.

major comments (1)

[Abstract and §4 (Experiments)] Abstract and §4 (Experiments): The central claim attributes the 8.97% mAP@0.5 lift (95.4% vs. YOLOv8n baseline) primarily to Ghost Module + CBAM + DCNv2, supported by ablation studies. However, the manuscript does not explicitly state that the baseline YOLOv8n was retrained under identical conditions (optimizer, learning-rate schedule, number of epochs, data augmentations, and train/val splits). Without this, performance differences cannot be confidently ascribed to the architectural additions rather than training-protocol variations; this directly undermines the ablation-based validation of module contributions.

minor comments (2)

[Abstract and Results] The abstract and results sections claim real-time suitability but report no FPS, inference latency, or FLOPs numbers for the proposed model versus baseline; adding these metrics (e.g., on the same hardware) would directly support the efficiency claims.
[Tables 1-3] Table captions and axis labels in the comparative and ablation tables should explicitly note the evaluation protocol (e.g., mAP@0.5 on KITTI val split) to avoid ambiguity when readers compare against other published YOLOv8 variants.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which will help strengthen the clarity and rigor of our work. We address the major comment below.

read point-by-point responses

Referee: [Abstract and §4 (Experiments)] Abstract and §4 (Experiments): The central claim attributes the 8.97% mAP@0.5 lift (95.4% vs. YOLOv8n baseline) primarily to Ghost Module + CBAM + DCNv2, supported by ablation studies. However, the manuscript does not explicitly state that the baseline YOLOv8n was retrained under identical conditions (optimizer, learning-rate schedule, number of epochs, data augmentations, and train/val splits). Without this, performance differences cannot be confidently ascribed to the architectural additions rather than training-protocol variations; this directly undermines the ablation-based validation of module contributions.

Authors: We agree with the referee that explicit confirmation of identical training conditions is essential for attributing performance gains to the architectural changes and for supporting the ablation results. In our experiments, the YOLOv8n baseline was retrained from scratch under exactly the same conditions as the proposed model, using the Adam optimizer, the identical learning-rate schedule, 300 epochs, the same data augmentations, and the same train/validation splits on the KITTI dataset. This controlled setup ensures that the reported 8.97% mAP@0.5 improvement (and the ablation outcomes) can be confidently ascribed to the Ghost Module, CBAM, and DCNv2. We will revise Section 4 to include a clear statement of these matched training protocols and will add a brief reference in the abstract and ablation discussion to improve reproducibility and strengthen the validation of the module contributions. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results from benchmark evaluation

full rationale

The paper describes an architectural enhancement to YOLOv8n via Ghost Module, CBAM, and DCNv2, then reports measured performance (mAP@0.5, precision, recall, F1) after training and evaluation on the public KITTI dataset, plus ablations and comparisons to other detectors. No mathematical derivation chain, first-principles predictions, or fitted parameters are claimed; all headline numbers are direct empirical outputs. No self-citations, self-definitional equations, or renamings of known results appear in the abstract or described content. The central claims rest on experimental protocol rather than any reduction to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the standard assumption that KITTI is a sufficient proxy for real-world vehicle detection and that the added modules produce additive gains independent of training details.

axioms (1)

domain assumption The KITTI dataset distribution is representative of the target deployment environments for intelligent transportation systems.
Evaluation and claims of superiority rely on this dataset without explicit discussion of its limitations in the abstract.

pith-pipeline@v0.9.0 · 5518 in / 1233 out tokens · 54389 ms · 2026-05-10T00:29:30.093755+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 5 canonical work pages · 2 internal anchors

[1]

The application of virtual reality technology on intelligent traffic construction and decision support in smart cities,

G. Yan and Y . Chen, “The application of virtual reality technology on intelligent traffic construction and decision support in smart cities,”Wire- less Communications and Mobile Computing, vol. 2021, p. 3833562, 2021

2021
[2]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788

2016
[3]

Yolo9000: Better, faster, stronger,

J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6517–6525

2017
[4]

YOLOv3: An Incremental Improvement

——, “Yolov3: An incremental improvement,”arXiv preprint arXiv:1804.02767, 2018

work page internal anchor Pith review arXiv 2018
[5]

YOLOv4: Optimal Speed and Accuracy of Object Detection

A. Bochkovskiy, C.-Y . Wang, and H.-Y . M. Liao, “Yolov4: Op- timal speed and accuracy of object detection,”arXiv preprint arXiv:2004.10934, 2020

work page internal anchor Pith review arXiv 2004
[6]

Ultralytics yolov5,

G. Jocheret al., “Ultralytics yolov5,” https://github.com/ultralytics/ yolov5, 2020

2020
[7]

Under the hood: Yolov8 architecture explained,

Keylabs, “Under the hood: Yolov8 architecture explained,” https: //keylabs.ai/blog/under-the-hood-yolov8-architecture-explained/, 2023, accessed: 2024-12-09

2023
[8]

A bearing surface defect detection method based on multi- attention mechanism yolov8,

P. Ding, “A bearing surface defect detection method based on multi- attention mechanism yolov8,”Measurement Science and Technology, vol. 35, no. 8, p. 086003, 2024

2024
[9]

GhostNetV2: Enhance Cheap Operation with Long-Range Attention,

Y . Tang, K. Han, J. Guo, C. Xu, C. Xu, and Y . Wang, “GhostNetV2: Enhance Cheap Operation with Long-Range Attention,” inAdvances in Neural Information Processing Systems (NeurIPS), 2022. [Online]. Available: https://arxiv.org/abs/2211.12905

work page arXiv 2022
[10]

YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection,

C.-T. Chien, R.-Y . Ju, K.-Y . Chou, E. Xieerke, and J.-S. Chiang, “YOLOv8-AM: YOLOv8 Based on Effective Attention Mechanisms for Pediatric Wrist Fracture Detection,”IEEE Acces, 2025. [Online]. Available: https://ieeexplore.ieee.org/document/10918980

work page arXiv 2025
[11]

Object Detection Algorithm Based on Improved YOLOv8 for Drill Pipe on Coal Mines,

X. Li, M. Li, and M. Zhao, “Object Detection Algorithm Based on Improved YOLOv8 for Drill Pipe on Coal Mines,”Scientific Reports, vol. 15, no. 5942, 2025. [Online]. Available: https: //www.nature.com/articles/s41598-025-89019-8

2025
[12]

Fully deformable convolutional network for ship detection in remote sensing imagery,

H. Guo, H. Bai, Y . Yuan, and W. Qin, “Fully deformable convolutional network for ship detection in remote sensing imagery,”Remote Sensing, vol. 14, no. 8, p. 1850, 2022

2022
[13]

A comprehensive review of yolo architectures in computer vision: from yolov1 to yolov8 and yolo-nas,

J. Terven, “A comprehensive review of yolo architectures in computer vision: from yolov1 to yolov8 and yolo-nas,”Machine Learning and Knowledge Extraction, vol. 5, no. 4, pp. 1680–1716, 2023

2023
[14]

An improved yolov8 to detect moving objects,

M. Safaldin, “An improved yolov8 to detect moving objects,”IEEE Access, vol. 12, pp. 59 782–59 806, 2024

2024
[15]

Improved vehicle detection systems with double-layer lstm modules,

W. Yang, W. Liow, S. Chen, J. Yang, P. Chung, and S. Mao, “Improved vehicle detection systems with double-layer lstm modules,”Eurasip Journal on Advances in Signal Processing, vol. 2022, pp. 1–10, 2022

2022
[16]

Vehicle detection using deep learning technique in tunnel road environments,

J. Kim, “Vehicle detection using deep learning technique in tunnel road environments,”Symmetry, vol. 12, no. 12, p. 2012, 2020

2012
[17]

Vehicle detection using yolov5,

C. Chavan, “Vehicle detection using yolov5,”International Journal of Scientific Research in Engineering and Management, vol. 07, no. 05, 2023

2023
[18]

Deep learning based multi-target detection for roads,

J. Jiang, “Deep learning based multi-target detection for roads,”Applied and Computational Engineering, vol. 39, no. 1, pp. 38–43, 2024

2024
[19]

Lightweight yolov5 architecture for real-time vehicle detection in intelligent transportation systems,

L. Xu and B. Chen, “Lightweight yolov5 architecture for real-time vehicle detection in intelligent transportation systems,”IEEE Access, vol. 11, pp. 6783–6795, 2023

2023
[20]

Convolution neural network with selective multi-stage feature fusion: case study on vehicle rear detection,

W. Lee, D. Kim, T. Kang, and M. Lim, “Convolution neural network with selective multi-stage feature fusion: case study on vehicle rear detection,”Applied Sciences, vol. 8, no. 12, p. 2468, 2018

2018
[21]

YOLOv5-CBAM: A Small Object Detection Model Based on YOLOv5 and CBAM,

Q. Ma, “YOLOv5-CBAM: A Small Object Detection Model Based on YOLOv5 and CBAM,” in2024 6th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI), 2024, pp. 618–623. [Online]. Available: https://doi.org/10.1109/RICAI64321. 2024.10911839

work page doi:10.1109/ricai64321 2024
[22]

Object detection in aerial images using cbam and fpn,

J. Li and Y . Wang, “Object detection in aerial images using cbam and fpn,”Sensors, vol. 20, no. 18, p. 5245, 2020

2020
[23]

Intelligent detection of hazardous goods vehicles and determination of risk grade based on deep learning,

Q. An, S. Wu, R. Shi, H. Wang, J. Yu, and Z. Li, “Intelligent detection of hazardous goods vehicles and determination of risk grade based on deep learning,”Sensors, vol. 22, no. 19, p. 7123, 2022

2022
[24]

Deformable convnets v2: More deformable, better results,

X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9308– 9316

2019
[25]

Yolov8 based novel approach for object detection on lidar point cloud,

S. Behera, B. Anandet al., “Yolov8 based novel approach for object detection on lidar point cloud,” in2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring), 2024, pp. 1–5

2024
[26]

A visual de- tection algorithm for autonomous driving road environment perception,

P. Cong, H. Feng, S. Li, T. Li, Y . Xu, and X. Zhang, “A visual de- tection algorithm for autonomous driving road environment perception,” Engineering Applications of Artificial Intelligence, vol. 133, p. 108034, 2024

2024
[27]

Road object detection algorithm based on improved yolov8,

J. Peng, C. Li, A. Jiang, B. Mou, Y . Luo, and W. Chen, “Road object detection algorithm based on improved yolov8,” in2024 IEEE 19th Conference on Industrial Electronics and Applications (ICIEA), 2024, pp. 1–6

2024
[28]

An improved lightweight network for real-time detection of potential risks for autonomous vehicles,

X. Shen and V . V . Lukyanov, “An improved lightweight network for real-time detection of potential risks for autonomous vehicles,” in2024 International Russian Automation Conference (RusAutoCon). IEEE, 2024, pp. 583–588

2024
[29]

Gbforkdet: A lightweight object detector for forklift safety driving,

L. Ye and S. Chen, “Gbforkdet: A lightweight object detector for forklift safety driving,”IEEE Access, vol. 11, pp. 86 509–86 521, 2023

2023
[30]

Z- yolov8s-based approach for road object recognition in complex traffic scenarios,

R. Zhao, S. H. Tang, E. E. B. Supeni, S. A. Rahim, and L. Fan, “Z- yolov8s-based approach for road object recognition in complex traffic scenarios,”Alexandria Engineering Journal, vol. 106, pp. 298–311, 2024

2024
[31]

Real-time vehicle detection algorithm based on vision and lidar point cloud fusion,

H. Wang, X. Lou, Y . Cai, Y . Li, and L. Chen, “Real-time vehicle detection algorithm based on vision and lidar point cloud fusion,” Journal of Sensors, vol. 2019, pp. 1–9, 2019

2019
[32]

Ghostnet: More features from cheap operations,

K. Han, Y . Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “Ghostnet: More features from cheap operations,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1580–1589

2020
[33]

Cbam: Convolutional block attention module,

S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” inProceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–19

2018
[34]

A multi-objective dynamic detection model in autonomous driving based on an improved yolov8,

C. Li, Y . Zhu, and M. Zheng, “A multi-objective dynamic detection model in autonomous driving based on an improved yolov8,”Alexandria Engineering Journal, vol. 122, pp. 453–464, 2025

2025