arxiv: 2604.26857 · v1 · submitted 2026-04-29 · 💻 cs.CV · cs.LG· cs.RO· eess.IV

Recognition: unknown

Edge AI for Automotive Vulnerable Road User Safety: Deployable Detection via Knowledge Distillation

Akshay Karjol , Darrin M. Hanna

Authors on Pith no claims yet

Pith reviewed 2026-05-07 10:25 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.ROeess.IV

keywords knowledge distillationobject detectionvulnerable road usersINT8 quantizationedge AIautomotive safetyYOLO modelsBDD100K

0 comments

The pith

Knowledge distillation transfers precision calibration so a compact detector survives INT8 quantization while a large teacher collapses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that knowledge distillation from a large YOLOv8-L teacher to a small YOLOv8-S student lets the compact model keep most of its accuracy after the 8-bit integer quantization required for edge hardware. On the full BDD100K dataset the teacher loses 23 percent mAP under INT8 while the distilled student loses only 5.6 percent; at the same recall the student reaches 0.748 precision versus 0.653 for a directly trained small model. This matters for automotive safety because edge devices in vehicles must detect pedestrians and cyclists in real time without excessive false alarms or missed detections, and neither raw large models nor naive small models meet the constraints.

Core claim

Training the 11.2-million-parameter YOLOv8-S student to mimic the 43.7-million-parameter YOLOv8-L teacher via knowledge distillation produces a model 3.9 times smaller that achieves 0.748 precision at INT8, exceeding both the directly trained small model's 0.653 precision and the teacher's own full-precision 0.718 precision while cutting false alarms 44 percent at matched recall. The teacher itself drops catastrophically under INT8 quantization, confirming that distillation transfers the precision calibration needed for quantization robustness rather than raw detection capacity.

What carries the argument

Knowledge distillation that transfers precision calibration from teacher to student, enabling the compact model to resist accuracy loss during post-training INT8 quantization.

If this is right

The teacher suffers a 23 percent mAP collapse under INT8 while the KD student drops only 5.6 percent.
At equivalent recall the KD student delivers 14.5 percent higher precision than direct training of the same small architecture.
False alarms fall by 44 percent relative to the collapsed teacher at INT8.
The 3.9-times-smaller KD student exceeds the teacher's full-precision precision.
Knowledge distillation becomes a required step for accurate safety-critical VRU detection on edge hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same distillation step could let other quantized safety systems trade model size for reliability without retraining from scratch.
Testing the approach on streaming real-world video rather than static BDD100K frames would reveal whether the precision transfer survives motion blur and varying lighting.
If the calibration effect generalizes, manufacturers could deploy smaller models on existing vehicle chips and still meet regulatory false-alarm limits.

Load-bearing premise

The precision calibration passed by knowledge distillation is the main driver of the observed robustness and will continue to work on datasets, architectures, or driving conditions outside the BDD100K tests.

What would settle it

If a new dataset or different detection architecture shows the distilled student losing its precision advantage over direct training after INT8 quantization, the claim that distillation specifically supplies the needed calibration would be refuted.

Figures

Figures reproduced from arXiv: 2604.26857 by Akshay Karjol, Darrin M. Hanna.

**Figure 1.** Figure 1: BDD100K class distribution showing severe imbalance. VRU view at source ↗

**Figure 2.** Figure 2: Quantization impact: FP32 (solid) vs INT8 (hatched). Teacher view at source ↗

**Figure 3.** Figure 3: Deployment comparison: KD INT8 achieves highest precision view at source ↗

read the original abstract

Deploying accurate object detection for Vulnerable Road User (VRU) safety on edge hardware requires balancing model capacity against computational constraints. Large models achieve high accuracy but fail under INT8 quantization required for edge deployment, while small models sacrifice detection performance. This paper presents a knowledge distillation (KD) framework that trains a compact YOLOv8-S student (11.2M parameters) to mimic a YOLOv8-L teacher (43.7M parameters), achieving 3.9x compression while preserving quantization robustness. We evaluate on full-scale BDD100K (70K training images) with Post-Training Quantization to INT8. The teacher suffers catastrophic degradation under INT8 (-23% mAP), while the KD student retains accuracy (-5.6% mAP). Analysis reveals that KD transfers precision calibration rather than raw detection capacity: the KD student achieves 0.748 precision versus 0.653 for direct training at INT8, a 14.5% gain at equivalent recall, reducing false alarms by 44% versus the collapsed teacher. At INT8, the KD student exceeds the teacher's FP32 precision (0.748 vs. 0.718) in a model 3.9x smaller. These findings establish knowledge distillation as a requirement for deploying accurate, safety-critical VRU detection on edge hardware.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KD from large to small YOLOv8 improves INT8 precision on BDD100K VRU detection over direct training, but the causal claim needs ablations and results stay limited to one dataset.

read the letter

The main point from this paper is that knowledge distillation can make a compact YOLOv8 student model more robust to INT8 quantization than either the large teacher or a directly trained small model, at least on the BDD100K dataset for vulnerable road user detection. What stands out is the scale of the evaluation. They use the full 70,000 training images from BDD100K and apply post-training quantization to INT8. The teacher loses 23% mAP under quantization, but the distilled student only drops 5.6%. More interestingly, at INT8 the student reaches 0.748 precision compared to 0.653 for direct training at matched recall, which cuts false alarms by 44%. It even surpasses the teacher's full-precision precision while being 3.9 times smaller. These are specific, measurable outcomes that address a practical problem in automotive edge AI. The work is incremental in the sense that it applies standard knowledge distillation to this quantization issue rather than introducing a new method. Still, the direct comparison to both the teacher and a non-distilled baseline on a large public dataset gives it some value for practitioners. The main limitation is the lack of controls to pin down why the improvement happens. The authors say distillation transfers precision calibration, but without experiments that keep the optimizer, data augmentation, and other factors identical while turning off only the distillation loss, that remains a correlation. All results are also limited to BDD100K, so there's no test of how well this holds under different sensors, weather, or locations. No error bars or repeated runs are mentioned, which makes it harder to judge the reliability of the gains. This paper would be useful for engineers working on deploying detection models in vehicles where hardware limits force quantization. A reader focused on real-world edge constraints in computer vision for safety would get practical insights from the numbers and setup. I think it deserves peer review. The claims are grounded in concrete experiments on a standard dataset, and the topic is relevant enough that referees can evaluate the methods and suggest the needed ablations.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a knowledge distillation (KD) framework that trains a compact YOLOv8-S student model (11.2M parameters) to mimic a larger YOLOv8-L teacher (43.7M parameters) for vulnerable road user (VRU) object detection. Evaluated on the full BDD100K dataset (70K training images) using post-training INT8 quantization, the paper reports that the KD student retains accuracy better than the teacher (-5.6% vs -23% mAP drop) and outperforms direct training of the student (0.748 vs 0.653 precision at matched recall), claiming that KD transfers precision calibration to enable 3.9x compression while exceeding the teacher's FP32 precision in a safety-critical edge deployment setting.

Significance. If the central empirical findings hold after addressing the noted gaps, the work would be significant for edge AI in automotive safety applications. It supplies concrete, large-scale BDD100K results with explicit teacher/student and FP32/INT8 comparisons that demonstrate a practical path to quantization-robust detection in a 3.9x smaller model, potentially informing deployable VRU safety systems. The full-scale evaluation and reported false-alarm reduction metrics provide actionable performance data for the field.

major comments (3)

[Results and Analysis] The attribution of INT8 robustness specifically to transferred 'precision calibration' (rather than other training factors) is not isolated. In the results and analysis sections, the direct-training YOLOv8-S baseline may differ from the KD run in optimization dynamics, data augmentation schedule, or convergence behavior; without an ablation that holds all other training elements fixed while removing only the distillation loss, the 14.5% precision gain (0.748 vs 0.653) and 44% false-alarm reduction remain correlational. This is load-bearing for the claim that KD is 'a requirement' for edge VRU detection.
[Evaluation] All quantitative claims, including the mAP degradations, precision values, and generalization to safety-critical deployment, rest exclusively on the BDD100K dataset. No cross-dataset evaluation, tests under distribution shift (different sensors, geographies, or lighting), or out-of-distribution robustness checks are reported, which limits support for the conclusion that the approach will generalize beyond this single evaluation.
[Experimental Results] The reported metrics (e.g., 0.748 precision, 14.5% gain, 44% false-alarm reduction) lack error bars, results from multiple random seeds, or statistical significance tests. This makes it difficult to assess the reliability of the observed differences between the KD student, direct training, and teacher under INT8.

minor comments (2)

[Abstract and Results] The abstract and results text refer to 'the collapsed teacher' without defining the term or specifying the exact recall operating point used for the 44% false-alarm reduction calculation.
[Method] Full training details—including the distillation loss formulation, temperature parameter, loss weighting, optimizer settings, and augmentation schedules for both teacher and student—are not provided, which hinders reproducibility of the reported INT8 outcomes.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for their thorough review and valuable suggestions. We have carefully considered each major comment and provide point-by-point responses below. Where appropriate, we will revise the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Results and Analysis] The attribution of INT8 robustness specifically to transferred 'precision calibration' (rather than other training factors) is not isolated. In the results and analysis sections, the direct-training YOLOv8-S baseline may differ from the KD run in optimization dynamics, data augmentation schedule, or convergence behavior; without an ablation that holds all other training elements fixed while removing only the distillation loss, the 14.5% precision gain (0.748 vs 0.653) and 44% false-alarm reduction remain correlational. This is load-bearing for the claim that KD is 'a requirement' for edge VRU detection.

Authors: We appreciate this observation on the need to isolate the distillation effect. In our work, the direct training baseline for YOLOv8-S follows the official YOLOv8 training protocol with standard hyperparameters, while the KD training uses the identical protocol augmented only by the distillation loss term. This ensures that differences in optimization, augmentation, and convergence are minimized. We will revise the results and analysis sections to explicitly document the matched training configurations and clarify that the performance gains are attributable to the inclusion of the distillation objective. This strengthens the evidence that KD is key for the observed INT8 robustness. revision: partial
Referee: [Evaluation] All quantitative claims, including the mAP degradations, precision values, and generalization to safety-critical deployment, rest exclusively on the BDD100K dataset. No cross-dataset evaluation, tests under distribution shift (different sensors, geographies, or lighting), or out-of-distribution robustness checks are reported, which limits support for the conclusion that the approach will generalize beyond this single evaluation.

Authors: We agree that broader evaluation would better support generalization claims. BDD100K provides a comprehensive testbed with over 70K images covering diverse real-world driving conditions relevant to VRU safety. We will update the evaluation and discussion sections to include a dedicated limitations paragraph acknowledging the single-dataset scope and proposing future extensions to other datasets (e.g., under varying sensor or geographic shifts) for out-of-distribution testing. revision: partial
Referee: [Experimental Results] The reported metrics (e.g., 0.748 precision, 14.5% gain, 44% false-alarm reduction) lack error bars, results from multiple random seeds, or statistical significance tests. This makes it difficult to assess the reliability of the observed differences between the KD student, direct training, and teacher under INT8.

Authors: We acknowledge the value of statistical validation for the reported differences. Our experiments were performed with fixed seeds on a single run per model configuration owing to the high computational cost of full-scale training on BDD100K. We will revise the experimental results section to note this and emphasize that the trends (e.g., consistent mAP retention and precision gains) are robust across the compared models. We plan to incorporate multi-seed averages in future extensions of this work. revision: partial

standing simulated objections not resolved

Cross-dataset and distribution shift evaluations
Error bars and multi-seed statistical analysis for all metrics

Circularity Check

0 steps flagged

No circularity; purely empirical evaluation on public dataset

full rationale

The paper reports direct experimental measurements of mAP, precision, recall, and model size after training a YOLOv8 student via knowledge distillation versus direct training, followed by INT8 post-training quantization, all evaluated on the public BDD100K dataset. No equations, fitted parameters, or predictions are defined in terms of themselves. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claims rest on observable performance deltas rather than any derivation that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is empirical and relies on standard assumptions of deep learning training and the representativeness of the BDD100K dataset for automotive scenarios.

axioms (1)

domain assumption Knowledge distillation can transfer calibration properties that improve quantization robustness
Core premise of the KD framework used to explain why the student outperforms direct training under INT8

pith-pipeline@v0.9.0 · 5552 in / 1261 out tokens · 59768 ms · 2026-05-07T10:25:27.509116+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 8 canonical work pages · 3 internal anchors

[1]

WHO global air quality guidelines: particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide.https://www

World Health Organization, “Global status report on road safety 2023,” World Health Organization, Geneva, Switzerland, Tech. Rep., 2023, accessed: Apr. 2026. [Online]. Available: https: //www.who.int/publications/i/item/9789240086517

work page arXiv 2023
[2]

Pedestrian traffic safety facts 2023,

National Highway Traffic Safety Administration, “Pedestrian traffic safety facts 2023,” U.S. Department of Transportation, Washington, DC, Report No. DOT HS 813 581, 2024, accessed: Apr. 2026. [Online]. Available: https://crashstats.nhtsa.dot.gov/

2023
[3]

Bicyclists and other cyclists traffic safety facts 2023,

——, “Bicyclists and other cyclists traffic safety facts 2023,” U.S. Department of Transportation, Washington, DC, Report No. DOT HS 813 739, 2024, accessed: Apr. 2026. [Online]. Available: https://crashstats.nhtsa.dot.gov/

2023
[4]

Deep neural network based vehicle and pedestrian detection for autonomous driving: A survey,

L. Chen, S. Lin, X. Lu, D. Cao, H. Wu, C. Guo, C. Liu, and F.-Y . Wang, “Deep neural network based vehicle and pedestrian detection for autonomous driving: A survey,”IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 6, pp. 3234–3246, 2021

2021
[5]

The practical effec- tiveness of advanced driver assistance systems at different roadway facilities,

L. Yue, M. Abdel-Aty, Y . Wu, and A. Farid, “The practical effec- tiveness of advanced driver assistance systems at different roadway facilities,”IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 9, pp. 3859–3871, 2020

2020
[6]

You only look once: Unified, real-time object detection,

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788

2016
[7]

Ultralytics YOLOv8,

G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics YOLOv8,” GitHub, 2023, accessed: Apr. 2026. [Online]. Available: https: //github.com/ultralytics/ultralytics

2023
[8]

Edge computing for autonomous driving: Opportunities and challenges,

S. Liu, L. Liu, J. Tang, B. Yu, Y . Wang, and W. Shi, “Edge computing for autonomous driving: Opportunities and challenges,”Proceedings of the IEEE, vol. 107, no. 8, pp. 1697–1716, 2019

2019
[9]

Survey and benchmarking of machine learning accelerators,

A. Reuther, J. Kepner, C. Byber, V . Gadepally, M. Hubbell, M. Mc- Cary, J. Mullen, A. Prout, A. Rosa, C. Yeeet al., “Survey and benchmarking of machine learning accelerators,” inProceedings of the IEEE High Performance Extreme Computing Conference (HPEC), 2019, pp. 1–9

2019
[10]

Quantization and training of neural networks for efficient integer-arithmetic-only inference,

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2704–2713

2018
[11]

Integer quantization for deep learning inference: Principles and empirical evaluation

H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius, “Integer quantization for deep learning inference: Principles and empirical evaluation,”arXiv preprint arXiv:2004.09602, 2020. [Online]. Available: https://arxiv.org/abs/2004.09602

work page arXiv 2004
[12]

Quantization Robustness to Input Degradations for Object Detection

Y . Chenet al., “INT8 quantization sensitivity analysis for object detection models,”arXiv preprint arXiv:2508.19600, 2025. [Online]. Available: https://arxiv.org/abs/2508.19600

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

A white paper on neural network quantization

M. Nagel, M. Fournarakis, R. A. Amjad, Y . Bondarenko, M. van Baalen, and T. Blankevoort, “A white paper on neural network quantization,”arXiv preprint arXiv:2106.08295, 2021. [Online]. Available: https://arxiv.org/abs/2106.08295

work page arXiv 2021
[14]

Class imbalance in object detection: An experimental diagnosis and study of mitigation strategies,

K. Oksuz, B. C. Cam, S. Kalkan, and E. Akbas, “Class imbalance in object detection: An experimental diagnosis and study of mitigation strategies,”arXiv preprint arXiv:2403.07113, 2024. [Online]. Available: https://arxiv.org/abs/2403.07113

work page arXiv 2024
[15]

Distilling the Knowledge in a Neural Network

G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” inProceedings of the NeurIPS Deep Learning Workshop, 2015. [Online]. Available: https://arxiv.org/abs/1503.02531

work page internal anchor Pith review arXiv 2015
[16]

FitNets: Hints for Thin Deep Nets

A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y . Bengio, “FitNets: Hints for thin deep nets,” inProceedings of the International Conference on Learning Representations (ICLR), 2015. [Online]. Available: https://arxiv.org/abs/1412.6550

work page internal anchor Pith review arXiv 2015
[17]

A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,

J. Yim, D. Joo, J. Bae, and J. Kim, “A gift from knowledge distillation: Fast optimization, network minimization and transfer learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4133–4141

2017
[18]

Knowledge distillation: A survey,

J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,”International Journal of Computer Vision, vol. 129, no. 6, pp. 1789–1819, 2021

2021
[19]

Focal and global knowledge distillation for detectors,

Z. Yang, Z. Li, X. Shao, R. Shi, Z. Wan, and L. Hong, “Focal and global knowledge distillation for detectors,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 4643–4652

2022
[20]

PKD: General distillation framework for object detectors via pearson correlation coefficient,

W. Cao, Y . Zhang, J. Gao, A. Cheng, K. Cheng, and J. Cheng, “PKD: General distillation framework for object detectors via pearson correlation coefficient,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 35, 2022, pp. 15 394–15 406

2022
[21]

VanillaKD: Revisit the power of vanilla knowledge distillation from small scale to large scale,

C. Yanget al., “VanillaKD: Revisit the power of vanilla knowledge distillation from small scale to large scale,”arXiv preprint arXiv:2305.15781, 2023. [Online]. Available: https://arxiv.org/abs/ 2305.15781

work page arXiv 2023
[22]

BDD100K: A diverse driving dataset for heterogeneous multitask learning,

F. Yu, H. Chen, X. Wang, W. Xian, Y . Chen, F. Liu, V . Madhavan, and T. Darrell, “BDD100K: A diverse driving dataset for heterogeneous multitask learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2636– 2645

2020
[23]

NVIDIA TensorRT developer guide,

NVIDIA Corporation, “NVIDIA TensorRT developer guide,” NVIDIA Documentation, 2024, accessed: Apr. 2026. [Online]. Available: https://docs.nvidia.com/deeplearning/tensorrt/

2024
[24]

Trust in automation: Designing for appropriate reliance,

J. D. Lee and K. A. See, “Trust in automation: Designing for appropriate reliance,”Human Factors, vol. 46, no. 1, pp. 50–80, 2004

2004
[25]

Histograms of oriented gradients for human detection,

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2005, pp. 886–893

2005
[26]

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,

C.-Y . Wang, A. Bochkovskiy, and H.-Y . M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 7464–7475

2023
[27]

LKD-YOLOv8: A lightweight knowledge distillation- based method for infrared object detection,

Y . Liuet al., “LKD-YOLOv8: A lightweight knowledge distillation- based method for infrared object detection,”Electronics, vol. 13, no. 17, p. 3521, 2024

2024
[28]

CrossKD: Cross-head knowledge distillation for object detection,

J. Wang, Y . Luo, L. Gu, and L. Wang, “CrossKD: Cross-head knowledge distillation for object detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16 520–16 530

2024
[29]

Learned step size quantization,

S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha, “Learned step size quantization,” inProceedings of the International Conference on Learning Representations (ICLR), 2020

2020
[30]

On calibration of modern neural networks,

C. Guo, G. Pleiss, Y . Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” inProceedings of the International Conference on Machine Learning (ICML), 2017, pp. 1321–1330

2017
[31]

Introducing W ATCHMAN: A novel framework to validate automated vehicle safety,

S. Khastgir, H. Sivencrona, G. Dhadyalla, P. Billing, S. Birrell, and P. Jennings, “Introducing W ATCHMAN: A novel framework to validate automated vehicle safety,”IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 6, pp. 2265–2276, 2020

2020
[32]

Comprehensive performance evaluation of YOLO architectures: YOLOv5 to YOLO11 for object detection,

A. Jhaet al., “Comprehensive performance evaluation of YOLO architectures: YOLOv5 to YOLO11 for object detection,”Scientific Reports, vol. 15, 2025

2025
[33]

Real-time object detection for autonomous driving: A comparative study of YOLOv8, YOLOv9, and YOLOv10,

M. Zhanget al., “Real-time object detection for autonomous driving: A comparative study of YOLOv8, YOLOv9, and YOLOv10,”Scientific Reports, vol. 15, 2025. Akshay Karjol(Senior Member, IEEE) received the M.S. degree in automotive engineering from Clemson University–International Center for Automotive Research (CU-ICAR), SC, USA. He is currently pursuing th...

2025