A Case Study on Energy-Efficient Edge AI Crack Segmentation
Pith reviewed 2026-05-10 12:36 UTC · model grok-4.3
The pith
Knowledge distillation and FPGA hardware raise crack segmentation accuracy to 71.92% mean IoU on edge devices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Knowledge distillation improves every tested U-Net variant for crack segmentation. The strongest model reaches 71.92% mean IoU on the CrackVision12K dataset, an increase of 8.82 percentage points over the prior reported result. A selected FPGA implementation runs at 398 FPS while delivering 204.99 frames per joule and a mean IoU of 69.42%.
What carries the argument
Knowledge distillation applied to U-Net variants, followed by post-training quantization and a custom FPGA hardware architecture that accelerates inference under tight power limits.
If this is right
- Infrastructure inspection can run continuously on battery-powered devices without constant cloud uploads.
- UAVs can perform autonomous crack surveys with lower latency and reduced exposure of human operators to hazardous sites.
- Local edge processing lowers both data-storage costs and transmission-related security exposure.
Where Pith is reading between the lines
- The same distillation-plus-quantization recipe may transfer to other defect-detection tasks on constrained hardware beyond road cracks.
- Hybrid acceleration that pairs the FPGA design with a small GPU or CPU fallback could provide graceful degradation when power budgets tighten further.
- Systematic stress-testing across multiple public crack datasets would clarify how much dataset-specific retraining the approach actually needs.
Load-bearing premise
The accuracy and efficiency gains from distillation, quantization, and the custom FPGA design will continue when the same pipeline is moved to new crack datasets, different lighting, or other edge platforms without major retuning.
What would settle it
Apply the distilled and quantized models to an independent crack dataset recorded under changed lighting or camera conditions and check whether mean IoU falls below the previously published baseline.
Figures
read the original abstract
Crack segmentation on edge devices can support continuous infrastructure monitoring and maintenance and thereby help to preserve public safety. Furthermore, autonomous infrastructure monitoring by using Unmanned Aerial Vehicles (UAVs) can reduce inspection risks, as human operators no longer need to enter hazardous areas. Edge processing reduces the cost of inspection by eliminating the need for high resolution image storage for offline processing and mitigates the security risks and bandwidth requirements of streaming to cloud servers. Edge inference is difficult due to the limited memory and computational capabilities of edge devices, which can affect both accuracy and latency. Furthermore, battery-powered devices are subject to strict power and energy constraints. Together, these limitations impose restrictions on the model size and computational complexity that can be deployed close to the sensor. In recent years, Transformers have achieved state-of-the-art accuracy in a variety of applications, including semantic segmentation. However, Transformer-based models are typically large and computationally intensive, making efficient edge deployment difficult. To address this, we first apply knowledge distillation to enhance the performance of the base models. We then use PTQ to compress the models further. Additionally, we consider the deployment of these models across multiple edge platforms. To maximize energy efficiency, we design and implement a custom hardware architecture for the models on an FPGA. Our results show that Knowledge Distillation (KD) improves all tested U-Net variants. Among the evaluated platforms, the selected FPGA implementation achieves 398 FPS at 204.99 Frames/J while maintaining a mean IoU of 69.42%. In addition, our best model reaches 71.92% mean IoU, which is 8.82 percentage points (pps) higher than the previously reported result on the CrackVision12K dataset.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a case study on deploying U-Net variants for crack segmentation on edge devices. It applies knowledge distillation (KD) to improve accuracy, followed by post-training quantization (PTQ) for compression, and evaluates the models across edge platforms including a custom FPGA architecture. Key claims include that KD improves all tested U-Net variants, the best model achieves 71.92% mean IoU (8.82 pps above the prior reported result on CrackVision12K), and the FPGA implementation reaches 398 FPS at 204.99 Frames/J while maintaining 69.42% mIoU.
Significance. If the empirical gains from KD and the hardware performance metrics hold under controlled conditions, the work offers a practical demonstration of energy-efficient edge AI for infrastructure monitoring via UAVs or similar, quantifying trade-offs in accuracy, latency, and energy across CPU, GPU, and FPGA platforms. The hardware implementation and measured FPS/Frames-per-Joule values provide concrete engineering insights.
major comments (2)
- [Abstract and results] Abstract and results section: The central claim that the best model reaches 71.92% mean IoU, '8.82 percentage points (pps) higher than the previously reported result on the CrackVision12K dataset,' is load-bearing for attributing gains to KD. However, no evidence is provided that the prior baseline was reproduced under identical data splits, preprocessing, augmentation, or optimization protocols; without this controlled comparison, the delta cannot be confidently ascribed to the distillation step rather than setup differences.
- [Experiments / Results] Experimental evaluation: Full details on data splits, error bars, statistical significance of the mIoU improvements across U-Net variants, and exact baseline re-implementations are absent. These omissions undermine reproducibility of the reported FPS, Frames/J, and mIoU numbers and the claim that 'KD improves all tested U-Net variants.'
minor comments (2)
- [Method] Clarify the exact U-Net variants tested and the teacher model used for KD; notation for model sizes or layer counts could be added for precision.
- [Figures] Figure captions for hardware results should explicitly state the input resolution and batch size used for FPS and energy measurements to aid direct comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying our experimental approach and outlining planned revisions to improve clarity and reproducibility.
read point-by-point responses
-
Referee: [Abstract and results] Abstract and results section: The central claim that the best model reaches 71.92% mean IoU, '8.82 percentage points (pps) higher than the previously reported result on the CrackVision12K dataset,' is load-bearing for attributing gains to KD. However, no evidence is provided that the prior baseline was reproduced under identical data splits, preprocessing, augmentation, or optimization protocols; without this controlled comparison, the delta cannot be confidently ascribed to the distillation step rather than setup differences.
Authors: We agree that the 8.82 pps improvement is reported relative to the previously published result on CrackVision12K rather than a re-implementation of the baseline under identical conditions. Our primary evidence for the benefits of KD comes from controlled internal comparisons across the U-Net variants, where each model was trained with and without distillation using the same data, preprocessing, augmentation, and optimization settings. We will revise the abstract and results section to explicitly note that the cross-paper delta is provided for context only and to emphasize that KD gains are demonstrated through our matched-pair experiments on the variants. revision: yes
-
Referee: [Experiments / Results] Experimental evaluation: Full details on data splits, error bars, statistical significance of the mIoU improvements across U-Net variants, and exact baseline re-implementations are absent. These omissions undermine reproducibility of the reported FPS, Frames/J, and mIoU numbers and the claim that 'KD improves all tested U-Net variants.'
Authors: We acknowledge that additional experimental details would strengthen reproducibility. We will expand the Experiments section to specify the exact train/validation/test split ratios used on CrackVision12K, the preprocessing and augmentation pipeline, and the precise training hyperparameters. For the KD claim, we will include a table reporting mIoU for each U-Net variant with and without distillation under identical conditions. The prior baseline was not re-implemented, as it originates from the dataset's original publication; we will add an explicit statement to this effect. Regarding error bars and statistical significance, all reported results are from single training runs due to the high computational cost of training multiple variants and performing hardware deployments; we will note this limitation and avoid any claims of statistical significance. revision: partial
Circularity Check
No circularity: all claims are direct empirical measurements
full rationale
The paper is an empirical case study reporting measured mIoU, FPS, and energy efficiency from KD, PTQ, and FPGA runs on U-Net variants. The 71.92% mIoU and 8.82 pps delta are comparisons to external prior results on CrackVision12K; no equations, derivations, or fitted parameters are defined inside the paper that reduce the reported outcomes to self-inputs by construction. No self-definitional loops, fitted-input predictions, or load-bearing self-citation chains appear in the provided text.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Knowledge distillation improves performance of smaller student models when trained from a larger teacher
Reference graph
Works this paper leans on
-
[1]
J. M. Goo, X. Milidonis, A. Artusi, J. Boehm, and C. Ciliberto, “Hybrid-segmentor: Hybrid approach for automated fine-grained crack segmentation in civil infrastructure,”Automation in Construction, vol. 170, p. 105960, 2025
work page 2025
-
[2]
Micro crack detection with dijkstra’s shortest path algorithm,
C. Gunkel, A. Stepper, A. C. M ¨uller, and C. H. M ¨uller, “Micro crack detection with dijkstra’s shortest path algorithm,”Machine Vision and Applications, vol. 23, no. 3, pp. 589–601, 2012
work page 2012
-
[3]
Cracktree: Automatic crack detection from pavement images,
Q. Zou, Y . Cao, Q. Li, Q. Mao, and S. Wang, “Cracktree: Automatic crack detection from pavement images,”Pattern Recognition Letters, vol. 33, no. 3, pp. 227–238, 2012
work page 2012
-
[4]
U-net: Convolutional networks for biomedical image segmentation,
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inInternational Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241
work page 2015
-
[5]
Sddnet: Real-time crack segmentation,
W. Choi and Y .-J. Cha, “Sddnet: Real-time crack segmentation,”IEEE Transactions on Industrial Electronics, vol. 67, no. 9, pp. 8016–8025, 2019
work page 2019
-
[6]
Crackformer network for pavement crack segmentation,
H. Liu, J. Yang, X. Miao, C. Mertz, and H. Kong, “Crackformer network for pavement crack segmentation,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 9, pp. 9240–9252, 2023
work page 2023
-
[7]
Crackw-net: A novel pavement crack image segmentation convolutional neural network,
C. Han, T. Ma, J. Huyan, X. Huang, and Y . Zhang, “Crackw-net: A novel pavement crack image segmentation convolutional neural network,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 22 135–22 144, 2021
work page 2021
-
[8]
Deepcrackat: An effective crack segmentation framework based on learning multi-scale crack features,
Q. Lin, W. Li, X. Zheng, H. Fan, and Z. Li, “Deepcrackat: An effective crack segmentation framework based on learning multi-scale crack features,”Engineering Applications of Artificial Intelligence, vol. 126, p. 106876, 2023
work page 2023
-
[9]
Automatic concrete crack segmentation model based on transformer,
W. Wang and C. Su, “Automatic concrete crack segmentation model based on transformer,”Automation in Construction, vol. 139, p. 104275, 2022
work page 2022
-
[10]
Automated pavement crack segmentation using u-net-based convolutional neural network,
S. L. Lau, E. K. Chong, X. Yang, and X. Wang, “Automated pavement crack segmentation using u-net-based convolutional neural network,” Ieee Access, vol. 8, pp. 114 892–114 899, 2020
work page 2020
-
[11]
S. Egodawela, A. Gostar, H. Buddika, A. Dammika, N. Harischandra, S. Navaratnam, and M. Mahmoodian, “Surface crack classification and segmentation using unmanned aerial vehicles: a deep learning approach for infrastructure inspection,”Sensors, vol. 24, no. 6, p. 1936, 2024
work page 1936
-
[12]
Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,”IEEE transactions on medical imaging, vol. 39, no. 6, pp. 1856–1867, 2019
work page 2019
-
[13]
Improved unet with attention for medical image segmentation,
A. Al Qurri and M. Almekkawy, “Improved unet with attention for medical image segmentation,”Sensors, vol. 23, no. 20, p. 8589, 2023
work page 2023
-
[14]
Unext: an efficient network for the semantic segmentation of high-resolution remote sensing images,
Z. Chang, M. Xu, Y . Wei, J. Lian, C. Zhang, and C. Li, “Unext: an efficient network for the semantic segmentation of high-resolution remote sensing images,”Sensors, vol. 24, no. 20, p. 6655, 2024
work page 2024
-
[15]
Scsegamba: Lightweight structure-aware vision mamba for crack segmentation in structures,
H. Liu, C. Jia, F. Shi, X. Cheng, and S. Chen, “Scsegamba: Lightweight structure-aware vision mamba for crack segmentation in structures,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
work page 2025
-
[16]
H. Zhang, L. Bai, Y . Yu, Y . Zhang, and D. Dias-Da-Costa, “Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models,”arXiv preprint arXiv:2404.06258, 2024
-
[17]
J. Chen, Y . Liu, and J.-a. Hou, “A lightweight deep learning network based on knowledge distillation for applications of efficient crack seg- mentation on embedded devices,”Structural Health Monitoring, vol. 22, no. 5, pp. 3027–3046, 2023
work page 2023
-
[18]
On-device crack segmentation for edge structural health monitoring,
Y . Zhang, Y . Xu, L. S. Martinez-Rau, Q. N. P. Vu, B. Oelmann, and S. Bader, “On-device crack segmentation for edge structural health monitoring,”arXiv preprint arXiv:2505.07915, 2025
-
[19]
Crack segmentation-guided measurement with lightweight distillation network on edge device,
J. Zhang, L. Ding, W. Wang, H. Wang, I. Brilakis, D. Davletshina, R. Heikkil ¨a, and X. Yang, “Crack segmentation-guided measurement with lightweight distillation network on edge device,”Computer-Aided Civil and Infrastructure Engineering, 2025
work page 2025
-
[20]
L. Falaschetti, M. Beccerica, G. Biagetti, P. Crippa, M. Alessandrini, and C. Turchetti, “A lightweight cnn-based vision system for concrete crack detection on a low-power embedded microcontroller platform,” Procedia Computer Science, vol. 207, pp. 3948–3956, 2022
work page 2022
-
[21]
An u-net semantic segmentation vision system on a low- power embedded microcontroller platform,
L. Falaschetti, S. Bruschi, M. Alessandrini, G. Biagetti, P. Crippa, and C. Turchetti, “An u-net semantic segmentation vision system on a low- power embedded microcontroller platform,”Procedia Computer Science, vol. 225, pp. 4473–4482, 2023
work page 2023
-
[22]
Real-time semantic segmentation on fpgas for autonomous vehicles with hls4ml,
N. Ghielmetti, V . Loncar, M. Pierini, M. Roed, S. Summers, T. Aar- restad, C. Petersson, H. Linander, J. Ngadiuba, K. Linet al., “Real-time semantic segmentation on fpgas for autonomous vehicles with hls4ml,” Machine Learning: Science and Technology, vol. 3, no. 4, p. 045011, 2022
work page 2022
-
[23]
Design and implementation of real-time semantic segmentation network based on fpga,
W. Jia, J. Cui, X. Zheng, and Q. Wu, “Design and implementation of real-time semantic segmentation network based on fpga,” inProceedings of the 2021 7th International Conference on Computing and Artificial Intelligence, 2021, pp. 321–325
work page 2021
-
[24]
Fpga- accelerated semantic segmentation for urban scenes,
T. Shen, Y . Zuo, H. Zheng, L. Zhang, C. Hu, and H. Liu, “Fpga- accelerated semantic segmentation for urban scenes,” in2024 2nd Inter- national Conference on Machine Vision, Image Processing & Imaging Technology (MVIPIT). IEEE, 2024, pp. 84–89
work page 2024
-
[25]
An fpga-based lightweight semantic segmentation neural network with optimized ghost module,
Y . Chen, J. Jiang, and Y . Ma, “An fpga-based lightweight semantic segmentation neural network with optimized ghost module,”IEEE Internet of Things Journal, vol. 11, no. 13, pp. 24 247–24 258, 2024
work page 2024
-
[26]
Accelerating and pruning cnns for semantic segmentation on fpga,
P. Mor `ı, M.-R. Vemparala, N. Fasfous, S. Mitra, S. Sarkar, A. Frick- enstein, L. Frickenstein, D. Helms, N. S. Nagaraja, W. Stecheleet al., “Accelerating and pruning cnns for semantic segmentation on fpga,” inProceedings of the 59th ACM/IEEE Design Automation Conference, 2022, pp. 145–150
work page 2022
-
[27]
Lightweight low-power u-net architecture for semantic segmentation,
C. Modiboyina, I. Chakrabarti, and S. Ghosh, “Lightweight low-power u-net architecture for semantic segmentation,”Circuits, Systems, and Signal Processing, vol. 44, pp. 2527–2561, 12 2024
work page 2024
-
[28]
Fpga implementation of 3-bit quantized cnn for semantic segmentation,
M. Miyama, “Fpga implementation of 3-bit quantized cnn for semantic segmentation,” inJournal of Physics: Conference Series, vol. 1729, no. 1. IOP Publishing, 2021, p. 012004
work page 2021
-
[29]
Optimizing fpga-based convolutional encoder-decoder architecture for semantic segmentation,
M. Yu, H. Huang, H. Liu, S. He, F. Qiao, L. Luo, F. Xie, X.-J. Liu, and H. Yang, “Optimizing fpga-based convolutional encoder-decoder architecture for semantic segmentation,” in2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER). IEEE, 2019, pp. 1436–1440
work page 2019
-
[30]
Fpga-based design for real-time crack detection based on particle filter,
T. Chisholm, R. Lins, and S. Givigi, “Fpga-based design for real-time crack detection based on particle filter,”IEEE Transactions on Industrial Informatics, vol. 16, no. 9, pp. 5703–5711, 2019
work page 2019
-
[31]
J. Posso, H. Kieffer, N. Menga, O. Hlimi, S. Tarris, H. Guerard, G. Bois, M. Couderc, and E. Jenn, “Real-time semantic segmentation of aerial images using an embedded u-net: A comparison of cpu, gpu, and fpga workflows,” 2025. [Online]. Available: https://arxiv.org/abs/2503.08700
-
[32]
Fpga- accelerated cnn reconstruction for low-power sparse-array ultrasound imaging,
R. M. Imenabadi, G. R. Thoreson, K. G. Brown, and D. Bhatia, “Fpga- accelerated cnn reconstruction for low-power sparse-array ultrasound imaging,”IEEE Transactions on Ultrasonics, Ferroelectrics, and Fre- quency Control, pp. 1–1, 2025
work page 2025
-
[33]
Optimization of dnn-based hsi segmentation fpga-based soc for ads: A practical approach,
J. Guti ´errez-Zaballa, K. Basterretxea, and J. Echanobe, “Optimization of dnn-based hsi segmentation fpga-based soc for ads: A practical approach,”ACM Transactions on Embedded Computing Systems, vol. 24, no. 5, p. 1–27, Sep. 2025. [Online]. Available: http://dx.doi.org/10.1145/3748722
-
[34]
Finn: A framework for fast, scalable binarized neural network inference,
Y . Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “Finn: A framework for fast, scalable binarized neural network inference,” inProceedings of the 2017 ACM/SIGDA interna- tional symposium on field-programmable gate arrays, 2017, pp. 65–74
work page 2017
-
[35]
Fast inference of deep neural networks in fpgas for particle physics,
J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadi- uba, M. Pierini, R. Rivera, N. Tranet al., “Fast inference of deep neural networks in fpgas for particle physics,”Journal of instrumentation, vol. 13, no. 07, pp. P07 027–P07 027, 2018
work page 2018
-
[36]
Distilling the Knowledge in a Neural Network
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[37]
Segformer: Simple and efficient design for semantic segmentation with transformers,
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,”Advances in neural information processing systems, vol. 34, pp. 12 077–12 090, 2021
work page 2021
-
[38]
A novel guidance framework for nasal rapid antigen tests with improved swab keypoint detection,
M. Tsch ¨ope, D. Schneider, S. Suh, and P. Lukowicz, “A novel guidance framework for nasal rapid antigen tests with improved swab keypoint detection,”Smart Health, vol. 35, p. 100534, 2025
work page 2025
-
[39]
Decoupled Weight Decay Regularization
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.