pith. sign in

arxiv: 2604.13933 · v2 · submitted 2026-04-15 · 📡 eess.SP

A Case Study on Energy-Efficient Edge AI Crack Segmentation

Pith reviewed 2026-05-10 12:36 UTC · model grok-4.3

classification 📡 eess.SP
keywords crack segmentationedge AIknowledge distillationU-NetFPGAenergy efficiencysemantic segmentationinfrastructure monitoring
0
0 comments X

The pith

Knowledge distillation and FPGA hardware raise crack segmentation accuracy to 71.92% mean IoU on edge devices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how knowledge distillation can lift the performance of U-Net models for identifying cracks in images captured by edge devices such as those on UAVs. This matters because edge processing cuts storage costs, bandwidth use, and security risks that come with sending raw high-resolution data to the cloud. The authors further compress the models with post-training quantization and map them to a custom FPGA design that delivers both speed and energy efficiency while preserving usable accuracy.

Core claim

Knowledge distillation improves every tested U-Net variant for crack segmentation. The strongest model reaches 71.92% mean IoU on the CrackVision12K dataset, an increase of 8.82 percentage points over the prior reported result. A selected FPGA implementation runs at 398 FPS while delivering 204.99 frames per joule and a mean IoU of 69.42%.

What carries the argument

Knowledge distillation applied to U-Net variants, followed by post-training quantization and a custom FPGA hardware architecture that accelerates inference under tight power limits.

If this is right

  • Infrastructure inspection can run continuously on battery-powered devices without constant cloud uploads.
  • UAVs can perform autonomous crack surveys with lower latency and reduced exposure of human operators to hazardous sites.
  • Local edge processing lowers both data-storage costs and transmission-related security exposure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same distillation-plus-quantization recipe may transfer to other defect-detection tasks on constrained hardware beyond road cracks.
  • Hybrid acceleration that pairs the FPGA design with a small GPU or CPU fallback could provide graceful degradation when power budgets tighten further.
  • Systematic stress-testing across multiple public crack datasets would clarify how much dataset-specific retraining the approach actually needs.

Load-bearing premise

The accuracy and efficiency gains from distillation, quantization, and the custom FPGA design will continue when the same pipeline is moved to new crack datasets, different lighting, or other edge platforms without major retuning.

What would settle it

Apply the distilled and quantized models to an independent crack dataset recorded under changed lighting or camera conditions and check whether mean IoU falls below the previously published baseline.

Figures

Figures reproduced from arXiv: 2604.13933 by Bo Zhou, Matthias Tschope, Mohamed Moursi, Norbert Wehn, Paul Lukowicz, Vladimir Rybalkin.

Figure 1
Figure 1. Figure 1: The sample hardware architecture where each layer is mapped to a separate hardware instance. The modules are connected via on-chip data streams. Power was measured using the onboard Texas Instruments IN219 current/power monitor. Idle power was measured after configuring the PL for a continuous period of 10 seconds. While the runtime power was averaged over 32 repetitions, each consisting of 32 images proce… view at source ↗
Figure 2
Figure 2. Figure 2: Pareto frontier showing the trade-off between accuracy [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

Crack segmentation on edge devices can support continuous infrastructure monitoring and maintenance and thereby help to preserve public safety. Furthermore, autonomous infrastructure monitoring by using Unmanned Aerial Vehicles (UAVs) can reduce inspection risks, as human operators no longer need to enter hazardous areas. Edge processing reduces the cost of inspection by eliminating the need for high resolution image storage for offline processing and mitigates the security risks and bandwidth requirements of streaming to cloud servers. Edge inference is difficult due to the limited memory and computational capabilities of edge devices, which can affect both accuracy and latency. Furthermore, battery-powered devices are subject to strict power and energy constraints. Together, these limitations impose restrictions on the model size and computational complexity that can be deployed close to the sensor. In recent years, Transformers have achieved state-of-the-art accuracy in a variety of applications, including semantic segmentation. However, Transformer-based models are typically large and computationally intensive, making efficient edge deployment difficult. To address this, we first apply knowledge distillation to enhance the performance of the base models. We then use PTQ to compress the models further. Additionally, we consider the deployment of these models across multiple edge platforms. To maximize energy efficiency, we design and implement a custom hardware architecture for the models on an FPGA. Our results show that Knowledge Distillation (KD) improves all tested U-Net variants. Among the evaluated platforms, the selected FPGA implementation achieves 398 FPS at 204.99 Frames/J while maintaining a mean IoU of 69.42%. In addition, our best model reaches 71.92% mean IoU, which is 8.82 percentage points (pps) higher than the previously reported result on the CrackVision12K dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a case study on deploying U-Net variants for crack segmentation on edge devices. It applies knowledge distillation (KD) to improve accuracy, followed by post-training quantization (PTQ) for compression, and evaluates the models across edge platforms including a custom FPGA architecture. Key claims include that KD improves all tested U-Net variants, the best model achieves 71.92% mean IoU (8.82 pps above the prior reported result on CrackVision12K), and the FPGA implementation reaches 398 FPS at 204.99 Frames/J while maintaining 69.42% mIoU.

Significance. If the empirical gains from KD and the hardware performance metrics hold under controlled conditions, the work offers a practical demonstration of energy-efficient edge AI for infrastructure monitoring via UAVs or similar, quantifying trade-offs in accuracy, latency, and energy across CPU, GPU, and FPGA platforms. The hardware implementation and measured FPS/Frames-per-Joule values provide concrete engineering insights.

major comments (2)
  1. [Abstract and results] Abstract and results section: The central claim that the best model reaches 71.92% mean IoU, '8.82 percentage points (pps) higher than the previously reported result on the CrackVision12K dataset,' is load-bearing for attributing gains to KD. However, no evidence is provided that the prior baseline was reproduced under identical data splits, preprocessing, augmentation, or optimization protocols; without this controlled comparison, the delta cannot be confidently ascribed to the distillation step rather than setup differences.
  2. [Experiments / Results] Experimental evaluation: Full details on data splits, error bars, statistical significance of the mIoU improvements across U-Net variants, and exact baseline re-implementations are absent. These omissions undermine reproducibility of the reported FPS, Frames/J, and mIoU numbers and the claim that 'KD improves all tested U-Net variants.'
minor comments (2)
  1. [Method] Clarify the exact U-Net variants tested and the teacher model used for KD; notation for model sizes or layer counts could be added for precision.
  2. [Figures] Figure captions for hardware results should explicitly state the input resolution and batch size used for FPS and energy measurements to aid direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying our experimental approach and outlining planned revisions to improve clarity and reproducibility.

read point-by-point responses
  1. Referee: [Abstract and results] Abstract and results section: The central claim that the best model reaches 71.92% mean IoU, '8.82 percentage points (pps) higher than the previously reported result on the CrackVision12K dataset,' is load-bearing for attributing gains to KD. However, no evidence is provided that the prior baseline was reproduced under identical data splits, preprocessing, augmentation, or optimization protocols; without this controlled comparison, the delta cannot be confidently ascribed to the distillation step rather than setup differences.

    Authors: We agree that the 8.82 pps improvement is reported relative to the previously published result on CrackVision12K rather than a re-implementation of the baseline under identical conditions. Our primary evidence for the benefits of KD comes from controlled internal comparisons across the U-Net variants, where each model was trained with and without distillation using the same data, preprocessing, augmentation, and optimization settings. We will revise the abstract and results section to explicitly note that the cross-paper delta is provided for context only and to emphasize that KD gains are demonstrated through our matched-pair experiments on the variants. revision: yes

  2. Referee: [Experiments / Results] Experimental evaluation: Full details on data splits, error bars, statistical significance of the mIoU improvements across U-Net variants, and exact baseline re-implementations are absent. These omissions undermine reproducibility of the reported FPS, Frames/J, and mIoU numbers and the claim that 'KD improves all tested U-Net variants.'

    Authors: We acknowledge that additional experimental details would strengthen reproducibility. We will expand the Experiments section to specify the exact train/validation/test split ratios used on CrackVision12K, the preprocessing and augmentation pipeline, and the precise training hyperparameters. For the KD claim, we will include a table reporting mIoU for each U-Net variant with and without distillation under identical conditions. The prior baseline was not re-implemented, as it originates from the dataset's original publication; we will add an explicit statement to this effect. Regarding error bars and statistical significance, all reported results are from single training runs due to the high computational cost of training multiple variants and performing hardware deployments; we will note this limitation and avoid any claims of statistical significance. revision: partial

Circularity Check

0 steps flagged

No circularity: all claims are direct empirical measurements

full rationale

The paper is an empirical case study reporting measured mIoU, FPS, and energy efficiency from KD, PTQ, and FPGA runs on U-Net variants. The 71.92% mIoU and 8.82 pps delta are comparisons to external prior results on CrackVision12K; no equations, derivations, or fitted parameters are defined inside the paper that reduce the reported outcomes to self-inputs by construction. No self-definitional loops, fitted-input predictions, or load-bearing self-citation chains appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the domain assumption that knowledge distillation and post-training quantization reliably improve or preserve accuracy while reducing model size and compute; no new free parameters, axioms, or invented entities are introduced beyond standard ML compression practices.

axioms (1)
  • domain assumption Knowledge distillation improves performance of smaller student models when trained from a larger teacher
    Invoked to justify applying KD to all tested U-Net variants before quantization and hardware mapping.

pith-pipeline@v0.9.0 · 5627 in / 1453 out tokens · 39958 ms · 2026-05-10T12:36:52.675497+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 2 internal anchors

  1. [1]

    Hybrid-segmentor: Hybrid approach for automated fine-grained crack segmentation in civil infrastructure,

    J. M. Goo, X. Milidonis, A. Artusi, J. Boehm, and C. Ciliberto, “Hybrid-segmentor: Hybrid approach for automated fine-grained crack segmentation in civil infrastructure,”Automation in Construction, vol. 170, p. 105960, 2025

  2. [2]

    Micro crack detection with dijkstra’s shortest path algorithm,

    C. Gunkel, A. Stepper, A. C. M ¨uller, and C. H. M ¨uller, “Micro crack detection with dijkstra’s shortest path algorithm,”Machine Vision and Applications, vol. 23, no. 3, pp. 589–601, 2012

  3. [3]

    Cracktree: Automatic crack detection from pavement images,

    Q. Zou, Y . Cao, Q. Li, Q. Mao, and S. Wang, “Cracktree: Automatic crack detection from pavement images,”Pattern Recognition Letters, vol. 33, no. 3, pp. 227–238, 2012

  4. [4]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inInternational Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241

  5. [5]

    Sddnet: Real-time crack segmentation,

    W. Choi and Y .-J. Cha, “Sddnet: Real-time crack segmentation,”IEEE Transactions on Industrial Electronics, vol. 67, no. 9, pp. 8016–8025, 2019

  6. [6]

    Crackformer network for pavement crack segmentation,

    H. Liu, J. Yang, X. Miao, C. Mertz, and H. Kong, “Crackformer network for pavement crack segmentation,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 9, pp. 9240–9252, 2023

  7. [7]

    Crackw-net: A novel pavement crack image segmentation convolutional neural network,

    C. Han, T. Ma, J. Huyan, X. Huang, and Y . Zhang, “Crackw-net: A novel pavement crack image segmentation convolutional neural network,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 22 135–22 144, 2021

  8. [8]

    Deepcrackat: An effective crack segmentation framework based on learning multi-scale crack features,

    Q. Lin, W. Li, X. Zheng, H. Fan, and Z. Li, “Deepcrackat: An effective crack segmentation framework based on learning multi-scale crack features,”Engineering Applications of Artificial Intelligence, vol. 126, p. 106876, 2023

  9. [9]

    Automatic concrete crack segmentation model based on transformer,

    W. Wang and C. Su, “Automatic concrete crack segmentation model based on transformer,”Automation in Construction, vol. 139, p. 104275, 2022

  10. [10]

    Automated pavement crack segmentation using u-net-based convolutional neural network,

    S. L. Lau, E. K. Chong, X. Yang, and X. Wang, “Automated pavement crack segmentation using u-net-based convolutional neural network,” Ieee Access, vol. 8, pp. 114 892–114 899, 2020

  11. [11]

    Surface crack classification and segmentation using unmanned aerial vehicles: a deep learning approach for infrastructure inspection,

    S. Egodawela, A. Gostar, H. Buddika, A. Dammika, N. Harischandra, S. Navaratnam, and M. Mahmoodian, “Surface crack classification and segmentation using unmanned aerial vehicles: a deep learning approach for infrastructure inspection,”Sensors, vol. 24, no. 6, p. 1936, 2024

  12. [12]

    Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,

    Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,”IEEE transactions on medical imaging, vol. 39, no. 6, pp. 1856–1867, 2019

  13. [13]

    Improved unet with attention for medical image segmentation,

    A. Al Qurri and M. Almekkawy, “Improved unet with attention for medical image segmentation,”Sensors, vol. 23, no. 20, p. 8589, 2023

  14. [14]

    Unext: an efficient network for the semantic segmentation of high-resolution remote sensing images,

    Z. Chang, M. Xu, Y . Wei, J. Lian, C. Zhang, and C. Li, “Unext: an efficient network for the semantic segmentation of high-resolution remote sensing images,”Sensors, vol. 24, no. 20, p. 6655, 2024

  15. [15]

    Scsegamba: Lightweight structure-aware vision mamba for crack segmentation in structures,

    H. Liu, C. Jia, F. Shi, X. Cheng, and S. Chen, “Scsegamba: Lightweight structure-aware vision mamba for crack segmentation in structures,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

  16. [16]

    Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models,

    H. Zhang, L. Bai, Y . Yu, Y . Zhang, and D. Dias-Da-Costa, “Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models,”arXiv preprint arXiv:2404.06258, 2024

  17. [17]

    A lightweight deep learning network based on knowledge distillation for applications of efficient crack seg- mentation on embedded devices,

    J. Chen, Y . Liu, and J.-a. Hou, “A lightweight deep learning network based on knowledge distillation for applications of efficient crack seg- mentation on embedded devices,”Structural Health Monitoring, vol. 22, no. 5, pp. 3027–3046, 2023

  18. [18]

    On-device crack segmentation for edge structural health monitoring,

    Y . Zhang, Y . Xu, L. S. Martinez-Rau, Q. N. P. Vu, B. Oelmann, and S. Bader, “On-device crack segmentation for edge structural health monitoring,”arXiv preprint arXiv:2505.07915, 2025

  19. [19]

    Crack segmentation-guided measurement with lightweight distillation network on edge device,

    J. Zhang, L. Ding, W. Wang, H. Wang, I. Brilakis, D. Davletshina, R. Heikkil ¨a, and X. Yang, “Crack segmentation-guided measurement with lightweight distillation network on edge device,”Computer-Aided Civil and Infrastructure Engineering, 2025

  20. [20]

    A lightweight cnn-based vision system for concrete crack detection on a low-power embedded microcontroller platform,

    L. Falaschetti, M. Beccerica, G. Biagetti, P. Crippa, M. Alessandrini, and C. Turchetti, “A lightweight cnn-based vision system for concrete crack detection on a low-power embedded microcontroller platform,” Procedia Computer Science, vol. 207, pp. 3948–3956, 2022

  21. [21]

    An u-net semantic segmentation vision system on a low- power embedded microcontroller platform,

    L. Falaschetti, S. Bruschi, M. Alessandrini, G. Biagetti, P. Crippa, and C. Turchetti, “An u-net semantic segmentation vision system on a low- power embedded microcontroller platform,”Procedia Computer Science, vol. 225, pp. 4473–4482, 2023

  22. [22]

    Real-time semantic segmentation on fpgas for autonomous vehicles with hls4ml,

    N. Ghielmetti, V . Loncar, M. Pierini, M. Roed, S. Summers, T. Aar- restad, C. Petersson, H. Linander, J. Ngadiuba, K. Linet al., “Real-time semantic segmentation on fpgas for autonomous vehicles with hls4ml,” Machine Learning: Science and Technology, vol. 3, no. 4, p. 045011, 2022

  23. [23]

    Design and implementation of real-time semantic segmentation network based on fpga,

    W. Jia, J. Cui, X. Zheng, and Q. Wu, “Design and implementation of real-time semantic segmentation network based on fpga,” inProceedings of the 2021 7th International Conference on Computing and Artificial Intelligence, 2021, pp. 321–325

  24. [24]

    Fpga- accelerated semantic segmentation for urban scenes,

    T. Shen, Y . Zuo, H. Zheng, L. Zhang, C. Hu, and H. Liu, “Fpga- accelerated semantic segmentation for urban scenes,” in2024 2nd Inter- national Conference on Machine Vision, Image Processing & Imaging Technology (MVIPIT). IEEE, 2024, pp. 84–89

  25. [25]

    An fpga-based lightweight semantic segmentation neural network with optimized ghost module,

    Y . Chen, J. Jiang, and Y . Ma, “An fpga-based lightweight semantic segmentation neural network with optimized ghost module,”IEEE Internet of Things Journal, vol. 11, no. 13, pp. 24 247–24 258, 2024

  26. [26]

    Accelerating and pruning cnns for semantic segmentation on fpga,

    P. Mor `ı, M.-R. Vemparala, N. Fasfous, S. Mitra, S. Sarkar, A. Frick- enstein, L. Frickenstein, D. Helms, N. S. Nagaraja, W. Stecheleet al., “Accelerating and pruning cnns for semantic segmentation on fpga,” inProceedings of the 59th ACM/IEEE Design Automation Conference, 2022, pp. 145–150

  27. [27]

    Lightweight low-power u-net architecture for semantic segmentation,

    C. Modiboyina, I. Chakrabarti, and S. Ghosh, “Lightweight low-power u-net architecture for semantic segmentation,”Circuits, Systems, and Signal Processing, vol. 44, pp. 2527–2561, 12 2024

  28. [28]

    Fpga implementation of 3-bit quantized cnn for semantic segmentation,

    M. Miyama, “Fpga implementation of 3-bit quantized cnn for semantic segmentation,” inJournal of Physics: Conference Series, vol. 1729, no. 1. IOP Publishing, 2021, p. 012004

  29. [29]

    Optimizing fpga-based convolutional encoder-decoder architecture for semantic segmentation,

    M. Yu, H. Huang, H. Liu, S. He, F. Qiao, L. Luo, F. Xie, X.-J. Liu, and H. Yang, “Optimizing fpga-based convolutional encoder-decoder architecture for semantic segmentation,” in2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER). IEEE, 2019, pp. 1436–1440

  30. [30]

    Fpga-based design for real-time crack detection based on particle filter,

    T. Chisholm, R. Lins, and S. Givigi, “Fpga-based design for real-time crack detection based on particle filter,”IEEE Transactions on Industrial Informatics, vol. 16, no. 9, pp. 5703–5711, 2019

  31. [31]

    Real-time semantic segmentation of aerial images using an embedded u-net: A comparison of cpu, gpu, and fpga workflows,

    J. Posso, H. Kieffer, N. Menga, O. Hlimi, S. Tarris, H. Guerard, G. Bois, M. Couderc, and E. Jenn, “Real-time semantic segmentation of aerial images using an embedded u-net: A comparison of cpu, gpu, and fpga workflows,” 2025. [Online]. Available: https://arxiv.org/abs/2503.08700

  32. [32]

    Fpga- accelerated cnn reconstruction for low-power sparse-array ultrasound imaging,

    R. M. Imenabadi, G. R. Thoreson, K. G. Brown, and D. Bhatia, “Fpga- accelerated cnn reconstruction for low-power sparse-array ultrasound imaging,”IEEE Transactions on Ultrasonics, Ferroelectrics, and Fre- quency Control, pp. 1–1, 2025

  33. [33]

    Optimization of dnn-based hsi segmentation fpga-based soc for ads: A practical approach,

    J. Guti ´errez-Zaballa, K. Basterretxea, and J. Echanobe, “Optimization of dnn-based hsi segmentation fpga-based soc for ads: A practical approach,”ACM Transactions on Embedded Computing Systems, vol. 24, no. 5, p. 1–27, Sep. 2025. [Online]. Available: http://dx.doi.org/10.1145/3748722

  34. [34]

    Finn: A framework for fast, scalable binarized neural network inference,

    Y . Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “Finn: A framework for fast, scalable binarized neural network inference,” inProceedings of the 2017 ACM/SIGDA interna- tional symposium on field-programmable gate arrays, 2017, pp. 65–74

  35. [35]

    Fast inference of deep neural networks in fpgas for particle physics,

    J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadi- uba, M. Pierini, R. Rivera, N. Tranet al., “Fast inference of deep neural networks in fpgas for particle physics,”Journal of instrumentation, vol. 13, no. 07, pp. P07 027–P07 027, 2018

  36. [36]

    Distilling the Knowledge in a Neural Network

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,”arXiv preprint arXiv:1503.02531, 2015

  37. [37]

    Segformer: Simple and efficient design for semantic segmentation with transformers,

    E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,”Advances in neural information processing systems, vol. 34, pp. 12 077–12 090, 2021

  38. [38]

    A novel guidance framework for nasal rapid antigen tests with improved swab keypoint detection,

    M. Tsch ¨ope, D. Schneider, S. Suh, and P. Lukowicz, “A novel guidance framework for nasal rapid antigen tests with improved swab keypoint detection,”Smart Health, vol. 35, p. 100534, 2025

  39. [39]

    Decoupled Weight Decay Regularization

    I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017