pith. sign in

arxiv: 1907.02248 · v1 · pith:OFOA3T4Pnew · submitted 2019-07-04 · 💻 cs.CV

FPCNet: Fast Pavement Crack Detection Network Based on Encoder-Decoder Architecture

Pith reviewed 2026-05-25 09:25 UTC · model grok-4.3

classification 💻 cs.CV
keywords pavement crack detectiondeep learningencoder-decoderdilated convolutionsqueeze-and-excitationpixel-level segmentationroad maintenancecomputer vision
0
0 comments X

The pith

FPCNet uses an encoder-decoder structure with multi-dilation and SE-upsampling modules to learn multi-context crack features for fast pixel-level pavement crack detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve the difficulty of designing hand-crafted features for pavement cracks that have complicated topologies, varying damage levels, and oil stains, while also incorporating surrounding context. It introduces FPCNet, which automatically learns crack characteristics through an encoder-decoder network rather than relying on manual feature design and separate classifiers. The Multi-Dilation module captures features at multiple context sizes using dilated convolutions with different rates to handle cracks of varying widths and shapes. The SE-Upsampling module then refines these features with squeeze-and-excitation operations before final detection. A reader would care because this end-to-end approach promises quicker and more reliable road maintenance decisions across diverse real-world imaging conditions.

Core claim

The paper claims that integrating the Multi-Dilation module, which synthesizes crack features of multiple context sizes via dilated convolution with multiple rates to describe cracks of different widths and topologies, with the SE-Upsampling module, which optimizes those features using Squeeze-and-Excitation learning, produces an FPCNet that continuously refines the features step-by-step to achieve fast pixel-level crack detection outperforming prior methods on the CFD and G45 datasets under varied crack types and shooting conditions.

What carries the argument

The Multi-Dilation module that synthesizes multi-context crack features via dilated convolutions at multiple rates, paired with the SE-Upsampling module that refines them through squeeze-and-excitation, inside an encoder-decoder network.

If this is right

  • Crack features are learned automatically with multiple contexts instead of being manually designed.
  • The network handles complicated topological structures and contextual information around cracks.
  • End-to-end pixel-level detection runs faster than previous state-of-the-art methods.
  • Performance gains appear across multiple public datasets with varied crack types and conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same encoder-decoder pattern with dilation and channel attention could apply to detecting linear defects in other materials such as concrete structures or manufactured surfaces.
  • Because the method emphasizes speed, it may support real-time processing on vehicle-mounted cameras during routine road surveys.
  • Combining the network output with GPS data could produce automated maintenance priority maps without additional manual annotation.

Load-bearing premise

The Multi-Dilation and SE-Upsampling modules will continue to synthesize and optimize useful crack features across unseen crack topologies, widths, and imaging conditions without requiring dataset-specific retraining.

What would settle it

Running FPCNet on a fresh dataset of pavement images with crack patterns or lighting conditions absent from the CFD and G45 sets, and finding that it no longer exceeds the accuracy or speed of conventional feature-based detectors, would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.02248 by Qi Chen, Wenjun Liu, Ying Li, Yuchun Huang.

Figure 1
Figure 1. Figure 1: Convolution kernels with different dilation rates. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: SE-Upsampling module. The MC features are first added to the MD features after transposed convolution. Next, global pooling is performed to [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Network architecture of FPCNet. The method uses 4 Convs (two [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results of comparison of proposed approach with Method [18] on [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Percentage of time consumption of each module in the process of [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Typical results of four types of cracks on G45 dataset using FPCNet (from left to right: transverse cracks, longitudinal cracks, block cracks, alligator [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Typical examples of crack images with low contrast, zebra crossings, [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
read the original abstract

Timely, accurate and automatic detection of pavement cracks is necessary for making cost-effective decisions concerning road maintenance. Conventional crack detection algorithms focus on the design of single or multiple crack features and classifiers. However, complicated topological structures, varying degrees of damage and oil stains make the design of crack features difficult. In addition, the contextual information around a crack is not investigated extensively in the design process. Accordingly, these design features have limited discriminative adaptability and cannot fuse effectively with the classifiers. To solve these problems, this paper proposes a deep learning network for pavement crack detection. Using the Encoder-Decoder structure, crack characteristics with multiple contexts are automatically learned, and end-to-end crack detection is achieved. Specifically, we first propose the Multi-Dilation (MD) module, which can synthesize the crack features of multiple context sizes via dilated convolution with multiple rates. The crack MD features obtained in this module can describe cracks of different widths and topologies. Next, we propose the SE-Upsampling (SEU) module, which uses the Squeeze-and-Excitation learning operation to optimize the MD features. Finally, the above two modules are integrated to develop the fast crack detection network, namely, FPCNet. This network continuously optimizes the MD features step-by-step to realize fast pixel-level crack detection. Experiments are conducted on challenging public CFD datasets and G45 crack datasets involving various crack types under different shooting conditions. The distinct performance and speed improvements over all the datasets demonstrate that the proposed method outperforms other state-of-the-art crack detection methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes FPCNet, an encoder-decoder network for pixel-level pavement crack detection. It introduces a Multi-Dilation (MD) module using dilated convolutions at multiple rates to capture crack features across varying widths and topologies, and an SE-Upsampling (SEU) module that applies Squeeze-and-Excitation to re-weight and optimize those features. The integrated network is evaluated on the CFD and G45 datasets, with the central claim being that FPCNet outperforms prior state-of-the-art methods in both detection performance and inference speed.

Significance. If the empirical results are reproducible with proper controls, the work could contribute a practical, relatively lightweight architecture for automated road inspection. The combination of multi-rate dilation and channel attention is a standard but well-motivated way to address multi-scale context in segmentation; explicit credit is due for targeting both accuracy and speed, which is relevant for deployment on resource-constrained hardware.

major comments (2)
  1. [§4] §4 (Experiments): The abstract and reported results claim clear outperformance on CFD and G45, yet no information is supplied on training protocols, random seeds, data splits, number of runs, or statistical significance testing. Without these, it is impossible to determine whether the reported gains are robust or could be explained by post-hoc hyper-parameter choices or favorable splits.
  2. [§4] §4, Table 2/3 (presumed results tables): No ablation study isolating the contribution of the MD module versus the SEU module is presented. Because the central claim attributes gains to these two modules, the absence of controlled ablations leaves the load-bearing architectural novelty unverified.
minor comments (2)
  1. [§3] Notation for dilation rates and channel dimensions in the MD and SEU modules should be defined explicitly in §3 before the equations are used.
  2. [Figure 5] Figure captions should state the exact input resolution and output stride used for the reported FPS numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve reproducibility and verification of the proposed modules.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments): The abstract and reported results claim clear outperformance on CFD and G45, yet no information is supplied on training protocols, random seeds, data splits, number of runs, or statistical significance testing. Without these, it is impossible to determine whether the reported gains are robust or could be explained by post-hoc hyper-parameter choices or favorable splits.

    Authors: We agree that the manuscript does not provide these experimental details, which limits assessment of result robustness. In the revised version, we will add a dedicated subsection in §4 describing the training protocols, data splits, random seeds, number of runs, and any statistical significance tests. revision: yes

  2. Referee: [§4] §4, Table 2/3 (presumed results tables): No ablation study isolating the contribution of the MD module versus the SEU module is presented. Because the central claim attributes gains to these two modules, the absence of controlled ablations leaves the load-bearing architectural novelty unverified.

    Authors: We recognize that the absence of an ablation study leaves the individual contributions of the MD and SEU modules unverified. We will add an ablation study to the revised manuscript that isolates the effect of each module. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical architecture for crack detection using an encoder-decoder network with proposed MD and SEU modules. The central claim is that FPCNet outperforms prior methods on CFD and G45 datasets in accuracy and speed, supported by experimental results on held-out test sets. No derivation chain, equations, or self-citations reduce any claimed result to fitted parameters or prior self-work by construction. The modules are described as learned feature synthesizers, and performance is evaluated externally rather than forced by internal definitions or renamings. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of the newly introduced MD and SEU modules; no explicit free parameters, axioms, or invented physical entities are stated in the abstract. The network weights themselves are learned from data and therefore not counted as free parameters in the ledger sense.

pith-pipeline@v0.9.0 · 5812 in / 1168 out tokens · 39684 ms · 2026-05-25T09:25:24.431453+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 2 internal anchors

  1. [1]

    Automatic crack detection and segmentation using a hybrid algorithm for road distress analysis,

    J. Tang and Y . Gu, “Automatic crack detection and segmentation using a hybrid algorithm for road distress analysis,” in Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on . IEEE, 2013, pp. 3026–3030

  2. [2]

    Novel approach to pavement image segmentation based on neighboring difference histogram method,

    Q. Li and X. Liu, “Novel approach to pavement image segmentation based on neighboring difference histogram method,” inImage and Signal Processing, 2008. CISP’08. Congress on, vol. 2. IEEE, 2008, pp. 792– 796

  3. [3]

    Automatic road crack segmentation using entropy and image dynamic thresholding,

    H. Oliveira and P. L. Correia, “Automatic road crack segmentation using entropy and image dynamic thresholding,” in Signal Processing Conference, 2009 17th European . IEEE, 2009, pp. 622–626

  4. [4]

    Automatic pavement distress detection system,

    H.-D. Cheng and M. Miyojim, “Automatic pavement distress detection system,” Inf. Sci., vol. 108, no. 1-4, pp. 219–240, 1998

  5. [5]

    Improvement of canny algorithm based on pavement edge detection,

    H. Zhao, G. Qin, and X. Wang, “Improvement of canny algorithm based on pavement edge detection,” in Image and Signal Processing (CISP), 2010 3rd International Congress on , vol. 2. IEEE, 2010, pp. 964–967

  6. [6]

    Developing a crack inspection robot for bridge maintenance,

    R. S. Lim, H. M. La, Z. Shan, and W. Sheng, “Developing a crack inspection robot for bridge maintenance,” in Robotics and Automation (ICRA), 2011 IEEE International Conference on . IEEE, 2011, pp. 6288–6293

  7. [7]

    A robotic crack inspection and mapping system for bridge deck maintenance,

    R. S. Lim, H. M. La, and W. Sheng, “A robotic crack inspection and mapping system for bridge deck maintenance,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 2, pp. 367–378, 2014

  8. [8]

    Automatic bridge crack detection–a texture analysis-based approach,

    S. Chanda, G. Bu, H. Guan, J. Jo, U. Pal, Y .-C. Loo, and M. Blu- menstein, “Automatic bridge crack detection–a texture analysis-based approach,” in IAPR Workshop on Artificial Neural Networks in Pattern Recognition. Springer, 2014, pp. 193–203

  9. [9]

    Enhanced automatic detection of road surface cracks by combining 2d/3d image processing techniques,

    R. Medina, J. Llamas, E. Zalama, and J. G ´omez-Garc´ıa-Bermejo, “Enhanced automatic detection of road surface cracks by combining 2d/3d image processing techniques,” in Image Processing (ICIP), 2014 IEEE International Conference on . IEEE, 2014, pp. 778–782

  10. [10]

    Automation of pave- ment surface crack detection using the continuous wavelet transform,

    P. Subirats, J. Dumoulin, V . Legeay, and D. Barba, “Automation of pave- ment surface crack detection using the continuous wavelet transform,” in Image Processing, 2006 IEEE International Conference on . IEEE, 2006, pp. 3037–3040

  11. [11]

    Wavelet-based pavement distress detection and evaluation,

    J. Zhou, P. S. Huang, and F.-P. Chiang, “Wavelet-based pavement distress detection and evaluation,” Opt. Eng., vol. 45, no. 2, p. 027007, 2006

  12. [12]

    Asphalt surfaced pavement cracks detection based on histograms of oriented gradients,

    R. Kapela, P. ´Sniatała, A. Turkot, A. Rybarczyk, A. Po ˙zarycki, P. Ry- dzewski, M. Wyczałek, and A. Błoch, “Asphalt surfaced pavement cracks detection based on histograms of oriented gradients,” in Mixed Design of Integrated Circuits & Systems (MIXDES), 2015 22nd International Conference. IEEE, 2015, pp. 579–584

  13. [13]

    Nb-cnn: deep learning-based crack detection using convolutional neural network and naive bayes data fusion,

    F.-C. Chen and M. R. Jahanshahi, “Nb-cnn: deep learning-based crack detection using convolutional neural network and naive bayes data fusion,” IEEE Trans. Ind. Electron., vol. 65, no. 5, pp. 4392–4400, 2018

  14. [14]

    Diagnosis of crack damage on structures based on image processing techniques and r-cnn using unmanned aerial vehicle (uav),

    J.-H. Lee, S.-S. Yoon, I.-H. Kim, and H.-J. Jung, “Diagnosis of crack damage on structures based on image processing techniques and r-cnn using unmanned aerial vehicle (uav),” in Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems 2018 , vol. 10598. International Society for Optics and Photonics, 2018, p. 1059811

  15. [15]

    Grid-based pavement crack analysis using deep learning,

    X. Wang and Z. Hu, “Grid-based pavement crack analysis using deep learning,” in Transportation Information and Safety (ICTIS), 2017 4th International Conference on . IEEE, 2017, pp. 917–924

  16. [16]

    Deep learning-based crack damage detection using convolutional neural networks,

    Y .-J. Cha, W. Choi, and O. B ¨uy¨uk¨ozt¨urk, “Deep learning-based crack damage detection using convolutional neural networks,” Comput.-Aided Civ. Infrastruct. Eng., vol. 32, no. 5, pp. 361–378, 2017

  17. [17]

    Deep learning-based concrete crack detection using hybrid images,

    Y .-K. An, K.-Y . Jang, B. Kim, and S. Cho, “Deep learning-based concrete crack detection using hybrid images,” in Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems 2018, vol. 10598. International Society for Optics and Photonics, 2018, p. 1059812

  18. [18]

    Automatic Pavement Crack Detection Based on Structured Prediction with the Convolutional Neural Network

    Z. Fan, Y . Wu, J. Lu, and W. Li, “Automatic pavement crack detection based on structured prediction with the convolutional neural network,” arXiv preprint arXiv:1802.02208 , 2018

  19. [19]

    Automatic pixel-level crack detection and measurement using fully convolutional network,

    X. Yang, H. Li, Y . Yu, X. Luo, T. Huang, and X. Yang, “Automatic pixel-level crack detection and measurement using fully convolutional network,” Comput.-Aided Civ. Infrastruct. Eng

  20. [20]

    Deep learning based image recognition for crack and leakage defects of metro shield tunnel,

    H.-w. Huang, Q.-t. Li, and D.-m. Zhang, “Deep learning based image recognition for crack and leakage defects of metro shield tunnel,” Tunnelling Underground Space Technol., vol. 77, pp. 166–176, 2018

  21. [21]

    Fully convolutional networks for semantic segmentation,

    J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2015, pp. 3431–3440

  22. [22]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241

  23. [23]

    A real-time algorithm for signal analysis with the help of the wavelet transform,

    M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian, “A real-time algorithm for signal analysis with the help of the wavelet transform,” in Wavelets. Springer, 1990, pp. 286–297

  24. [24]

    Squeeze-and-Excitation Networks

    J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” arXiv preprint arXiv:1709.01507, vol. 7, 2017

  25. [25]

    Automatic pavement crack detection using texture and shape descriptors,

    Y . Hu, C. Zhao, and H. Wang, “Automatic pavement crack detection using texture and shape descriptors,” IETE TECH REV , vol. 27, no. 5, p. 398, 2010

  26. [26]

    Automatic road defect detection by textural pattern recognition based on adaboost,

    A. Cord and S. Chambon, “Automatic road defect detection by textural pattern recognition based on adaboost,” Comput.-Aided Civ. Infrastruct. Eng., vol. 27, no. 4, pp. 244–259, 2012

  27. [27]

    Automated crack detection on concrete bridges,

    P. Prasanna, K. J. Dana, N. Gucunski, B. B. Basily, H. M. La, R. S. Lim, and H. Parvardeh, “Automated crack detection on concrete bridges,” IEEE Trans. Autom. Sci. Eng. , vol. 13, no. 2, pp. 591–599, 2016

  28. [28]

    Automatic road crack detection using random structured forests,

    Y . Shi, L. Cui, Z. Qi, F. Meng, and Z. Chen, “Automatic road crack detection using random structured forests,” IEEE Trans. Intell. Transp. Syst., vol. 17, no. 12, pp. 3434–3445, 2016

  29. [29]

    Gradient-based learning applied to document recognition,

    Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278– 2324, 1998

  30. [30]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural infor- mation processing systems , 2012, pp. 1097–1105

  31. [31]

    Going deeper with convolutions,

    C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” Proc. IEEE CVPR , pp. 1–9, 2015

  32. [32]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE CVPR , pp. 770–778, 2016

  33. [33]

    Multi-scale context aggregation by dilated convolutions,

    F. Yu and V . Koltun, “Multi-scale context aggregation by dilated convolutions,” International Conference on Learning Representations , 2016

  34. [34]

    Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

    L. Chen, G. Papandreou, I. Kokkinos, K. P. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, UNDER REVIEW. 11

  35. [35]

    Automatic differentiation in pytorch,

    A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” in NIPS-W, 2017

  36. [36]

    Automatic pavement crack detection by multi-scale image fusion,

    H. Li, D. Song, Y . Liu, and B. Li, “Automatic pavement crack detection by multi-scale image fusion,” IEEE Trans. Intell. Transp. Syst. , no. 99, pp. 1–12, 2018

  37. [37]

    Automatic pixel-level pavement crack detection using information of multi-scale neighborhoods,

    D. Ai, G. Jiang, L. S. Kei, and C. Li, “Automatic pixel-level pavement crack detection using information of multi-scale neighborhoods,” IEEE Access, vol. 6, pp. 24 452–24 463, 2018

  38. [38]

    Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,

    K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” Proc IEEE Int Conf Comput Vis , pp. 1026–1034, 2015

  39. [39]

    Some methods of speeding up the convergence of iteration methods,

    B. T. Polyak, “Some methods of speeding up the convergence of iteration methods,” Ussr Comput. Math. Math. Phys. , vol. 4, no. 5, pp. 1–17, 1964