pith. machine review for the scientific record. sign in

arxiv: 2604.14711 · v1 · submitted 2026-04-16 · 💻 cs.CV

Recognition: unknown

MS-SSE-Net: A Multi-Scale Spatial Squeeze-and-Excitation Network for Structural Damage Detection in Civil and Geotechnical Engineering

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:43 UTC · model grok-4.3

classification 💻 cs.CV
keywords structural damage detectiondeep learningmulti-scale featuresattention mechanismsDenseNetcivil infrastructureimage classification
0
0 comments X

The pith

MS-SSE-Net reaches 99.26 percent accuracy on structural damage classification by adding multi-scale feature extraction and dual attention to a DenseNet201 backbone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MS-SSE-Net to classify images of structural damage more accurately than standard models. It starts from DenseNet201 and adds parallel depthwise convolutions that pull both fine local details and wider context, then applies channel attention to boost useful feature maps and spatial attention to highlight damage locations while ignoring background noise. These refined features feed into global average pooling and a classifier. On the StructDamage dataset the network records 99.31 percent precision, 99.25 percent recall, 99.27 percent F1-score and 99.26 percent accuracy, beating the unmodified DenseNet201 by roughly 0.7 points across all metrics. The gains matter because civil infrastructure inspection relies on reliable image-based detection to catch cracks, spalling and other defects before they become safety issues.

Core claim

The central claim is that a DenseNet201 backbone augmented with parallel depthwise convolutions for multi-scale features, followed by squeeze-and-excitation channel attention and spatial attention, produces measurably better classification of multiple structural damage categories than the baseline network or other compared models when tested on the StructDamage dataset.

What carries the argument

MS-SSE-Net, a network that runs parallel depthwise convolutions at different scales to capture local and contextual information, then uses channel-wise squeeze-and-excitation attention plus spatial attention to emphasize informative regions before final classification.

If this is right

  • The same multi-scale and attention blocks can be swapped into other DenseNet-based pipelines that classify defects in bridges, buildings or tunnels.
  • Higher precision and recall reduce false positives that would otherwise trigger unnecessary on-site inspections.
  • The architecture remains fully convolutional after the attention stages, so it can be applied to images of arbitrary size without retraining.
  • Because the attention maps highlight damage locations, the model supplies visual explanations that inspectors can check directly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same attention pattern could be tested on video frames of moving structures to detect progressive damage over time.
  • If the multi-scale convolutions are kept lightweight, the model might run on edge devices mounted on drones for automated bridge surveys.
  • Combining this network with segmentation heads would turn the classification scores into pixel-level damage maps.

Load-bearing premise

The images and train-test split in the StructDamage dataset capture enough real-world variation in damage appearance and surroundings that the measured accuracy improvement will hold on new photographs taken under different conditions.

What would settle it

Evaluating the trained model on a fresh collection of structural images captured with different cameras, lighting, or weather, and finding that accuracy drops below 97 percent, would indicate the performance gain does not generalize.

read the original abstract

Structural damage detection is essential for maintaining the safety and reliability of civil infrastructure. However, accurately identifying different types of structural damage from images remains challenging due to variations in damage patterns and environmental conditions. To address these challenges, this paper proposes MS-SSE-Net, a novel deep learning (DL) framework for structural damage classification. The proposed model is built upon the DenseNet201 backbone and integrates novel multi-scale feature extraction with channel and spatial attention mechanisms (MS-SSE-Net). Specifically, parallel depthwise convolutions capture both local and contextual features, while squeeze-and-excitation style channel attention and spatial attention emphasize informative regions and suppress irrelevant noise. The refined features are then processed through global average pooling and a fully connected classification layer to generate the final predictions. Experiments are conducted on the StructDamage dataset containing multiple structural damage categories. The proposed MS-SSE-Net demonstrates superior performance compared with the baseline DenseNet201 and other comparative approaches. Specifically, the proposed method achieves 99.31% precision, 99.25% recall, 99.27% F1-score, and 99.26% accuracy, outperforming the baseline model which achieved 98.62% precision, 98.53% recall, 98.58% F1-score, and 98.53% accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes MS-SSE-Net, a deep learning framework extending the DenseNet201 backbone with parallel depthwise convolutions for multi-scale feature extraction and squeeze-and-excitation style channel and spatial attention mechanisms. The model is evaluated on the StructDamage dataset for classifying structural damage types, reporting 99.31% precision, 99.25% recall, 99.27% F1-score, and 99.26% accuracy, which exceeds the DenseNet201 baseline (98.62% precision, 98.53% recall, 98.58% F1-score, 98.53% accuracy) and other comparative approaches.

Significance. If the performance gains prove robust, the work offers a practical incremental improvement for automated image-based structural damage detection in civil and geotechnical engineering. The combination of multi-scale processing with attention on a proven backbone is a reasonable engineering choice that could aid infrastructure inspection. However, the modest absolute gains (0.73 percentage points in accuracy) and absence of supporting validation details limit the immediate impact; stronger evidence of reproducibility would be needed to establish it as a reliable advance over existing DenseNet-based classifiers.

major comments (2)
  1. Abstract: The abstract reports specific performance metrics as point estimates but provides no details on training protocol, data splits, cross-validation, number of runs, or statistical testing. Without these, the 0.73 percentage point accuracy improvement over DenseNet201 cannot be distinguished from training stochasticity or split-specific effects, which commonly produce 0.5-1% variance in deep network evaluations on image classification tasks.
  2. Experimental results: No ablation studies, error bars, or multi-run statistics are mentioned to isolate the contribution of the multi-scale depthwise convolutions and attention modules. This leaves the central claim of superiority dependent on a single evaluation whose robustness cannot be assessed from the provided information.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of experimental rigor and reproducibility that we will address in the revision. We respond point-by-point below.

read point-by-point responses
  1. Referee: Abstract: The abstract reports specific performance metrics as point estimates but provides no details on training protocol, data splits, cross-validation, number of runs, or statistical testing. Without these, the 0.73 percentage point accuracy improvement over DenseNet201 cannot be distinguished from training stochasticity or split-specific effects, which commonly produce 0.5-1% variance in deep network evaluations on image classification tasks.

    Authors: We agree that the abstract lacks these details due to space constraints. In the revised manuscript we will expand the Experimental Setup section to fully document the training protocol (optimizer, learning rate, epochs, batch size), data split (train/validation/test ratios), and any cross-validation procedure used. We will also add results averaged over multiple independent runs with standard deviations and a brief statement on evaluation methodology to the abstract. revision: yes

  2. Referee: Experimental results: No ablation studies, error bars, or multi-run statistics are mentioned to isolate the contribution of the multi-scale depthwise convolutions and attention modules. This leaves the central claim of superiority dependent on a single evaluation whose robustness cannot be assessed from the provided information.

    Authors: We acknowledge that the current version does not include ablation studies or multi-run statistics. In the revision we will add ablation experiments that isolate the effects of the parallel depthwise convolutions and the channel/spatial attention modules by comparing variants with and without each component. We will also report mean performance and standard deviations across five independent training runs with error bars to quantify variability and strengthen the evidence for the observed gains. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical architecture proposal

full rationale

The paper describes an empirical CNN architecture (MS-SSE-Net) extending DenseNet201 with standard multi-scale depthwise convolutions and attention blocks, then reports point-estimate metrics on the StructDamage dataset. No equations, uniqueness theorems, fitted-parameter predictions, or self-citation chains are present that would reduce any claimed result to its own inputs by construction. The performance numbers are direct experimental outputs and remain externally falsifiable by re-training on the same split or new data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard machine-learning assumptions about labeled image data and the representativeness of one particular dataset; no new physical entities or first-principles derivations are introduced.

free parameters (1)
  • Model hyperparameters (learning rate, batch size, optimizer settings, attention scaling factors)
    Deep-learning training always involves multiple hyperparameters whose values are chosen or tuned on the data; none are listed in the abstract.
axioms (1)
  • domain assumption The StructDamage dataset labels are accurate and the train/test distribution matches real deployment conditions.
    All reported accuracy figures presuppose that the evaluation set is a fair proxy for future images.

pith-pipeline@v0.9.0 · 5569 in / 1400 out tokens · 40134 ms · 2026-05-10T11:43:36.538013+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 8 canonical work pages · 3 internal anchors

  1. [1]

    Zhen-Liang, L., An, Z., Xin-Ru, R., Yun-Peng, W., Wei-Gang, Z., Hao, Z.: A crack detection and quantification method using matched filter and photograph reconstruction. Scientific Reports15(1), 25266 (2025) 27 T able 11: Acronym and its full form Acronym full form ML Machine Learning SHM Structural Health Monitoring MS-SSE Multi-Scale Spatial-Squeeze Exci...

  2. [2]

    Scientific Reports15(1), 4858 (2025)

    Lin, Y., Ahmadi, M., Alnowibet, K.A., Bukhari, F.A.: Concrete crack detec- tion using ridgelet neural network optimized by advanced human evolutionary optimization. Scientific Reports15(1), 4858 (2025)

  3. [3]

    Precision Engineering76, 314–327 (2022)

    Mir, B., Sasaki, T., Nakao, K., Nagae, K., Nakada, K., Mitani, M., Tsukada, T., Osada, N., Terabayashi, K., Jindai, M.: Machine learning-based evaluation of the damage caused by cracks on concrete structures. Precision Engineering76, 314–327 (2022)

  4. [4]

    Automation in Construction170, 105896 (2025)

    Kompanets, A., Duits, R., Pai, G., Leonetti, D., Snijder, H.B.: Loss function inversion for improved crack segmentation in steel bridges using a cnn framework. Automation in Construction170, 105896 (2025)

  5. [5]

    Memetic Computing18(1), 9 (2026)

    Ishfaque, M., Khan, S.U.R., Lou, Y.-L.: Towards efficient dam inspection: crack detection via chirplet transform feature and a pruned vgg16 architecture. Memetic Computing18(1), 9 (2026)

  6. [6]

    Construction and Building Materials370, 130709 (2023)

    Laxman, K., Tabassum, N., Ai, L., Cole, C., Ziehl, P.: Automated crack detection and crack depth prediction for reinforced concrete structures using deep learning. Construction and Building Materials370, 130709 (2023)

  7. [7]

    arXiv preprint arXiv:2508.10256 (2025)

    Zhang, X., Wang, H., Hsieh, Y.-A., Yang, Z., Yezzi, A., Tsai, Y.-C.: Deep learning for crack detection: A review of learning paradigms, generalizability, and datasets. arXiv preprint arXiv:2508.10256 (2025)

  8. [8]

    Frontiers in Artificial Intelligence8, 1655091 (2025) 28

    Anusha, N., Anbarasi, L.J.: Crack detection in structural images using a hybrid swin transformer and enhanced features representation block. Frontiers in Artificial Intelligence8, 1655091 (2025) 28

  9. [9]

    Automation in Construction147, 104712 (2023)

    Zhang, E., Shao, L., Wang, Y.: Unifying transformer and convolution for dam crack detection. Automation in Construction147, 104712 (2023)

  10. [10]

    Engineering Structures274, 115158 (2023)

    Tang, Y., Huang, Z., Chen, Z., Chen, M., Zhou, H., Zhang, H., Sun, J.: Novel visual crack width measurement based on backbone double-scale features for improved detection automation. Engineering Structures274, 115158 (2023)

  11. [11]

    Automation in Construction140, 104388 (2022)

    Chaiyasarn, K., Buatik, A., Mohamad, H., Zhou, M., Kongsilp, S., Poovarodom, N.: Integrated pixel-level cnn-fcn crack detection via photogrammetric 3d texture mapping of concrete structures. Automation in Construction140, 104388 (2022)

  12. [12]

    Automation in Construction168, 105787 (2024)

    Guo, P., Meng, X., Meng, W., Bao, Y.: Automatic assessment of concrete cracks in low-light, overexposed, and blurred images restored using a generative ai approach. Automation in Construction168, 105787 (2024)

  13. [13]

    Journal of Building Engineering61, 105246 (2022)

    Yu, Y., Samali, B., Rashidi, M., Mohammadi, M., Nguyen, T.N., Zhang, G.: Vision-based concrete crack detection using a hybrid framework considering noise effect. Journal of Building Engineering61, 105246 (2022)

  14. [14]

    Intelligence & Robotics5(1), 105– 118 (2025)

    Ruan, Y., Wang, D., Yuan, Y., Jiang, S., Yang, X.: Skpnet: snake kan perceive bridge cracks through semantic segmentation. Intelligence & Robotics5(1), 105– 118 (2025)

  15. [15]

    Scientific Reports15(1), 6553 (2025)

    Zhang, Z., He, Y., Hu, D., Jin, Q., Zhou, M., Liu, Z., Chen, H., Wang, H., Xiang, X.: Algorithm for pixel-level concrete pavement crack segmentation based on an improved u-net model. Scientific Reports15(1), 6553 (2025)

  16. [16]

    Construction Materials4(4), 655–675 (2024)

    Ashraf, A., Sophian, A., Bawono, A.A.: Crack detection, classification, and seg- mentation on road pavement material using multi-scale feature aggregation and transformer-based attention mechanisms. Construction Materials4(4), 655–675 (2024)

  17. [17]

    HighTech Innov J5(3), 690–702 (2024)

    Sorilla, J., Chu, T., Chua, A.: A uav based concrete crack detection and seg- mentation using 2-stage convolutional network with transfer learning. HighTech Innov J5(3), 690–702 (2024)

  18. [18]

    Journal of Information Technology in Construction29(2024)

    Matarneh, S., Elghaish, F., Edwards, D.J., Rahimian, F.P., Abdellatef, E., Ejo- hwomu, O.: Automatic crack classification on asphalt pavement surfaces using convolutional neural networks and transfer learning. Journal of Information Technology in Construction29(2024)

  19. [19]

    Computer- Aided Civil and Infrastructure Engineering39(11), 1616–1640 (2024)

    Yamaguchi, T., Mizutani, T.: Road crack detection interpreting background images by convolutional neural networks and a self-organizing map. Computer- Aided Civil and Infrastructure Engineering39(11), 1616–1640 (2024)

  20. [20]

    29 Journal of Performance of Constructed Facilities39(3), 04025007 (2025)

    Shen, Q., Xiao, B., Mi, H., Yu, J., Xiao, L.: Adaptive learning filters–embedded vision transformer for pixel-level segmentation of low-light concrete cracks. 29 Journal of Performance of Constructed Facilities39(3), 04025007 (2025)

  21. [21]

    Scientific Reports 15(1), 38843 (2025)

    Gao, X., Cao, C., Yi, X.: Using the improved yolov11 model to enhance computer vision applications for building crack detection algorithms. Scientific Reports 15(1), 38843 (2025)

  22. [22]

    Journal of Construction Engineering and Management151(3), 04024210 (2025)

    Fan, C.-L.: Evaluation model for crack detection with deep learning: Improved confusion matrix based on linear features. Journal of Construction Engineering and Management151(3), 04024210 (2025)

  23. [23]

    arXiv preprint arXiv:2603.10484 (2026)

    Ijaz, M., Khan, S.U.R., Rehman, A.U., Vollmer, S., Dengel, A., Asim, M.N.: Structdamage: A large scale unified crack and surface defect dataset for robust structural damage detection. arXiv preprint arXiv:2603.10484 (2026)

  24. [24]

    In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/ CVPR.2017.243

  25. [25]

    In: Interna- tional Conference on Machine Learning, pp

    Tan, M., Le, Q.: Efficientnetv2: Smaller models and faster training. In: Interna- tional Conference on Machine Learning, pp. 10096–10106 (2021). PMLR

  26. [26]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  27. [27]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520 (2018)

  28. [28]

    In: European Conference on Computer Vision, pp

    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645 (2016). Springer

  29. [29]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  30. [30]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

  31. [31]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  32. [32]

    In: Proceedings of the European Conference on Computer Vision 30 (ECCV), pp

    Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block atten- tion module. In: Proceedings of the European Conference on Computer Vision 30 (ECCV), pp. 3–19 (2018)

  33. [33]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Huang, G., Liu, Z., Maaten, L., Weinberger, K.Q.: Densely connected convolu- tional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

  34. [34]

    In: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Szegedy, C., Liu, W., Jia, Y.,et al.: Going deeper with convolutions. In: Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  35. [35]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    Szegedy, C., Vanhoucke, V., Ioffe, S.,et al.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

  36. [36]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recogni- tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  37. [37]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp

    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)

  38. [38]

    IEEJ Transactions on Electrical and Elec- tronic Engineering3(1), 128–135 (2008)

    Yamaguchi, T., Nakamura, S., Saegusa, R., Hashimoto, S.: Image-based crack detection for real concrete surfaces. IEEJ Transactions on Electrical and Elec- tronic Engineering3(1), 128–135 (2008)

  39. [39]

    Computer-Aided Civil and Infrastructure Engineering32(10), 805–819 (2017)

    Zhang, A., Wang, K.C., Li, B., Yang, E., Dai, X., Peng, Y., Fei, Y., Liu, Y., Li, J.Q., Chen, C.: Automated pixel-level pavement crack detection on 3d asphalt surfaces using a deep-learning network. Computer-Aided Civil and Infrastructure Engineering32(10), 805–819 (2017)

  40. [40]

    Measurement240, 115587 (2025)

    Lu, X., Li, Q., Li, J., Zhang, L.: Deep learning-based method for detection and feature quantification of microscopic cracks on the surface of concrete dams. Measurement240, 115587 (2025)

  41. [41]

    Nondestructive Testing and Evaluation39(1), 75–89 (2024)

    Tang, H., Feng, Y., Xu, S., Wang, D.: A cnn-based network with attention mech- anism for autonomous crack identification on building facade. Nondestructive Testing and Evaluation39(1), 75–89 (2024)

  42. [42]

    arXiv preprint arXiv:2602.07463 (2026)

    Ijaz, M., Khan, S.U.R., Rehman, A.U., Asif, T., Vollmer, S., Dengel, A., Asim, M.N.: Globalwastedata: A large-scale, integrated dataset for robust waste classification and environmental monitoring. arXiv preprint arXiv:2602.07463 (2026)

  43. [43]

    arXiv preprint arXiv:2510.20438 31 (2025)

    Khan, S.U.R., Asim, M.N., Vollmer, S., Dengel, A.: Dynamic weight adjust- ment for knowledge distillation: Leveraging vision transformer for high-accuracy lung cancer detection and real-time deployment. arXiv preprint arXiv:2510.20438 31 (2025)

  44. [44]

    arXiv preprint arXiv:2603.14727 (2026)

    Maqsood, H., Khan, S.U.R., Vollmer, S., Dengel, A., Asim, M.N.: Automated diabetic screening via anterior segment ocular imaging: A deep learning and explainable ai approach. arXiv preprint arXiv:2603.14727 (2026)

  45. [45]

    Journal of Bionic Engineering, 1–23 (2026) 32

    Hekmat, A., Bilal, O., Zhang, Z., Ur Rehman Khan, S., Asif, S.: Fre-net: A fuzzy richards functions-based ensemble network for brain tumor detection. Journal of Bionic Engineering, 1–23 (2026) 32