pith. sign in

arxiv: 2501.09203 · v2 · pith:3PX5CQPDnew · submitted 2025-01-15 · 💻 cs.CV · cs.RO

3D Modeling and Automated Measurement of Concrete Cracks via Segment Anything Refinement and Visual Inertial LiDAR Fusion

Pith reviewed 2026-05-23 04:48 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords concrete crack detection3D reconstructioncrack measurementSegment Anything ModelLiDAR fusionvisual inertial SLAMpoint cloudstructural inspection
0
0 comments X

The pith

A fusion of SAM-refined image segmentation and visual-inertial LiDAR SLAM produces direct 3D measurements of concrete cracks on curved surfaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that starts with a DeepLabv3+ model refined by the Segment Anything Model to generate precise 2D crack masks that generalize to new scenarios. These masks are then combined with image and LiDAR data through a multi-frame multi-modal SLAM process to create dense, colorized 3D point clouds that retain crack information at real-world scale. Crack geometric attributes are measured automatically inside this 3D space instead of relying on 2D images. The approach targets the limitation that conventional methods cannot handle cracks on curved or complex three-dimensional structural elements.

Core claim

The central claim is that integrating SAM-refined segmentation masks with LiDAR point clouds via image- and LiDAR-SLAM produces dense point clouds in which crack geometric attributes can be measured automatically and directly at real-world scale, making the method suitable for structural components with curved and complex 3D geometries.

What carries the argument

The multi-frame and multi-modal fusion framework that combines SAM-refined segmentation masks with visual-inertial LiDAR SLAM to generate dense colorized point clouds preserving crack semantics.

If this is right

  • Crack geometric attributes become measurable automatically and directly inside dense 3D point cloud space.
  • The method removes the projection errors inherent in conventional 2D image-based crack measurements.
  • Measurements remain feasible on structural components that have curved and complex 3D geometries.
  • The same pipeline yields improved robustness and generalization across diverse concrete inspection scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Repeated scans of the same structure could support 3D tracking of crack growth over time.
  • The dense point clouds could serve as input for finite-element models that simulate structural response to crack patterns.
  • Robotic platforms equipped with similar sensors might use the method for autonomous inspection routes in tunnels or bridges.

Load-bearing premise

The multi-frame fusion of image segmentation masks with LiDAR point clouds produces sufficiently accurate and dense 3D reconstructions that preserve crack semantics at real-world scale without significant drift or misalignment.

What would settle it

A side-by-side comparison of crack width and length values extracted from the generated 3D point cloud against physical measurements on a curved concrete specimen would falsify the claim if the 3D values deviate beyond the reported accuracy tolerance.

read the original abstract

Visual-Spatial Systems has become increasingly essential in concrete crack inspection. However, existing methods often lacks adaptability to diverse scenarios, exhibits limited robustness in image-based approaches, and struggles with curved or complex geometries. To address these limitations, an innovative framework for two-dimensional (2D) crack detection, three-dimensional (3D) reconstruction, and 3D automatic crack measurement was proposed by integrating computer vision technologies and multi-modal Simultaneous localization and mapping (SLAM) in this study. Firstly, building on a base DeepLabv3+ segmentation model, and incorporating specific refinements utilizing foundation model Segment Anything Model (SAM), we developed a crack segmentation method with strong generalization across unfamiliar scenarios, enabling the generation of precise 2D crack masks. To enhance the accuracy and robustness of 3D reconstruction, Light Detection and Ranging (LiDAR) point clouds were utilized together with image data and segmentation masks. By leveraging both image- and LiDAR-SLAM, we developed a multi-frame and multi-modal fusion framework that produces dense, colorized point clouds, effectively capturing crack semantics at a 3D real-world scale. Furthermore, the crack geometric attributions were measured automatically and directly within 3D dense point cloud space, surpassing the limitations of conventional 2D image-based measurements. This advancement makes the method suitable for structural components with curved and complex 3D geometries. Experimental results across various concrete structures highlight the significant improvements and unique advantages of the proposed method, demonstrating its effectiveness, accuracy, and robustness in real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a framework for concrete crack inspection integrating a DeepLabv3+ model refined by the Segment Anything Model (SAM) for 2D crack segmentation, multi-modal LiDAR-image SLAM for dense 3D point cloud reconstruction that preserves crack semantics, and direct automatic measurement of crack geometric attributes within the 3D point cloud space. It claims superior generalization across scenarios, accuracy, robustness, and suitability for curved/complex 3D structural geometries over conventional 2D image-based methods, supported by experiments on various concrete structures.

Significance. If the fusion pipeline and measurements are quantitatively validated, the approach could meaningfully advance automated structural health monitoring by enabling scale-accurate 3D crack analysis on non-planar surfaces. The use of foundation-model refinement and multi-modal SLAM is a reasonable direction, but the current absence of supporting metrics prevents assessment of whether the claimed advantages are realized.

major comments (2)
  1. [Abstract] Abstract: The text asserts 'significant improvements' in generalization, accuracy, and robustness as well as 'effectiveness, accuracy, and robustness in real-world applications,' yet supplies no quantitative metrics, error bars, baseline comparisons, dataset details, or statistical tests to substantiate these claims.
  2. [Fusion framework and experimental results] Fusion and measurement sections: The central claim that multi-frame multi-modal fusion 'produces dense, colorized point clouds, effectively capturing crack semantics at a 3D real-world scale' without drift or misalignment is load-bearing for the 3D measurement advantage, but the manuscript reports neither registration residuals, ICP error statistics, nor ground-truth comparisons of 3D crack width/length against independent metrology.
minor comments (2)
  1. [Abstract] Abstract: 'existing methods often lacks adaptability' contains a subject-verb agreement error.
  2. [Abstract] Abstract: 'crack geometric attributions' appears to be a typographical error for 'attributes'.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We appreciate the referee's comments highlighting the need for stronger quantitative validation to support the claims in our manuscript. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The text asserts 'significant improvements' in generalization, accuracy, and robustness as well as 'effectiveness, accuracy, and robustness in real-world applications,' yet supplies no quantitative metrics, error bars, baseline comparisons, dataset details, or statistical tests to substantiate these claims.

    Authors: We agree that the abstract would be strengthened by including quantitative evidence. In the revised manuscript, we will update the abstract to reference specific metrics from the experiments (e.g., segmentation accuracy, 3D measurement errors, and baseline comparisons) along with dataset details to substantiate the stated improvements. revision: yes

  2. Referee: [Fusion framework and experimental results] Fusion and measurement sections: The central claim that multi-frame multi-modal fusion 'produces dense, colorized point clouds, effectively capturing crack semantics at a 3D real-world scale' without drift or misalignment is load-bearing for the 3D measurement advantage, but the manuscript reports neither registration residuals, ICP error statistics, nor ground-truth comparisons of 3D crack width/length against independent metrology.

    Authors: The current experiments demonstrate the multi-modal fusion through visual results on real structures. We will incorporate available registration residuals and ICP error statistics from the SLAM pipeline into the revised sections. Ground-truth comparisons against independent metrology were not collected in the original study. revision: partial

standing simulated objections not resolved
  • Ground-truth comparisons of 3D crack width/length against independent metrology

Circularity Check

0 steps flagged

No significant circularity; derivation is integration of external models

full rationale

The paper presents a pipeline integrating standard components (DeepLabv3+, SAM refinement, image/LiDAR SLAM fusion) for 2D segmentation and 3D crack measurement. No equations, fitted parameters, or self-citations are described in the provided text that reduce any claim to its own inputs by construction. Claims rest on the proposed multi-modal fusion producing usable 3D outputs, which is an engineering integration rather than a self-referential derivation. This matches the default expectation of no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review reveals no explicit free parameters, axioms, or invented entities; the method relies on pre-existing models (DeepLabv3+, SAM, SLAM) whose internal assumptions are not detailed here.

pith-pipeline@v0.9.0 · 5835 in / 1113 out tokens · 19828 ms · 2026-05-23T04:48:19.094143+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    & Bagheri, M.A

    Ahmadi, A., Khalesi, S. & Bagheri, M.A. (2018) Automatic road crack detectionandclassificationusingimageprocessingtechniques,machine learningandintegratedmodelsinurbanareas:Anovelimagebinarization technique.Journal of Industrial and Systems Engineering, 11, 85–97. Ahn, S.J., Yoo, J., Lee, B.G. & Lee, J.J. 3d surface reconstruction from scattereddatausingm...

  2. [2]

    Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 719–726. Balta, H., Velagic, J., Bosschaerts, W., Cubber, G.D. & Siciliano, B. (2018) Fast statistical outlier removal based method for large 3 point clouds of outdoor environments.IFAC-PapersOnLine, 51(22), 348–353. doi:https://doi.org/10.1016/j.ifacol.2018.11.566. Bréhéret, A. (2017)Pixel Annotation To...

  3. [3]

    9630–9640

    : IEEE, pp. 9630–9640. Cha, Y.J., Choi, W. & Büyüköztürk, O. (2017) Deep learning- based crack damage detection using convolutional neural networks. Computer-Aided Civil and Infrastructure Engineering, 32(5), 361–378. doi:10.1111/mice.12263. Chaiyasarn,K.,Khan,W.,Ali,L.,Sharma,M.,Brackenbury,D.&Dejong, M. Crackdetectioninmasonrystructuresusingconvolutiona...

  4. [4]

    118–125, iSSN: 2413-5844

    Taipei, Taiwan: International Association for Automation and Robotics in Construction (IAARC), pp. 118–125, iSSN: 2413-5844. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder- decoder with atrous separable convolution for semantic image segmen- tation. In: Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y. (Eds.) Computer Vision – ECCV 2018,

  5. [5]

    Cham: Springer International Publishing, pp. 833–851. Chu, H. & Chun, P.j. (2024) Fine-grained crack segmentation for high-resolution images via a multiscale cascaded network. Computer-Aided Civil and Infrastructure Engineering, 39(4), 575–594. doi:https://doi.org/10.1111/mice.13111. Chun, P.j. & Kikuta, T. (2024) Self-training with bayesian neural networ...

  6. [6]

    & Cho, S

    Kim, B. & Cho, S. (2019) Image-based concrete crack assessment using maskandregion-basedconvolutionalneuralnetwork. StructuralControl and Health Monitoring, 26(8), e2381. doi:10.1002/stc.2381. Kim,H.,Sim,S.H.&Spencer,B.F.(2022)Automatedconcretecrackeval- uation using stereo vision with two different focal lengths.Automation in Construction, 135, 104136. d...

  7. [7]

    Singapore: Springer Singapore, pp. 267–272. Pantoja-Rosero, B.G., Oner, D., Kozinski, M., Achanta, R., Fua, P., Perez- Cruz,F.etal.(2022)Topo-lossforcontinuity-preservingcrackdetection using deep learning.Construction and Building Materials, 344, 128264. doi:10.1016/j.conbuildmat.2022.128264. Pennec, X. (1998) Computing the mean of geometric features appl...

  8. [8]

    & Nielsen, J

    Shokri, P., Shahbazi, M. & Nielsen, J. (2022) Semantic segmentation and 3d reconstruction of concrete cracks.Remote Sensing, 14(22),

  9. [9]

    Silva, W.R.L.d

    doi:10.3390/rs14225793. Silva, W.R.L.d. & Lucena, D.S.d. (2018) Concrete cracks detection based on deep learning image classification.Proceedings, 2(8),

  10. [10]

    Stewart, A.D

    doi:10.3390/ICEM18-05387, number: 8 Publisher: Multidisciplinary Digital Publishing Institute. Stewart, A.D. & Newman, P. Laps - localisation using appearance of prior structure: 6-dof monocular camera localisation using prior pointclouds. In:2012 IEEE International Conference on Robotics and Automation, 2012, pp. 2625–2632. Sun, L., Shang, Z., Xia, Y., B...

  11. [11]

    & Zhu, Y.J

    Zhang, L., Yang, F., Daniel Zhang, Y. & Zhu, Y.J. Road crack detection using deep convolutional neural network. In:2016 IEEE International Conference on Image Processing (ICIP), Sep. 2016, pp. 3708–3712, iSSN: 2381-8549. Zhang,Z.,Shen,Z.,Liu,J.,Shu,J.&Zhang,H.(2023)Abinocularvision- based crack detection and measurement method incorporating semantic segme...

  12. [12]

    22 Deng et al

    doi:10.3390/s24010003. 22 Deng et al. Zhao, S., Kang, F. & Li, J. (2024) Intelligent segmentation method for blurred cracks and 3d mapping of width nephograms in concrete dams using uav photogrammetry.Automation in Construction, 157, 105145. doi:10.1016/j.autcon.2023.105145. Zhou, Z., Zhang, J. & Gong, C. (2022) Automatic detection method of tunnel lining...