3D Modeling and Automated Measurement of Concrete Cracks via Segment Anything Refinement and Visual Inertial LiDAR Fusion

Chun Li; Jiapeng Yao; Pengru Deng; Su Wang; Varun Ojha; Xinrun Li; Xuhui He

arxiv: 2501.09203 · v2 · pith:3PX5CQPDnew · submitted 2025-01-15 · 💻 cs.CV · cs.RO

3D Modeling and Automated Measurement of Concrete Cracks via Segment Anything Refinement and Visual Inertial LiDAR Fusion

Pengru Deng , Jiapeng Yao , Chun Li , Su Wang , Xinrun Li , Varun Ojha , Xuhui He This is my paper

Pith reviewed 2026-05-23 04:48 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords concrete crack detection3D reconstructioncrack measurementSegment Anything ModelLiDAR fusionvisual inertial SLAMpoint cloudstructural inspection

0 comments

The pith

A fusion of SAM-refined image segmentation and visual-inertial LiDAR SLAM produces direct 3D measurements of concrete cracks on curved surfaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that starts with a DeepLabv3+ model refined by the Segment Anything Model to generate precise 2D crack masks that generalize to new scenarios. These masks are then combined with image and LiDAR data through a multi-frame multi-modal SLAM process to create dense, colorized 3D point clouds that retain crack information at real-world scale. Crack geometric attributes are measured automatically inside this 3D space instead of relying on 2D images. The approach targets the limitation that conventional methods cannot handle cracks on curved or complex three-dimensional structural elements.

Core claim

The central claim is that integrating SAM-refined segmentation masks with LiDAR point clouds via image- and LiDAR-SLAM produces dense point clouds in which crack geometric attributes can be measured automatically and directly at real-world scale, making the method suitable for structural components with curved and complex 3D geometries.

What carries the argument

The multi-frame and multi-modal fusion framework that combines SAM-refined segmentation masks with visual-inertial LiDAR SLAM to generate dense colorized point clouds preserving crack semantics.

If this is right

Crack geometric attributes become measurable automatically and directly inside dense 3D point cloud space.
The method removes the projection errors inherent in conventional 2D image-based crack measurements.
Measurements remain feasible on structural components that have curved and complex 3D geometries.
The same pipeline yields improved robustness and generalization across diverse concrete inspection scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Repeated scans of the same structure could support 3D tracking of crack growth over time.
The dense point clouds could serve as input for finite-element models that simulate structural response to crack patterns.
Robotic platforms equipped with similar sensors might use the method for autonomous inspection routes in tunnels or bridges.

Load-bearing premise

The multi-frame fusion of image segmentation masks with LiDAR point clouds produces sufficiently accurate and dense 3D reconstructions that preserve crack semantics at real-world scale without significant drift or misalignment.

What would settle it

A side-by-side comparison of crack width and length values extracted from the generated 3D point cloud against physical measurements on a curved concrete specimen would falsify the claim if the 3D values deviate beyond the reported accuracy tolerance.

read the original abstract

Visual-Spatial Systems has become increasingly essential in concrete crack inspection. However, existing methods often lacks adaptability to diverse scenarios, exhibits limited robustness in image-based approaches, and struggles with curved or complex geometries. To address these limitations, an innovative framework for two-dimensional (2D) crack detection, three-dimensional (3D) reconstruction, and 3D automatic crack measurement was proposed by integrating computer vision technologies and multi-modal Simultaneous localization and mapping (SLAM) in this study. Firstly, building on a base DeepLabv3+ segmentation model, and incorporating specific refinements utilizing foundation model Segment Anything Model (SAM), we developed a crack segmentation method with strong generalization across unfamiliar scenarios, enabling the generation of precise 2D crack masks. To enhance the accuracy and robustness of 3D reconstruction, Light Detection and Ranging (LiDAR) point clouds were utilized together with image data and segmentation masks. By leveraging both image- and LiDAR-SLAM, we developed a multi-frame and multi-modal fusion framework that produces dense, colorized point clouds, effectively capturing crack semantics at a 3D real-world scale. Furthermore, the crack geometric attributions were measured automatically and directly within 3D dense point cloud space, surpassing the limitations of conventional 2D image-based measurements. This advancement makes the method suitable for structural components with curved and complex 3D geometries. Experimental results across various concrete structures highlight the significant improvements and unique advantages of the proposed method, demonstrating its effectiveness, accuracy, and robustness in real-world applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a practical pipeline fusing SAM-refined segmentation with multi-modal LiDAR SLAM for 3D crack measurement on complex surfaces, but the abstract supplies no numbers to back its accuracy claims.

read the letter

The core idea is a workflow that starts with DeepLabv3+ plus SAM for 2D crack masks, then projects those masks into dense point clouds via image- and LiDAR-SLAM fusion, and finally measures crack geometry directly in 3D space. This targets a real gap: standard 2D methods break down on curved or irregular concrete elements, and the authors show how the multi-frame fusion can keep crack semantics in the point cloud at real-world scale. That end-to-end framing for infrastructure inspection is the part that feels fresh as an applied combination rather than a new algorithm.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a framework for concrete crack inspection integrating a DeepLabv3+ model refined by the Segment Anything Model (SAM) for 2D crack segmentation, multi-modal LiDAR-image SLAM for dense 3D point cloud reconstruction that preserves crack semantics, and direct automatic measurement of crack geometric attributes within the 3D point cloud space. It claims superior generalization across scenarios, accuracy, robustness, and suitability for curved/complex 3D structural geometries over conventional 2D image-based methods, supported by experiments on various concrete structures.

Significance. If the fusion pipeline and measurements are quantitatively validated, the approach could meaningfully advance automated structural health monitoring by enabling scale-accurate 3D crack analysis on non-planar surfaces. The use of foundation-model refinement and multi-modal SLAM is a reasonable direction, but the current absence of supporting metrics prevents assessment of whether the claimed advantages are realized.

major comments (2)

[Abstract] Abstract: The text asserts 'significant improvements' in generalization, accuracy, and robustness as well as 'effectiveness, accuracy, and robustness in real-world applications,' yet supplies no quantitative metrics, error bars, baseline comparisons, dataset details, or statistical tests to substantiate these claims.
[Fusion framework and experimental results] Fusion and measurement sections: The central claim that multi-frame multi-modal fusion 'produces dense, colorized point clouds, effectively capturing crack semantics at a 3D real-world scale' without drift or misalignment is load-bearing for the 3D measurement advantage, but the manuscript reports neither registration residuals, ICP error statistics, nor ground-truth comparisons of 3D crack width/length against independent metrology.

minor comments (2)

[Abstract] Abstract: 'existing methods often lacks adaptability' contains a subject-verb agreement error.
[Abstract] Abstract: 'crack geometric attributions' appears to be a typographical error for 'attributes'.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We appreciate the referee's comments highlighting the need for stronger quantitative validation to support the claims in our manuscript. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The text asserts 'significant improvements' in generalization, accuracy, and robustness as well as 'effectiveness, accuracy, and robustness in real-world applications,' yet supplies no quantitative metrics, error bars, baseline comparisons, dataset details, or statistical tests to substantiate these claims.

Authors: We agree that the abstract would be strengthened by including quantitative evidence. In the revised manuscript, we will update the abstract to reference specific metrics from the experiments (e.g., segmentation accuracy, 3D measurement errors, and baseline comparisons) along with dataset details to substantiate the stated improvements. revision: yes
Referee: [Fusion framework and experimental results] Fusion and measurement sections: The central claim that multi-frame multi-modal fusion 'produces dense, colorized point clouds, effectively capturing crack semantics at a 3D real-world scale' without drift or misalignment is load-bearing for the 3D measurement advantage, but the manuscript reports neither registration residuals, ICP error statistics, nor ground-truth comparisons of 3D crack width/length against independent metrology.

Authors: The current experiments demonstrate the multi-modal fusion through visual results on real structures. We will incorporate available registration residuals and ICP error statistics from the SLAM pipeline into the revised sections. Ground-truth comparisons against independent metrology were not collected in the original study. revision: partial

standing simulated objections not resolved

Ground-truth comparisons of 3D crack width/length against independent metrology

Circularity Check

0 steps flagged

No significant circularity; derivation is integration of external models

full rationale

The paper presents a pipeline integrating standard components (DeepLabv3+, SAM refinement, image/LiDAR SLAM fusion) for 2D segmentation and 3D crack measurement. No equations, fitted parameters, or self-citations are described in the provided text that reduce any claim to its own inputs by construction. Claims rest on the proposed multi-modal fusion producing usable 3D outputs, which is an engineering integration rather than a self-referential derivation. This matches the default expectation of no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review reveals no explicit free parameters, axioms, or invented entities; the method relies on pre-existing models (DeepLabv3+, SAM, SLAM) whose internal assumptions are not detailed here.

pith-pipeline@v0.9.0 · 5835 in / 1113 out tokens · 19828 ms · 2026-05-23T04:48:19.094143+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

& Bagheri, M.A

Ahmadi, A., Khalesi, S. & Bagheri, M.A. (2018) Automatic road crack detectionandclassificationusingimageprocessingtechniques,machine learningandintegratedmodelsinurbanareas:Anovelimagebinarization technique.Journal of Industrial and Systems Engineering, 11, 85–97. Ahn, S.J., Yoo, J., Lee, B.G. & Lee, J.J. 3d surface reconstruction from scattereddatausingm...

work page 2018
[2]

Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 719–726. Balta, H., Velagic, J., Bosschaerts, W., Cubber, G.D. & Siciliano, B. (2018) Fast statistical outlier removal based method for large 3 point clouds of outdoor environments.IFAC-PapersOnLine, 51(22), 348–353. doi:https://doi.org/10.1016/j.ifacol.2018.11.566. Bréhéret, A. (2017)Pixel Annotation To...

work page doi:10.1016/j.ifacol.2018.11.566 2018
[3]

9630–9640

: IEEE, pp. 9630–9640. Cha, Y.J., Choi, W. & Büyüköztürk, O. (2017) Deep learning- based crack damage detection using convolutional neural networks. Computer-Aided Civil and Infrastructure Engineering, 32(5), 361–378. doi:10.1111/mice.12263. Chaiyasarn,K.,Khan,W.,Ali,L.,Sharma,M.,Brackenbury,D.&Dejong, M. Crackdetectioninmasonrystructuresusingconvolutiona...

work page doi:10.1111/mice.12263 2017
[4]

118–125, iSSN: 2413-5844

Taipei, Taiwan: International Association for Automation and Robotics in Construction (IAARC), pp. 118–125, iSSN: 2413-5844. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder- decoder with atrous separable convolution for semantic image segmen- tation. In: Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y. (Eds.) Computer Vision – ECCV 2018,

work page 2018
[5]

Cham: Springer International Publishing, pp. 833–851. Chu, H. & Chun, P.j. (2024) Fine-grained crack segmentation for high-resolution images via a multiscale cascaded network. Computer-Aided Civil and Infrastructure Engineering, 39(4), 575–594. doi:https://doi.org/10.1111/mice.13111. Chun, P.j. & Kikuta, T. (2024) Self-training with bayesian neural networ...

work page doi:10.1111/mice.13111 2024
[6]

& Cho, S

Kim, B. & Cho, S. (2019) Image-based concrete crack assessment using maskandregion-basedconvolutionalneuralnetwork. StructuralControl and Health Monitoring, 26(8), e2381. doi:10.1002/stc.2381. Kim,H.,Sim,S.H.&Spencer,B.F.(2022)Automatedconcretecrackeval- uation using stereo vision with two different focal lengths.Automation in Construction, 135, 104136. d...

work page doi:10.1002/stc.2381 2019
[7]

Singapore: Springer Singapore, pp. 267–272. Pantoja-Rosero, B.G., Oner, D., Kozinski, M., Achanta, R., Fua, P., Perez- Cruz,F.etal.(2022)Topo-lossforcontinuity-preservingcrackdetection using deep learning.Construction and Building Materials, 344, 128264. doi:10.1016/j.conbuildmat.2022.128264. Pennec, X. (1998) Computing the mean of geometric features appl...

work page doi:10.1016/j.conbuildmat.2022.128264 2022
[8]

& Nielsen, J

Shokri, P., Shahbazi, M. & Nielsen, J. (2022) Semantic segmentation and 3d reconstruction of concrete cracks.Remote Sensing, 14(22),

work page 2022
[9]

Silva, W.R.L.d

doi:10.3390/rs14225793. Silva, W.R.L.d. & Lucena, D.S.d. (2018) Concrete cracks detection based on deep learning image classification.Proceedings, 2(8),

work page doi:10.3390/rs14225793 2018
[10]

Stewart, A.D

doi:10.3390/ICEM18-05387, number: 8 Publisher: Multidisciplinary Digital Publishing Institute. Stewart, A.D. & Newman, P. Laps - localisation using appearance of prior structure: 6-dof monocular camera localisation using prior pointclouds. In:2012 IEEE International Conference on Robotics and Automation, 2012, pp. 2625–2632. Sun, L., Shang, Z., Xia, Y., B...

work page doi:10.3390/icem18-05387 2012
[11]

& Zhu, Y.J

Zhang, L., Yang, F., Daniel Zhang, Y. & Zhu, Y.J. Road crack detection using deep convolutional neural network. In:2016 IEEE International Conference on Image Processing (ICIP), Sep. 2016, pp. 3708–3712, iSSN: 2381-8549. Zhang,Z.,Shen,Z.,Liu,J.,Shu,J.&Zhang,H.(2023)Abinocularvision- based crack detection and measurement method incorporating semantic segme...

work page 2016
[12]

22 Deng et al

doi:10.3390/s24010003. 22 Deng et al. Zhao, S., Kang, F. & Li, J. (2024) Intelligent segmentation method for blurred cracks and 3d mapping of width nephograms in concrete dams using uav photogrammetry.Automation in Construction, 157, 105145. doi:10.1016/j.autcon.2023.105145. Zhou, Z., Zhang, J. & Gong, C. (2022) Automatic detection method of tunnel lining...

work page doi:10.3390/s24010003 2024

[1] [1]

& Bagheri, M.A

Ahmadi, A., Khalesi, S. & Bagheri, M.A. (2018) Automatic road crack detectionandclassificationusingimageprocessingtechniques,machine learningandintegratedmodelsinurbanareas:Anovelimagebinarization technique.Journal of Industrial and Systems Engineering, 11, 85–97. Ahn, S.J., Yoo, J., Lee, B.G. & Lee, J.J. 3d surface reconstruction from scattereddatausingm...

work page 2018

[2] [2]

Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 719–726. Balta, H., Velagic, J., Bosschaerts, W., Cubber, G.D. & Siciliano, B. (2018) Fast statistical outlier removal based method for large 3 point clouds of outdoor environments.IFAC-PapersOnLine, 51(22), 348–353. doi:https://doi.org/10.1016/j.ifacol.2018.11.566. Bréhéret, A. (2017)Pixel Annotation To...

work page doi:10.1016/j.ifacol.2018.11.566 2018

[3] [3]

9630–9640

: IEEE, pp. 9630–9640. Cha, Y.J., Choi, W. & Büyüköztürk, O. (2017) Deep learning- based crack damage detection using convolutional neural networks. Computer-Aided Civil and Infrastructure Engineering, 32(5), 361–378. doi:10.1111/mice.12263. Chaiyasarn,K.,Khan,W.,Ali,L.,Sharma,M.,Brackenbury,D.&Dejong, M. Crackdetectioninmasonrystructuresusingconvolutiona...

work page doi:10.1111/mice.12263 2017

[4] [4]

118–125, iSSN: 2413-5844

Taipei, Taiwan: International Association for Automation and Robotics in Construction (IAARC), pp. 118–125, iSSN: 2413-5844. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder- decoder with atrous separable convolution for semantic image segmen- tation. In: Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y. (Eds.) Computer Vision – ECCV 2018,

work page 2018

[5] [5]

Cham: Springer International Publishing, pp. 833–851. Chu, H. & Chun, P.j. (2024) Fine-grained crack segmentation for high-resolution images via a multiscale cascaded network. Computer-Aided Civil and Infrastructure Engineering, 39(4), 575–594. doi:https://doi.org/10.1111/mice.13111. Chun, P.j. & Kikuta, T. (2024) Self-training with bayesian neural networ...

work page doi:10.1111/mice.13111 2024

[6] [6]

& Cho, S

Kim, B. & Cho, S. (2019) Image-based concrete crack assessment using maskandregion-basedconvolutionalneuralnetwork. StructuralControl and Health Monitoring, 26(8), e2381. doi:10.1002/stc.2381. Kim,H.,Sim,S.H.&Spencer,B.F.(2022)Automatedconcretecrackeval- uation using stereo vision with two different focal lengths.Automation in Construction, 135, 104136. d...

work page doi:10.1002/stc.2381 2019

[7] [7]

Singapore: Springer Singapore, pp. 267–272. Pantoja-Rosero, B.G., Oner, D., Kozinski, M., Achanta, R., Fua, P., Perez- Cruz,F.etal.(2022)Topo-lossforcontinuity-preservingcrackdetection using deep learning.Construction and Building Materials, 344, 128264. doi:10.1016/j.conbuildmat.2022.128264. Pennec, X. (1998) Computing the mean of geometric features appl...

work page doi:10.1016/j.conbuildmat.2022.128264 2022

[8] [8]

& Nielsen, J

Shokri, P., Shahbazi, M. & Nielsen, J. (2022) Semantic segmentation and 3d reconstruction of concrete cracks.Remote Sensing, 14(22),

work page 2022

[9] [9]

Silva, W.R.L.d

doi:10.3390/rs14225793. Silva, W.R.L.d. & Lucena, D.S.d. (2018) Concrete cracks detection based on deep learning image classification.Proceedings, 2(8),

work page doi:10.3390/rs14225793 2018

[10] [10]

Stewart, A.D

doi:10.3390/ICEM18-05387, number: 8 Publisher: Multidisciplinary Digital Publishing Institute. Stewart, A.D. & Newman, P. Laps - localisation using appearance of prior structure: 6-dof monocular camera localisation using prior pointclouds. In:2012 IEEE International Conference on Robotics and Automation, 2012, pp. 2625–2632. Sun, L., Shang, Z., Xia, Y., B...

work page doi:10.3390/icem18-05387 2012

[11] [11]

& Zhu, Y.J

Zhang, L., Yang, F., Daniel Zhang, Y. & Zhu, Y.J. Road crack detection using deep convolutional neural network. In:2016 IEEE International Conference on Image Processing (ICIP), Sep. 2016, pp. 3708–3712, iSSN: 2381-8549. Zhang,Z.,Shen,Z.,Liu,J.,Shu,J.&Zhang,H.(2023)Abinocularvision- based crack detection and measurement method incorporating semantic segme...

work page 2016

[12] [12]

22 Deng et al

doi:10.3390/s24010003. 22 Deng et al. Zhao, S., Kang, F. & Li, J. (2024) Intelligent segmentation method for blurred cracks and 3d mapping of width nephograms in concrete dams using uav photogrammetry.Automation in Construction, 157, 105145. doi:10.1016/j.autcon.2023.105145. Zhou, Z., Zhang, J. & Gong, C. (2022) Automatic detection method of tunnel lining...

work page doi:10.3390/s24010003 2024