pith. machine review for the scientific record. sign in

arxiv: 2512.06171 · v2 · submitted 2025-12-05 · 💻 cs.CV

Automated Annotation of Shearographic Measurements Enabling Weakly Supervised Defect Detection

Pith reviewed 2026-05-17 00:15 UTC · model grok-4.3

classification 💻 cs.CV
keywords shearographyautomated annotationweakly supervised learningdefect detectionGrounded DINOSAMYOLOnon-destructive testing
0
0 comments X

The pith

An automated pipeline combines Grounded DINO and SAM to generate annotations from shearographic measurements suitable for weakly supervised defect detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a practical way to label shearographic images without relying on slow, subjective manual work. It applies Grounded DINO to produce initial bounding boxes around likely defects, then uses SAM to refine those into precise masks before exporting ready-to-use YOLO labels. This matters for industrial use because shearography can spot subsurface flaws in critical parts yet its adoption stalls when datasets remain small. Quantitative checks confirm the generated boxes work for training detectors under weak supervision, while the masks support visual inspection. The overall result is a scalable route to larger, standardized datasets for more reliable defect finders.

Core claim

The central claim is that an automated pipeline can generate candidate defect bounding boxes with Grounded DINO, refine them into high-resolution masks with SAM, and export YOLO-format labels from shearographic interferometry data. Quantitative evaluation shows these boxes are suitable for weakly supervised learning while the masks enable qualitative visualization, thereby reducing manual annotation effort and supporting scalable dataset creation for industrial defect detection.

What carries the argument

The automated labeling pipeline that runs Grounded DINO for bounding-box proposals followed by SAM mask refinement on shearographic displacement-gradient images.

If this is right

  • Manual labeling effort for shearographic datasets drops sharply, allowing faster creation of training data.
  • YOLO-format outputs enable direct training of existing detectors without extra conversion steps.
  • High-resolution masks from SAM supply detailed visual references alongside the boxes used for supervision.
  • Standardized automated labels reduce subjectivity and improve repeatability across different inspection sites.
  • Larger datasets become feasible, which in turn supports more robust models for safety-critical components.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same two-stage proposal-and-refine pattern could be tested on other non-destructive testing modalities that produce gradient or phase images.
  • Adding a lightweight fine-tuning step on a small set of shearographic examples might raise label quality without reintroducing heavy manual work.
  • If the generated labels prove stable, the pipeline could be embedded in portable shearography hardware for on-site dataset growth during routine inspections.

Load-bearing premise

The outputs of Grounded DINO and SAM transfer to shearographic images without domain-specific adaptation or errors large enough to degrade later detector training.

What would settle it

Train a standard object detector on the pipeline's generated labels, then measure whether its defect-detection performance on held-out shearographic test images falls substantially below the performance obtained from an identical detector trained on human-annotated labels.

Figures

Figures reproduced from arXiv: 2512.06171 by Georg von Freymann, Jessica Plassmann, Michael Schuth, Nicolas Schuler.

Figure 1
Figure 1. Figure 1: Overview of the zero-shot annotation pipeline. Bounding boxes are generated [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ROC and PR curves for all trained YOLOv8 models using bounding boxes. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Bounding boxes and confidence scores predicted by Grounded DINO using the [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Validation results of the YOLO segmentation model trained on masks generated [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Shearography is an interferometric technique sensitive to surface displacement gradients, providing high sensitivity for detecting subsurface defects in safety-critical components. A key limitation to industrial adoption is the lack of high-quality annotated datasets, since manual labeling remains labor-intensive, subjective, and difficult to standardize. We present an automated labeling pipeline that generates candidate defect bounding boxes with Grounded DINO, refines them using SAM masks, and exports YOLO-format labels for downstream detector training. Quantitative evaluation shows the generated boxes are suitable for weakly supervised learning, while high-resolution masks provide qualitative visualization. This approach reduces manual effort and supports scalable dataset creation for robust industrial defect detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces an automated annotation pipeline for shearographic images to address the scarcity of labeled datasets for defect detection. It employs Grounded DINO to propose candidate bounding boxes around defects, refines these using SAM for high-resolution masks, and exports the results in YOLO format to enable weakly supervised training of downstream detectors. The central claim is that quantitative evaluation demonstrates the generated boxes are suitable for this purpose, while the masks offer qualitative visualization benefits.

Significance. If the quantitative suitability claim holds with appropriate metrics, the pipeline could meaningfully lower the barrier to creating large-scale shearography datasets for industrial non-destructive testing, supporting more robust automated defect detection in safety-critical components. The approach leverages existing zero-shot models without custom training, which is a practical strength for rapid deployment, though the lack of domain-specific validation currently limits assessed impact.

major comments (1)
  1. [Abstract] Abstract and evaluation section: The claim that 'quantitative evaluation shows the generated boxes are suitable for weakly supervised learning' is unsupported because no metrics (e.g., IoU, precision-recall, or mAP against manual shearography annotations), baselines, or details on how suitability for downstream YOLO-style detector performance was measured are reported. This is load-bearing for the central contribution, as shearographic images contain speckle and gradient features outside the natural-image pretraining distributions of Grounded DINO and SAM.
minor comments (1)
  1. [Methods] The manuscript would benefit from explicit citations to the original Grounded DINO and SAM papers in the methods description to clarify the off-the-shelf usage.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights the need to strengthen the quantitative support for our central claim. We agree that the current presentation of the evaluation is insufficient to fully substantiate the suitability of the generated annotations for weakly supervised learning, particularly given the domain characteristics of shearographic imagery. We will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract and evaluation section: The claim that 'quantitative evaluation shows the generated boxes are suitable for weakly supervised learning' is unsupported because no metrics (e.g., IoU, precision-recall, or mAP against manual shearography annotations), baselines, or details on how suitability for downstream YOLO-style detector performance was measured are reported. This is load-bearing for the central contribution, as shearographic images contain speckle and gradient features outside the natural-image pretraining distributions of Grounded DINO and SAM.

    Authors: We acknowledge the validity of this observation. The manuscript currently references quantitative evaluation in the abstract and evaluation section but does not report explicit metrics such as IoU or mAP computed against manual annotations on shearographic data, nor does it detail baselines or the precise protocol for assessing downstream YOLO detector performance. The existing quantitative results focus on the consistency of the automated pipeline outputs rather than direct validation against expert labels. In the revised manuscript we will add a dedicated evaluation subsection that includes: (1) IoU statistics between the generated bounding boxes and a set of manually annotated shearographic images, (2) precision-recall curves for the auto-generated labels treated as pseudo-ground truth, and (3) mAP comparison of a YOLO detector trained on the automated annotations versus the same detector trained on the manual annotations. We will also add a short discussion addressing the domain-shift concern, supported by qualitative failure-case analysis and any available proxy metrics from the zero-shot models. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline applies external pre-trained models without self-referential definitions or fitted predictions

full rationale

The paper presents an applied pipeline that invokes off-the-shelf Grounded DINO for bounding-box proposals and SAM for mask refinement, then exports labels for a downstream YOLO detector. No equations, parameters, or quantities are defined in terms of later outputs; the central claim that the generated boxes are suitable for weakly supervised learning rests on external model transfer rather than any reduction to the paper's own fitted values or self-citation chain. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the transfer performance of two pre-trained foundation models to shearographic data and on the assumption that automatically generated labels are sufficiently accurate for weakly supervised training; no new free parameters, mathematical axioms, or invented physical entities are introduced.

axioms (1)
  • domain assumption Grounded DINO and SAM outputs remain reliable when applied directly to shearographic fringe patterns without domain adaptation
    The pipeline description assumes successful zero-shot or few-shot transfer from general natural-image training to the specialized interferometric domain.

pith-pipeline@v0.9.0 · 5408 in / 1289 out tokens · 66712 ms · 2026-05-17T00:15:47.381600+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 5 internal anchors

  1. [1]

    Shearography non-destructive testing of thick GFRP laminates: Numerical and experimental study on defect detection with thermal loading,

    N. Tao, A. G. Anisimov, and R. M. Groves, “Shearography non-destructive testing of thick GFRP laminates: Numerical and experimental study on defect detection with thermal loading,” Compos. Struct.282(2022)

  2. [2]

    WeiterentwicklungderShearografiemiträumlichemPhasenschiebenalszerstörungsfreiesPrüfverfahren für die automatisierte Serienüberwachung,

    C.M.Petry,“WeiterentwicklungderShearografiemiträumlichemPhasenschiebenalszerstörungsfreiesPrüfverfahren für die automatisierte Serienüberwachung,” Ph.D. thesis, Universität des Saarlandes (2021)

  3. [3]

    A comprehensive survey on machine learning driven material defect detection,

    J. Bai, D. Wu, T. Shelley,et al., “A comprehensive survey on machine learning driven material defect detection,” arXiv preprint arXiv:2406.07880 (2024). Accepted in ACM Computing Surveys (2025)

  4. [4]

    YOLO-v1toYOLO-v8,theRiseofYOLOandItsComplementaryNaturetowardDigitalManufacturing and Industrial Defect Detection,

    M.Hussain,“YOLO-v1toYOLO-v8,theRiseofYOLOandItsComplementaryNaturetowardDigitalManufacturing and Industrial Defect Detection,” Mach. State-of-the-Art Digit. Manuf. Syst.11, 677 (2023)

  5. [5]

    You Only Look Once: Unified, Real-Time Object Detection,

    J. Redmon, J. Redmon, S. Divvala,et al., “You Only Look Once: Unified, Real-Time Object Detection,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),(2016), pp. 779–788

  6. [6]

    A Survey on Semi-, Self- and Unsupervised Learning for Image Classification,

    L. Schmarje, M. Santarossa, S.-M. Schröder, and R. Koch, “A Survey on Semi-, Self- and Unsupervised Learning for Image Classification,” IEEE Access9, 82146 –82168 (2021)

  7. [7]

    SuperPoint: Self-Supervised Interest Point Detection and Description

    D.DeTone,T.Malisiewicz,andA.Rabinovich,“Superpoint: Self-supervisedinterestpointdetectionanddescription,” arXiv preprint arXiv:1712.07629 (2018). First published December 2017, widely used for local feature detection and description

  8. [8]

    https:// doi.org/10.48550/arXiv.2501.09898

    B.Wen,M.Trepte,J.Aribido,etal.,“Foundationstereo: Zero-shotstereomatching,”arXivpreprintarXiv:2501.09898 (2025). Introduces a foundation model for zero-shot stereo depth estimation

  9. [9]

    Simulation Dataset Preparation and Hybrid Training for Deep Learning in Defect Detection Using Digital Shearography,

    W. Li, D. Wang, and S. Wu, “Simulation Dataset Preparation and Hybrid Training for Deep Learning in Defect Detection Using Digital Shearography,” Appl. Sci.12(2022)

  10. [10]

    A Survey on Unsupervised Anomaly Detection Algorithms for Industrial Images,

    Y. Cui, Z. Liu, and S. Lian, “A Survey on Unsupervised Anomaly Detection Algorithms for Industrial Images,” IEEE Access11, 55297–55315 (2023)

  11. [11]

    Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

    S. Liu, Z. Zeng, T. Ren,et al., “Grounding DINO: Marrying DINO with Grounded Pretraining for Open-Set Object Detection,” arXiv preprint arXiv:2303.05499 (2023)

  12. [12]

    Segment Anything

    A. Kirillov, E. Mintun, N. Ravi,et al., “Segment Anything,” arXiv preprint arXiv:2304.02643 (2023)

  13. [13]

    Digital speckle pattern shearing interferometry,

    S. Nakadate, T. Yatagai, and H. Saito, “Digital speckle pattern shearing interferometry,” Appl. Opt.19, 4241–4246 (1980)

  14. [14]

    Lockin-Interferometry: Principle and Applications in NDE,

    P. Menner, H. Gerhard, and G. Busse, “Lockin-Interferometry: Principle and Applications in NDE,” Strojniski vestnik - J. Mech. Eng.57, 183–191 (2011). Paper received: 22.08.2009; accepted: 04.03.2010

  15. [15]

    Schuth,Aufbau und Anwendung der Shearografie als praxisgerechtes, optisches Prüf-und Messverfahren zur Dehnungsanalyse, Qualitätssicherung und Bauteiloptimierung, vol

    M. Schuth,Aufbau und Anwendung der Shearografie als praxisgerechtes, optisches Prüf-und Messverfahren zur Dehnungsanalyse, Qualitätssicherung und Bauteiloptimierung, vol. 8 (VDI Verlag, 1995)

  16. [16]

    Michelson interferometer based spatial phase shift shearography,

    X. Xie, L. Yang, N. Xu, and X. Chen, “Michelson interferometer based spatial phase shift shearography,” Appl. Opt. 52, 4063–4071 (2013)

  17. [17]

    Modified Michelson Interferometer Based Dual Shearing Single Camera Digital Shearography,

    B. Zhang, W. Xu, J. Li,et al., “Modified Michelson Interferometer Based Dual Shearing Single Camera Digital Shearography,” Exp. Tech.44, 187–195 (2020)

  18. [18]

    Shearography and its applications – a chronological review,

    R. Sirohi, “Shearography and its applications – a chronological review,” Light. Adv. Manuf.3(2022)

  19. [19]

    Improved Mach–Zehnder interferometer-based shearography,

    C. Cai and L. He, “Improved Mach–Zehnder interferometer-based shearography,” Opt. Lasers Eng.50, 1699–1705 (2012)

  20. [20]

    Mach-Zehnder-Interferometer Aufbau mit Virtuellem Doppelspalt für Shearografie mit räumlichem Phasenschieben bei variablem Shear,

    C. M. Petry and M. Schuth, “Mach-Zehnder-Interferometer Aufbau mit Virtuellem Doppelspalt für Shearografie mit räumlichem Phasenschieben bei variablem Shear,” inSpecial Issue "ZfP today" as substitute for the cancelled DGZfP Annual Conference 2020,(2020)

  21. [21]

    Rapid one-shot dual-shearing digital shearography using a spatial light modulator,

    Y. Wang, K. Xu, S. Wu,et al., “Rapid one-shot dual-shearing digital shearography using a spatial light modulator,” Appl. Opt.62, 5360–5368 (2023)

  22. [22]

    OneFormer: One Transformer To Rule Universal Image Segmentation,

    J. Jain, J. Li, M. Chiu,et al., “OneFormer: One Transformer To Rule Universal Image Segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),(2023), pp. 2989–2998

  23. [23]

    Segment Everything Everywhere All at Once,

    X. Zou, J. Yang, H. Zhang,et al., “Segment Everything Everywhere All at Once,” inAdvances in Neural Information Processing Systems (NeurIPS),(2023). Full paper and poster: https://neurips.cc/virtual/2023/poster/71518

  24. [24]

    PersonalizeSegmentAnythingModelwithOneShot,

    R.Zhang,Z.Jiang,Z.Guo,etal.,“PersonalizeSegmentAnythingModelwithOneShot,”inInternationalConference on Learning Representations (ICLR),(2024)

  25. [25]

    Minderer, A

    M. Minderer, A. Gritsenko, A. Stone,et al., “Simple open-vocabulary object detection with vision transformers,” arXiv preprint arxiv:2205.06230 (2022)

  26. [26]

    Inpaint anything: Segment anything meets image inpainting

    T. Yu, R. Feng, R. Feng,et al., “Inpaint Anything: Segment Anything Meets Image Inpainting,” arXiv preprint arXiv:2304.06790 (2023)

  27. [27]

    SAM-Based Instance Segmentation Models for the Automation of Structural Damage Detection,

    Z. Ye, L. Lovell, A. Faramarzi, and J. Ninic, “SAM-Based Instance Segmentation Models for the Automation of Structural Damage Detection,” SSRN Electron. J. (2024)

  28. [28]

    Segment Anything Model for Medical Image Segmentation,

    Y. Zhang and R. Jiao, “Segment Anything Model for Medical Image Segmentation,” Comput. & Electr. Eng.107, 108614 (2024)

  29. [29]

    A Comprehensive Survey on Segment Anything Model for Vision and Beyond,

    C. Zhang, L. Liu, Y. Cui,et al., “A Comprehensive Survey on Segment Anything Model for Vision and Beyond,” arXivpreprintarXiv:2305.08196(2023).Includesanopen-sourcerepository: https://github.com/liliu-avril/Awesome- Segment-Anything; survey of SAM progress, applications, and future directions

  30. [30]

    SAM 2: Segment Anything in Images and Videos,

    N. Ravi, V. Gabeur, Y.-T. Hu,et al., “SAM 2: Segment Anything in Images and Videos,” inInternational Conference on Learning Representations (ICLR),(2025)

  31. [31]

    SAM 3: Segment Anything with Concepts

    N.Carion, L.Gustafson, Y.-T.Hu,etal., “Sam3: Segmentanythingwithconcepts,” inProceedingsoftheFourteenth International Conference on Learning Representations (ICLR 2025),(2025). Under review. arXiv:2511.16719v1, 20 Nov 2025

  32. [32]

    DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection,

    H. Zhang, F. Li, S. Liu,et al., “DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection,” inInternational Conference on Learning Representations (ICLR),(2023). Code available at https: //github.com/IDEACVR/DINO, presentation at https://openreview.net/forum?id=3mRwyG5one

  33. [33]

    HuggingFace's Transformers: State-of-the-art Natural Language Processing

    T. Wolf, L. Debut, V. Sanh,et al., “HuggingFace’s Transformers: State-of-the-Art Natural Language Processing,” arXiv preprint arXiv.1910.03771 (2019)

  34. [34]

    Pretrained models for Pytorch,

    “Pretrained models for Pytorch,” https://modelzoo.co/model/pretrained-modelspytorch (2021)

  35. [35]

    UltralyticsYOLOv8,

    G.Jocher, A.Chaurasia, J.Qiuet al., “UltralyticsYOLOv8,” https://github.com/ultralytics/ultralytics(2023).Version 8.0.0, GitHub repository, accessed 2025-09-17

  36. [36]

    Enhanced yolov8 for industrial polymer films: a semi-supervised framework for micron-scale defect detection,

    X. Yu, B. Hu, W. Jiang,et al., “Enhanced yolov8 for industrial polymer films: a semi-supervised framework for micron-scale defect detection,” Front. Artif. Intell.8, 1638772 (2025). Published 09 September 2025; open access

  37. [37]

    Efficient Non-Maximum Suppression,

    A. Neubeck and L. V. Gool, “Efficient Non-Maximum Suppression,” in18th International Conference on Pattern Recognition (ICPR’06),(IEEE, 2006)

  38. [38]

    CUDA, release: 10.2.89,

    NVIDIA, P. Vingelmann, and F. H. Fitzek, “CUDA, release: 10.2.89,” (2020)

  39. [39]

    Unsupervised Shearography,

    J. Plassmann, N. Schuler, G. von Freymann, and M. Schuth, “Unsupervised Shearography,” https://github.com/ JessicaPlassmann/Unsupervised-Shearography (2025)

  40. [40]

    Shearographic Anomaly Detection Dataset (SADD), Version 1.0.0,

    J. Plassmann, N. Schuler, G. von Freymann, and M. Schuth, “Shearographic Anomaly Detection Dataset (SADD), Version 1.0.0,” Zenodo https://doi.org/10.5281/zenodo.17631257 (2025)

  41. [41]

    What is an ROC curve?

    Z. H. Hoo, J. Candlish, and D. Teare, “What is an ROC curve?” Emerg. Med. J.34, 357–359 (2017)

  42. [42]

    When to consult precision-recall curves,

    J. Cook and V. Ramadas, “When to consult precision-recall curves,” The Stata J.20, 131–148 (2020)

  43. [43]

    Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression,

    H. Rezatofighi, N. Tsoi, J. Gwak,et al., “Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), pp. 658–666

  44. [44]

    TorchMetrics-MeasuringReproducibilityinPyTorch,

    N.S.Detlefsen,J.Borovec,J.Schock,etal.,“TorchMetrics-MeasuringReproducibilityinPyTorch,”J.OpenSource Softw.7, 4101 (2022)

  45. [45]

    Unsupervised Learning for Industrial Defect Detection: A Case Study on Shearographic Data,

    J. Plassmann, N. Schuler, G. von Freymann, and M. Schuth, “Unsupervised Learning for Industrial Defect Detection: A Case Study on Shearographic Data,” inArtificial Intelligence XLII: 45th SGAI International Conference on Artificial Intelligence, AI 2025, Cambridge, UK, December 16–18, 2025, Proceedings,(Springer, 2025), Lecture Notes in Computer Science. ...

  46. [46]

    Enhancing Automated Inspection in Metal Industries: Zero-Shot Segmentation of Surface Defects Using Bounding Box Prompts,

    D. G. Lema, R. Usamentiaga, and D. F. García, “Enhancing Automated Inspection in Metal Industries: Zero-Shot Segmentation of Surface Defects Using Bounding Box Prompts,” Meas. Sci. Technol.35, 085604 (2024)