arxiv: 2512.06171 · v2 · submitted 2025-12-05 · 💻 cs.CV

Automated Annotation of Shearographic Measurements Enabling Weakly Supervised Defect Detection

Jessica Plassmann , Nicolas Schuler , Michael Schuth , Georg von Freymann This is my paper

Pith reviewed 2026-05-17 00:15 UTC · model grok-4.3

classification 💻 cs.CV

keywords shearographyautomated annotationweakly supervised learningdefect detectionGrounded DINOSAMYOLOnon-destructive testing

0 comments

The pith

An automated pipeline combines Grounded DINO and SAM to generate annotations from shearographic measurements suitable for weakly supervised defect detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a practical way to label shearographic images without relying on slow, subjective manual work. It applies Grounded DINO to produce initial bounding boxes around likely defects, then uses SAM to refine those into precise masks before exporting ready-to-use YOLO labels. This matters for industrial use because shearography can spot subsurface flaws in critical parts yet its adoption stalls when datasets remain small. Quantitative checks confirm the generated boxes work for training detectors under weak supervision, while the masks support visual inspection. The overall result is a scalable route to larger, standardized datasets for more reliable defect finders.

Core claim

The central claim is that an automated pipeline can generate candidate defect bounding boxes with Grounded DINO, refine them into high-resolution masks with SAM, and export YOLO-format labels from shearographic interferometry data. Quantitative evaluation shows these boxes are suitable for weakly supervised learning while the masks enable qualitative visualization, thereby reducing manual annotation effort and supporting scalable dataset creation for industrial defect detection.

What carries the argument

The automated labeling pipeline that runs Grounded DINO for bounding-box proposals followed by SAM mask refinement on shearographic displacement-gradient images.

If this is right

Manual labeling effort for shearographic datasets drops sharply, allowing faster creation of training data.
YOLO-format outputs enable direct training of existing detectors without extra conversion steps.
High-resolution masks from SAM supply detailed visual references alongside the boxes used for supervision.
Standardized automated labels reduce subjectivity and improve repeatability across different inspection sites.
Larger datasets become feasible, which in turn supports more robust models for safety-critical components.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-stage proposal-and-refine pattern could be tested on other non-destructive testing modalities that produce gradient or phase images.
Adding a lightweight fine-tuning step on a small set of shearographic examples might raise label quality without reintroducing heavy manual work.
If the generated labels prove stable, the pipeline could be embedded in portable shearography hardware for on-site dataset growth during routine inspections.

Load-bearing premise

The outputs of Grounded DINO and SAM transfer to shearographic images without domain-specific adaptation or errors large enough to degrade later detector training.

What would settle it

Train a standard object detector on the pipeline's generated labels, then measure whether its defect-detection performance on held-out shearographic test images falls substantially below the performance obtained from an identical detector trained on human-annotated labels.

Figures

Figures reproduced from arXiv: 2512.06171 by Georg von Freymann, Jessica Plassmann, Michael Schuth, Nicolas Schuler.

**Figure 2.** Figure 2: ROC and PR curves for all trained YOLOv8 models using bounding boxes. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Bounding boxes and confidence scores predicted by Grounded DINO using the [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Validation results of the YOLO segmentation model trained on masks generated [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Shearography is an interferometric technique sensitive to surface displacement gradients, providing high sensitivity for detecting subsurface defects in safety-critical components. A key limitation to industrial adoption is the lack of high-quality annotated datasets, since manual labeling remains labor-intensive, subjective, and difficult to standardize. We present an automated labeling pipeline that generates candidate defect bounding boxes with Grounded DINO, refines them using SAM masks, and exports YOLO-format labels for downstream detector training. Quantitative evaluation shows the generated boxes are suitable for weakly supervised learning, while high-resolution masks provide qualitative visualization. This approach reduces manual effort and supports scalable dataset creation for robust industrial defect detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies Grounded DINO and SAM to automate shearography annotation for defect detection, which is a useful practical step but rests on an under-supported claim about quantitative suitability.

read the letter

The main takeaway is that this work chains off-the-shelf models—Grounded DINO for boxes, SAM for masks, then YOLO export—to label shearographic images and support weakly supervised training. It targets the real annotation bottleneck in industrial non-destructive testing of safety-critical parts. The pipeline itself is straightforward and modular, which is the part that could see use in practice for scaling datasets without starting from scratch on every new component. It does a reasonable job laying out why manual labeling is slow and subjective in this modality and how the automated route reduces that effort while still producing usable high-resolution masks for visualization. The approach stays grounded in existing tools rather than claiming new theory, which keeps expectations realistic. The soft spot is the evaluation. The abstract asserts that quantitative checks confirm the boxes work for weakly supervised learning, yet no metrics, baselines, IoU figures, or comparisons to manual annotations appear in the summary. Shearography images have speckle and gradient features far from the natural-image data these models saw in pretraining, so without reported error rates or failure cases the transfer claim stays unverified. If the full manuscript includes those numbers and shows the downstream detector still performs adequately, the limitation shrinks; otherwise it is the load-bearing gap. Citations to the foundation models look standard and appropriate. This is for people working on industrial inspection, manufacturing quality control, or weakly supervised vision in specialized imaging modalities. A reader who needs concrete ideas for bootstrapping labels in niche sensors would get value from the pipeline description. It deserves peer review so referees can check the full evaluation section and suggest any needed domain checks.

Referee Report

1 major / 1 minor

Summary. The paper introduces an automated annotation pipeline for shearographic images to address the scarcity of labeled datasets for defect detection. It employs Grounded DINO to propose candidate bounding boxes around defects, refines these using SAM for high-resolution masks, and exports the results in YOLO format to enable weakly supervised training of downstream detectors. The central claim is that quantitative evaluation demonstrates the generated boxes are suitable for this purpose, while the masks offer qualitative visualization benefits.

Significance. If the quantitative suitability claim holds with appropriate metrics, the pipeline could meaningfully lower the barrier to creating large-scale shearography datasets for industrial non-destructive testing, supporting more robust automated defect detection in safety-critical components. The approach leverages existing zero-shot models without custom training, which is a practical strength for rapid deployment, though the lack of domain-specific validation currently limits assessed impact.

major comments (1)

[Abstract] Abstract and evaluation section: The claim that 'quantitative evaluation shows the generated boxes are suitable for weakly supervised learning' is unsupported because no metrics (e.g., IoU, precision-recall, or mAP against manual shearography annotations), baselines, or details on how suitability for downstream YOLO-style detector performance was measured are reported. This is load-bearing for the central contribution, as shearographic images contain speckle and gradient features outside the natural-image pretraining distributions of Grounded DINO and SAM.

minor comments (1)

[Methods] The manuscript would benefit from explicit citations to the original Grounded DINO and SAM papers in the methods description to clarify the off-the-shelf usage.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights the need to strengthen the quantitative support for our central claim. We agree that the current presentation of the evaluation is insufficient to fully substantiate the suitability of the generated annotations for weakly supervised learning, particularly given the domain characteristics of shearographic imagery. We will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract and evaluation section: The claim that 'quantitative evaluation shows the generated boxes are suitable for weakly supervised learning' is unsupported because no metrics (e.g., IoU, precision-recall, or mAP against manual shearography annotations), baselines, or details on how suitability for downstream YOLO-style detector performance was measured are reported. This is load-bearing for the central contribution, as shearographic images contain speckle and gradient features outside the natural-image pretraining distributions of Grounded DINO and SAM.

Authors: We acknowledge the validity of this observation. The manuscript currently references quantitative evaluation in the abstract and evaluation section but does not report explicit metrics such as IoU or mAP computed against manual annotations on shearographic data, nor does it detail baselines or the precise protocol for assessing downstream YOLO detector performance. The existing quantitative results focus on the consistency of the automated pipeline outputs rather than direct validation against expert labels. In the revised manuscript we will add a dedicated evaluation subsection that includes: (1) IoU statistics between the generated bounding boxes and a set of manually annotated shearographic images, (2) precision-recall curves for the auto-generated labels treated as pseudo-ground truth, and (3) mAP comparison of a YOLO detector trained on the automated annotations versus the same detector trained on the manual annotations. We will also add a short discussion addressing the domain-shift concern, supported by qualitative failure-case analysis and any available proxy metrics from the zero-shot models. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline applies external pre-trained models without self-referential definitions or fitted predictions

full rationale

The paper presents an applied pipeline that invokes off-the-shelf Grounded DINO for bounding-box proposals and SAM for mask refinement, then exports labels for a downstream YOLO detector. No equations, parameters, or quantities are defined in terms of later outputs; the central claim that the generated boxes are suitable for weakly supervised learning rests on external model transfer rather than any reduction to the paper's own fitted values or self-citation chain. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the transfer performance of two pre-trained foundation models to shearographic data and on the assumption that automatically generated labels are sufficiently accurate for weakly supervised training; no new free parameters, mathematical axioms, or invented physical entities are introduced.

axioms (1)

domain assumption Grounded DINO and SAM outputs remain reliable when applied directly to shearographic fringe patterns without domain adaptation
The pipeline description assumes successful zero-shot or few-shot transfer from general natural-image training to the specialized interferometric domain.

pith-pipeline@v0.9.0 · 5408 in / 1289 out tokens · 66712 ms · 2026-05-17T00:15:47.381600+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 5 internal anchors

[1]

Shearography non-destructive testing of thick GFRP laminates: Numerical and experimental study on defect detection with thermal loading,

N. Tao, A. G. Anisimov, and R. M. Groves, “Shearography non-destructive testing of thick GFRP laminates: Numerical and experimental study on defect detection with thermal loading,” Compos. Struct.282(2022)

work page 2022
[2]

WeiterentwicklungderShearografiemiträumlichemPhasenschiebenalszerstörungsfreiesPrüfverfahren für die automatisierte Serienüberwachung,

C.M.Petry,“WeiterentwicklungderShearografiemiträumlichemPhasenschiebenalszerstörungsfreiesPrüfverfahren für die automatisierte Serienüberwachung,” Ph.D. thesis, Universität des Saarlandes (2021)

work page 2021
[3]

A comprehensive survey on machine learning driven material defect detection,

J. Bai, D. Wu, T. Shelley,et al., “A comprehensive survey on machine learning driven material defect detection,” arXiv preprint arXiv:2406.07880 (2024). Accepted in ACM Computing Surveys (2025)

work page arXiv 2024
[4]

YOLO-v1toYOLO-v8,theRiseofYOLOandItsComplementaryNaturetowardDigitalManufacturing and Industrial Defect Detection,

M.Hussain,“YOLO-v1toYOLO-v8,theRiseofYOLOandItsComplementaryNaturetowardDigitalManufacturing and Industrial Defect Detection,” Mach. State-of-the-Art Digit. Manuf. Syst.11, 677 (2023)

work page 2023
[5]

You Only Look Once: Unified, Real-Time Object Detection,

J. Redmon, J. Redmon, S. Divvala,et al., “You Only Look Once: Unified, Real-Time Object Detection,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),(2016), pp. 779–788

work page 2016
[6]

A Survey on Semi-, Self- and Unsupervised Learning for Image Classification,

L. Schmarje, M. Santarossa, S.-M. Schröder, and R. Koch, “A Survey on Semi-, Self- and Unsupervised Learning for Image Classification,” IEEE Access9, 82146 –82168 (2021)

work page 2021
[7]

SuperPoint: Self-Supervised Interest Point Detection and Description

D.DeTone,T.Malisiewicz,andA.Rabinovich,“Superpoint: Self-supervisedinterestpointdetectionanddescription,” arXiv preprint arXiv:1712.07629 (2018). First published December 2017, widely used for local feature detection and description

work page internal anchor Pith review Pith/arXiv arXiv 2018
[8]

https:// doi.org/10.48550/arXiv.2501.09898

B.Wen,M.Trepte,J.Aribido,etal.,“Foundationstereo: Zero-shotstereomatching,”arXivpreprintarXiv:2501.09898 (2025). Introduces a foundation model for zero-shot stereo depth estimation

work page arXiv 2025
[9]

Simulation Dataset Preparation and Hybrid Training for Deep Learning in Defect Detection Using Digital Shearography,

W. Li, D. Wang, and S. Wu, “Simulation Dataset Preparation and Hybrid Training for Deep Learning in Defect Detection Using Digital Shearography,” Appl. Sci.12(2022)

work page 2022
[10]

A Survey on Unsupervised Anomaly Detection Algorithms for Industrial Images,

Y. Cui, Z. Liu, and S. Lian, “A Survey on Unsupervised Anomaly Detection Algorithms for Industrial Images,” IEEE Access11, 55297–55315 (2023)

work page 2023
[11]

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

S. Liu, Z. Zeng, T. Ren,et al., “Grounding DINO: Marrying DINO with Grounded Pretraining for Open-Set Object Detection,” arXiv preprint arXiv:2303.05499 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

Segment Anything

A. Kirillov, E. Mintun, N. Ravi,et al., “Segment Anything,” arXiv preprint arXiv:2304.02643 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

Digital speckle pattern shearing interferometry,

S. Nakadate, T. Yatagai, and H. Saito, “Digital speckle pattern shearing interferometry,” Appl. Opt.19, 4241–4246 (1980)

work page 1980
[14]

Lockin-Interferometry: Principle and Applications in NDE,

P. Menner, H. Gerhard, and G. Busse, “Lockin-Interferometry: Principle and Applications in NDE,” Strojniski vestnik - J. Mech. Eng.57, 183–191 (2011). Paper received: 22.08.2009; accepted: 04.03.2010

work page 2011
[15]

Schuth,Aufbau und Anwendung der Shearografie als praxisgerechtes, optisches Prüf-und Messverfahren zur Dehnungsanalyse, Qualitätssicherung und Bauteiloptimierung, vol

M. Schuth,Aufbau und Anwendung der Shearografie als praxisgerechtes, optisches Prüf-und Messverfahren zur Dehnungsanalyse, Qualitätssicherung und Bauteiloptimierung, vol. 8 (VDI Verlag, 1995)

work page 1995
[16]

Michelson interferometer based spatial phase shift shearography,

X. Xie, L. Yang, N. Xu, and X. Chen, “Michelson interferometer based spatial phase shift shearography,” Appl. Opt. 52, 4063–4071 (2013)

work page 2013
[17]

Modified Michelson Interferometer Based Dual Shearing Single Camera Digital Shearography,

B. Zhang, W. Xu, J. Li,et al., “Modified Michelson Interferometer Based Dual Shearing Single Camera Digital Shearography,” Exp. Tech.44, 187–195 (2020)

work page 2020
[18]

Shearography and its applications – a chronological review,

R. Sirohi, “Shearography and its applications – a chronological review,” Light. Adv. Manuf.3(2022)

work page 2022
[19]

Improved Mach–Zehnder interferometer-based shearography,

C. Cai and L. He, “Improved Mach–Zehnder interferometer-based shearography,” Opt. Lasers Eng.50, 1699–1705 (2012)

work page 2012
[20]

Mach-Zehnder-Interferometer Aufbau mit Virtuellem Doppelspalt für Shearografie mit räumlichem Phasenschieben bei variablem Shear,

C. M. Petry and M. Schuth, “Mach-Zehnder-Interferometer Aufbau mit Virtuellem Doppelspalt für Shearografie mit räumlichem Phasenschieben bei variablem Shear,” inSpecial Issue "ZfP today" as substitute for the cancelled DGZfP Annual Conference 2020,(2020)

work page 2020
[21]

Rapid one-shot dual-shearing digital shearography using a spatial light modulator,

Y. Wang, K. Xu, S. Wu,et al., “Rapid one-shot dual-shearing digital shearography using a spatial light modulator,” Appl. Opt.62, 5360–5368 (2023)

work page 2023
[22]

OneFormer: One Transformer To Rule Universal Image Segmentation,

J. Jain, J. Li, M. Chiu,et al., “OneFormer: One Transformer To Rule Universal Image Segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),(2023), pp. 2989–2998

work page 2023
[23]

Segment Everything Everywhere All at Once,

X. Zou, J. Yang, H. Zhang,et al., “Segment Everything Everywhere All at Once,” inAdvances in Neural Information Processing Systems (NeurIPS),(2023). Full paper and poster: https://neurips.cc/virtual/2023/poster/71518

work page 2023
[24]

PersonalizeSegmentAnythingModelwithOneShot,

R.Zhang,Z.Jiang,Z.Guo,etal.,“PersonalizeSegmentAnythingModelwithOneShot,”inInternationalConference on Learning Representations (ICLR),(2024)

work page 2024
[25]

Minderer, A

M. Minderer, A. Gritsenko, A. Stone,et al., “Simple open-vocabulary object detection with vision transformers,” arXiv preprint arxiv:2205.06230 (2022)

work page arXiv 2022
[26]

Inpaint anything: Segment anything meets image inpainting

T. Yu, R. Feng, R. Feng,et al., “Inpaint Anything: Segment Anything Meets Image Inpainting,” arXiv preprint arXiv:2304.06790 (2023)

work page arXiv 2023
[27]

SAM-Based Instance Segmentation Models for the Automation of Structural Damage Detection,

Z. Ye, L. Lovell, A. Faramarzi, and J. Ninic, “SAM-Based Instance Segmentation Models for the Automation of Structural Damage Detection,” SSRN Electron. J. (2024)

work page 2024
[28]

Segment Anything Model for Medical Image Segmentation,

Y. Zhang and R. Jiao, “Segment Anything Model for Medical Image Segmentation,” Comput. & Electr. Eng.107, 108614 (2024)

work page 2024
[29]

A Comprehensive Survey on Segment Anything Model for Vision and Beyond,

C. Zhang, L. Liu, Y. Cui,et al., “A Comprehensive Survey on Segment Anything Model for Vision and Beyond,” arXivpreprintarXiv:2305.08196(2023).Includesanopen-sourcerepository: https://github.com/liliu-avril/Awesome- Segment-Anything; survey of SAM progress, applications, and future directions

work page arXiv 2023
[30]

SAM 2: Segment Anything in Images and Videos,

N. Ravi, V. Gabeur, Y.-T. Hu,et al., “SAM 2: Segment Anything in Images and Videos,” inInternational Conference on Learning Representations (ICLR),(2025)

work page 2025
[31]

SAM 3: Segment Anything with Concepts

N.Carion, L.Gustafson, Y.-T.Hu,etal., “Sam3: Segmentanythingwithconcepts,” inProceedingsoftheFourteenth International Conference on Learning Representations (ICLR 2025),(2025). Under review. arXiv:2511.16719v1, 20 Nov 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection,

H. Zhang, F. Li, S. Liu,et al., “DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection,” inInternational Conference on Learning Representations (ICLR),(2023). Code available at https: //github.com/IDEACVR/DINO, presentation at https://openreview.net/forum?id=3mRwyG5one

work page 2023
[33]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

T. Wolf, L. Debut, V. Sanh,et al., “HuggingFace’s Transformers: State-of-the-Art Natural Language Processing,” arXiv preprint arXiv.1910.03771 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910
[34]

Pretrained models for Pytorch,

“Pretrained models for Pytorch,” https://modelzoo.co/model/pretrained-modelspytorch (2021)

work page 2021
[35]

UltralyticsYOLOv8,

G.Jocher, A.Chaurasia, J.Qiuet al., “UltralyticsYOLOv8,” https://github.com/ultralytics/ultralytics(2023).Version 8.0.0, GitHub repository, accessed 2025-09-17

work page 2023
[36]

Enhanced yolov8 for industrial polymer films: a semi-supervised framework for micron-scale defect detection,

X. Yu, B. Hu, W. Jiang,et al., “Enhanced yolov8 for industrial polymer films: a semi-supervised framework for micron-scale defect detection,” Front. Artif. Intell.8, 1638772 (2025). Published 09 September 2025; open access

work page 2025
[37]

Efficient Non-Maximum Suppression,

A. Neubeck and L. V. Gool, “Efficient Non-Maximum Suppression,” in18th International Conference on Pattern Recognition (ICPR’06),(IEEE, 2006)

work page 2006
[38]

CUDA, release: 10.2.89,

NVIDIA, P. Vingelmann, and F. H. Fitzek, “CUDA, release: 10.2.89,” (2020)

work page 2020
[39]

Unsupervised Shearography,

J. Plassmann, N. Schuler, G. von Freymann, and M. Schuth, “Unsupervised Shearography,” https://github.com/ JessicaPlassmann/Unsupervised-Shearography (2025)

work page 2025
[40]

Shearographic Anomaly Detection Dataset (SADD), Version 1.0.0,

J. Plassmann, N. Schuler, G. von Freymann, and M. Schuth, “Shearographic Anomaly Detection Dataset (SADD), Version 1.0.0,” Zenodo https://doi.org/10.5281/zenodo.17631257 (2025)

work page doi:10.5281/zenodo.17631257 2025
[41]

What is an ROC curve?

Z. H. Hoo, J. Candlish, and D. Teare, “What is an ROC curve?” Emerg. Med. J.34, 357–359 (2017)

work page 2017
[42]

When to consult precision-recall curves,

J. Cook and V. Ramadas, “When to consult precision-recall curves,” The Stata J.20, 131–148 (2020)

work page 2020
[43]

Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression,

H. Rezatofighi, N. Tsoi, J. Gwak,et al., “Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), pp. 658–666

work page 2019
[44]

TorchMetrics-MeasuringReproducibilityinPyTorch,

N.S.Detlefsen,J.Borovec,J.Schock,etal.,“TorchMetrics-MeasuringReproducibilityinPyTorch,”J.OpenSource Softw.7, 4101 (2022)

work page 2022
[45]

Unsupervised Learning for Industrial Defect Detection: A Case Study on Shearographic Data,

J. Plassmann, N. Schuler, G. von Freymann, and M. Schuth, “Unsupervised Learning for Industrial Defect Detection: A Case Study on Shearographic Data,” inArtificial Intelligence XLII: 45th SGAI International Conference on Artificial Intelligence, AI 2025, Cambridge, UK, December 16–18, 2025, Proceedings,(Springer, 2025), Lecture Notes in Computer Science. ...

work page arXiv 2025
[46]

Enhancing Automated Inspection in Metal Industries: Zero-Shot Segmentation of Surface Defects Using Bounding Box Prompts,

D. G. Lema, R. Usamentiaga, and D. F. García, “Enhancing Automated Inspection in Metal Industries: Zero-Shot Segmentation of Surface Defects Using Bounding Box Prompts,” Meas. Sci. Technol.35, 085604 (2024)

work page 2024