pith. sign in

arxiv: 2605.26533 · v1 · pith:USCW3EXYnew · submitted 2026-05-26 · 💻 cs.CV · cs.AI· cs.CL· cs.LG

A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection

Pith reviewed 2026-06-29 18:47 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CLcs.LG
keywords industrial inspectiondefect detectionreport generationvision-language modelQLoRAwind turbinehybrid architecturestructured JSON output
0
0 comments X

The pith

A three-component pipeline with a 1.5B adapted model generates higher-quality defect reports than a 671B generalist vision-language model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a decoupled system for wind turbine blade inspection that separates defect localization from report generation. A YOLO detector finds oriented bounding boxes at native resolution. A parameter-free bridge turns those boxes into grid-referenced tokens inside a structured prompt. A QLoRA-adapted 1.5B model then produces a JSON maintenance report, with retrieval-augmented fine-tuning to ground recommendations in procedures. The full pipeline scores BLEU-4 0.41, hallucination rate 4 percent, and expert score 8.6 out of 10, versus 0.07, 65 percent, and 3.3 for a zero-shot large VLM baseline. The same small model also outperforms a 671B-parameter generalist model when both receive identical detection evidence, while running at 47 tokens per second on a single T4 GPU.

Core claim

The decoupled Eyes-Bridge-Brain pipeline, with a 4-bit quantized Qwen-2.5-1.5B model adapted via QLoRA on 947 synthetic reports and RAFT for procedure grounding, produces structured JSON reports that achieve BLEU-4 of 0.41, hallucination rate of 4 percent, and expert score of 8.6/10, exceeding both zero-shot VLM baselines and a 671B generalist model given the same detection input.

What carries the argument

The three-part pipeline: YOLO26-x-obb detector for oriented bounding boxes, deterministic Bridge module that encodes boxes into grid-referenced spatial tokens, and QLoRA-adapted 1.5B LLM that converts the prompt into a structured JSON report.

If this is right

  • The complete pipeline runs at 47 tokens per second on a single T4-class GPU, enabling edge deployment.
  • Ablation results show that removing any one component increases hallucination rate and lowers expert scores.
  • The 1.5B QLoRA model produces higher-quality reports than the 671B generalist model when both receive identical detection evidence.
  • Retrieval-augmented fine-tuning grounds recommendations in indexed maintenance procedures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The deterministic Bridge encoding of spatial tokens could be added to other vision-language models to reduce spatial hallucinations without retraining the full model.
  • If synthetic report generation can be scaled to new industrial domains, the same small-model adaptation approach may apply beyond wind turbines.
  • The performance gap between the adapted 1.5B model and the 671B baseline suggests that task-specific structure and domain data matter more than raw parameter count for structured output tasks.

Load-bearing premise

The 947 synthetically generated maintenance reports represent real-world scenarios and the LLM-as-a-Judge scores align with actual expert judgment.

What would settle it

Run the pipeline on a held-out set of real expert-written maintenance reports from actual wind turbine inspections and measure whether BLEU-4, hallucination rate, and expert scores remain higher than the large VLM baseline.

read the original abstract

Automated industrial inspection requires both precise defect localization and structured maintenance report generation; in current practice these tasks are handled separately, with linguistic interpretation left to human experts. This paper describes a decoupled, edge-deployable pipeline for wind turbine blade inspection built from three components that each handle a distinct sub-task. The Eyes a YOLO26-x-obb oriented bounding-box detector localizes defects at dataset-native resolution. The Bridge a deterministic, parameter-free encoding module maps each detected bounding box to grid-referenced spatial tokens embedded in a structured prompt. The Brain a 4-bit quantized Qwen-2.5-1.5B model adapted with Quantized Low-Rank Adaptation (QLoRA) on 947 synthetically generated maintenance reports generates a structured JSON report from that prompt. Retrieval-Augmented Fine-Tuning (RAFT) further grounds each recommendation in indexed maintenance procedures. Five ablation experiments, scored by BLEU-4, ROUGE-L, Hallucination Rate (HR), and an LLM-as-a-Judge rubric, compare the pipeline against a monolithic vision-language model (VLM) baseline and against partial configurations in which one component is removed. The complete system achieves BLEU-4 0.41, HR=4%, and Expert Score = 8.6/10 compared with 0.07, 65%, and 3.3/10 for the zero-shot VLM baseline. The QLoRA-adapted 1.5B model generates higher-quality reports than a 671B-parameter generalist API model given identical detection evidence, at 47 tokens per second on a single T4-class GPU. The results show that purpose-built decoupled architecture with a small domain-specific training corpus outperforms a generalist end-to-end model on this structured generation task.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes a decoupled three-component pipeline for wind-turbine blade defect inspection and structured report generation: a YOLO26-x-obb detector (Eyes) for oriented bounding-box localization, a deterministic parameter-free Bridge module that encodes detections into grid-referenced spatial tokens, and a 4-bit QLoRA-adapted Qwen-2.5-1.5B model (Brain) fine-tuned on 947 synthetically generated maintenance reports plus RAFT, which produces JSON reports. Five ablations using BLEU-4, ROUGE-L, Hallucination Rate, and LLM-as-Judge scoring show the full pipeline reaching BLEU-4 0.41 / HR=4% / Expert Score 8.6/10 versus 0.07 / 65% / 3.3/10 for a zero-shot VLM baseline and outperforming a 671B generalist model at 47 tokens/s on a T4 GPU.

Significance. If the central claims hold after validation, the work would demonstrate that a purpose-built, edge-deployable decoupled architecture with modest domain-specific adaptation can exceed both zero-shot VLMs and much larger generalist models on structured industrial report generation. The quantitative ablation suite and explicit comparison to a 671B baseline provide concrete evidence for the value of task decomposition and small-model specialization in this domain.

major comments (2)
  1. [Abstract] Abstract (final two paragraphs): The reported superiority (BLEU-4 0.41, HR=4%, Expert Score 8.6/10) and the claim that the QLoRA-adapted 1.5B model outperforms the 671B generalist model rest entirely on adaptation and evaluation using 947 synthetically generated reports together with an unvalidated LLM-as-Judge rubric. No generation procedure, lexical/structural diversity statistics, or human-expert alignment study for these reports is supplied, which directly undermines the generalization argument that the pipeline performs better on the actual maintenance task.
  2. [Abstract] Abstract (ablation description): The five ablation experiments compare the complete pipeline against a monolithic VLM baseline and partial configurations, yet the evaluation remains confined to held-out synthetic reports. Without evidence that the synthetic corpus reproduces the lexical, recommendation, and variability distributions of real wind-turbine maintenance reports, the ablation results cannot establish that the decoupled design improves real-world defect reasoning.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and the focus on the synthetic data foundation. We address each major comment below. Where the manuscript is incomplete we propose targeted additions; where new empirical validation would be required we note the limitation explicitly.

read point-by-point responses
  1. Referee: [Abstract] Abstract (final two paragraphs): The reported superiority (BLEU-4 0.41, HR=4%, Expert Score 8.6/10) and the claim that the QLoRA-adapted 1.5B model outperforms the 671B generalist model rest entirely on adaptation and evaluation using 947 synthetically generated reports together with an unvalidated LLM-as-Judge rubric. No generation procedure, lexical/structural diversity statistics, or human-expert alignment study for these reports is supplied, which directly undermines the generalization argument that the pipeline performs better on the actual maintenance task.

    Authors: We agree that the generation procedure and diversity statistics must be supplied. The 947 reports were produced by a deterministic template engine seeded with defect taxonomies and maintenance-action lists drawn from our industrial partner’s historical logs; each template was then varied by sampling from a small set of lexical paraphrases and recommendation phrasings. We will add a new subsection (3.3) that documents the template grammar, the sampling procedure, and quantitative diversity measures (type-token ratio, n-gram entropy, and structural variance across JSON fields). We also acknowledge that no separate human-expert alignment study was performed; this is a genuine limitation of the current study and will be stated as such in the revised Limitations paragraph. revision: partial

  2. Referee: [Abstract] Abstract (ablation description): The five ablation experiments compare the complete pipeline against a monolithic VLM baseline and partial configurations, yet the evaluation remains confined to held-out synthetic reports. Without evidence that the synthetic corpus reproduces the lexical, recommendation, and variability distributions of real wind-turbine maintenance reports, the ablation results cannot establish that the decoupled design improves real-world defect reasoning.

    Authors: The ablations are performed on held-out synthetic reports by design, because the controlled corpus lets us isolate the contribution of each pipeline stage without confounding factors from real-world annotation noise. We will expand the Data section to include a side-by-side comparison of key statistics (average report length, frequency of each defect class, distribution of recommendation verbs) between the synthetic corpus and a small set of redacted real maintenance reports that our partner permitted us to inspect. We cannot, however, release or evaluate on a large public real-report corpus; therefore the claim that the architecture improves real-world performance rests on the assumption that the synthetic distribution is sufficiently representative—an assumption we will now qualify in the text. revision: partial

standing simulated objections not resolved
  • A full human-expert alignment study comparing synthetic versus real reports cannot be supplied without additional data access and annotation resources that are outside the scope of the present work.

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation on held-out data

full rationale

The paper presents a decoupled pipeline evaluated via standard metrics (BLEU-4, ROUGE-L, HR, LLM-as-Judge) on held-out synthetic reports after QLoRA adaptation, with ablations against baselines. No load-bearing step reduces by construction to its inputs, no self-definitional mappings, no fitted parameters renamed as predictions, and no self-citation chains invoked as uniqueness theorems. The central claims rest on independent held-out comparisons rather than tautological reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the effectiveness of synthetic data and the deterministic bridge module, which are not independently verified in the provided abstract. No explicit free parameters or invented entities are described.

axioms (1)
  • domain assumption Synthetic maintenance reports can train a model to produce accurate real reports.
    The training relies on 947 synthetic reports without mention of validation against real data.

pith-pipeline@v0.9.1-grok · 5864 in / 1277 out tokens · 37317 ms · 2026-06-29T18:47:39.066891+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 33 canonical work pages

  1. [1]

    The Connected-Component Labeling Problem: A Review of State-of-the-Art Algorithms

    Zhong, D., Xia, Z., Zhu, Y., Duan, J.: Overview of predictive maintenance based on digital twin technology. Heliyon9(4), 14534 (2023) https://doi.org/10.1016/j. heliyon.2023.e14534

  2. [2]

    Future Internet17(11), 528 (2025) https://doi.org/10.3390/fi17110528

    Hamdi, A., Noura, H.N.: Ai-driven damage detection in wind turbines: Drone imagery and lightweight deep learning approaches. Future Internet17(11), 528 (2025) https://doi.org/10.3390/fi17110528

  3. [3]

    Mea- surement Science and Technology36(9), 095416 (2025) https://doi.org/10.1088/ 1361-6501/ae08db

    Si, Y., Ding, Y., Ge, F., Wu, X., Liu, J., Ding, D., Zhang, H.: A multi-scale defect detection network for wind turbines utilizing margin aware features. Mea- surement Science and Technology36(9), 095416 (2025) https://doi.org/10.1088/ 1361-6501/ae08db

  4. [4]

    Engineering, Technology & Applied Science Research15(6), 30267–30276 (2025) https://doi.org/10.48084/etasr.14220

    Zheng, B., Angkawisittpan, N., Huang, L., Sonasang, S.: An improved yolov11n algorithm with conv2former and pw-iou for uav inspection of power line insula- tors. Engineering, Technology & Applied Science Research15(6), 30267–30276 (2025) https://doi.org/10.48084/etasr.14220

  5. [5]

    Applied Sciences15(11), 6117 (2025) https://doi.org/10.3390/ app15116117

    Deng, Z., Li, X., Yang, R.: Rml-yolo: An insulator defect detection method for uav aerial images. Applied Sciences15(11), 6117 (2025) https://doi.org/10.3390/ app15116117

  6. [6]

    Proceedings of the AAAI Conference on Artificial Intelligence38(3), 1932–1940 (2024) https://doi.org/10

    Gu, Z., Zhu, B., Zhu, G., Chen, Y., Tang, M., Wang, J.: Anomalygpt: Detecting industrial anomalies using large vision-language models. Proceedings of the AAAI Conference on Artificial Intelligence38(3), 1932–1940 (2024) https://doi.org/10. 1609/aaai.v38i3.27963

  7. [7]

    & Sung, J

    Cai, W., Huang, W., Cao, Y., Huang, C., Yuan, F., Zhang, B., Wen, J.: Towards vlm-based hybrid explainable prompt enhancement for zero-shot indus- trial anomaly detection. In: Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, pp. 711–719. International Joint Conferences on Artificial Intelligence Organization, ???...

  8. [8]

    IEEE Transactions on Pattern Analysis and Machine Intelligence46(8), 19 5625–5644 (2024) https://doi.org/10.1109/TPAMI.2024.3369699

    Zhang, J., Huang, J., Jin, S., Lu, S.: Vision-language models for vision tasks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence46(8), 19 5625–5644 (2024) https://doi.org/10.1109/TPAMI.2024.3369699

  9. [9]

    International Journal of Computer Vision133(6), 3689–3726 (2025) https://doi

    Yang, M., Wang, Z.: Image synthesis under limited data: A survey and taxonomy. International Journal of Computer Vision133(6), 3689–3726 (2025) https://doi. org/10.1007/s11263-025-02357-y

  10. [10]

    https://doi.org/10.2139/ssrn

    Bai, Y., Zhang, J., Dong, Y., Cao, Y., Tian, G.: Dual-Path Frequency Discrim- inators for Few-Shot Anomaly Detection (2024). https://doi.org/10.2139/ssrn. 4862099

  11. [11]

    https://docs.ultralytics.com/models/yolo26/ (2026)

    Ultralytics: Ultralytics YOLO26. https://docs.ultralytics.com/models/yolo26/ (2026)

  12. [12]

    arXiv preprint arXiv:2509.25164 (2025)

    Sapkota, R., Cheppally, R.H., Sharda, A., Karkee, M.: YOLO26: Key archi- tectural enhancements and performance benchmarking for real-time object detection. arXiv preprint arXiv:2509.25164 (2025)

  13. [13]

    Interna- tional Journal of Computer Vision60(2), 91–110 (2004) https://doi.org/10.1023/ B:VISI.0000029664.99615.94

    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Interna- tional Journal of Computer Vision60(2), 91–110 (2004) https://doi.org/10.1023/ B:VISI.0000029664.99615.94

  14. [14]

    IEEE Transactions on Pattern Analy- sis and Machine Intelligence39(6), 1137–1149 (2017) https://doi.org/10.1109/ TPAMI.2016.2577031

    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analy- sis and Machine Intelligence39(6), 1137–1149 (2017) https://doi.org/10.1109/ TPAMI.2016.2577031

  15. [15]

    International Journal of Precision Engineering and Manufacturing-Green Technology9(2), 661–691 (2022) https://doi.org/10.1007/ s40684-021-00343-6

    Ren, Z., Fang, F., Yan, N., Wu, Y.: State of the art in defect detection based on machine vision. International Journal of Precision Engineering and Manufacturing-Green Technology9(2), 661–691 (2022) https://doi.org/10.1007/ s40684-021-00343-6

  16. [16]

    Journal of Advanced Research35, 33–48 (2022) https://doi.org/10.1016/j.jare.2021.03.015

    Tulbure, A.-A., Tulbure, A.-A., Dulf, E.-H.: A review on modern defect detection models using dcnns – deep convolutional neural networks. Journal of Advanced Research35, 33–48 (2022) https://doi.org/10.1016/j.jare.2021.03.015

  17. [17]

    Renewable Energy253, 123489 (2025) https://doi.org/10.1016/j.renene.2025.123489

    Zhao, B., Li, X., Wang, G., Gao, H., Lv, C., Cao, S.: End-to-end wind turbine damage detection model based on multi-branch feature sensing and contextual information reuse in harsh environments. Renewable Energy253, 123489 (2025) https://doi.org/10.1016/j.renene.2025.123489

  18. [18]

    IEEE Transactions on Instrumentation and Measurement67(2), 257–269 (2018) https://doi.org/10.1109/TIM.2017.2775345

    Chen, J., Liu, Z., Wang, H., Nunez, A., Han, Z.: Automatic defect detection of fasteners on the catenary support device using deep convolutional neural network. IEEE Transactions on Instrumentation and Measurement67(2), 257–269 (2018) https://doi.org/10.1109/TIM.2017.2775345

  19. [19]

    Processes13(11), 3714 (2025) https://doi.org/10.3390/ pr13113714

    Liu, S., Zhang, W., Yuan, S., Bao, H., Mao, W., Xi, S.: A lightweight model for 20 insulator defect detection based on vision–language modeling and prior knowl- edge in power systems. Processes13(11), 3714 (2025) https://doi.org/10.3390/ pr13113714

  20. [20]

    Journal of Intelligent Manufactur- ing (2025) https://doi.org/10.1007/s10845-025-02767-2

    Tran, N.-Q., Nguyen, H.-C., Mach, B.-N., Nguyen, N.N., Nguyen, T.Q.: Mobilevit- slm: real-time edge-deployable cnn–transformer hybrid for fine-grained scan line defect classification in additive manufacturing. Journal of Intelligent Manufactur- ing (2025) https://doi.org/10.1007/s10845-025-02767-2

  21. [21]

    Engineering Applications of Artificial Intelligence 131, 107836 (2024) https://doi.org/10.1016/j.engappai.2023.107836

    Dwivedi, D., Babu, K.V.S.M., Yemula, P.K., Chakraborty, P., Pal, M.: Identifica- tion of surface defects on solar pv panels and wind turbine blades using attention based deep learning model. Engineering Applications of Artificial Intelligence 131, 107836 (2024) https://doi.org/10.1016/j.engappai.2023.107836

  22. [22]

    In: Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, pp

    Jiang, Y., Lu, X., Jin, Q., Sun, Q., Wu, H., Zhuo, C.: Fabgpt: An efficient large multimodal model for complex wafer defect knowledge queries. In: Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, pp. 1–8. ACM, ??? (2024). https://doi.org/10.1145/3676536.3676750

  23. [23]

    Visual Intelligence2(1), 17 (2024) https://doi.org/10.1007/s44267-024-00050-1

    Jiang, Y., Yan, X., Ji, G.-P., Fu, K., Sun, M., Xiong, H., Fan, D.-P., Khan, F.S.: Effectiveness assessment of recent large vision-language models. Visual Intelligence2(1), 17 (2024) https://doi.org/10.1007/s44267-024-00050-1

  24. [24]

    IEEE Access13, 117914–117942 (2025) https://doi.org/10.1109/ACCESS

    Bukhary, N., Ahmad, M., Rashad, K., Rai, S., Shapsough, S., Kaddoura, Y., Dghaym, D., Zualkernan, I.: Few-shot evaluation of vision language models for detecting visual defects in autonomous vehicle software requirement specifica- tions. IEEE Access13, 117914–117942 (2025) https://doi.org/10.1109/ACCESS. 2025.3586554

  25. [25]

    Scientific Reports15(1), 40600 (2025) https://doi.org/10.1038/s41598-025-24260-9

    Wang, Q., Wang, D., Lu, J., Xiao, G., Liang, D., Lu, G., Shao, H.: Sal-yolo- deepseek: a lightweight real-time detection and llm-driven decision framework for intelligent escalator safety monitoring. Scientific Reports15(1), 40600 (2025) https://doi.org/10.1038/s41598-025-24260-9

  26. [26]

    Journal of Advanced Transportation2026(1) (2026) https: //doi.org/10.1155/atr/2814128

    Zhao, Y., Ma, T., Wang, Z., Zhang, Z., Li, C., Liu, S., Cui, Z., Lv, M., Yu, H., Peng, Z.: A multiview-integrated framework for traffic scene understanding based on yolo and llm. Journal of Advanced Transportation2026(1) (2026) https: //doi.org/10.1155/atr/2814128

  27. [27]

    Advanced Engineering Informatics66, 103478 (2025) https://doi.org/10.1016/j.aei.2025.103478

    Chen, Q., Yin, X.: Tailored vision-language framework for automated hazard identification and report generation in construction sites. Advanced Engineering Informatics66, 103478 (2025) https://doi.org/10.1016/j.aei.2025.103478

  28. [28]

    Proceedings of the AAAI Conference on Artificial Intelligence40(31), 26787–26795 (2026) 21 https://doi.org/10.1609/aaai.v40i31.39889

    Wang, Z., Fan, Z., Tan, S., Zhong, Y., Yuan, Y., Li, H., Jiang, H., Zhang, W., Shao, F., Wang, H., Xiao, J.: Mau-gpt: Enhancing multi-type industrial anomaly understanding via anomaly-aware and generalist experts adaptation. Proceedings of the AAAI Conference on Artificial Intelligence40(31), 26787–26795 (2026) 21 https://doi.org/10.1609/aaai.v40i31.39889

  29. [29]

    ACM Computing Surveys 57(8), 1–35 (2025) https://doi.org/10.1145/3719664

    Zheng, Y., Chen, Y., Qian, B., Shi, X., Shu, Y., Chen, J.: A review on edge large language models: Design, execution, and applications. ACM Computing Surveys 57(8), 1–35 (2025) https://doi.org/10.1145/3719664

  30. [30]

    Agriculture15(15), 1712 (2025) https: //doi.org/10.3390/agriculture15151712

    Gao, L., Ran, T., Zou, H., Wu, H.: Cotton leaf disease detection using llm- synthetic data and demm-yolo model. Agriculture15(15), 1712 (2025) https: //doi.org/10.3390/agriculture15151712

  31. [31]

    Zhao, J.: Cognitive-yolo: Llm-driven architecture synthesis from first principles of data for object detection (2025) https://doi.org/10.48550/arXiv.2512.12281

  32. [32]

    Journal of Quality in Maintenance Engineering32(1), 269–290 (2026) https://doi.org/10.1108/ JQME-05-2025-0055

    Nagrani, S., Narwane, V.S.: An exploration of factors influencing the adop- tion of digital twin technology in predictive maintenance. Journal of Quality in Maintenance Engineering32(1), 269–290 (2026) https://doi.org/10.1108/ JQME-05-2025-0055

  33. [33]

    Intelligent Systems with Applications26, 200535 (2025) https://doi.org/10.1016/j.iswa.2025.200535

    Leon-Medina, J.X., Tibaduiza, D.A., Par´ es, N., Pozo, F.: Digital twin technology in wind turbine components: A review. Intelligent Systems with Applications26, 200535 (2025) https://doi.org/10.1016/j.iswa.2025.200535

  34. [34]

    Journal of Manufacturing Systems71, 581–594 (2023) https://doi.org/10.1016/j.jmsy.2023

    Chen, C., Fu, H., Zheng, Y., Tao, F., Liu, Y.: The advance of digital twin for predictive maintenance: The role and function of machine learning. Journal of Manufacturing Systems71, 581–594 (2023) https://doi.org/10.1016/j.jmsy.2023. 10.010

  35. [35]

    PeerJ Computer Science10, 1943 (2024) https://doi.org/10.7717/peerj-cs.1943

    Abd Wahab, N.H., Hasikin, K., Lai, K.W., Xia, K., Bei, L., Huang, K., Wu, X.: Systematic review of predictive maintenance and digital twin technologies challenges, opportunities, and best practices. PeerJ Computer Science10, 1943 (2024) https://doi.org/10.7717/peerj-cs.1943

  36. [36]

    In: 2025 International Conference on Control, Automation and Diagnosis (ICCAD), pp

    Chen, Z., Fu, H., Zeng, Z.: A domain adaptation neural network for digital twin-supported fault diagnosis. In: 2025 International Conference on Control, Automation and Diagnosis (ICCAD), pp. 1–6. IEEE, ??? (2025). https://doi.org/ 10.1109/ICCAD64771.2025.11099349

  37. [37]

    International Journal of Intelligent Robotics and Applications (2025) https://doi.org/10.1007/s41315-025-00509-4

    Hnaien, I.B., Gascard, E., Simeu-Abazi, Z., Dhouibi, H., Duong, Q.B.: Unsu- pervised anomaly detection in robotic systems via high-fidelity digital twins and deep autoencoders. International Journal of Intelligent Robotics and Applications (2025) https://doi.org/10.1007/s41315-025-00509-4

  38. [38]

    Applied Sciences15(6), 3166 (2025) https://doi.org/10.3390/ app15063166 22

    Miko lajewska, E., Miko lajewski, D., Miko lajczyk, T., Paczkowski, T.: Genera- tive ai in ai-based digital twins for fault diagnosis for predictive maintenance in industry 4.0/5.0. Applied Sciences15(6), 3166 (2025) https://doi.org/10.3390/ app15063166 22

  39. [39]

    Engineering Science9(3), 60–70 (2024) https://doi.org/10.11648/j.es.20240903

    Gomaa, A.: Digital twins for improving proactive maintenance management. Engineering Science9(3), 60–70 (2024) https://doi.org/10.11648/j.es.20240903. 12

  40. [40]

    Mendeley Data

    Shihavuddin, A., Chen, X.: DTU – Drone inspection images of wind tur- bine. Mendeley Data. Version 2. Mendeley Data. https://doi.org/10.17632/ hd96prn3nc.2 (2018)

  41. [41]

    GitHub (2023)

    Gohar, I.: DTU-annotations: Annotations for the DTU Wind Turbine Images Dataset. GitHub (2023)

  42. [42]

    Open source software available from https://github.com/ HumanSignal/label-studio (2020–2025)

    Tkachenko, M., Malyuk, M., Holmanyuk, A., Liubimov, N.: Label Studio: Data labeling software. Open source software available from https://github.com/ HumanSignal/label-studio (2020–2025)

  43. [43]

    Sensors25(10), 3072 (2025) https://doi.org/10.3390/s25103072

    Wang, T., Zhang, B., Jiang, D., Li, D.: A multimodal large language model framework for intelligent perception and decision-making in smart manufacturing. Sensors25(10), 3072 (2025) https://doi.org/10.3390/s25103072

  44. [44]

    In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp

    Yu, Y., Zutty, J.: Llm-guided evolution: An autonomous model optimization for object detection. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 2363–2370. ACM, ??? (2025). https://doi.org/10. 1145/3712255.3734340 23