General Hazard Detection

CP Lim; David Nguyen; Hailing Zhou; Hendrik Zurlinden; Lei Wei; Saeid Nahavandi; Stephanie Ng; SueJen Looi

arxiv: 2605.23304 · v1 · pith:N35FCT2Onew · submitted 2026-05-22 · 💻 cs.CV

General Hazard Detection

Stephanie Ng , CP Lim , SueJen Looi , Hendrik Zurlinden , David Nguyen , Lei Wei , Saeid Nahavandi , Hailing Zhou This is my paper

Pith reviewed 2026-05-25 05:09 UTC · model grok-4.3

classification 💻 cs.CV

keywords hazard detectionvision-language modelsrule-based complianceactive learningCompliVision datasetsafety rulesISO standardsgeneralization

0 comments

The pith

Expressing safety requirements as language rules from regulations decouples hazard detection from image examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that hazards are abstract concepts best captured by logical rules rather than fixed image categories, so existing detection systems fail on noisy data, shifting definitions, and novel cases. It introduces the CompliVision dataset of 3006 images across traffic, construction, and warehouse scenes, each checked against specific language rules drawn from domain regulations and ISO standards, plus natural-language explanations of the visual evidence. A baseline active-learning framework then uses LLaVA-style vision-language models with human-in-the-loop feedback to assess rule compliance. If the approach holds, hazard systems could handle evolving standards and unseen scenarios without retraining on new labeled examples for each context.

Core claim

Hazard assessment reduces to checking compliance with language-based safety rules grounded in authoritative regulations and ISO standards rather than learning from predefined image categories; the CompliVision dataset supplies 3006 images annotated for rule compliance and supporting visual evidence, while an active-learning pipeline combining LLaVA visual reasoning with human feedback enables generalization beyond the training distribution.

What carries the argument

Language-based safety rules (grounded in regulations and ISO standards) that replace image-category labels, evaluated by an active-learning loop of LLaVA-based visual reasoning plus human-in-the-loop refinement.

If this is right

Hazard definitions can be updated by editing the language rules without recollecting image examples.
The same rule set applies across traffic, construction, and warehouse domains without domain-specific retraining.
Active learning reduces the volume of human annotations needed compared with fully supervised category-based detectors.
Natural-language explanations of visual evidence become a built-in output of the assessment process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The rule-decoupling pattern could be tested on other abstract safety or compliance concepts such as accessibility or environmental impact.
Direct linkage of the language rules to live regulatory databases would allow automatic propagation of definition changes into the detector.
The framework might be extended to video or 3D sensor streams by applying the same rule interpreter to temporal or spatial evidence.

Load-bearing premise

Vision-language models plus active learning and human feedback can correctly interpret fine-grained, context-dependent safety rules for hazards never seen during training.

What would settle it

A controlled test on a new domain or novel hazard scenario where the framework produces compliance judgments that systematically contradict expert rule application.

Figures

Figures reproduced from arXiv: 2605.23304 by CP Lim, David Nguyen, Hailing Zhou, Hendrik Zurlinden, Lei Wei, Saeid Nahavandi, Stephanie Ng, SueJen Looi.

**Figure 1.** Figure 1: Overview of the proposed approach for general hazard detection. Top panel: Key limitations of perception-level object detection models in hazard [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of the proposed framework in details. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: t-SNE Visualization of Embeddings for Training, Validation, and Test [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Word Clouds of Generated Explanations for Training, Validation, and [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: t-SNE Embeddings with Rule-Compliance Regions in AL Round 0 vs Round 3. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Examples of hazard detection results across three application domains and classification types. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Demonstration of the feedback process: (1) the model generates a prediction and flags weak samples for human feedback. (2a) an example of poor [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

Hazard, as an abstract concept, is typically defined through cognitive-level logical reasoning rather than concrete examples. In contrast, existing hazard detection systems rely on predefined hazard categories and require intensive collection of labelled examples within detection or classification architectures. This approach faces three fundamental challenges when addressing abstract safety concepts: (1) noisy and sparse training data, (2) dynamically evolving definitions that change across contexts and time, and (3) limited generalisation to unseen or novel scenarios. To address these limitations, we present the CompliVision dataset, the first general-purpose hazard dataset designed for rule-based compliance assessment, along with a baseline framework for hazard evaluation. Our key innovation is decoupling the hazard concept from image-based examples by expressing safety requirements through language-based rules. We ground our approach in authoritative domain regulations and ISO standards to define diverse hazard concepts across multiple domains. The CompliVision dataset comprises 3,006 images spanning traffic, construction, and warehouse environments, with each image annotated for compliance against specific safety rules, accompanied by natural language explanations highlighting the supporting visual evidence. To achieve robust generalisation, we develop an active learning framework to more effectively guide and refine vision-language models in assessing hazard compliance. While state-of-the-art VLMs demonstrate strong capabilities, they struggle with the fine-grained, context-dependent interpretation required for accurate safety assessment. We proposed a general hazard detection framework to address this limitation which combines LLaVA-based visual reasoning with with human-in-the-loop feedback.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New dataset for rule-based hazard detection in three industrial settings, but the active learning claims rest on zero reported results or ablations.

read the letter

The main thing here is a new dataset called CompliVision with 3006 images across traffic, construction, and warehouse scenes, each tied to compliance checks against specific ISO-grounded safety rules plus natural language explanations. The core idea is to move away from collecting example images for every hazard and instead express rules in language so the system can handle evolving or novel cases. That decoupling is a reasonable response to the usual problems with sparse labels and shifting definitions in safety work. They also sketch an LLaVA-based active learning loop with human feedback to handle the fine-grained interpretation that off-the-shelf VLMs apparently struggle with. Those pieces are new enough on their own terms and could be useful to people building compliance tools for real sites. The soft spot is obvious and load-bearing: the abstract and stress-test note both say the framework is meant to deliver better generalization, yet nothing is shown—no accuracy numbers, no baselines, no held-out novel-rule tests, no ablation on the active learning step. Without those, the claim that the HITL loop actually closes the gap stays untested. The dataset itself is modest in scale and scope, which is fine for a starting point but does not substitute for evidence on the method. This is the kind of work that might interest applied groups in industrial vision or safety engineering who need rule-grounded data, but it reads more like a dataset release than a completed method paper. I would not send it to referees in its current form; the authors would need to add concrete evaluations before it deserves serious review time.

Referee Report

2 major / 1 minor

Summary. The paper introduces the CompliVision dataset of 3,006 images from traffic, construction, and warehouse domains, each annotated for compliance with language-based safety rules derived from ISO standards and regulations, along with natural language explanations. It proposes a baseline framework that decouples hazard detection from image examples by using LLaVA-based vision-language models, active learning, and human-in-the-loop feedback to assess rule compliance, claiming this addresses noisy data, evolving definitions, and limited generalization to novel scenarios where standard VLMs struggle with fine-grained, context-dependent rules.

Significance. If the active learning + HITL framework were shown to deliver reliable extrapolation on unseen rules and contexts, the work would be significant for shifting hazard detection toward authoritative, language-grounded standards rather than example-driven categories, with potential impact on safety-critical CV applications.

major comments (2)

[Abstract] Abstract: the claim that the proposed LLaVA-based framework with active learning and human-in-the-loop feedback achieves 'robust generalisation' to novel hazard scenarios is unsupported; no accuracy metrics, baselines, ablations, or held-out evaluations on evolving or unseen rules are reported to substantiate improvement over standard VLMs.
[Abstract] Abstract: the dataset is presented as enabling rule-based compliance assessment, yet no details on annotation protocol, rule-to-image mapping procedure, or validation of the natural language explanations are supplied, leaving the core data foundation for the generalization claim unevaluated.

minor comments (1)

[Abstract] Abstract: duplicate 'with with' in the final sentence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the proposed LLaVA-based framework with active learning and human-in-the-loop feedback achieves 'robust generalisation' to novel hazard scenarios is unsupported; no accuracy metrics, baselines, ablations, or held-out evaluations on evolving or unseen rules are reported to substantiate improvement over standard VLMs.

Authors: We agree that the abstract overstates the generalization capability. The current manuscript presents the active learning and HITL framework as a proposed baseline without reporting accuracy metrics, baselines, ablations, or held-out tests on unseen rules. We will revise the abstract to qualify or remove the 'robust generalisation' claim and add quantitative evaluations in the revised manuscript. revision: yes
Referee: [Abstract] Abstract: the dataset is presented as enabling rule-based compliance assessment, yet no details on annotation protocol, rule-to-image mapping procedure, or validation of the natural language explanations are supplied, leaving the core data foundation for the generalization claim unevaluated.

Authors: We acknowledge that the abstract does not include these details and that the manuscript would benefit from expanded description of the data creation process. We will add explicit sections covering the annotation protocol, rule-to-image mapping procedure, and validation steps for the natural language explanations in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity; method grounded in external ISO standards and regulations

full rationale

The paper's derivation chain relies on expressing safety requirements via language-based rules drawn from authoritative external domain regulations and ISO standards, rather than any self-referential definitions, fitted parameters presented as predictions, or load-bearing self-citations. The CompliVision dataset and LLaVA-based active learning framework with human-in-the-loop feedback are introduced as responses to stated limitations of existing VLM approaches, with no equations or steps that reduce by construction to the paper's own inputs. This is a self-contained proposal against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on the domain assumption that safety rules from standards can be directly applied to visual scenes by VLMs; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption Safety requirements can be accurately expressed through language-based rules from ISO standards and regulations and applied to images.
Invoked to enable decoupling of hazard concept from image examples.

pith-pipeline@v0.9.0 · 5804 in / 1070 out tokens · 22036 ms · 2026-05-25T05:09:37.965860+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 3 internal anchors

[1]

Framework of automated construction- safety monitoring using cloud-enabled bim and ble mobile tracking sensors,

J. Park, K. Kim, and Y . K. Cho, “Framework of automated construction- safety monitoring using cloud-enabled bim and ble mobile tracking sensors,”Journal of Construction Engineering and Management, vol. 143, no. 2, p. 05016019, 2017

work page 2017
[2]

Inferring workplace safety hazards from the spatial patterns of workers’ wearable data,

K. Yang and C. R. Ahn, “Inferring workplace safety hazards from the spatial patterns of workers’ wearable data,”Advanced Engineering Informatics, vol. 41, p. 100924, 2019

work page 2019
[3]

Real-time vision-based worker localization & hazard detection for construction,

I. Jeelani, K. Asadi, H. Ramshankar, K. Han, and A. Albert, “Real-time vision-based worker localization & hazard detection for construction,” Automation in Construction, vol. 121, p. 103448, 2021

work page 2021
[4]

S. W. Australia, Oct 2025. [Online]. Available: https: //data.safeworkaustralia.gov.au/insights/key-whs-statistics-australia/ latest-release

work page 2025
[5]

A systematic review of computer vision-based personal protective equipment compliance in industry practice: advancements, challenges and future directions,

A. M. Vukicevic, M. Petrovic, P. Milosevic, A. Peulic, K. Jovanovic, and A. Novakovic, “A systematic review of computer vision-based personal protective equipment compliance in industry practice: advancements, challenges and future directions,”Artificial Intelligence Review, vol. 57, no. 12, p. 319, 2024

work page 2024
[6]

Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces,

Z. Chen, H. Chen, M. Imani, R. Chen, and F. Imani, “Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces,”Expert Systems with Applications, vol. 265, p. 125769, 2025

work page 2025
[7]

Detection of personal pro- tective equipment (ppe) compliance on construction site using computer vision based deep learning techniques,

V . S. K. Delhi, R. Sankarlal, and A. Thomas, “Detection of personal pro- tective equipment (ppe) compliance on construction site using computer vision based deep learning techniques,”Frontiers in Built Environment, vol. 6, p. 136, 2020

work page 2020
[8]

Computer vision-based hazard identification of construction site using visual relationship detection and ontology,

Y . Li, H. Wei, Z. Han, N. Jiang, W. Wang, and J. Huang, “Computer vision-based hazard identification of construction site using visual relationship detection and ontology,”Buildings, vol. 12, no. 6, p. 857, 2022

work page 2022
[9]

Safe-construct: Redefining construction safety violation recognition as 3d multi-view engagement task,

A. Chharia, T. Ren, T. Furuhata, and K. Shimada, “Safe-construct: Redefining construction safety violation recognition as 3d multi-view engagement task,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5811–5820

work page 2025
[10]

Vision- language hazard reasoning for driver distraction and workload estima- tion,

S. Ng, H. Zhou, A. Arogbonlo, C. P. Lim, and S. Nahavandi, “Vision- language hazard reasoning for driver distraction and workload estima- tion,”Electronics Letters, vol. 61, no. 1, p. e70466, 2025

work page 2025
[11]

Zero-shot monitoring of construction workers’ personal protective equipment based on image captioning,

D. Gil and G. Lee, “Zero-shot monitoring of construction workers’ personal protective equipment based on image captioning,”Automation in Construction, vol. 164, p. 105470, 2024

work page 2024
[12]

Safety compliance checking of con- struction behaviors using visual question answering,

Y . Ding, M. Liu, and X. Luo, “Safety compliance checking of con- struction behaviors using visual question answering,”Automation in Construction, vol. 144, p. 104580, 2022

work page 2022
[13]

Detection of wearing safety helmet for workers based on yolov4,

L. Yunyun and W. JIANG, “Detection of wearing safety helmet for workers based on yolov4,” in2021 International Conference on Com- puter Engineering and Artificial Intelligence (ICCEAI). IEEE, 2021, pp. 83–87

work page 2021
[14]

Detection of worker’s safety helmet and mask and identification of worker using deeplearning

N. Kwak and D. Kim, “Detection of worker’s safety helmet and mask and identification of worker using deeplearning.”Computers, Materials & Continua, vol. 75, no. 1, pp. 1671–1686, 2023

work page 2023
[15]

Real-time road hazard information system,

C. Pena-Caballero, D. Kim, A. Gonzalez, O. Castellanos, A. Cantu, and J. Ho, “Real-time road hazard information system,”Infrastructures, vol. 5, no. 9, p. 75, 2020

work page 2020
[16]

Fire detection method in smart city environments using a deep-learning- based approach,

K. Avazov, M. Mukhiddinov, F. Makhmudov, and Y . I. Cho, “Fire detection method in smart city environments using a deep-learning- based approach,”Electronics, vol. 11, no. 1, p. 73, 2021. [Online]. Available: https://www.mdpi.com/2079-9292/11/1/7

work page 2021
[17]

Meta module network for compositional visual reasoning,

W. Chen, Z. Gan, L. Li, Y . Cheng, W. Wang, and J. Liu, “Meta module network for compositional visual reasoning,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 655–664

work page 2021
[18]

Neural- symbolic vqa: Disentangling reasoning from vision and language under- standing,

K. Yi, J. Wu, C. Gan, A. Torralba, P. Kohli, and J. Tenenbaum, “Neural- symbolic vqa: Disentangling reasoning from vision and language under- standing,”Advances in neural information processing systems, vol. 31, pp. 1039–1050, 2018

work page 2018
[19]

Gqa: A new dataset for real-world visual reasoning and compositional question answering,

D. A. Hudson and C. D. Manning, “Gqa: A new dataset for real-world visual reasoning and compositional question answering,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 6700–6709

work page 2019
[20]

Inferring and executing programs for visual reasoning,

J. Johnson, B. Hariharan, L. Van Der Maaten, J. Hoffman, L. Fei- Fei, C. Lawrence Zitnick, and R. Girshick, “Inferring and executing programs for visual reasoning,” in2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3008–3017

work page 2017
[21]

Review of graph-based hazardous event detection methods for autonomous driving systems,

D. Xiao, M. Dianati, W. G. Geiger, and R. Woodman, “Review of graph-based hazardous event detection methods for autonomous driving systems,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 5, pp. 4697–4715, 2023

work page 2023
[22]

Knowledge graph for identifying hazards on construction sites: Integrating computer vision with ontology,

W. Fang, L. Ma, P. E. Love, H. Luo, L. Ding, and A. Zhou, “Knowledge graph for identifying hazards on construction sites: Integrating computer vision with ontology,”Automation in Construction, vol. 119, p. 103310, 2020

work page 2020
[23]

Construction safety knowledge graph integrating text and image information,

W. Wu, Q. Yuan, Q. Chen, and Y . Cao, “Construction safety knowledge graph integrating text and image information,” inProceedings of the 2023 6th International Conference on Information Management and Management Science, 2023, pp. 26–32

work page 2023
[24]

Hazard analysis: A deep learning and text mining framework for accident prevention,

B. Zhong, X. Pan, P. E. Love, J. Sun, and C. Tao, “Hazard analysis: A deep learning and text mining framework for accident prevention,” Advanced Engineering Informatics, vol. 46, p. 101152, 2020

work page 2020
[25]

Deep learning safety concerns in automated driving perception,

S. Abrecht, A. Hirsch, S. Raafatnia, and M. Woehrle, “Deep learning safety concerns in automated driving perception,”IEEE Transactions on Intelligent Vehicles, 2024

work page 2024
[26]

Construction- ppe: Personal protective equipment detection dataset,

M. Dalvi, N. Singh, S. Bhingarde, and K. Chalke, “Construction- ppe: Personal protective equipment detection dataset,” January

work page
[27]

Available: https://docs.ultralytics.com/datasets/detect/ construction-ppe/ 12 TABLE IV ZERO-SHOT, FINE-TUNING,ANDACTIVELEARNINGPERFORMANCE ACROSS3 DOMAINS

[Online]. Available: https://docs.ultralytics.com/datasets/detect/ construction-ppe/ 12 TABLE IV ZERO-SHOT, FINE-TUNING,ANDACTIVELEARNINGPERFORMANCE ACROSS3 DOMAINS. Domain Method AL Num. Manual Accum. Manual No. Pseudo Training Annotation Avg Model Performance Rounds Labels Labels Labels Samples Saved (%) Macro F1 Accuracy Traffic Zero-shot - 0 0 0 0 0 0...

work page arXiv 2000
[28]

Soda: A large-scale open site object detection dataset for deep learning in construction,

R. Duan, H. Deng, M. Tian, Y . Deng, and J. Lin, “Soda: A large-scale open site object detection dataset for deep learning in construction,” Automation in Construction, vol. 142, p. 104499, 2022

work page 2022
[29]

Dataset and benchmark for detecting moving objects in construction sites,

A. Xuehui, Z. Li, L. Zuguang, W. Chengzhi, L. Pengfei, and L. Zhiwei, “Dataset and benchmark for detecting moving objects in construction sites,”Automation in Construction, vol. 122, p. 103482, 2021

work page 2021
[30]

Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?

X. Chen and Z. Zou, “Are large pre-trained vision language models effective construction safety inspectors?”arXiv preprint arXiv:2508.11011, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[31]

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

K. Yi, C. Gan, Y . Li, P. Kohli, J. Wu, A. Torralba, and J. B. Tenenbaum, “Clevrer: Collision events for video representation and reasoning,”arXiv preprint arXiv:1910.01442, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[32]

Deepaccident: A motion and accident prediction benchmark for v2x autonomous driving,

T. Wang, S. Kim, J. Wenxuan, E. Xie, C. Ge, J. Chen, Z. Li, and P. Luo, “Deepaccident: A motion and accident prediction benchmark for v2x autonomous driving,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, 2024, pp. 5599–5606

work page 2024
[33]

Wts: A pedestrian-centric traffic video dataset for fine-grained spatial-temporal understanding,

Q. Kong, Y . Kawana, R. Saini, A. Kumar, J. Pan, T. Gu, Y . Ozao, B. Opra, Y . Sato, and N. Kobori, “Wts: A pedestrian-centric traffic video dataset for fine-grained spatial-temporal understanding,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 1–18

work page 2024
[34]

Sh17: A dataset for human safety and personal protective equipment detection in manufacturing industry,

H. M. Ahmad and A. Rahimi, “Sh17: A dataset for human safety and personal protective equipment detection in manufacturing industry,” 14 arXiv preprint arXiv:2407.04590, 2024

work page arXiv 2024
[35]

Early fire and smoke detection using deep learning: A comprehensive review of models, datasets, and challenges,

A. Elhanashi, S. Essahraui, P. Dini, and S. Saponara, “Early fire and smoke detection using deep learning: A comprehensive review of models, datasets, and challenges,”Applied Sciences, vol. 15, no. 18, p. 10255, 2025

work page 2025
[36]

Visual instruction tuning,

H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual instruction tuning,” Advances in neural information processing systems, vol. 36, pp. 34 892– 34 916, 2023

work page 2023
[37]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

work page 2022
[38]

Llava- cot: Let vision language models reason step-by-step,

G. Xu, P. Jin, Z. Wu, H. Li, Y . Song, L. Sun, and L. Yuan, “Llava- cot: Let vision language models reason step-by-step,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 2087–2098

work page 2025
[39]

Interactive semantic interventions for vlms: A human- in-the-loop investigation of vlm failure,

L. Klein, K. Amara, C. T. L ¨uth, H. Strobelt, M. El-Assady, and P. F. Jaeger, “Interactive semantic interventions for vlms: A human- in-the-loop investigation of vlm failure,” inNeurips Safe Generative AI Workshop 2024, 2024

work page 2024
[40]

Active prompting of vision language models for human-in-the-loop classifi- cation and explanation of microscopy images,

A. Kandiyana, P. R. Mouton, L. O. Hall, and D. Goldgof, “Active prompting of vision language models for human-in-the-loop classifi- cation and explanation of microscopy images,” in2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS). IEEE, 2024, pp. 205–212

work page 2024
[41]

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

P. Sahoo, A. K. Singh, S. Saha, V . Jain, S. Mondal, and A. Chadha, “A systematic survey of prompt engineering in large language models,” arXiv preprint arXiv:2402.07927, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[42]

Large vision-language models: A survey,

H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Large vision-language models: A survey,”arXiv preprint arXiv:2402.14082, 2024

work page arXiv 2024
[43]

A survey of deep active learning,

P. Ren, Y . Xiao, X. Chang, P.-Y . Huang, Z. Li, B. B. Gupta, and X. Wang, “A survey of deep active learning,”ACM Computing Surveys, vol. 54, no. 9, pp. 1–40, 2021

work page 2021
[44]

Lora: Low-rank adaptation of large language models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

work page 2022
[45]

Llavanext: Improved reasoning, ocr, and world knowledge,

H. Liu, C. Li, Y . Li, B. Li, Y . Zhang, S. Shen, and Y . J. Lee, “Llavanext: Improved reasoning, ocr, and world knowledge,” 2024

work page 2024
[46]

Llama 3.2: Revolutionizing edge ai and vision with open, customizable models,

Meta AI, “Llama 3.2: Revolutionizing edge ai and vision with open, customizable models,” Sep. 2024, ac- cessed: 2025-11-29. [Online]. Available: https://ai.meta.com/blog/ llama-3-2-connect-2024-vision-edge-mobile-devices/ Stephanie Ngreceived the M.DataSc. degree from the University of Melbourne in 2021 and completed the Ph.D. degree requirements in Engin...

work page 2024
[47]

[National Highway Traffic Safety Administration] •Driver to have proper control of a vehicle etc.: A person must not drive a vehicle if a person or an animal is in the driver’s lap

Driving Distraction: •Distracted driving: Distracted driving is any activity that diverts attention from driving, including talking or texting on your phone, eating and drinking, talking to people in your vehicle, fiddling with the stereo, entertainment or navigation system — anything that takes your attention away from the task of safe driving. [National...

work page 2017
[48]

Traffic Rules: •Giving way at a pedestrian crossing: A driver must give way to any pedestrian on or entering a pedestrian crossing. [ROAD SAFETY ROAD RULES 2017 - REG 81 (2)] •Overtaking or passing a vehicle at a children’s crossing or pedestrian crossing: A driver approaching a children’s crossing, or pedestrian crossing, must not overtake or pass a vehi...

work page 2017
[49]

Pedestrian Crossing: •Crossing a road—general: A pedestrian crossing a road— (a) must cross by the shortest safe route; and (b) must not stay on the road longer than necessary to cross the road safely. [ROAD SAFETY ROAD RULES 2017 - REG 230 (1)] •Crossing a road at pedestrian lights: If the pedestrian lights show a red pedestrian light and the pedestrian ...

work page 2017
[50]

Road Condition: •Obligations of road users: A person who drives a motor vehicle on a highway must drive in a safe manner having regard to all the relevant factors. [ROAD SAFETY ACT 1986 - SECT 17A (1)] •The relevant factors include the following— (a) the physical characteristics of the road; (b) the prevailing weather conditions; (c) the level of visibili...

work page 1986
[51]

Vehicle Load: •Carrying goods in addition to a large indivisible item: A load-carrying vehicle must not carry more than 1 large indivisible item. [HEA VY VEHICLE (MASS, DIMEN- SION AND LOADING) NATIONAL REGULATION - SCHEDULE 8 Division 2 - Load-carrying vehicles 13 (1)] •Load restraint requirement: The following requirements apply to a vehicle that is car...

work page 2018
[52]

[1926.1417(d)] •Erect and maintain control lines, warning lines, railings or similar barriers to mark the boundaries of the hazard areas

Crane Use: •The operator must not engage in any practice or activity that diverts his/her attention while actually engaged in operating the equipment, such as the use of cellular phones (other than when used for signal communica- tions). [1926.1417(d)] •Erect and maintain control lines, warning lines, railings or similar barriers to mark the boundaries of...

work page arXiv 1926
[53]

No Smoking or Open Flame

Fire Risk: •Smoking shall be prohibited at or in the vicinity of operations which constitute a fire hazard, and shall be conspicuously posted: “No Smoking or Open Flame.” [1926.151(a)(3)] •If the object to be welded, cut, or heated cannot be moved and if all the fire hazards cannot be removed, positive means shall be taken to confine the heat, sparks, and...

work page 1926
[54]

[1926.1053(b)(6)] •The area around the top and bottom of ladders shall be kept clear

Ladder Use: •Ladders shall be used only on stable and level sur- faces unless secured to prevent accidental displacement. [1926.1053(b)(6)] •The area around the top and bottom of ladders shall be kept clear. [1926.1053(b)(9)] •When ascending or descending a ladder, the user shall face the ladder. [1926.1053(b)(20)] •Each employee shall use at least one ha...

work page arXiv 1926
[55]

Protective Equipment: •Employees working in areas where there is a possible danger of head injury from impact, or from falling or flying objects, or from electrical shock and burns, shall be protected by protective helmets. [1926.100(a)] •Each affected employee uses appropriate eye or face protection when exposed to eye or face hazards from flying particl...

work page 1926
[56]

Scaffold Risk: •Each platform on all working levels of scaffolds shall be fully planked or decked between the front uprights and the guardrail supports [1926.451(b)(1)] •Guardrail systems shall be installed along all open sides and ends of platforms. [1926.451(g)(4)] •The top edge height of toprails or equivalent member on supported scaffolds shall be ins...

work page 1926
[57]

Never lift a heavy item above shoulder level

Ergonomic Lifting: •Safe lifting involves: Holding the load close to your body at waist height. Never lift a heavy item above shoulder level. Never carry a load that obstructs your vision. [General Duty Clause, Section 5(a)(1)] •The following points should be considered: The start and finish height of the load should be a suitable level above the floor, t...

work page 2005
[58]

[29 CFR 1910.178(m)(2)] •All traffic regulations shall be observed, including au- thorized plant speed limits

Forklift Use: •No person shall be allowed to stand or pass under the elevated portion of any truck, whether loaded or empty. [29 CFR 1910.178(m)(2)] •All traffic regulations shall be observed, including au- thorized plant speed limits. A safe distance shall be maintained approximately three truck lengths from the truck ahead, and the truck shall be kept u...

work page 1910
[59]

[29 CFR 1910.23(b)(13)]

Ladder Use: •Ladders are used only on stable and level surfaces; [29 CFR 1910.23(c)(4)] •Each employee faces the ladder when climbing up or down it; [29 CFR 1910.23(b)(11)] 3 •Each employee uses at least one hand to grasp the ladder when climbing up and down it; and [29 CFR 1910.23(b)(12)] •No employee carries any object or load that could cause the emplo...

work page 1910
[60]

Protective Equipment: •Each affected employee uses appropriate eye or face protection when exposed to eye or face hazards from flying particles, molten metal, liquid chemicals, acids or caustic liquids, chemical gases or vapors, or potentially injurious light radiation [29 CFR 1910.133(a)(1)] •Each affected employee wears a protective helmet when working ...

work page 1910
[61]

[29 CFR 1910.22(a)(1)] •The floor of each workroom is maintained in a clean and, to the extent feasible, in a dry condition

Surface Condition: •All places of employment, passageways, storerooms, service rooms, and walking-working surfaces are kept in a clean, orderly, and sanitary condition. [29 CFR 1910.22(a)(1)] •The floor of each workroom is maintained in a clean and, to the extent feasible, in a dry condition. When wet processes are used, drainage must be maintained and, t...

work page 1910
[62]

Driving Distraction: •No assumptions made

work page
[63]

•Not evaluated (Not Applicable) if lane markings are not visible or the road is gravel

Traffic Rules: •Evaluated if the vehicle is traveling in its lane, moving in the same direction as traffic, or parked neatly in the correct orientation. •Not evaluated (Not Applicable) if lane markings are not visible or the road is gravel. •Vehicles traveling/parked on the emergency lane or on gravel next to the road are considered hazards

work page
[64]

Pedestrian Crossing: •Evaluated only if both pedestrian legs and the road are visible; otherwise, Not Applicable

work page
[65]

•Gravel roads or roads without visible lane markings are considered hazards

Road Condition: •Evaluated as long as part of the road is visible, even if blurred. •Gravel roads or roads without visible lane markings are considered hazards. •Vehicles not on a road (e.g., on grass) are Not Applicable

work page
[66]

•Vans and buses are evaluated only if obvious cargo is present on top or strapped to the vehicle

Vehicle Load: •All trucks are always evaluated. •Vans and buses are evaluated only if obvious cargo is present on top or strapped to the vehicle. •Vehicles with cargo are always evaluated; vehicles with- out cargo are Not Applicable. B. Construction Domain

work page
[67]

Crane Use: •Assumed compliant if a crane (or part of it) is visible, unless there is a clear violation

work page
[68]

Fire Risk: •Assumed violated if protective equipment rules are not met, even if fire is handled safely

work page
[69]

Ladder Use: •No assumptions made

work page
[70]

Wearing only a high-visibility vest is a violation

Protective Equipment: •Considered compliant if the worker/operator wears at least a helmet. Wearing only a high-visibility vest is a violation. •Exceptions: –Firefighters, who may have different uniforms and may not require a helmet. –If a smoke hazard is present (excluding cigarette smoke) and the worker lacks a breathing mask, it is considered a violati...

work page
[71]

•Wooden frames are not considered scaffolding

Scaffold Risk: •Label based on the presence of scaffolding, not necessity. •Wooden frames are not considered scaffolding. •If scaffolding is required but not visible, label as Not Applicable. C. Warehouse Domain

work page
[72]

•Picking up items do not close to waist level is a violation

Ergonomic Lifting: •All lifted items are assumed heavy; items carried above shoulder level are violations, including when passed between two people. •Picking up items do not close to waist level is a violation. •Signs of back pain (holding back, grimacing) indicate a violation, even if ergonomics are correct. 4

work page
[73]

•Operator distraction (e.g., phone use, talking) is a viola- tion

Forklift Use: •All accidents involving a forklift are considered hazards. •Operator distraction (e.g., phone use, talking) is a viola- tion. •Evaluated if forklift and operators are present; during accidents, even vacant forklifts are considered a violation

work page
[74]

•Reaching or carrying items above shoulder level on a ladder is a violation

Ladder Use: •Not always a violation if both hands are not on the ladder; assume user is stationary if carrying items. •Reaching or carrying items above shoulder level on a ladder is a violation. •Users must face ladder steps when climbing; facing any direction on a platform is allowed. •Using non-ladders as ladders is a violation. •Step ladders are consid...

work page
[75]

Protective Equipment: •Workers must wear at least a safety helmet; absence is a violation even if wearing a high-visibility vest

work page
[76]

•White backgrounds/floors are Not Applicable

Surface Condition: •Single boxes on the floor are violations. •White backgrounds/floors are Not Applicable. •Standing on improper surfaces (boxes, ladders, or other items) is a violation. APPENDIXC PROMPTTEMPLATES A. Task-focused Variants

work page
[77]

Complied

T1 (Inline Classification Instruction):Classify the image into exactly one of "Complied", "Violated", or "Not Applicable" for compliance with the rule set

work page
[78]

Complied

T2 (Constrained Output Instruction):Classify the image according to the rule set. Respond with exactly one of: "Complied", "Violated", or "Not Applicable"

work page
[79]

Complied

T3 (T2 – Alt Wording Instruction):Classify the image against the rule set. Respond with exactly one of: "Complied", "Violated", or "Not Applicable"

work page
[80]

Complied

T4 (T3 – Analysis-focused Instruction):Analyze the image against the rule set. Respond with exactly one of: "Complied", "Violated", or "Not Applicable". B. Classification-focused Variants

work page

Showing first 80 references.

[1] [1]

Framework of automated construction- safety monitoring using cloud-enabled bim and ble mobile tracking sensors,

J. Park, K. Kim, and Y . K. Cho, “Framework of automated construction- safety monitoring using cloud-enabled bim and ble mobile tracking sensors,”Journal of Construction Engineering and Management, vol. 143, no. 2, p. 05016019, 2017

work page 2017

[2] [2]

Inferring workplace safety hazards from the spatial patterns of workers’ wearable data,

K. Yang and C. R. Ahn, “Inferring workplace safety hazards from the spatial patterns of workers’ wearable data,”Advanced Engineering Informatics, vol. 41, p. 100924, 2019

work page 2019

[3] [3]

Real-time vision-based worker localization & hazard detection for construction,

I. Jeelani, K. Asadi, H. Ramshankar, K. Han, and A. Albert, “Real-time vision-based worker localization & hazard detection for construction,” Automation in Construction, vol. 121, p. 103448, 2021

work page 2021

[4] [4]

S. W. Australia, Oct 2025. [Online]. Available: https: //data.safeworkaustralia.gov.au/insights/key-whs-statistics-australia/ latest-release

work page 2025

[5] [5]

A systematic review of computer vision-based personal protective equipment compliance in industry practice: advancements, challenges and future directions,

A. M. Vukicevic, M. Petrovic, P. Milosevic, A. Peulic, K. Jovanovic, and A. Novakovic, “A systematic review of computer vision-based personal protective equipment compliance in industry practice: advancements, challenges and future directions,”Artificial Intelligence Review, vol. 57, no. 12, p. 319, 2024

work page 2024

[6] [6]

Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces,

Z. Chen, H. Chen, M. Imani, R. Chen, and F. Imani, “Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces,”Expert Systems with Applications, vol. 265, p. 125769, 2025

work page 2025

[7] [7]

Detection of personal pro- tective equipment (ppe) compliance on construction site using computer vision based deep learning techniques,

V . S. K. Delhi, R. Sankarlal, and A. Thomas, “Detection of personal pro- tective equipment (ppe) compliance on construction site using computer vision based deep learning techniques,”Frontiers in Built Environment, vol. 6, p. 136, 2020

work page 2020

[8] [8]

Computer vision-based hazard identification of construction site using visual relationship detection and ontology,

Y . Li, H. Wei, Z. Han, N. Jiang, W. Wang, and J. Huang, “Computer vision-based hazard identification of construction site using visual relationship detection and ontology,”Buildings, vol. 12, no. 6, p. 857, 2022

work page 2022

[9] [9]

Safe-construct: Redefining construction safety violation recognition as 3d multi-view engagement task,

A. Chharia, T. Ren, T. Furuhata, and K. Shimada, “Safe-construct: Redefining construction safety violation recognition as 3d multi-view engagement task,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5811–5820

work page 2025

[10] [10]

Vision- language hazard reasoning for driver distraction and workload estima- tion,

S. Ng, H. Zhou, A. Arogbonlo, C. P. Lim, and S. Nahavandi, “Vision- language hazard reasoning for driver distraction and workload estima- tion,”Electronics Letters, vol. 61, no. 1, p. e70466, 2025

work page 2025

[11] [11]

Zero-shot monitoring of construction workers’ personal protective equipment based on image captioning,

D. Gil and G. Lee, “Zero-shot monitoring of construction workers’ personal protective equipment based on image captioning,”Automation in Construction, vol. 164, p. 105470, 2024

work page 2024

[12] [12]

Safety compliance checking of con- struction behaviors using visual question answering,

Y . Ding, M. Liu, and X. Luo, “Safety compliance checking of con- struction behaviors using visual question answering,”Automation in Construction, vol. 144, p. 104580, 2022

work page 2022

[13] [13]

Detection of wearing safety helmet for workers based on yolov4,

L. Yunyun and W. JIANG, “Detection of wearing safety helmet for workers based on yolov4,” in2021 International Conference on Com- puter Engineering and Artificial Intelligence (ICCEAI). IEEE, 2021, pp. 83–87

work page 2021

[14] [14]

Detection of worker’s safety helmet and mask and identification of worker using deeplearning

N. Kwak and D. Kim, “Detection of worker’s safety helmet and mask and identification of worker using deeplearning.”Computers, Materials & Continua, vol. 75, no. 1, pp. 1671–1686, 2023

work page 2023

[15] [15]

Real-time road hazard information system,

C. Pena-Caballero, D. Kim, A. Gonzalez, O. Castellanos, A. Cantu, and J. Ho, “Real-time road hazard information system,”Infrastructures, vol. 5, no. 9, p. 75, 2020

work page 2020

[16] [16]

Fire detection method in smart city environments using a deep-learning- based approach,

K. Avazov, M. Mukhiddinov, F. Makhmudov, and Y . I. Cho, “Fire detection method in smart city environments using a deep-learning- based approach,”Electronics, vol. 11, no. 1, p. 73, 2021. [Online]. Available: https://www.mdpi.com/2079-9292/11/1/7

work page 2021

[17] [17]

Meta module network for compositional visual reasoning,

W. Chen, Z. Gan, L. Li, Y . Cheng, W. Wang, and J. Liu, “Meta module network for compositional visual reasoning,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 655–664

work page 2021

[18] [18]

Neural- symbolic vqa: Disentangling reasoning from vision and language under- standing,

K. Yi, J. Wu, C. Gan, A. Torralba, P. Kohli, and J. Tenenbaum, “Neural- symbolic vqa: Disentangling reasoning from vision and language under- standing,”Advances in neural information processing systems, vol. 31, pp. 1039–1050, 2018

work page 2018

[19] [19]

Gqa: A new dataset for real-world visual reasoning and compositional question answering,

D. A. Hudson and C. D. Manning, “Gqa: A new dataset for real-world visual reasoning and compositional question answering,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 6700–6709

work page 2019

[20] [20]

Inferring and executing programs for visual reasoning,

J. Johnson, B. Hariharan, L. Van Der Maaten, J. Hoffman, L. Fei- Fei, C. Lawrence Zitnick, and R. Girshick, “Inferring and executing programs for visual reasoning,” in2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3008–3017

work page 2017

[21] [21]

Review of graph-based hazardous event detection methods for autonomous driving systems,

D. Xiao, M. Dianati, W. G. Geiger, and R. Woodman, “Review of graph-based hazardous event detection methods for autonomous driving systems,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 5, pp. 4697–4715, 2023

work page 2023

[22] [22]

Knowledge graph for identifying hazards on construction sites: Integrating computer vision with ontology,

W. Fang, L. Ma, P. E. Love, H. Luo, L. Ding, and A. Zhou, “Knowledge graph for identifying hazards on construction sites: Integrating computer vision with ontology,”Automation in Construction, vol. 119, p. 103310, 2020

work page 2020

[23] [23]

Construction safety knowledge graph integrating text and image information,

W. Wu, Q. Yuan, Q. Chen, and Y . Cao, “Construction safety knowledge graph integrating text and image information,” inProceedings of the 2023 6th International Conference on Information Management and Management Science, 2023, pp. 26–32

work page 2023

[24] [24]

Hazard analysis: A deep learning and text mining framework for accident prevention,

B. Zhong, X. Pan, P. E. Love, J. Sun, and C. Tao, “Hazard analysis: A deep learning and text mining framework for accident prevention,” Advanced Engineering Informatics, vol. 46, p. 101152, 2020

work page 2020

[25] [25]

Deep learning safety concerns in automated driving perception,

S. Abrecht, A. Hirsch, S. Raafatnia, and M. Woehrle, “Deep learning safety concerns in automated driving perception,”IEEE Transactions on Intelligent Vehicles, 2024

work page 2024

[26] [26]

Construction- ppe: Personal protective equipment detection dataset,

M. Dalvi, N. Singh, S. Bhingarde, and K. Chalke, “Construction- ppe: Personal protective equipment detection dataset,” January

work page

[27] [27]

Available: https://docs.ultralytics.com/datasets/detect/ construction-ppe/ 12 TABLE IV ZERO-SHOT, FINE-TUNING,ANDACTIVELEARNINGPERFORMANCE ACROSS3 DOMAINS

[Online]. Available: https://docs.ultralytics.com/datasets/detect/ construction-ppe/ 12 TABLE IV ZERO-SHOT, FINE-TUNING,ANDACTIVELEARNINGPERFORMANCE ACROSS3 DOMAINS. Domain Method AL Num. Manual Accum. Manual No. Pseudo Training Annotation Avg Model Performance Rounds Labels Labels Labels Samples Saved (%) Macro F1 Accuracy Traffic Zero-shot - 0 0 0 0 0 0...

work page arXiv 2000

[28] [28]

Soda: A large-scale open site object detection dataset for deep learning in construction,

R. Duan, H. Deng, M. Tian, Y . Deng, and J. Lin, “Soda: A large-scale open site object detection dataset for deep learning in construction,” Automation in Construction, vol. 142, p. 104499, 2022

work page 2022

[29] [29]

Dataset and benchmark for detecting moving objects in construction sites,

A. Xuehui, Z. Li, L. Zuguang, W. Chengzhi, L. Pengfei, and L. Zhiwei, “Dataset and benchmark for detecting moving objects in construction sites,”Automation in Construction, vol. 122, p. 103482, 2021

work page 2021

[30] [30]

Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?

X. Chen and Z. Zou, “Are large pre-trained vision language models effective construction safety inspectors?”arXiv preprint arXiv:2508.11011, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[31] [31]

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

K. Yi, C. Gan, Y . Li, P. Kohli, J. Wu, A. Torralba, and J. B. Tenenbaum, “Clevrer: Collision events for video representation and reasoning,”arXiv preprint arXiv:1910.01442, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910

[32] [32]

Deepaccident: A motion and accident prediction benchmark for v2x autonomous driving,

T. Wang, S. Kim, J. Wenxuan, E. Xie, C. Ge, J. Chen, Z. Li, and P. Luo, “Deepaccident: A motion and accident prediction benchmark for v2x autonomous driving,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, 2024, pp. 5599–5606

work page 2024

[33] [33]

Wts: A pedestrian-centric traffic video dataset for fine-grained spatial-temporal understanding,

Q. Kong, Y . Kawana, R. Saini, A. Kumar, J. Pan, T. Gu, Y . Ozao, B. Opra, Y . Sato, and N. Kobori, “Wts: A pedestrian-centric traffic video dataset for fine-grained spatial-temporal understanding,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 1–18

work page 2024

[34] [34]

Sh17: A dataset for human safety and personal protective equipment detection in manufacturing industry,

H. M. Ahmad and A. Rahimi, “Sh17: A dataset for human safety and personal protective equipment detection in manufacturing industry,” 14 arXiv preprint arXiv:2407.04590, 2024

work page arXiv 2024

[35] [35]

Early fire and smoke detection using deep learning: A comprehensive review of models, datasets, and challenges,

A. Elhanashi, S. Essahraui, P. Dini, and S. Saponara, “Early fire and smoke detection using deep learning: A comprehensive review of models, datasets, and challenges,”Applied Sciences, vol. 15, no. 18, p. 10255, 2025

work page 2025

[36] [36]

Visual instruction tuning,

H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual instruction tuning,” Advances in neural information processing systems, vol. 36, pp. 34 892– 34 916, 2023

work page 2023

[37] [37]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

work page 2022

[38] [38]

Llava- cot: Let vision language models reason step-by-step,

G. Xu, P. Jin, Z. Wu, H. Li, Y . Song, L. Sun, and L. Yuan, “Llava- cot: Let vision language models reason step-by-step,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 2087–2098

work page 2025

[39] [39]

Interactive semantic interventions for vlms: A human- in-the-loop investigation of vlm failure,

L. Klein, K. Amara, C. T. L ¨uth, H. Strobelt, M. El-Assady, and P. F. Jaeger, “Interactive semantic interventions for vlms: A human- in-the-loop investigation of vlm failure,” inNeurips Safe Generative AI Workshop 2024, 2024

work page 2024

[40] [40]

Active prompting of vision language models for human-in-the-loop classifi- cation and explanation of microscopy images,

A. Kandiyana, P. R. Mouton, L. O. Hall, and D. Goldgof, “Active prompting of vision language models for human-in-the-loop classifi- cation and explanation of microscopy images,” in2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS). IEEE, 2024, pp. 205–212

work page 2024

[41] [41]

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

P. Sahoo, A. K. Singh, S. Saha, V . Jain, S. Mondal, and A. Chadha, “A systematic survey of prompt engineering in large language models,” arXiv preprint arXiv:2402.07927, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[42] [42]

Large vision-language models: A survey,

H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Large vision-language models: A survey,”arXiv preprint arXiv:2402.14082, 2024

work page arXiv 2024

[43] [43]

A survey of deep active learning,

P. Ren, Y . Xiao, X. Chang, P.-Y . Huang, Z. Li, B. B. Gupta, and X. Wang, “A survey of deep active learning,”ACM Computing Surveys, vol. 54, no. 9, pp. 1–40, 2021

work page 2021

[44] [44]

Lora: Low-rank adaptation of large language models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

work page 2022

[45] [45]

Llavanext: Improved reasoning, ocr, and world knowledge,

H. Liu, C. Li, Y . Li, B. Li, Y . Zhang, S. Shen, and Y . J. Lee, “Llavanext: Improved reasoning, ocr, and world knowledge,” 2024

work page 2024

[46] [46]

Llama 3.2: Revolutionizing edge ai and vision with open, customizable models,

Meta AI, “Llama 3.2: Revolutionizing edge ai and vision with open, customizable models,” Sep. 2024, ac- cessed: 2025-11-29. [Online]. Available: https://ai.meta.com/blog/ llama-3-2-connect-2024-vision-edge-mobile-devices/ Stephanie Ngreceived the M.DataSc. degree from the University of Melbourne in 2021 and completed the Ph.D. degree requirements in Engin...

work page 2024

[47] [47]

[National Highway Traffic Safety Administration] •Driver to have proper control of a vehicle etc.: A person must not drive a vehicle if a person or an animal is in the driver’s lap

Driving Distraction: •Distracted driving: Distracted driving is any activity that diverts attention from driving, including talking or texting on your phone, eating and drinking, talking to people in your vehicle, fiddling with the stereo, entertainment or navigation system — anything that takes your attention away from the task of safe driving. [National...

work page 2017

[48] [48]

Traffic Rules: •Giving way at a pedestrian crossing: A driver must give way to any pedestrian on or entering a pedestrian crossing. [ROAD SAFETY ROAD RULES 2017 - REG 81 (2)] •Overtaking or passing a vehicle at a children’s crossing or pedestrian crossing: A driver approaching a children’s crossing, or pedestrian crossing, must not overtake or pass a vehi...

work page 2017

[49] [49]

Pedestrian Crossing: •Crossing a road—general: A pedestrian crossing a road— (a) must cross by the shortest safe route; and (b) must not stay on the road longer than necessary to cross the road safely. [ROAD SAFETY ROAD RULES 2017 - REG 230 (1)] •Crossing a road at pedestrian lights: If the pedestrian lights show a red pedestrian light and the pedestrian ...

work page 2017

[50] [50]

Road Condition: •Obligations of road users: A person who drives a motor vehicle on a highway must drive in a safe manner having regard to all the relevant factors. [ROAD SAFETY ACT 1986 - SECT 17A (1)] •The relevant factors include the following— (a) the physical characteristics of the road; (b) the prevailing weather conditions; (c) the level of visibili...

work page 1986

[51] [51]

Vehicle Load: •Carrying goods in addition to a large indivisible item: A load-carrying vehicle must not carry more than 1 large indivisible item. [HEA VY VEHICLE (MASS, DIMEN- SION AND LOADING) NATIONAL REGULATION - SCHEDULE 8 Division 2 - Load-carrying vehicles 13 (1)] •Load restraint requirement: The following requirements apply to a vehicle that is car...

work page 2018

[52] [52]

[1926.1417(d)] •Erect and maintain control lines, warning lines, railings or similar barriers to mark the boundaries of the hazard areas

Crane Use: •The operator must not engage in any practice or activity that diverts his/her attention while actually engaged in operating the equipment, such as the use of cellular phones (other than when used for signal communica- tions). [1926.1417(d)] •Erect and maintain control lines, warning lines, railings or similar barriers to mark the boundaries of...

work page arXiv 1926

[53] [53]

No Smoking or Open Flame

Fire Risk: •Smoking shall be prohibited at or in the vicinity of operations which constitute a fire hazard, and shall be conspicuously posted: “No Smoking or Open Flame.” [1926.151(a)(3)] •If the object to be welded, cut, or heated cannot be moved and if all the fire hazards cannot be removed, positive means shall be taken to confine the heat, sparks, and...

work page 1926

[54] [54]

[1926.1053(b)(6)] •The area around the top and bottom of ladders shall be kept clear

Ladder Use: •Ladders shall be used only on stable and level sur- faces unless secured to prevent accidental displacement. [1926.1053(b)(6)] •The area around the top and bottom of ladders shall be kept clear. [1926.1053(b)(9)] •When ascending or descending a ladder, the user shall face the ladder. [1926.1053(b)(20)] •Each employee shall use at least one ha...

work page arXiv 1926

[55] [55]

Protective Equipment: •Employees working in areas where there is a possible danger of head injury from impact, or from falling or flying objects, or from electrical shock and burns, shall be protected by protective helmets. [1926.100(a)] •Each affected employee uses appropriate eye or face protection when exposed to eye or face hazards from flying particl...

work page 1926

[56] [56]

Scaffold Risk: •Each platform on all working levels of scaffolds shall be fully planked or decked between the front uprights and the guardrail supports [1926.451(b)(1)] •Guardrail systems shall be installed along all open sides and ends of platforms. [1926.451(g)(4)] •The top edge height of toprails or equivalent member on supported scaffolds shall be ins...

work page 1926

[57] [57]

Never lift a heavy item above shoulder level

Ergonomic Lifting: •Safe lifting involves: Holding the load close to your body at waist height. Never lift a heavy item above shoulder level. Never carry a load that obstructs your vision. [General Duty Clause, Section 5(a)(1)] •The following points should be considered: The start and finish height of the load should be a suitable level above the floor, t...

work page 2005

[58] [58]

[29 CFR 1910.178(m)(2)] •All traffic regulations shall be observed, including au- thorized plant speed limits

Forklift Use: •No person shall be allowed to stand or pass under the elevated portion of any truck, whether loaded or empty. [29 CFR 1910.178(m)(2)] •All traffic regulations shall be observed, including au- thorized plant speed limits. A safe distance shall be maintained approximately three truck lengths from the truck ahead, and the truck shall be kept u...

work page 1910

[59] [59]

[29 CFR 1910.23(b)(13)]

Ladder Use: •Ladders are used only on stable and level surfaces; [29 CFR 1910.23(c)(4)] •Each employee faces the ladder when climbing up or down it; [29 CFR 1910.23(b)(11)] 3 •Each employee uses at least one hand to grasp the ladder when climbing up and down it; and [29 CFR 1910.23(b)(12)] •No employee carries any object or load that could cause the emplo...

work page 1910

[60] [60]

Protective Equipment: •Each affected employee uses appropriate eye or face protection when exposed to eye or face hazards from flying particles, molten metal, liquid chemicals, acids or caustic liquids, chemical gases or vapors, or potentially injurious light radiation [29 CFR 1910.133(a)(1)] •Each affected employee wears a protective helmet when working ...

work page 1910

[61] [61]

[29 CFR 1910.22(a)(1)] •The floor of each workroom is maintained in a clean and, to the extent feasible, in a dry condition

Surface Condition: •All places of employment, passageways, storerooms, service rooms, and walking-working surfaces are kept in a clean, orderly, and sanitary condition. [29 CFR 1910.22(a)(1)] •The floor of each workroom is maintained in a clean and, to the extent feasible, in a dry condition. When wet processes are used, drainage must be maintained and, t...

work page 1910

[62] [62]

Driving Distraction: •No assumptions made

work page

[63] [63]

•Not evaluated (Not Applicable) if lane markings are not visible or the road is gravel

Traffic Rules: •Evaluated if the vehicle is traveling in its lane, moving in the same direction as traffic, or parked neatly in the correct orientation. •Not evaluated (Not Applicable) if lane markings are not visible or the road is gravel. •Vehicles traveling/parked on the emergency lane or on gravel next to the road are considered hazards

work page

[64] [64]

Pedestrian Crossing: •Evaluated only if both pedestrian legs and the road are visible; otherwise, Not Applicable

work page

[65] [65]

•Gravel roads or roads without visible lane markings are considered hazards

Road Condition: •Evaluated as long as part of the road is visible, even if blurred. •Gravel roads or roads without visible lane markings are considered hazards. •Vehicles not on a road (e.g., on grass) are Not Applicable

work page

[66] [66]

•Vans and buses are evaluated only if obvious cargo is present on top or strapped to the vehicle

Vehicle Load: •All trucks are always evaluated. •Vans and buses are evaluated only if obvious cargo is present on top or strapped to the vehicle. •Vehicles with cargo are always evaluated; vehicles with- out cargo are Not Applicable. B. Construction Domain

work page

[67] [67]

Crane Use: •Assumed compliant if a crane (or part of it) is visible, unless there is a clear violation

work page

[68] [68]

Fire Risk: •Assumed violated if protective equipment rules are not met, even if fire is handled safely

work page

[69] [69]

Ladder Use: •No assumptions made

work page

[70] [70]

Wearing only a high-visibility vest is a violation

Protective Equipment: •Considered compliant if the worker/operator wears at least a helmet. Wearing only a high-visibility vest is a violation. •Exceptions: –Firefighters, who may have different uniforms and may not require a helmet. –If a smoke hazard is present (excluding cigarette smoke) and the worker lacks a breathing mask, it is considered a violati...

work page

[71] [71]

•Wooden frames are not considered scaffolding

Scaffold Risk: •Label based on the presence of scaffolding, not necessity. •Wooden frames are not considered scaffolding. •If scaffolding is required but not visible, label as Not Applicable. C. Warehouse Domain

work page

[72] [72]

•Picking up items do not close to waist level is a violation

Ergonomic Lifting: •All lifted items are assumed heavy; items carried above shoulder level are violations, including when passed between two people. •Picking up items do not close to waist level is a violation. •Signs of back pain (holding back, grimacing) indicate a violation, even if ergonomics are correct. 4

work page

[73] [73]

•Operator distraction (e.g., phone use, talking) is a viola- tion

Forklift Use: •All accidents involving a forklift are considered hazards. •Operator distraction (e.g., phone use, talking) is a viola- tion. •Evaluated if forklift and operators are present; during accidents, even vacant forklifts are considered a violation

work page

[74] [74]

•Reaching or carrying items above shoulder level on a ladder is a violation

Ladder Use: •Not always a violation if both hands are not on the ladder; assume user is stationary if carrying items. •Reaching or carrying items above shoulder level on a ladder is a violation. •Users must face ladder steps when climbing; facing any direction on a platform is allowed. •Using non-ladders as ladders is a violation. •Step ladders are consid...

work page

[75] [75]

Protective Equipment: •Workers must wear at least a safety helmet; absence is a violation even if wearing a high-visibility vest

work page

[76] [76]

•White backgrounds/floors are Not Applicable

Surface Condition: •Single boxes on the floor are violations. •White backgrounds/floors are Not Applicable. •Standing on improper surfaces (boxes, ladders, or other items) is a violation. APPENDIXC PROMPTTEMPLATES A. Task-focused Variants

work page

[77] [77]

Complied

T1 (Inline Classification Instruction):Classify the image into exactly one of "Complied", "Violated", or "Not Applicable" for compliance with the rule set

work page

[78] [78]

Complied

T2 (Constrained Output Instruction):Classify the image according to the rule set. Respond with exactly one of: "Complied", "Violated", or "Not Applicable"

work page

[79] [79]

Complied

T3 (T2 – Alt Wording Instruction):Classify the image against the rule set. Respond with exactly one of: "Complied", "Violated", or "Not Applicable"

work page

[80] [80]

Complied

T4 (T3 – Analysis-focused Instruction):Analyze the image against the rule set. Respond with exactly one of: "Complied", "Violated", or "Not Applicable". B. Classification-focused Variants

work page