pith. sign in

arxiv: 2605.23304 · v1 · pith:N35FCT2Onew · submitted 2026-05-22 · 💻 cs.CV

General Hazard Detection

Pith reviewed 2026-05-25 05:09 UTC · model grok-4.3

classification 💻 cs.CV
keywords hazard detectionvision-language modelsrule-based complianceactive learningCompliVision datasetsafety rulesISO standardsgeneralization
0
0 comments X

The pith

Expressing safety requirements as language rules from regulations decouples hazard detection from image examples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that hazards are abstract concepts best captured by logical rules rather than fixed image categories, so existing detection systems fail on noisy data, shifting definitions, and novel cases. It introduces the CompliVision dataset of 3006 images across traffic, construction, and warehouse scenes, each checked against specific language rules drawn from domain regulations and ISO standards, plus natural-language explanations of the visual evidence. A baseline active-learning framework then uses LLaVA-style vision-language models with human-in-the-loop feedback to assess rule compliance. If the approach holds, hazard systems could handle evolving standards and unseen scenarios without retraining on new labeled examples for each context.

Core claim

Hazard assessment reduces to checking compliance with language-based safety rules grounded in authoritative regulations and ISO standards rather than learning from predefined image categories; the CompliVision dataset supplies 3006 images annotated for rule compliance and supporting visual evidence, while an active-learning pipeline combining LLaVA visual reasoning with human feedback enables generalization beyond the training distribution.

What carries the argument

Language-based safety rules (grounded in regulations and ISO standards) that replace image-category labels, evaluated by an active-learning loop of LLaVA-based visual reasoning plus human-in-the-loop refinement.

If this is right

  • Hazard definitions can be updated by editing the language rules without recollecting image examples.
  • The same rule set applies across traffic, construction, and warehouse domains without domain-specific retraining.
  • Active learning reduces the volume of human annotations needed compared with fully supervised category-based detectors.
  • Natural-language explanations of visual evidence become a built-in output of the assessment process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The rule-decoupling pattern could be tested on other abstract safety or compliance concepts such as accessibility or environmental impact.
  • Direct linkage of the language rules to live regulatory databases would allow automatic propagation of definition changes into the detector.
  • The framework might be extended to video or 3D sensor streams by applying the same rule interpreter to temporal or spatial evidence.

Load-bearing premise

Vision-language models plus active learning and human feedback can correctly interpret fine-grained, context-dependent safety rules for hazards never seen during training.

What would settle it

A controlled test on a new domain or novel hazard scenario where the framework produces compliance judgments that systematically contradict expert rule application.

Figures

Figures reproduced from arXiv: 2605.23304 by CP Lim, David Nguyen, Hailing Zhou, Hendrik Zurlinden, Lei Wei, Saeid Nahavandi, Stephanie Ng, SueJen Looi.

Figure 1
Figure 1. Figure 1: Overview of the proposed approach for general hazard detection. Top panel: Key limitations of perception-level object detection models in hazard [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the proposed framework in details. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: t-SNE Visualization of Embeddings for Training, Validation, and Test [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Word Clouds of Generated Explanations for Training, Validation, and [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: t-SNE Embeddings with Rule-Compliance Regions in AL Round 0 vs Round 3. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Examples of hazard detection results across three application domains and classification types. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Demonstration of the feedback process: (1) the model generates a prediction and flags weak samples for human feedback. (2a) an example of poor [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
read the original abstract

Hazard, as an abstract concept, is typically defined through cognitive-level logical reasoning rather than concrete examples. In contrast, existing hazard detection systems rely on predefined hazard categories and require intensive collection of labelled examples within detection or classification architectures. This approach faces three fundamental challenges when addressing abstract safety concepts: (1) noisy and sparse training data, (2) dynamically evolving definitions that change across contexts and time, and (3) limited generalisation to unseen or novel scenarios. To address these limitations, we present the CompliVision dataset, the first general-purpose hazard dataset designed for rule-based compliance assessment, along with a baseline framework for hazard evaluation. Our key innovation is decoupling the hazard concept from image-based examples by expressing safety requirements through language-based rules. We ground our approach in authoritative domain regulations and ISO standards to define diverse hazard concepts across multiple domains. The CompliVision dataset comprises 3,006 images spanning traffic, construction, and warehouse environments, with each image annotated for compliance against specific safety rules, accompanied by natural language explanations highlighting the supporting visual evidence. To achieve robust generalisation, we develop an active learning framework to more effectively guide and refine vision-language models in assessing hazard compliance. While state-of-the-art VLMs demonstrate strong capabilities, they struggle with the fine-grained, context-dependent interpretation required for accurate safety assessment. We proposed a general hazard detection framework to address this limitation which combines LLaVA-based visual reasoning with with human-in-the-loop feedback.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the CompliVision dataset of 3,006 images from traffic, construction, and warehouse domains, each annotated for compliance with language-based safety rules derived from ISO standards and regulations, along with natural language explanations. It proposes a baseline framework that decouples hazard detection from image examples by using LLaVA-based vision-language models, active learning, and human-in-the-loop feedback to assess rule compliance, claiming this addresses noisy data, evolving definitions, and limited generalization to novel scenarios where standard VLMs struggle with fine-grained, context-dependent rules.

Significance. If the active learning + HITL framework were shown to deliver reliable extrapolation on unseen rules and contexts, the work would be significant for shifting hazard detection toward authoritative, language-grounded standards rather than example-driven categories, with potential impact on safety-critical CV applications.

major comments (2)
  1. [Abstract] Abstract: the claim that the proposed LLaVA-based framework with active learning and human-in-the-loop feedback achieves 'robust generalisation' to novel hazard scenarios is unsupported; no accuracy metrics, baselines, ablations, or held-out evaluations on evolving or unseen rules are reported to substantiate improvement over standard VLMs.
  2. [Abstract] Abstract: the dataset is presented as enabling rule-based compliance assessment, yet no details on annotation protocol, rule-to-image mapping procedure, or validation of the natural language explanations are supplied, leaving the core data foundation for the generalization claim unevaluated.
minor comments (1)
  1. [Abstract] Abstract: duplicate 'with with' in the final sentence.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the proposed LLaVA-based framework with active learning and human-in-the-loop feedback achieves 'robust generalisation' to novel hazard scenarios is unsupported; no accuracy metrics, baselines, ablations, or held-out evaluations on evolving or unseen rules are reported to substantiate improvement over standard VLMs.

    Authors: We agree that the abstract overstates the generalization capability. The current manuscript presents the active learning and HITL framework as a proposed baseline without reporting accuracy metrics, baselines, ablations, or held-out tests on unseen rules. We will revise the abstract to qualify or remove the 'robust generalisation' claim and add quantitative evaluations in the revised manuscript. revision: yes

  2. Referee: [Abstract] Abstract: the dataset is presented as enabling rule-based compliance assessment, yet no details on annotation protocol, rule-to-image mapping procedure, or validation of the natural language explanations are supplied, leaving the core data foundation for the generalization claim unevaluated.

    Authors: We acknowledge that the abstract does not include these details and that the manuscript would benefit from expanded description of the data creation process. We will add explicit sections covering the annotation protocol, rule-to-image mapping procedure, and validation steps for the natural language explanations in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity; method grounded in external ISO standards and regulations

full rationale

The paper's derivation chain relies on expressing safety requirements via language-based rules drawn from authoritative external domain regulations and ISO standards, rather than any self-referential definitions, fitted parameters presented as predictions, or load-bearing self-citations. The CompliVision dataset and LLaVA-based active learning framework with human-in-the-loop feedback are introduced as responses to stated limitations of existing VLM approaches, with no equations or steps that reduce by construction to the paper's own inputs. This is a self-contained proposal against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on the domain assumption that safety rules from standards can be directly applied to visual scenes by VLMs; no free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption Safety requirements can be accurately expressed through language-based rules from ISO standards and regulations and applied to images.
    Invoked to enable decoupling of hazard concept from image examples.

pith-pipeline@v0.9.0 · 5804 in / 1070 out tokens · 22036 ms · 2026-05-25T05:09:37.965860+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 3 internal anchors

  1. [1]

    Framework of automated construction- safety monitoring using cloud-enabled bim and ble mobile tracking sensors,

    J. Park, K. Kim, and Y . K. Cho, “Framework of automated construction- safety monitoring using cloud-enabled bim and ble mobile tracking sensors,”Journal of Construction Engineering and Management, vol. 143, no. 2, p. 05016019, 2017

  2. [2]

    Inferring workplace safety hazards from the spatial patterns of workers’ wearable data,

    K. Yang and C. R. Ahn, “Inferring workplace safety hazards from the spatial patterns of workers’ wearable data,”Advanced Engineering Informatics, vol. 41, p. 100924, 2019

  3. [3]

    Real-time vision-based worker localization & hazard detection for construction,

    I. Jeelani, K. Asadi, H. Ramshankar, K. Han, and A. Albert, “Real-time vision-based worker localization & hazard detection for construction,” Automation in Construction, vol. 121, p. 103448, 2021

  4. [4]

    S. W. Australia, Oct 2025. [Online]. Available: https: //data.safeworkaustralia.gov.au/insights/key-whs-statistics-australia/ latest-release

  5. [5]

    A systematic review of computer vision-based personal protective equipment compliance in industry practice: advancements, challenges and future directions,

    A. M. Vukicevic, M. Petrovic, P. Milosevic, A. Peulic, K. Jovanovic, and A. Novakovic, “A systematic review of computer vision-based personal protective equipment compliance in industry practice: advancements, challenges and future directions,”Artificial Intelligence Review, vol. 57, no. 12, p. 319, 2024

  6. [6]

    Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces,

    Z. Chen, H. Chen, M. Imani, R. Chen, and F. Imani, “Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces,”Expert Systems with Applications, vol. 265, p. 125769, 2025

  7. [7]

    Detection of personal pro- tective equipment (ppe) compliance on construction site using computer vision based deep learning techniques,

    V . S. K. Delhi, R. Sankarlal, and A. Thomas, “Detection of personal pro- tective equipment (ppe) compliance on construction site using computer vision based deep learning techniques,”Frontiers in Built Environment, vol. 6, p. 136, 2020

  8. [8]

    Computer vision-based hazard identification of construction site using visual relationship detection and ontology,

    Y . Li, H. Wei, Z. Han, N. Jiang, W. Wang, and J. Huang, “Computer vision-based hazard identification of construction site using visual relationship detection and ontology,”Buildings, vol. 12, no. 6, p. 857, 2022

  9. [9]

    Safe-construct: Redefining construction safety violation recognition as 3d multi-view engagement task,

    A. Chharia, T. Ren, T. Furuhata, and K. Shimada, “Safe-construct: Redefining construction safety violation recognition as 3d multi-view engagement task,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5811–5820

  10. [10]

    Vision- language hazard reasoning for driver distraction and workload estima- tion,

    S. Ng, H. Zhou, A. Arogbonlo, C. P. Lim, and S. Nahavandi, “Vision- language hazard reasoning for driver distraction and workload estima- tion,”Electronics Letters, vol. 61, no. 1, p. e70466, 2025

  11. [11]

    Zero-shot monitoring of construction workers’ personal protective equipment based on image captioning,

    D. Gil and G. Lee, “Zero-shot monitoring of construction workers’ personal protective equipment based on image captioning,”Automation in Construction, vol. 164, p. 105470, 2024

  12. [12]

    Safety compliance checking of con- struction behaviors using visual question answering,

    Y . Ding, M. Liu, and X. Luo, “Safety compliance checking of con- struction behaviors using visual question answering,”Automation in Construction, vol. 144, p. 104580, 2022

  13. [13]

    Detection of wearing safety helmet for workers based on yolov4,

    L. Yunyun and W. JIANG, “Detection of wearing safety helmet for workers based on yolov4,” in2021 International Conference on Com- puter Engineering and Artificial Intelligence (ICCEAI). IEEE, 2021, pp. 83–87

  14. [14]

    Detection of worker’s safety helmet and mask and identification of worker using deeplearning

    N. Kwak and D. Kim, “Detection of worker’s safety helmet and mask and identification of worker using deeplearning.”Computers, Materials & Continua, vol. 75, no. 1, pp. 1671–1686, 2023

  15. [15]

    Real-time road hazard information system,

    C. Pena-Caballero, D. Kim, A. Gonzalez, O. Castellanos, A. Cantu, and J. Ho, “Real-time road hazard information system,”Infrastructures, vol. 5, no. 9, p. 75, 2020

  16. [16]

    Fire detection method in smart city environments using a deep-learning- based approach,

    K. Avazov, M. Mukhiddinov, F. Makhmudov, and Y . I. Cho, “Fire detection method in smart city environments using a deep-learning- based approach,”Electronics, vol. 11, no. 1, p. 73, 2021. [Online]. Available: https://www.mdpi.com/2079-9292/11/1/7

  17. [17]

    Meta module network for compositional visual reasoning,

    W. Chen, Z. Gan, L. Li, Y . Cheng, W. Wang, and J. Liu, “Meta module network for compositional visual reasoning,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 655–664

  18. [18]

    Neural- symbolic vqa: Disentangling reasoning from vision and language under- standing,

    K. Yi, J. Wu, C. Gan, A. Torralba, P. Kohli, and J. Tenenbaum, “Neural- symbolic vqa: Disentangling reasoning from vision and language under- standing,”Advances in neural information processing systems, vol. 31, pp. 1039–1050, 2018

  19. [19]

    Gqa: A new dataset for real-world visual reasoning and compositional question answering,

    D. A. Hudson and C. D. Manning, “Gqa: A new dataset for real-world visual reasoning and compositional question answering,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 6700–6709

  20. [20]

    Inferring and executing programs for visual reasoning,

    J. Johnson, B. Hariharan, L. Van Der Maaten, J. Hoffman, L. Fei- Fei, C. Lawrence Zitnick, and R. Girshick, “Inferring and executing programs for visual reasoning,” in2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3008–3017

  21. [21]

    Review of graph-based hazardous event detection methods for autonomous driving systems,

    D. Xiao, M. Dianati, W. G. Geiger, and R. Woodman, “Review of graph-based hazardous event detection methods for autonomous driving systems,”IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 5, pp. 4697–4715, 2023

  22. [22]

    Knowledge graph for identifying hazards on construction sites: Integrating computer vision with ontology,

    W. Fang, L. Ma, P. E. Love, H. Luo, L. Ding, and A. Zhou, “Knowledge graph for identifying hazards on construction sites: Integrating computer vision with ontology,”Automation in Construction, vol. 119, p. 103310, 2020

  23. [23]

    Construction safety knowledge graph integrating text and image information,

    W. Wu, Q. Yuan, Q. Chen, and Y . Cao, “Construction safety knowledge graph integrating text and image information,” inProceedings of the 2023 6th International Conference on Information Management and Management Science, 2023, pp. 26–32

  24. [24]

    Hazard analysis: A deep learning and text mining framework for accident prevention,

    B. Zhong, X. Pan, P. E. Love, J. Sun, and C. Tao, “Hazard analysis: A deep learning and text mining framework for accident prevention,” Advanced Engineering Informatics, vol. 46, p. 101152, 2020

  25. [25]

    Deep learning safety concerns in automated driving perception,

    S. Abrecht, A. Hirsch, S. Raafatnia, and M. Woehrle, “Deep learning safety concerns in automated driving perception,”IEEE Transactions on Intelligent Vehicles, 2024

  26. [26]

    Construction- ppe: Personal protective equipment detection dataset,

    M. Dalvi, N. Singh, S. Bhingarde, and K. Chalke, “Construction- ppe: Personal protective equipment detection dataset,” January

  27. [27]

    Available: https://docs.ultralytics.com/datasets/detect/ construction-ppe/ 12 TABLE IV ZERO-SHOT, FINE-TUNING,ANDACTIVELEARNINGPERFORMANCE ACROSS3 DOMAINS

    [Online]. Available: https://docs.ultralytics.com/datasets/detect/ construction-ppe/ 12 TABLE IV ZERO-SHOT, FINE-TUNING,ANDACTIVELEARNINGPERFORMANCE ACROSS3 DOMAINS. Domain Method AL Num. Manual Accum. Manual No. Pseudo Training Annotation Avg Model Performance Rounds Labels Labels Labels Samples Saved (%) Macro F1 Accuracy Traffic Zero-shot - 0 0 0 0 0 0...

  28. [28]

    Soda: A large-scale open site object detection dataset for deep learning in construction,

    R. Duan, H. Deng, M. Tian, Y . Deng, and J. Lin, “Soda: A large-scale open site object detection dataset for deep learning in construction,” Automation in Construction, vol. 142, p. 104499, 2022

  29. [29]

    Dataset and benchmark for detecting moving objects in construction sites,

    A. Xuehui, Z. Li, L. Zuguang, W. Chengzhi, L. Pengfei, and L. Zhiwei, “Dataset and benchmark for detecting moving objects in construction sites,”Automation in Construction, vol. 122, p. 103482, 2021

  30. [30]

    Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?

    X. Chen and Z. Zou, “Are large pre-trained vision language models effective construction safety inspectors?”arXiv preprint arXiv:2508.11011, 2025

  31. [31]

    CLEVRER: CoLlision Events for Video REpresentation and Reasoning

    K. Yi, C. Gan, Y . Li, P. Kohli, J. Wu, A. Torralba, and J. B. Tenenbaum, “Clevrer: Collision events for video representation and reasoning,”arXiv preprint arXiv:1910.01442, 2019

  32. [32]

    Deepaccident: A motion and accident prediction benchmark for v2x autonomous driving,

    T. Wang, S. Kim, J. Wenxuan, E. Xie, C. Ge, J. Chen, Z. Li, and P. Luo, “Deepaccident: A motion and accident prediction benchmark for v2x autonomous driving,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, 2024, pp. 5599–5606

  33. [33]

    Wts: A pedestrian-centric traffic video dataset for fine-grained spatial-temporal understanding,

    Q. Kong, Y . Kawana, R. Saini, A. Kumar, J. Pan, T. Gu, Y . Ozao, B. Opra, Y . Sato, and N. Kobori, “Wts: A pedestrian-centric traffic video dataset for fine-grained spatial-temporal understanding,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 1–18

  34. [34]

    Sh17: A dataset for human safety and personal protective equipment detection in manufacturing industry,

    H. M. Ahmad and A. Rahimi, “Sh17: A dataset for human safety and personal protective equipment detection in manufacturing industry,” 14 arXiv preprint arXiv:2407.04590, 2024

  35. [35]

    Early fire and smoke detection using deep learning: A comprehensive review of models, datasets, and challenges,

    A. Elhanashi, S. Essahraui, P. Dini, and S. Saponara, “Early fire and smoke detection using deep learning: A comprehensive review of models, datasets, and challenges,”Applied Sciences, vol. 15, no. 18, p. 10255, 2025

  36. [36]

    Visual instruction tuning,

    H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual instruction tuning,” Advances in neural information processing systems, vol. 36, pp. 34 892– 34 916, 2023

  37. [37]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022

  38. [38]

    Llava- cot: Let vision language models reason step-by-step,

    G. Xu, P. Jin, Z. Wu, H. Li, Y . Song, L. Sun, and L. Yuan, “Llava- cot: Let vision language models reason step-by-step,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 2087–2098

  39. [39]

    Interactive semantic interventions for vlms: A human- in-the-loop investigation of vlm failure,

    L. Klein, K. Amara, C. T. L ¨uth, H. Strobelt, M. El-Assady, and P. F. Jaeger, “Interactive semantic interventions for vlms: A human- in-the-loop investigation of vlm failure,” inNeurips Safe Generative AI Workshop 2024, 2024

  40. [40]

    Active prompting of vision language models for human-in-the-loop classifi- cation and explanation of microscopy images,

    A. Kandiyana, P. R. Mouton, L. O. Hall, and D. Goldgof, “Active prompting of vision language models for human-in-the-loop classifi- cation and explanation of microscopy images,” in2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS). IEEE, 2024, pp. 205–212

  41. [41]

    A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

    P. Sahoo, A. K. Singh, S. Saha, V . Jain, S. Mondal, and A. Chadha, “A systematic survey of prompt engineering in large language models,” arXiv preprint arXiv:2402.07927, 2024

  42. [42]

    Large vision-language models: A survey,

    H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Large vision-language models: A survey,”arXiv preprint arXiv:2402.14082, 2024

  43. [43]

    A survey of deep active learning,

    P. Ren, Y . Xiao, X. Chang, P.-Y . Huang, Z. Li, B. B. Gupta, and X. Wang, “A survey of deep active learning,”ACM Computing Surveys, vol. 54, no. 9, pp. 1–40, 2021

  44. [44]

    Lora: Low-rank adaptation of large language models

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

  45. [45]

    Llavanext: Improved reasoning, ocr, and world knowledge,

    H. Liu, C. Li, Y . Li, B. Li, Y . Zhang, S. Shen, and Y . J. Lee, “Llavanext: Improved reasoning, ocr, and world knowledge,” 2024

  46. [46]

    Llama 3.2: Revolutionizing edge ai and vision with open, customizable models,

    Meta AI, “Llama 3.2: Revolutionizing edge ai and vision with open, customizable models,” Sep. 2024, ac- cessed: 2025-11-29. [Online]. Available: https://ai.meta.com/blog/ llama-3-2-connect-2024-vision-edge-mobile-devices/ Stephanie Ngreceived the M.DataSc. degree from the University of Melbourne in 2021 and completed the Ph.D. degree requirements in Engin...

  47. [47]

    [National Highway Traffic Safety Administration] •Driver to have proper control of a vehicle etc.: A person must not drive a vehicle if a person or an animal is in the driver’s lap

    Driving Distraction: •Distracted driving: Distracted driving is any activity that diverts attention from driving, including talking or texting on your phone, eating and drinking, talking to people in your vehicle, fiddling with the stereo, entertainment or navigation system — anything that takes your attention away from the task of safe driving. [National...

  48. [48]

    Traffic Rules: •Giving way at a pedestrian crossing: A driver must give way to any pedestrian on or entering a pedestrian crossing. [ROAD SAFETY ROAD RULES 2017 - REG 81 (2)] •Overtaking or passing a vehicle at a children’s crossing or pedestrian crossing: A driver approaching a children’s crossing, or pedestrian crossing, must not overtake or pass a vehi...

  49. [49]

    Pedestrian Crossing: •Crossing a road—general: A pedestrian crossing a road— (a) must cross by the shortest safe route; and (b) must not stay on the road longer than necessary to cross the road safely. [ROAD SAFETY ROAD RULES 2017 - REG 230 (1)] •Crossing a road at pedestrian lights: If the pedestrian lights show a red pedestrian light and the pedestrian ...

  50. [50]

    Road Condition: •Obligations of road users: A person who drives a motor vehicle on a highway must drive in a safe manner having regard to all the relevant factors. [ROAD SAFETY ACT 1986 - SECT 17A (1)] •The relevant factors include the following— (a) the physical characteristics of the road; (b) the prevailing weather conditions; (c) the level of visibili...

  51. [51]

    Vehicle Load: •Carrying goods in addition to a large indivisible item: A load-carrying vehicle must not carry more than 1 large indivisible item. [HEA VY VEHICLE (MASS, DIMEN- SION AND LOADING) NATIONAL REGULATION - SCHEDULE 8 Division 2 - Load-carrying vehicles 13 (1)] •Load restraint requirement: The following requirements apply to a vehicle that is car...

  52. [52]

    [1926.1417(d)] •Erect and maintain control lines, warning lines, railings or similar barriers to mark the boundaries of the hazard areas

    Crane Use: •The operator must not engage in any practice or activity that diverts his/her attention while actually engaged in operating the equipment, such as the use of cellular phones (other than when used for signal communica- tions). [1926.1417(d)] •Erect and maintain control lines, warning lines, railings or similar barriers to mark the boundaries of...

  53. [53]

    No Smoking or Open Flame

    Fire Risk: •Smoking shall be prohibited at or in the vicinity of operations which constitute a fire hazard, and shall be conspicuously posted: “No Smoking or Open Flame.” [1926.151(a)(3)] •If the object to be welded, cut, or heated cannot be moved and if all the fire hazards cannot be removed, positive means shall be taken to confine the heat, sparks, and...

  54. [54]

    [1926.1053(b)(6)] •The area around the top and bottom of ladders shall be kept clear

    Ladder Use: •Ladders shall be used only on stable and level sur- faces unless secured to prevent accidental displacement. [1926.1053(b)(6)] •The area around the top and bottom of ladders shall be kept clear. [1926.1053(b)(9)] •When ascending or descending a ladder, the user shall face the ladder. [1926.1053(b)(20)] •Each employee shall use at least one ha...

  55. [55]

    Protective Equipment: •Employees working in areas where there is a possible danger of head injury from impact, or from falling or flying objects, or from electrical shock and burns, shall be protected by protective helmets. [1926.100(a)] •Each affected employee uses appropriate eye or face protection when exposed to eye or face hazards from flying particl...

  56. [56]

    Scaffold Risk: •Each platform on all working levels of scaffolds shall be fully planked or decked between the front uprights and the guardrail supports [1926.451(b)(1)] •Guardrail systems shall be installed along all open sides and ends of platforms. [1926.451(g)(4)] •The top edge height of toprails or equivalent member on supported scaffolds shall be ins...

  57. [57]

    Never lift a heavy item above shoulder level

    Ergonomic Lifting: •Safe lifting involves: Holding the load close to your body at waist height. Never lift a heavy item above shoulder level. Never carry a load that obstructs your vision. [General Duty Clause, Section 5(a)(1)] •The following points should be considered: The start and finish height of the load should be a suitable level above the floor, t...

  58. [58]

    [29 CFR 1910.178(m)(2)] •All traffic regulations shall be observed, including au- thorized plant speed limits

    Forklift Use: •No person shall be allowed to stand or pass under the elevated portion of any truck, whether loaded or empty. [29 CFR 1910.178(m)(2)] •All traffic regulations shall be observed, including au- thorized plant speed limits. A safe distance shall be maintained approximately three truck lengths from the truck ahead, and the truck shall be kept u...

  59. [59]

    [29 CFR 1910.23(b)(13)]

    Ladder Use: •Ladders are used only on stable and level surfaces; [29 CFR 1910.23(c)(4)] •Each employee faces the ladder when climbing up or down it; [29 CFR 1910.23(b)(11)] 3 •Each employee uses at least one hand to grasp the ladder when climbing up and down it; and [29 CFR 1910.23(b)(12)] •No employee carries any object or load that could cause the emplo...

  60. [60]

    Protective Equipment: •Each affected employee uses appropriate eye or face protection when exposed to eye or face hazards from flying particles, molten metal, liquid chemicals, acids or caustic liquids, chemical gases or vapors, or potentially injurious light radiation [29 CFR 1910.133(a)(1)] •Each affected employee wears a protective helmet when working ...

  61. [61]

    [29 CFR 1910.22(a)(1)] •The floor of each workroom is maintained in a clean and, to the extent feasible, in a dry condition

    Surface Condition: •All places of employment, passageways, storerooms, service rooms, and walking-working surfaces are kept in a clean, orderly, and sanitary condition. [29 CFR 1910.22(a)(1)] •The floor of each workroom is maintained in a clean and, to the extent feasible, in a dry condition. When wet processes are used, drainage must be maintained and, t...

  62. [62]

    Driving Distraction: •No assumptions made

  63. [63]

    •Not evaluated (Not Applicable) if lane markings are not visible or the road is gravel

    Traffic Rules: •Evaluated if the vehicle is traveling in its lane, moving in the same direction as traffic, or parked neatly in the correct orientation. •Not evaluated (Not Applicable) if lane markings are not visible or the road is gravel. •Vehicles traveling/parked on the emergency lane or on gravel next to the road are considered hazards

  64. [64]

    Pedestrian Crossing: •Evaluated only if both pedestrian legs and the road are visible; otherwise, Not Applicable

  65. [65]

    •Gravel roads or roads without visible lane markings are considered hazards

    Road Condition: •Evaluated as long as part of the road is visible, even if blurred. •Gravel roads or roads without visible lane markings are considered hazards. •Vehicles not on a road (e.g., on grass) are Not Applicable

  66. [66]

    •Vans and buses are evaluated only if obvious cargo is present on top or strapped to the vehicle

    Vehicle Load: •All trucks are always evaluated. •Vans and buses are evaluated only if obvious cargo is present on top or strapped to the vehicle. •Vehicles with cargo are always evaluated; vehicles with- out cargo are Not Applicable. B. Construction Domain

  67. [67]

    Crane Use: •Assumed compliant if a crane (or part of it) is visible, unless there is a clear violation

  68. [68]

    Fire Risk: •Assumed violated if protective equipment rules are not met, even if fire is handled safely

  69. [69]

    Ladder Use: •No assumptions made

  70. [70]

    Wearing only a high-visibility vest is a violation

    Protective Equipment: •Considered compliant if the worker/operator wears at least a helmet. Wearing only a high-visibility vest is a violation. •Exceptions: –Firefighters, who may have different uniforms and may not require a helmet. –If a smoke hazard is present (excluding cigarette smoke) and the worker lacks a breathing mask, it is considered a violati...

  71. [71]

    •Wooden frames are not considered scaffolding

    Scaffold Risk: •Label based on the presence of scaffolding, not necessity. •Wooden frames are not considered scaffolding. •If scaffolding is required but not visible, label as Not Applicable. C. Warehouse Domain

  72. [72]

    •Picking up items do not close to waist level is a violation

    Ergonomic Lifting: •All lifted items are assumed heavy; items carried above shoulder level are violations, including when passed between two people. •Picking up items do not close to waist level is a violation. •Signs of back pain (holding back, grimacing) indicate a violation, even if ergonomics are correct. 4

  73. [73]

    •Operator distraction (e.g., phone use, talking) is a viola- tion

    Forklift Use: •All accidents involving a forklift are considered hazards. •Operator distraction (e.g., phone use, talking) is a viola- tion. •Evaluated if forklift and operators are present; during accidents, even vacant forklifts are considered a violation

  74. [74]

    •Reaching or carrying items above shoulder level on a ladder is a violation

    Ladder Use: •Not always a violation if both hands are not on the ladder; assume user is stationary if carrying items. •Reaching or carrying items above shoulder level on a ladder is a violation. •Users must face ladder steps when climbing; facing any direction on a platform is allowed. •Using non-ladders as ladders is a violation. •Step ladders are consid...

  75. [75]

    Protective Equipment: •Workers must wear at least a safety helmet; absence is a violation even if wearing a high-visibility vest

  76. [76]

    •White backgrounds/floors are Not Applicable

    Surface Condition: •Single boxes on the floor are violations. •White backgrounds/floors are Not Applicable. •Standing on improper surfaces (boxes, ladders, or other items) is a violation. APPENDIXC PROMPTTEMPLATES A. Task-focused Variants

  77. [77]

    Complied

    T1 (Inline Classification Instruction):Classify the image into exactly one of "Complied", "Violated", or "Not Applicable" for compliance with the rule set

  78. [78]

    Complied

    T2 (Constrained Output Instruction):Classify the image according to the rule set. Respond with exactly one of: "Complied", "Violated", or "Not Applicable"

  79. [79]

    Complied

    T3 (T2 – Alt Wording Instruction):Classify the image against the rule set. Respond with exactly one of: "Complied", "Violated", or "Not Applicable"

  80. [80]

    Complied

    T4 (T3 – Analysis-focused Instruction):Analyze the image against the rule set. Respond with exactly one of: "Complied", "Violated", or "Not Applicable". B. Classification-focused Variants

Showing first 80 references.