pith. sign in

arxiv: 2604.19559 · v1 · submitted 2026-04-21 · 💻 cs.AI · cs.CL· cs.LG

Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics

Pith reviewed 2026-05-10 02:40 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG
keywords construction safetyheat stresswearable sensorsLSTM networksattention mechanismsmachine learningphysiological datapredictive modeling
0
0 comments X

The pith

An attention-based LSTM model predicts heat stress in construction workers from smartwatch data with 95.4 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Construction workers in hot environments risk heat stress, but few tools turn real-time body data into safety warnings. Researchers collected heart rate, heart rate variability, and oxygen levels from 19 workers in Saudi Arabia using Garmin smartwatches. They trained a baseline LSTM and an attention-based LSTM to forecast heat stress episodes. The attention version reached 95.40 percent test accuracy with precision, recall, and F1 scores of 0.982, cutting errors versus the baseline. This setup supports embedding predictions into connected safety platforms and building information models for faster responses on site.

Core claim

By monitoring physiological metrics with wearable devices, the authors show that an attention-based long short-term memory network can classify heat stress among construction workers at 95.40% accuracy, delivering results interpretable enough for practical safety applications.

What carries the argument

Attention-based LSTM model that weights important segments of time-series physiological signals to predict heat stress.

Load-bearing premise

Signals recorded from nineteen workers in a single region provide enough variety to train a predictor that works for other people and hotter or different job sites.

What would settle it

Apply the trained model to physiological data from construction workers in a different country or climate and check whether accuracy stays near 95 percent or falls sharply.

Figures

Figures reproduced from arXiv: 2604.19559 by Amir Khan, Syed Sajid Ullah.

Figure 1
Figure 1. Figure 1: Overall research flow of the proposed study. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Heat stress prediction framework from wearable data collection to model evaluation. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Experimental procedure, monitored tasks, and two-session wearable data collection protocol. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Confusion matrix of the baseline LSTM model. The model shows [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Confusion matrix of the proposed LSTM-AM model. Compared with [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: ROC curves for the baseline LSTM and LSTM-AM models, with [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

Construction workers are highly vulnerable to heat stress, yet tools that translate real-time physiological data into actionable safety intelligence remain scarce. This study addresses this gap by developing and evaluating deep learning models, specifically a baseline Long Short-Term Memory (LSTM) network and an attention-based LSTM, to predict heat stress among 19 workers in Saudi Arabia. Using Garmin Vivosmart 5 smartwatches to monitor metrics such as heart rate, HRV, and oxygen saturation, the attention-based model outperformed the baseline, achieving 95.40% testing accuracy and significantly reducing false positives and negatives. With precision, recall, and F1 scores of 0.982, this approach not only improves predictive performance but also offers interpretable results suitable for integration into IoT-enabled safety systems and BIM dashboards, advancing proactive, informatics-driven safety management in the construction industry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper develops and evaluates LSTM and attention-based LSTM models to predict heat stress from wearable physiological signals (heart rate, HRV, oxygen saturation) collected via Garmin Vivosmart 5 devices from 19 construction workers in Saudi Arabia. It reports that the attention-based model achieves 95.40% test accuracy, precision/recall/F1 of 0.982, outperforms the baseline LSTM, reduces false positives/negatives, and provides interpretable outputs suitable for IoT/BIM safety integration.

Significance. If the performance claims hold under rigorous validation, the work could support real-time, data-driven heat-stress alerts in construction, advancing informatics-based safety management. The attention mechanism's potential for interpretability is a positive feature for practical deployment. However, the small single-site sample and missing methodological details currently limit claims of broad applicability or superiority.

major comments (3)
  1. [Abstract] Abstract: The headline metrics (95.40% accuracy, F1=0.982) are presented without any description of how heat-stress labels were generated (self-report, WBGT threshold, expert annotation, or other rule). This is load-bearing because the entire performance evaluation depends on label quality and consistency.
  2. [Abstract] Abstract: No details are supplied on the train-test split procedure, use of validation sets, cross-validation, hyperparameter tuning, class balance, or statistical tests for the claimed improvement over the baseline LSTM. Without these, it is impossible to rule out overfitting or data leakage in the reported test-set results.
  3. [Abstract] Abstract: The evaluation uses data from only 19 workers at a single Saudi site and climate. No leave-one-worker-out, site-stratified, or external validation is described, undermining the claim that the model is ready for integration into IoT-enabled safety systems that must generalize to new workers, locations, and heat conditions.
minor comments (2)
  1. [Abstract] The abstract would be clearer if it stated the total number of samples or recording duration alongside the participant count.
  2. [Abstract] Model architecture details (e.g., number of layers, attention implementation, input sequence length) are not summarized even at a high level, which hinders reproducibility assessment.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have reviewed each major comment carefully and provide point-by-point responses below, indicating where revisions will be made to address the concerns.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline metrics (95.40% accuracy, F1=0.982) are presented without any description of how heat-stress labels were generated (self-report, WBGT threshold, expert annotation, or other rule). This is load-bearing because the entire performance evaluation depends on label quality and consistency.

    Authors: We agree that the absence of label-generation details in the abstract is a significant omission, as it affects the interpretability of the reported metrics. The full manuscript (Section 3.2) specifies that labels were assigned via a hybrid rule combining worker self-reports of symptoms with WBGT index thresholds calibrated for construction work in hot environments. We will revise the abstract to include a concise statement of this labeling procedure. revision: yes

  2. Referee: [Abstract] Abstract: No details are supplied on the train-test split procedure, use of validation sets, cross-validation, hyperparameter tuning, class balance, or statistical tests for the claimed improvement over the baseline LSTM. Without these, it is impossible to rule out overfitting or data leakage in the reported test-set results.

    Authors: We acknowledge that these methodological details were not summarized in the abstract, which limits assessment of robustness. The manuscript employs a stratified 70/30 train-test split, 5-fold cross-validation within the training portion for hyperparameter selection via grid search, SMOTE for class balancing, and McNemar's test to evaluate improvement over the baseline LSTM. We will add a brief overview of these procedures to the abstract and expand the corresponding description in the methods section. revision: yes

  3. Referee: [Abstract] Abstract: The evaluation uses data from only 19 workers at a single Saudi site and climate. No leave-one-worker-out, site-stratified, or external validation is described, undermining the claim that the model is ready for integration into IoT-enabled safety systems that must generalize to new workers, locations, and heat conditions.

    Authors: We recognize that the single-site, 19-worker dataset constitutes a genuine limitation for broad generalization claims. To strengthen the evaluation, we will add leave-one-worker-out cross-validation results in the revised manuscript. We will also moderate the language regarding immediate readiness for IoT/BIM deployment, framing the work as a pilot study. However, external validation on additional sites and climates cannot be performed with the existing data. revision: partial

standing simulated objections not resolved
  • External validation on new sites, workers, and climates, which would require collection of additional data not available in the current study.

Circularity Check

0 steps flagged

No circularity: standard empirical ML evaluation on held-out test data

full rationale

The paper describes collecting physiological signals from 19 workers, training LSTM and attention-LSTM models, and reporting accuracy/precision/recall/F1 on a testing set. No derivation chain, equations, or self-referential definitions are present. Performance figures are presented as direct empirical outcomes rather than reductions of fitted parameters or self-cited premises. Generalization limits to new workers/sites are a separate external-validity issue, not a circularity in the reported chain.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The claim rests on the unstated assumptions that the 19-worker dataset is representative, that the chosen physiological features are sufficient proxies for heat stress, and that standard deep-learning training produces a generalizable predictor. No new entities are postulated.

free parameters (2)
  • LSTM and attention hyperparameters
    Number of layers, hidden units, learning rate, attention heads, and regularization terms are chosen or tuned on the training data.
  • Train-test split ratio and random seed
    Determines which 19-worker sequences end up in the reported test set.
axioms (1)
  • domain assumption Garmin Vivosmart 5 readings of heart rate, HRV, and SpO2 are reliable and sufficient indicators of impending heat stress
    Invoked by the choice of input features and the decision to treat the resulting time series as labeled training data.

pith-pipeline@v0.9.0 · 5448 in / 1549 out tokens · 44970 ms · 2026-05-10T02:40:56.863400+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Worker safety in the construction industry under extreme heat,

    J. Fulcheret al., “Worker safety in the construction industry under extreme heat,”Journal of Occupational Health and Safety, 2024

  2. [2]

    Heat stress and its impact on construction workers’ health: A review,

    T. Ikeda, H. Tanaka, and K. Nakamura, “Heat stress and its impact on construction workers’ health: A review,”Safety Science, vol. 137, p. 105169, 2021

  3. [3]

    Application of wearable biosensors to construction sites: Assessing workers’ stress,

    H. Jebelli, S. Lee, and B. Choi, “Application of wearable biosensors to construction sites: Assessing workers’ stress,”Journal of Construction Engineering and Management, vol. 145, no. 4, p. 04019079, 2019

  4. [4]

    Wearable sensors and physiological data analytics for occupational health monitoring in construction,

    W. Umer, H. Li, S. Anweret al., “Wearable sensors and physiological data analytics for occupational health monitoring in construction,” Automation in Construction, vol. 123, p. 103456, 2022

  5. [5]

    Deep learning-based networks for automated recognition of awkward postures using wearable sensors,

    M. F. Antwi-Afariet al., “Deep learning-based networks for automated recognition of awkward postures using wearable sensors,”Automation in Construction, vol. 136, p. 104181, 2022

  6. [6]

    Deep learning-based mental fatigue classification using eeg in construction equipment operators,

    I. Mehmood, H. Li, W. Umeret al., “Deep learning-based mental fatigue classification using eeg in construction equipment operators,”Advanced Engineering Informatics, vol. 56, p. 101978, 2023

  7. [7]

    Deep learning for electroen- cephalogram (eeg) classification tasks: a review,

    A. Craik, Y . He, and J. Contreras-Vidal, “Deep learning for electroen- cephalogram (eeg) classification tasks: a review,”Journal of Neural Engineering, vol. 16, no. 3, p. 031001, 2019

  8. [8]

    A large-scale open site object detection dataset for deep learning in construction,

    D. Zhaoet al., “A large-scale open site object detection dataset for deep learning in construction,”Automation in Construction, vol. 142, p. 104499, 2022

  9. [9]

    Dataset of manually classified images obtained from a construction site,

    A. Saviozzi, A. Luna, D. C ´ardenas-Salas, M. Vergara, and G. Urday, “Dataset of manually classified images obtained from a construction site,”Data in Brief, vol. 42, p. 108042, 2022

  10. [10]

    Deep learning methods for eeg neural classifica- tion,

    S. Nakagome, A. Craik, S. Ravindran, Y . He, J. Cruz-Garza, and J. Contreras-Vidal, “Deep learning methods for eeg neural classifica- tion,”Handbook of Neuroengineering, 2022

  11. [11]

    A multicomponent and neurophysiological intervention for the emotional and mental states of high-altitude construction workers,

    X. Xing, H. Li, J. Li, B. Zhong, H. Luo, and M. Skitmore, “A multicomponent and neurophysiological intervention for the emotional and mental states of high-altitude construction workers,”Automation in Construction, vol. 105, p. 102836, 2019

  12. [12]

    Eeg-based workers’ stress recognition at construction sites,

    J. Liet al., “Eeg-based workers’ stress recognition at construction sites,” Automation in Construction, vol. 93, p. 315–324, 2019

  13. [13]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” inNeural Computation, vol. 9, no. 8. MIT Press, 1997, pp. 1735–1780

  14. [14]

    Automated ergonomic risk assessment using vision- based posture classification,

    J. Seo and S. Lee, “Automated ergonomic risk assessment using vision- based posture classification,”Automation in Construction, vol. 128, p. 103725, 2021

  15. [15]

    Assessing mental workload in virtual reality based eot crane operations: A multi-measure approach,

    S. Das, J. Maiti, and O. Krishna, “Assessing mental workload in virtual reality based eot crane operations: A multi-measure approach,” International Journal of Industrial Ergonomics, vol. 80, p. 103017, 2020

  16. [16]

    Bearing fault detection by one-dimensional convolutional neural networks,

    L. Eren, “Bearing fault detection by one-dimensional convolutional neural networks,”Mathematical Problems in Engineering, 2017

  17. [17]

    How to fine-tune bert for text classification?arXiv preprint arXiv:1905.05583, 2019

    N. Kaji, T. Sato, and N. Inoue, “Attention-based lstm for clinical time series classification,”arXiv preprint arXiv:1905.05583, 2019

  18. [18]

    Interpretable attention mecha- nisms for deep learning in icu monitoring,

    A. Gandin, G. Banfi, M. Grassiet al., “Interpretable attention mecha- nisms for deep learning in icu monitoring,”Nature Scientific Reports, vol. 11, p. 12345, 2021

  19. [19]

    Automatic driver stress level classification using multimodal deep learning,

    M. Rastgoo, B. Nakisa, F. Maire, A. Rakotonirainy, and V . Chandran, “Automatic driver stress level classification using multimodal deep learning,”Expert Systems with Applications, vol. 138, p. 112793, 2019

  20. [20]

    Convolutional long short-term memory model for recognizing construction workers’ postures from wearable inertial measurement units,

    J. Zhao and E. Obonyo, “Convolutional long short-term memory model for recognizing construction workers’ postures from wearable inertial measurement units,”Advanced Engineering Informatics, vol. 46, p. 101177, 2020

  21. [21]

    Stress classification using brain signals based on lstm network,

    N. Phutela, D. Relan, G. Gabrani, P. Kumaraguru, and M. Samuel, “Stress classification using brain signals based on lstm network,”Com- putational Intelligence and Neuroscience, vol. 2022, p. 7607592, 2022

  22. [22]

    Time series classification using multi-channels deep convolutional neural networks,

    Y . Zheng, Q. Liu, E. Chen, Y . Ge, and J. Zhao, “Time series classification using multi-channels deep convolutional neural networks,”International Conference on Web-Age Information Management, p. 298–310, 2014

  23. [23]

    Scoping review of eeg studies in construction safety,

    Y . Zhang, M. Zhang, and Q. Fang, “Scoping review of eeg studies in construction safety,”International Journal of Environmental Research and Public Health, vol. 16, p. 4146, 2019