Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics
Pith reviewed 2026-05-10 02:40 UTC · model grok-4.3
The pith
An attention-based LSTM model predicts heat stress in construction workers from smartwatch data with 95.4 percent accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By monitoring physiological metrics with wearable devices, the authors show that an attention-based long short-term memory network can classify heat stress among construction workers at 95.40% accuracy, delivering results interpretable enough for practical safety applications.
What carries the argument
Attention-based LSTM model that weights important segments of time-series physiological signals to predict heat stress.
Load-bearing premise
Signals recorded from nineteen workers in a single region provide enough variety to train a predictor that works for other people and hotter or different job sites.
What would settle it
Apply the trained model to physiological data from construction workers in a different country or climate and check whether accuracy stays near 95 percent or falls sharply.
Figures
read the original abstract
Construction workers are highly vulnerable to heat stress, yet tools that translate real-time physiological data into actionable safety intelligence remain scarce. This study addresses this gap by developing and evaluating deep learning models, specifically a baseline Long Short-Term Memory (LSTM) network and an attention-based LSTM, to predict heat stress among 19 workers in Saudi Arabia. Using Garmin Vivosmart 5 smartwatches to monitor metrics such as heart rate, HRV, and oxygen saturation, the attention-based model outperformed the baseline, achieving 95.40% testing accuracy and significantly reducing false positives and negatives. With precision, recall, and F1 scores of 0.982, this approach not only improves predictive performance but also offers interpretable results suitable for integration into IoT-enabled safety systems and BIM dashboards, advancing proactive, informatics-driven safety management in the construction industry.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops and evaluates LSTM and attention-based LSTM models to predict heat stress from wearable physiological signals (heart rate, HRV, oxygen saturation) collected via Garmin Vivosmart 5 devices from 19 construction workers in Saudi Arabia. It reports that the attention-based model achieves 95.40% test accuracy, precision/recall/F1 of 0.982, outperforms the baseline LSTM, reduces false positives/negatives, and provides interpretable outputs suitable for IoT/BIM safety integration.
Significance. If the performance claims hold under rigorous validation, the work could support real-time, data-driven heat-stress alerts in construction, advancing informatics-based safety management. The attention mechanism's potential for interpretability is a positive feature for practical deployment. However, the small single-site sample and missing methodological details currently limit claims of broad applicability or superiority.
major comments (3)
- [Abstract] Abstract: The headline metrics (95.40% accuracy, F1=0.982) are presented without any description of how heat-stress labels were generated (self-report, WBGT threshold, expert annotation, or other rule). This is load-bearing because the entire performance evaluation depends on label quality and consistency.
- [Abstract] Abstract: No details are supplied on the train-test split procedure, use of validation sets, cross-validation, hyperparameter tuning, class balance, or statistical tests for the claimed improvement over the baseline LSTM. Without these, it is impossible to rule out overfitting or data leakage in the reported test-set results.
- [Abstract] Abstract: The evaluation uses data from only 19 workers at a single Saudi site and climate. No leave-one-worker-out, site-stratified, or external validation is described, undermining the claim that the model is ready for integration into IoT-enabled safety systems that must generalize to new workers, locations, and heat conditions.
minor comments (2)
- [Abstract] The abstract would be clearer if it stated the total number of samples or recording duration alongside the participant count.
- [Abstract] Model architecture details (e.g., number of layers, attention implementation, input sequence length) are not summarized even at a high level, which hinders reproducibility assessment.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We have reviewed each major comment carefully and provide point-by-point responses below, indicating where revisions will be made to address the concerns.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline metrics (95.40% accuracy, F1=0.982) are presented without any description of how heat-stress labels were generated (self-report, WBGT threshold, expert annotation, or other rule). This is load-bearing because the entire performance evaluation depends on label quality and consistency.
Authors: We agree that the absence of label-generation details in the abstract is a significant omission, as it affects the interpretability of the reported metrics. The full manuscript (Section 3.2) specifies that labels were assigned via a hybrid rule combining worker self-reports of symptoms with WBGT index thresholds calibrated for construction work in hot environments. We will revise the abstract to include a concise statement of this labeling procedure. revision: yes
-
Referee: [Abstract] Abstract: No details are supplied on the train-test split procedure, use of validation sets, cross-validation, hyperparameter tuning, class balance, or statistical tests for the claimed improvement over the baseline LSTM. Without these, it is impossible to rule out overfitting or data leakage in the reported test-set results.
Authors: We acknowledge that these methodological details were not summarized in the abstract, which limits assessment of robustness. The manuscript employs a stratified 70/30 train-test split, 5-fold cross-validation within the training portion for hyperparameter selection via grid search, SMOTE for class balancing, and McNemar's test to evaluate improvement over the baseline LSTM. We will add a brief overview of these procedures to the abstract and expand the corresponding description in the methods section. revision: yes
-
Referee: [Abstract] Abstract: The evaluation uses data from only 19 workers at a single Saudi site and climate. No leave-one-worker-out, site-stratified, or external validation is described, undermining the claim that the model is ready for integration into IoT-enabled safety systems that must generalize to new workers, locations, and heat conditions.
Authors: We recognize that the single-site, 19-worker dataset constitutes a genuine limitation for broad generalization claims. To strengthen the evaluation, we will add leave-one-worker-out cross-validation results in the revised manuscript. We will also moderate the language regarding immediate readiness for IoT/BIM deployment, framing the work as a pilot study. However, external validation on additional sites and climates cannot be performed with the existing data. revision: partial
- External validation on new sites, workers, and climates, which would require collection of additional data not available in the current study.
Circularity Check
No circularity: standard empirical ML evaluation on held-out test data
full rationale
The paper describes collecting physiological signals from 19 workers, training LSTM and attention-LSTM models, and reporting accuracy/precision/recall/F1 on a testing set. No derivation chain, equations, or self-referential definitions are present. Performance figures are presented as direct empirical outcomes rather than reductions of fitted parameters or self-cited premises. Generalization limits to new workers/sites are a separate external-validity issue, not a circularity in the reported chain.
Axiom & Free-Parameter Ledger
free parameters (2)
- LSTM and attention hyperparameters
- Train-test split ratio and random seed
axioms (1)
- domain assumption Garmin Vivosmart 5 readings of heart rate, HRV, and SpO2 are reliable and sufficient indicators of impending heat stress
Reference graph
Works this paper leans on
-
[1]
Worker safety in the construction industry under extreme heat,
J. Fulcheret al., “Worker safety in the construction industry under extreme heat,”Journal of Occupational Health and Safety, 2024
work page 2024
-
[2]
Heat stress and its impact on construction workers’ health: A review,
T. Ikeda, H. Tanaka, and K. Nakamura, “Heat stress and its impact on construction workers’ health: A review,”Safety Science, vol. 137, p. 105169, 2021
work page 2021
-
[3]
Application of wearable biosensors to construction sites: Assessing workers’ stress,
H. Jebelli, S. Lee, and B. Choi, “Application of wearable biosensors to construction sites: Assessing workers’ stress,”Journal of Construction Engineering and Management, vol. 145, no. 4, p. 04019079, 2019
work page 2019
-
[4]
W. Umer, H. Li, S. Anweret al., “Wearable sensors and physiological data analytics for occupational health monitoring in construction,” Automation in Construction, vol. 123, p. 103456, 2022
work page 2022
-
[5]
Deep learning-based networks for automated recognition of awkward postures using wearable sensors,
M. F. Antwi-Afariet al., “Deep learning-based networks for automated recognition of awkward postures using wearable sensors,”Automation in Construction, vol. 136, p. 104181, 2022
work page 2022
-
[6]
Deep learning-based mental fatigue classification using eeg in construction equipment operators,
I. Mehmood, H. Li, W. Umeret al., “Deep learning-based mental fatigue classification using eeg in construction equipment operators,”Advanced Engineering Informatics, vol. 56, p. 101978, 2023
work page 2023
-
[7]
Deep learning for electroen- cephalogram (eeg) classification tasks: a review,
A. Craik, Y . He, and J. Contreras-Vidal, “Deep learning for electroen- cephalogram (eeg) classification tasks: a review,”Journal of Neural Engineering, vol. 16, no. 3, p. 031001, 2019
work page 2019
-
[8]
A large-scale open site object detection dataset for deep learning in construction,
D. Zhaoet al., “A large-scale open site object detection dataset for deep learning in construction,”Automation in Construction, vol. 142, p. 104499, 2022
work page 2022
-
[9]
Dataset of manually classified images obtained from a construction site,
A. Saviozzi, A. Luna, D. C ´ardenas-Salas, M. Vergara, and G. Urday, “Dataset of manually classified images obtained from a construction site,”Data in Brief, vol. 42, p. 108042, 2022
work page 2022
-
[10]
Deep learning methods for eeg neural classifica- tion,
S. Nakagome, A. Craik, S. Ravindran, Y . He, J. Cruz-Garza, and J. Contreras-Vidal, “Deep learning methods for eeg neural classifica- tion,”Handbook of Neuroengineering, 2022
work page 2022
-
[11]
X. Xing, H. Li, J. Li, B. Zhong, H. Luo, and M. Skitmore, “A multicomponent and neurophysiological intervention for the emotional and mental states of high-altitude construction workers,”Automation in Construction, vol. 105, p. 102836, 2019
work page 2019
-
[12]
Eeg-based workers’ stress recognition at construction sites,
J. Liet al., “Eeg-based workers’ stress recognition at construction sites,” Automation in Construction, vol. 93, p. 315–324, 2019
work page 2019
-
[13]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” inNeural Computation, vol. 9, no. 8. MIT Press, 1997, pp. 1735–1780
work page 1997
-
[14]
Automated ergonomic risk assessment using vision- based posture classification,
J. Seo and S. Lee, “Automated ergonomic risk assessment using vision- based posture classification,”Automation in Construction, vol. 128, p. 103725, 2021
work page 2021
-
[15]
Assessing mental workload in virtual reality based eot crane operations: A multi-measure approach,
S. Das, J. Maiti, and O. Krishna, “Assessing mental workload in virtual reality based eot crane operations: A multi-measure approach,” International Journal of Industrial Ergonomics, vol. 80, p. 103017, 2020
work page 2020
-
[16]
Bearing fault detection by one-dimensional convolutional neural networks,
L. Eren, “Bearing fault detection by one-dimensional convolutional neural networks,”Mathematical Problems in Engineering, 2017
work page 2017
-
[17]
How to fine-tune bert for text classification?arXiv preprint arXiv:1905.05583, 2019
N. Kaji, T. Sato, and N. Inoue, “Attention-based lstm for clinical time series classification,”arXiv preprint arXiv:1905.05583, 2019
-
[18]
Interpretable attention mecha- nisms for deep learning in icu monitoring,
A. Gandin, G. Banfi, M. Grassiet al., “Interpretable attention mecha- nisms for deep learning in icu monitoring,”Nature Scientific Reports, vol. 11, p. 12345, 2021
work page 2021
-
[19]
Automatic driver stress level classification using multimodal deep learning,
M. Rastgoo, B. Nakisa, F. Maire, A. Rakotonirainy, and V . Chandran, “Automatic driver stress level classification using multimodal deep learning,”Expert Systems with Applications, vol. 138, p. 112793, 2019
work page 2019
-
[20]
J. Zhao and E. Obonyo, “Convolutional long short-term memory model for recognizing construction workers’ postures from wearable inertial measurement units,”Advanced Engineering Informatics, vol. 46, p. 101177, 2020
work page 2020
-
[21]
Stress classification using brain signals based on lstm network,
N. Phutela, D. Relan, G. Gabrani, P. Kumaraguru, and M. Samuel, “Stress classification using brain signals based on lstm network,”Com- putational Intelligence and Neuroscience, vol. 2022, p. 7607592, 2022
work page 2022
-
[22]
Time series classification using multi-channels deep convolutional neural networks,
Y . Zheng, Q. Liu, E. Chen, Y . Ge, and J. Zhao, “Time series classification using multi-channels deep convolutional neural networks,”International Conference on Web-Age Information Management, p. 298–310, 2014
work page 2014
-
[23]
Scoping review of eeg studies in construction safety,
Y . Zhang, M. Zhang, and Q. Fang, “Scoping review of eeg studies in construction safety,”International Journal of Environmental Research and Public Health, vol. 16, p. 4146, 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.