Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics

Amir Khan; Syed Sajid Ullah

arxiv: 2604.19559 · v1 · submitted 2026-04-21 · 💻 cs.AI · cs.CL· cs.LG

Enhancing Construction Worker Safety in Extreme Heat: A Machine Learning Approach Utilizing Wearable Technology for Predictive Health Analytics

Syed Sajid Ullah , Amir Khan This is my paper

Pith reviewed 2026-05-10 02:40 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords construction safetyheat stresswearable sensorsLSTM networksattention mechanismsmachine learningphysiological datapredictive modeling

0 comments

The pith

An attention-based LSTM model predicts heat stress in construction workers from smartwatch data with 95.4 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Construction workers in hot environments risk heat stress, but few tools turn real-time body data into safety warnings. Researchers collected heart rate, heart rate variability, and oxygen levels from 19 workers in Saudi Arabia using Garmin smartwatches. They trained a baseline LSTM and an attention-based LSTM to forecast heat stress episodes. The attention version reached 95.40 percent test accuracy with precision, recall, and F1 scores of 0.982, cutting errors versus the baseline. This setup supports embedding predictions into connected safety platforms and building information models for faster responses on site.

Core claim

By monitoring physiological metrics with wearable devices, the authors show that an attention-based long short-term memory network can classify heat stress among construction workers at 95.40% accuracy, delivering results interpretable enough for practical safety applications.

What carries the argument

Attention-based LSTM model that weights important segments of time-series physiological signals to predict heat stress.

Load-bearing premise

Signals recorded from nineteen workers in a single region provide enough variety to train a predictor that works for other people and hotter or different job sites.

What would settle it

Apply the trained model to physiological data from construction workers in a different country or climate and check whether accuracy stays near 95 percent or falls sharply.

Figures

Figures reproduced from arXiv: 2604.19559 by Amir Khan, Syed Sajid Ullah.

**Figure 2.** Figure 2: Heat stress prediction framework from wearable data collection to model evaluation. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Experimental procedure, monitored tasks, and two-session wearable data collection protocol. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Confusion matrix of the baseline LSTM model. The model shows [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Confusion matrix of the proposed LSTM-AM model. Compared with [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: ROC curves for the baseline LSTM and LSTM-AM models, with [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

Construction workers are highly vulnerable to heat stress, yet tools that translate real-time physiological data into actionable safety intelligence remain scarce. This study addresses this gap by developing and evaluating deep learning models, specifically a baseline Long Short-Term Memory (LSTM) network and an attention-based LSTM, to predict heat stress among 19 workers in Saudi Arabia. Using Garmin Vivosmart 5 smartwatches to monitor metrics such as heart rate, HRV, and oxygen saturation, the attention-based model outperformed the baseline, achieving 95.40% testing accuracy and significantly reducing false positives and negatives. With precision, recall, and F1 scores of 0.982, this approach not only improves predictive performance but also offers interpretable results suitable for integration into IoT-enabled safety systems and BIM dashboards, advancing proactive, informatics-driven safety management in the construction industry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gets 95% test accuracy from an attention LSTM on Garmin data from 19 Saudi construction workers for heat-stress prediction, but missing details on labeling and validation make the generalization claim hard to trust.

read the letter

The main point is that this applies a standard attention LSTM to wearable sensor data for a narrow but practical safety task in hot-climate construction work. They monitor heart rate, HRV, and SpO2 with Garmin Vivosmart 5 watches on 19 workers, compare the attention model to a plain LSTM baseline, and report 95.4% test accuracy plus 0.982 precision/recall/F1 while noting lower false positives and negatives. The write-up also sketches how the outputs could feed into IoT safety systems or BIM dashboards, which is a reasonable applied angle. That part is straightforward and shows they thought about deployment context rather than just chasing metrics. The numbers on the held-out test set look strong on paper and the baseline comparison is a minimal but useful check. The soft spots sit in the evaluation setup. With only 19 workers from a single region and climate, any model risks capturing site- or person-specific patterns instead of transferable heat-stress signals. The abstract gives no information on how heat-stress labels were created, whether the train-test split respected time or worker boundaries, or if any form of cross-validation or external hold-out was used. Those gaps matter because the claimed safety value depends on the model working on new workers and different heat conditions. Without that evidence the high accuracy stays local. This kind of paper is mainly for researchers or practitioners already working on occupational health monitoring or applied time-series models in safety settings. A reader who wants a concrete example of wearable-plus-LSTM for heat stress can pull ideas from it, but anyone needing a validated, ready-to-deploy system will have to do extra work. I would send it for peer review. The application is concrete and the modeling choices are transparent enough that referees can ask the right questions about validation and labeling; addressing those would turn it into something more usable.

Referee Report

3 major / 2 minor

Summary. The paper develops and evaluates LSTM and attention-based LSTM models to predict heat stress from wearable physiological signals (heart rate, HRV, oxygen saturation) collected via Garmin Vivosmart 5 devices from 19 construction workers in Saudi Arabia. It reports that the attention-based model achieves 95.40% test accuracy, precision/recall/F1 of 0.982, outperforms the baseline LSTM, reduces false positives/negatives, and provides interpretable outputs suitable for IoT/BIM safety integration.

Significance. If the performance claims hold under rigorous validation, the work could support real-time, data-driven heat-stress alerts in construction, advancing informatics-based safety management. The attention mechanism's potential for interpretability is a positive feature for practical deployment. However, the small single-site sample and missing methodological details currently limit claims of broad applicability or superiority.

major comments (3)

[Abstract] Abstract: The headline metrics (95.40% accuracy, F1=0.982) are presented without any description of how heat-stress labels were generated (self-report, WBGT threshold, expert annotation, or other rule). This is load-bearing because the entire performance evaluation depends on label quality and consistency.
[Abstract] Abstract: No details are supplied on the train-test split procedure, use of validation sets, cross-validation, hyperparameter tuning, class balance, or statistical tests for the claimed improvement over the baseline LSTM. Without these, it is impossible to rule out overfitting or data leakage in the reported test-set results.
[Abstract] Abstract: The evaluation uses data from only 19 workers at a single Saudi site and climate. No leave-one-worker-out, site-stratified, or external validation is described, undermining the claim that the model is ready for integration into IoT-enabled safety systems that must generalize to new workers, locations, and heat conditions.

minor comments (2)

[Abstract] The abstract would be clearer if it stated the total number of samples or recording duration alongside the participant count.
[Abstract] Model architecture details (e.g., number of layers, attention implementation, input sequence length) are not summarized even at a high level, which hinders reproducibility assessment.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have reviewed each major comment carefully and provide point-by-point responses below, indicating where revisions will be made to address the concerns.

read point-by-point responses

Referee: [Abstract] Abstract: The headline metrics (95.40% accuracy, F1=0.982) are presented without any description of how heat-stress labels were generated (self-report, WBGT threshold, expert annotation, or other rule). This is load-bearing because the entire performance evaluation depends on label quality and consistency.

Authors: We agree that the absence of label-generation details in the abstract is a significant omission, as it affects the interpretability of the reported metrics. The full manuscript (Section 3.2) specifies that labels were assigned via a hybrid rule combining worker self-reports of symptoms with WBGT index thresholds calibrated for construction work in hot environments. We will revise the abstract to include a concise statement of this labeling procedure. revision: yes
Referee: [Abstract] Abstract: No details are supplied on the train-test split procedure, use of validation sets, cross-validation, hyperparameter tuning, class balance, or statistical tests for the claimed improvement over the baseline LSTM. Without these, it is impossible to rule out overfitting or data leakage in the reported test-set results.

Authors: We acknowledge that these methodological details were not summarized in the abstract, which limits assessment of robustness. The manuscript employs a stratified 70/30 train-test split, 5-fold cross-validation within the training portion for hyperparameter selection via grid search, SMOTE for class balancing, and McNemar's test to evaluate improvement over the baseline LSTM. We will add a brief overview of these procedures to the abstract and expand the corresponding description in the methods section. revision: yes
Referee: [Abstract] Abstract: The evaluation uses data from only 19 workers at a single Saudi site and climate. No leave-one-worker-out, site-stratified, or external validation is described, undermining the claim that the model is ready for integration into IoT-enabled safety systems that must generalize to new workers, locations, and heat conditions.

Authors: We recognize that the single-site, 19-worker dataset constitutes a genuine limitation for broad generalization claims. To strengthen the evaluation, we will add leave-one-worker-out cross-validation results in the revised manuscript. We will also moderate the language regarding immediate readiness for IoT/BIM deployment, framing the work as a pilot study. However, external validation on additional sites and climates cannot be performed with the existing data. revision: partial

standing simulated objections not resolved

External validation on new sites, workers, and climates, which would require collection of additional data not available in the current study.

Circularity Check

0 steps flagged

No circularity: standard empirical ML evaluation on held-out test data

full rationale

The paper describes collecting physiological signals from 19 workers, training LSTM and attention-LSTM models, and reporting accuracy/precision/recall/F1 on a testing set. No derivation chain, equations, or self-referential definitions are present. Performance figures are presented as direct empirical outcomes rather than reductions of fitted parameters or self-cited premises. Generalization limits to new workers/sites are a separate external-validity issue, not a circularity in the reported chain.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The claim rests on the unstated assumptions that the 19-worker dataset is representative, that the chosen physiological features are sufficient proxies for heat stress, and that standard deep-learning training produces a generalizable predictor. No new entities are postulated.

free parameters (2)

LSTM and attention hyperparameters
Number of layers, hidden units, learning rate, attention heads, and regularization terms are chosen or tuned on the training data.
Train-test split ratio and random seed
Determines which 19-worker sequences end up in the reported test set.

axioms (1)

domain assumption Garmin Vivosmart 5 readings of heart rate, HRV, and SpO2 are reliable and sufficient indicators of impending heat stress
Invoked by the choice of input features and the decision to treat the resulting time series as labeled training data.

pith-pipeline@v0.9.0 · 5448 in / 1549 out tokens · 44970 ms · 2026-05-10T02:40:56.863400+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

[1]

Worker safety in the construction industry under extreme heat,

J. Fulcheret al., “Worker safety in the construction industry under extreme heat,”Journal of Occupational Health and Safety, 2024

work page 2024
[2]

Heat stress and its impact on construction workers’ health: A review,

T. Ikeda, H. Tanaka, and K. Nakamura, “Heat stress and its impact on construction workers’ health: A review,”Safety Science, vol. 137, p. 105169, 2021

work page 2021
[3]

Application of wearable biosensors to construction sites: Assessing workers’ stress,

H. Jebelli, S. Lee, and B. Choi, “Application of wearable biosensors to construction sites: Assessing workers’ stress,”Journal of Construction Engineering and Management, vol. 145, no. 4, p. 04019079, 2019

work page 2019
[4]

Wearable sensors and physiological data analytics for occupational health monitoring in construction,

W. Umer, H. Li, S. Anweret al., “Wearable sensors and physiological data analytics for occupational health monitoring in construction,” Automation in Construction, vol. 123, p. 103456, 2022

work page 2022
[5]

Deep learning-based networks for automated recognition of awkward postures using wearable sensors,

M. F. Antwi-Afariet al., “Deep learning-based networks for automated recognition of awkward postures using wearable sensors,”Automation in Construction, vol. 136, p. 104181, 2022

work page 2022
[6]

Deep learning-based mental fatigue classification using eeg in construction equipment operators,

I. Mehmood, H. Li, W. Umeret al., “Deep learning-based mental fatigue classification using eeg in construction equipment operators,”Advanced Engineering Informatics, vol. 56, p. 101978, 2023

work page 2023
[7]

Deep learning for electroen- cephalogram (eeg) classification tasks: a review,

A. Craik, Y . He, and J. Contreras-Vidal, “Deep learning for electroen- cephalogram (eeg) classification tasks: a review,”Journal of Neural Engineering, vol. 16, no. 3, p. 031001, 2019

work page 2019
[8]

A large-scale open site object detection dataset for deep learning in construction,

D. Zhaoet al., “A large-scale open site object detection dataset for deep learning in construction,”Automation in Construction, vol. 142, p. 104499, 2022

work page 2022
[9]

Dataset of manually classified images obtained from a construction site,

A. Saviozzi, A. Luna, D. C ´ardenas-Salas, M. Vergara, and G. Urday, “Dataset of manually classified images obtained from a construction site,”Data in Brief, vol. 42, p. 108042, 2022

work page 2022
[10]

Deep learning methods for eeg neural classifica- tion,

S. Nakagome, A. Craik, S. Ravindran, Y . He, J. Cruz-Garza, and J. Contreras-Vidal, “Deep learning methods for eeg neural classifica- tion,”Handbook of Neuroengineering, 2022

work page 2022
[11]

A multicomponent and neurophysiological intervention for the emotional and mental states of high-altitude construction workers,

X. Xing, H. Li, J. Li, B. Zhong, H. Luo, and M. Skitmore, “A multicomponent and neurophysiological intervention for the emotional and mental states of high-altitude construction workers,”Automation in Construction, vol. 105, p. 102836, 2019

work page 2019
[12]

Eeg-based workers’ stress recognition at construction sites,

J. Liet al., “Eeg-based workers’ stress recognition at construction sites,” Automation in Construction, vol. 93, p. 315–324, 2019

work page 2019
[13]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” inNeural Computation, vol. 9, no. 8. MIT Press, 1997, pp. 1735–1780

work page 1997
[14]

Automated ergonomic risk assessment using vision- based posture classification,

J. Seo and S. Lee, “Automated ergonomic risk assessment using vision- based posture classification,”Automation in Construction, vol. 128, p. 103725, 2021

work page 2021
[15]

Assessing mental workload in virtual reality based eot crane operations: A multi-measure approach,

S. Das, J. Maiti, and O. Krishna, “Assessing mental workload in virtual reality based eot crane operations: A multi-measure approach,” International Journal of Industrial Ergonomics, vol. 80, p. 103017, 2020

work page 2020
[16]

Bearing fault detection by one-dimensional convolutional neural networks,

L. Eren, “Bearing fault detection by one-dimensional convolutional neural networks,”Mathematical Problems in Engineering, 2017

work page 2017
[17]

How to ﬁne-tune bert for text classiﬁcation?arXiv preprint arXiv:1905.05583, 2019

N. Kaji, T. Sato, and N. Inoue, “Attention-based lstm for clinical time series classification,”arXiv preprint arXiv:1905.05583, 2019

work page arXiv 1905
[18]

Interpretable attention mecha- nisms for deep learning in icu monitoring,

A. Gandin, G. Banfi, M. Grassiet al., “Interpretable attention mecha- nisms for deep learning in icu monitoring,”Nature Scientific Reports, vol. 11, p. 12345, 2021

work page 2021
[19]

Automatic driver stress level classification using multimodal deep learning,

M. Rastgoo, B. Nakisa, F. Maire, A. Rakotonirainy, and V . Chandran, “Automatic driver stress level classification using multimodal deep learning,”Expert Systems with Applications, vol. 138, p. 112793, 2019

work page 2019
[20]

Convolutional long short-term memory model for recognizing construction workers’ postures from wearable inertial measurement units,

J. Zhao and E. Obonyo, “Convolutional long short-term memory model for recognizing construction workers’ postures from wearable inertial measurement units,”Advanced Engineering Informatics, vol. 46, p. 101177, 2020

work page 2020
[21]

Stress classification using brain signals based on lstm network,

N. Phutela, D. Relan, G. Gabrani, P. Kumaraguru, and M. Samuel, “Stress classification using brain signals based on lstm network,”Com- putational Intelligence and Neuroscience, vol. 2022, p. 7607592, 2022

work page 2022
[22]

Time series classification using multi-channels deep convolutional neural networks,

Y . Zheng, Q. Liu, E. Chen, Y . Ge, and J. Zhao, “Time series classification using multi-channels deep convolutional neural networks,”International Conference on Web-Age Information Management, p. 298–310, 2014

work page 2014
[23]

Scoping review of eeg studies in construction safety,

Y . Zhang, M. Zhang, and Q. Fang, “Scoping review of eeg studies in construction safety,”International Journal of Environmental Research and Public Health, vol. 16, p. 4146, 2019

work page 2019

[1] [1]

Worker safety in the construction industry under extreme heat,

J. Fulcheret al., “Worker safety in the construction industry under extreme heat,”Journal of Occupational Health and Safety, 2024

work page 2024

[2] [2]

Heat stress and its impact on construction workers’ health: A review,

T. Ikeda, H. Tanaka, and K. Nakamura, “Heat stress and its impact on construction workers’ health: A review,”Safety Science, vol. 137, p. 105169, 2021

work page 2021

[3] [3]

Application of wearable biosensors to construction sites: Assessing workers’ stress,

H. Jebelli, S. Lee, and B. Choi, “Application of wearable biosensors to construction sites: Assessing workers’ stress,”Journal of Construction Engineering and Management, vol. 145, no. 4, p. 04019079, 2019

work page 2019

[4] [4]

Wearable sensors and physiological data analytics for occupational health monitoring in construction,

W. Umer, H. Li, S. Anweret al., “Wearable sensors and physiological data analytics for occupational health monitoring in construction,” Automation in Construction, vol. 123, p. 103456, 2022

work page 2022

[5] [5]

Deep learning-based networks for automated recognition of awkward postures using wearable sensors,

M. F. Antwi-Afariet al., “Deep learning-based networks for automated recognition of awkward postures using wearable sensors,”Automation in Construction, vol. 136, p. 104181, 2022

work page 2022

[6] [6]

Deep learning-based mental fatigue classification using eeg in construction equipment operators,

I. Mehmood, H. Li, W. Umeret al., “Deep learning-based mental fatigue classification using eeg in construction equipment operators,”Advanced Engineering Informatics, vol. 56, p. 101978, 2023

work page 2023

[7] [7]

Deep learning for electroen- cephalogram (eeg) classification tasks: a review,

A. Craik, Y . He, and J. Contreras-Vidal, “Deep learning for electroen- cephalogram (eeg) classification tasks: a review,”Journal of Neural Engineering, vol. 16, no. 3, p. 031001, 2019

work page 2019

[8] [8]

A large-scale open site object detection dataset for deep learning in construction,

D. Zhaoet al., “A large-scale open site object detection dataset for deep learning in construction,”Automation in Construction, vol. 142, p. 104499, 2022

work page 2022

[9] [9]

Dataset of manually classified images obtained from a construction site,

A. Saviozzi, A. Luna, D. C ´ardenas-Salas, M. Vergara, and G. Urday, “Dataset of manually classified images obtained from a construction site,”Data in Brief, vol. 42, p. 108042, 2022

work page 2022

[10] [10]

Deep learning methods for eeg neural classifica- tion,

S. Nakagome, A. Craik, S. Ravindran, Y . He, J. Cruz-Garza, and J. Contreras-Vidal, “Deep learning methods for eeg neural classifica- tion,”Handbook of Neuroengineering, 2022

work page 2022

[11] [11]

A multicomponent and neurophysiological intervention for the emotional and mental states of high-altitude construction workers,

X. Xing, H. Li, J. Li, B. Zhong, H. Luo, and M. Skitmore, “A multicomponent and neurophysiological intervention for the emotional and mental states of high-altitude construction workers,”Automation in Construction, vol. 105, p. 102836, 2019

work page 2019

[12] [12]

Eeg-based workers’ stress recognition at construction sites,

J. Liet al., “Eeg-based workers’ stress recognition at construction sites,” Automation in Construction, vol. 93, p. 315–324, 2019

work page 2019

[13] [13]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” inNeural Computation, vol. 9, no. 8. MIT Press, 1997, pp. 1735–1780

work page 1997

[14] [14]

Automated ergonomic risk assessment using vision- based posture classification,

J. Seo and S. Lee, “Automated ergonomic risk assessment using vision- based posture classification,”Automation in Construction, vol. 128, p. 103725, 2021

work page 2021

[15] [15]

Assessing mental workload in virtual reality based eot crane operations: A multi-measure approach,

S. Das, J. Maiti, and O. Krishna, “Assessing mental workload in virtual reality based eot crane operations: A multi-measure approach,” International Journal of Industrial Ergonomics, vol. 80, p. 103017, 2020

work page 2020

[16] [16]

Bearing fault detection by one-dimensional convolutional neural networks,

L. Eren, “Bearing fault detection by one-dimensional convolutional neural networks,”Mathematical Problems in Engineering, 2017

work page 2017

[17] [17]

How to ﬁne-tune bert for text classiﬁcation?arXiv preprint arXiv:1905.05583, 2019

N. Kaji, T. Sato, and N. Inoue, “Attention-based lstm for clinical time series classification,”arXiv preprint arXiv:1905.05583, 2019

work page arXiv 1905

[18] [18]

Interpretable attention mecha- nisms for deep learning in icu monitoring,

A. Gandin, G. Banfi, M. Grassiet al., “Interpretable attention mecha- nisms for deep learning in icu monitoring,”Nature Scientific Reports, vol. 11, p. 12345, 2021

work page 2021

[19] [19]

Automatic driver stress level classification using multimodal deep learning,

M. Rastgoo, B. Nakisa, F. Maire, A. Rakotonirainy, and V . Chandran, “Automatic driver stress level classification using multimodal deep learning,”Expert Systems with Applications, vol. 138, p. 112793, 2019

work page 2019

[20] [20]

Convolutional long short-term memory model for recognizing construction workers’ postures from wearable inertial measurement units,

J. Zhao and E. Obonyo, “Convolutional long short-term memory model for recognizing construction workers’ postures from wearable inertial measurement units,”Advanced Engineering Informatics, vol. 46, p. 101177, 2020

work page 2020

[21] [21]

Stress classification using brain signals based on lstm network,

N. Phutela, D. Relan, G. Gabrani, P. Kumaraguru, and M. Samuel, “Stress classification using brain signals based on lstm network,”Com- putational Intelligence and Neuroscience, vol. 2022, p. 7607592, 2022

work page 2022

[22] [22]

Time series classification using multi-channels deep convolutional neural networks,

Y . Zheng, Q. Liu, E. Chen, Y . Ge, and J. Zhao, “Time series classification using multi-channels deep convolutional neural networks,”International Conference on Web-Age Information Management, p. 298–310, 2014

work page 2014

[23] [23]

Scoping review of eeg studies in construction safety,

Y . Zhang, M. Zhang, and Q. Fang, “Scoping review of eeg studies in construction safety,”International Journal of Environmental Research and Public Health, vol. 16, p. 4146, 2019

work page 2019