Leveraging Imperfect Medical Data: A Manifold-Consistent Spatio-Temporal Network for Sensor-based Human Activity Recognition
Pith reviewed 2026-05-09 20:32 UTC · model grok-4.3
The pith
A network maintains stable activity recognition by enforcing consistency across dual-level corrupted sensor views.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a Manifold-Consistent Spatio-Temporal Network equipped with dual-level corruption modeling and a dual-stream architecture can learn corruption-invariant semantic representations for human activity recognition by enforcing consistency of learned features across multiple physically and diffusion-corrupted views of the same sensor sequence.
What carries the argument
The Manifold-Consistent Spatio-Temporal Network (MCSTN) that combines dual-level corruption simulation with a dual-stream architecture separating long-term temporal dynamics from inter-sensor spatial correlations while enforcing representation consistency.
If this is right
- The model should deliver competitive accuracy on PAMAP2, Opportunity, and WISDM even when large fractions of the input measurements are missing or noisy.
- Separating temporal and spatial streams lets the network capture both long-range activity patterns and sensor-to-sensor relationships more cleanly than single-stream alternatives.
- Enforcing cross-view consistency removes the need for explicit denoising or imputation steps before classification.
- The same consistency principle could support deployment in continuous IoMT health monitoring where perfect sensor operation cannot be guaranteed.
Where Pith is reading between the lines
- The same dual-level corruption plus consistency approach might transfer directly to other multivariate time-series tasks such as gait analysis or vital-sign tracking.
- Replacing the simulated corruptions with actual logged failure patterns from clinical devices would offer a stronger test of whether the learned invariance generalizes beyond benchmarks.
- The dual-stream decoupling could be combined with attention mechanisms to further highlight which sensors or time windows remain reliable under heavy corruption.
Load-bearing premise
The simulated physical-level and diffusion-driven corruptions are representative enough of actual wearable sensor failures that consistency across them produces invariance to real imperfections.
What would settle it
If the MCSTN is applied to a new dataset containing genuine sensor dropouts and noise patterns outside the physical-plus-diffusion simulation and it loses its advantage over standard networks, the robustness claim would be refuted.
Figures
read the original abstract
Sensor-based Human Activity Recognition (HAR) has attracted increasing attention in medical and healthcare monitoring, particularly with the growth of Internet of Medical Things (IoMT). However, in real-world wearable sensing scenarios, IoMT signals are often corrupted by missing measurements, sensor failures, and environmental noise, which significantly degrade the performance of conventional deep learning models that assume clean and complete inputs. To address this challenge, we propose a Manifold-Consistent Spatio-Temporal Network (MCSTN) for robust HAR under imperfect sensing conditions. The proposed framework introduces a dual-level corruption modeling mechanism that simulates realistic sensor imperfections through both physical-level corruption and diffusion-driven continuous corruption. By enforcing representation consistency across multiple corrupted views, the model learns stable and corruption-invariant semantic representations. Furthermore, we design a dual-stream spatio-temporal architecture that explicitly decouples temporal dynamics modeling and spatial correlation learning. The temporal stream captures long-term activity dynamics, while the spatial stream models inter-sensor relationships, enabling more effective spatio-temporal representation learning. Extensive experiments on three widely used HAR benchmark datasets, PAMAP2, Opportunity, and WISDM, demonstrate that the proposed MCSTN achieves competitive performance compared with existing state-of-the-art methods, particularly under imperfect sensing conditions. These results validate the effectiveness and robustness of the proposed framework for real-world wearable IoMT sensing applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a Manifold-Consistent Spatio-Temporal Network (MCSTN) for sensor-based human activity recognition (HAR) under imperfect IoMT sensing conditions. It introduces dual-level corruption modeling (physical-level plus diffusion-driven) to simulate missing measurements, sensor failures, and noise; enforces representation consistency across multiple corrupted views to obtain invariant semantic features; and uses a dual-stream architecture that separates long-term temporal dynamics modeling from inter-sensor spatial correlation learning. Experiments on the PAMAP2, Opportunity, and WISDM benchmarks are claimed to show competitive performance against existing state-of-the-art methods, especially when inputs are corrupted.
Significance. If the central claims hold, the work would be significant for practical medical monitoring applications, where wearable sensor data routinely suffers from dropouts and noise. The explicit separation of temporal and spatial streams plus the cross-view consistency objective offer a concrete, implementable strategy for learning corruption-robust representations from spatio-temporal sensor streams. The empirical focus on three standard HAR benchmarks under controlled corruption provides a clear testbed for the approach.
major comments (2)
- [Method] Corruption modeling subsection: The dual-level (physical + diffusion-driven) corruption process is load-bearing for the robustness claim. The manuscript must provide evidence—such as quantitative comparison of missing-channel statistics, temporal burst patterns, or cross-sensor correlation distributions—that the synthetic corruptions match the failure modes observed in real IoMT deployments; without this, invariance learned on the simulated views supplies no guarantee of stability under genuine sensor imperfections.
- [Experiments] Experimental evaluation: The abstract asserts competitive performance on PAMAP2, Opportunity, and WISDM under imperfect conditions, yet the manuscript supplies no protocol details on the exact corruption rates and types applied at test time, the precise baseline implementations and hyper-parameters, or any statistical significance tests and error bars. These omissions render the performance claims unverifiable and weaken the central empirical argument.
minor comments (1)
- [Abstract] Abstract: Replace the generic phrase 'competitive performance' with concrete metrics (e.g., 'improves macro-F1 by 4.2 points at 40% missing rate on PAMAP2') to allow immediate assessment of the claimed gains.
Simulated Author's Rebuttal
We are grateful to the referee for the insightful comments that will help improve the clarity and rigor of our manuscript. We address the major concerns point-by-point below and commit to incorporating the suggested revisions.
read point-by-point responses
-
Referee: [Method] Corruption modeling subsection: The dual-level (physical + diffusion-driven) corruption process is load-bearing for the robustness claim. The manuscript must provide evidence—such as quantitative comparison of missing-channel statistics, temporal burst patterns, or cross-sensor correlation distributions—that the synthetic corruptions match the failure modes observed in real IoMT deployments; without this, invariance learned on the simulated views supplies no guarantee of stability under genuine sensor imperfections.
Authors: We thank the referee for highlighting this important aspect. While our dual-level corruption is designed to simulate common IoMT issues like missing measurements and noise based on domain knowledge, we agree that direct quantitative matching to real-world statistics would provide stronger validation. In the revised manuscript, we will add an analysis in the corruption modeling section or a supplementary appendix comparing our simulated corruption statistics (e.g., missing rates, burst durations) to those reported in literature on wearable sensor failures in medical IoT settings. This will include references to studies documenting real dropout patterns. We believe this addition will address the concern without requiring new data collection. revision: yes
-
Referee: [Experiments] Experimental evaluation: The abstract asserts competitive performance on PAMAP2, Opportunity, and WISDM under imperfect conditions, yet the manuscript supplies no protocol details on the exact corruption rates and types applied at test time, the precise baseline implementations and hyper-parameters, or any statistical significance tests and error bars. These omissions render the performance claims unverifiable and weaken the central empirical argument.
Authors: We apologize for the lack of detailed experimental protocols in the current version, which indeed hinders full reproducibility and verifiability. In the revised manuscript, we will expand the experimental section to include: (1) precise specifications of corruption parameters used during training and testing (e.g., percentages for physical-level missing sensors, diffusion noise levels); (2) full details on baseline methods, including their hyperparameters and any modifications for handling corrupted inputs; and (3) results with error bars from multiple random seeds along with statistical significance tests (e.g., paired t-tests) to support the performance claims. These changes will make the empirical evaluation transparent and robust. revision: yes
Circularity Check
Empirical neural network design with no self-referential derivation
full rationale
The paper presents an empirical architecture (MCSTN) whose core elements—dual-level corruption simulation, cross-view consistency loss, and dual-stream spatio-temporal streams—are introduced as modeling choices rather than derived quantities. Performance is measured on independent public benchmarks (PAMAP2, Opportunity, WISDM) under added synthetic noise; no equation or claim reduces the reported accuracy or invariance metric to a quantity defined in terms of the model's own fitted parameters or a self-citation chain. The derivation chain is therefore self-contained against external data.
Axiom & Free-Parameter Ledger
free parameters (2)
- corruption simulation parameters
- network hyperparameters
axioms (1)
- domain assumption Enforcing representation consistency across multiple corrupted views produces corruption-invariant semantic representations
invented entities (1)
-
Manifold-Consistent Spatio-Temporal Network (MCSTN)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
S. Singh, M. H. Anisi, A. Jindal, and D. Jarchi, “Smart multimodal In-bed pose estimation framework incorporating generative adversarial neural network,”IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 6, pp. 3379–3388, 2024
work page 2024
-
[2]
Q. Teng, W. Li, G. Hu, Y . Shu, and Y . Liu, “Innovative dual-decoupling CNN with layer-wise temporal-spatial attention for sensor-based human activity recognition,”IEEE Journal of Biomedical and Health Informat- ics, 2024
work page 2024
-
[3]
K. K. Patro, A. J. Prakash, J. P. Sahoo, S. Routray, A. Baihan, N. A. Samee, and G. Huang, “SMARTSeiz: deep learning with attention mechanism for accurate seizure recognition in iot healthcare devices,” IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 7, pp. 3810–3818, 2023
work page 2023
-
[4]
P. Kumar, S. Chauhan, and L. K. Awasthi, “Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions: P. Kumar et al.”Archives of Computational Methods in Engineering, vol. 31, no. 1, pp. 179–219, 2024
work page 2024
-
[5]
Real-time human activity recognition from accelerometer data using Convolutional Neural Networks,
A. Ignatov, “Real-time human activity recognition from accelerometer data using Convolutional Neural Networks,”Applied Soft Computing, vol. 62, pp. 915–922, Jan. 2018
work page 2018
-
[6]
M. Zeng, H. Gao, T. Yu, O. J. Mengshoel, H. Langseth, I. Lane, and X. Liu, “Understanding and improving recurrent networks for human activity recognition by continuous attention,” inProceedings of the 2018 ACM international symposium on wearable computers, 2018
work page 2018
-
[7]
Masked reconstruction based self-supervision for human activity recognition,
H. Haresamudram, A. Beedu, V . Agrawal, P. L. Grady, I. Essa, J. Hoff- man, and T. Pl ¨otz, “Masked reconstruction based self-supervision for human activity recognition,” inProceedings of the 2020 ACM Interna- tional Symposium on Wearable Computers, 2020, pp. 45–49
work page 2020
-
[8]
D. Cheng, L. Zhang, L. Qin, S. Wang, H. Wu, and A. Song, “MaskCAE: Masked convolutional AutoEncoder via sensor data reconstruction for self-supervised human activity recognition,”IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 5, pp. 2687–2698, 2024
work page 2024
-
[9]
M. Liu, D. Xiang, X. Cheng, X. Liu, D. Zhang, S. Chen, and C. S. Jensen, “Disentangling Imperfect: A Wavelet-Infused Multilevel Hetero- geneous Network for Human Activity Recognition in Flawed Wearable Sensor Data,”arXiv preprint arXiv:2402.09434, 2024
-
[10]
Human activity recognition using deep residual convolutional network based on wearable sensors,
X. Yu and M. A. Al-Qaness, “Human activity recognition using deep residual convolutional network based on wearable sensors,”IEEE Jour- nal of Biomedical and Health Informatics, 2024
work page 2024
-
[11]
Deep ConvLSTM with self-attention for human activity decoding using wearable sensors,
S. P. Singh, M. K. Sharma, A. Lay-Ekuakille, D. Gangwar, and S. Gupta, “Deep ConvLSTM with self-attention for human activity decoding using wearable sensors,”IEEE Sensors Journal, vol. 21, no. 6, pp. 8575–8582, 2020
work page 2020
-
[12]
Cross-attention enhanced pyramid multi-scale networks for sensor-based human activity recognition,
H. Pang, L. Zheng, and H. Fang, “Cross-attention enhanced pyramid multi-scale networks for sensor-based human activity recognition,”IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 5, pp. 2733– 2744, 2024
work page 2024
-
[13]
Using variational autoencoder to augment sparse time series datasets,
M. Goubeaud, P. Joußen, N. Gmyrek, F. Ghorban, L. Schelkes, and A. Kummert, “Using variational autoencoder to augment sparse time series datasets,” in2021 7th international conference on optimization and applications (ICOA). IEEE, 2021, pp. 1–6
work page 2021
-
[14]
Sensorgan: A novel data recovery approach for wearable human activity recognition,
D. Hussein and G. Bhat, “Sensorgan: A novel data recovery approach for wearable human activity recognition,”ACM Transactions on Embedded Computing Systems, vol. 23, no. 3, pp. 1–28, 2024
work page 2024
-
[15]
Introducing a new benchmarked dataset for activity monitoring,
A. Reiss and D. Stricker, “Introducing a new benchmarked dataset for activity monitoring,” in2012 16th international symposium on wearable computers. IEEE, 2012, pp. 108–109
work page 2012
-
[16]
Activity recognition us- ing cell phone accelerometers,
J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Activity recognition us- ing cell phone accelerometers,”ACM SigKDD Explorations Newsletter, vol. 12, no. 2, pp. 74–82, 2011
work page 2011
-
[17]
Collecting complex activity datasets in highly rich networked sensor environments,
D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. F ¨orster, G. Tr¨oster, P. Lukowicz, D. Bannach, G. Pirkl, A. Ferscha, and others, “Collecting complex activity datasets in highly rich networked sensor environments,” in2010 Seventh international conference on networked sensing systems (INSS). IEEE, 2010, pp. 233–240
work page 2010
-
[18]
Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition,
F. J. Ord ´o˜nez and D. Roggen, “Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition,” Sensors, vol. 16, no. 1, p. 115, Jan. 2016, number: 1. [Online]. Available: https://www.mdpi.com/1424-8220/16/1/115
work page 2016
-
[19]
An Improved Deep Convolutional LSTM for Human Activity Recognition Using Wearable Sensors,
N. Zhang, Y . Song, D. Fang, Z. Gao, and Y . Yan, “An Improved Deep Convolutional LSTM for Human Activity Recognition Using Wearable Sensors,”IEEE Sensors Journal, 2023
work page 2023
-
[20]
Patchhar: A mlp-like architecture for efficient activity recognition using wearables,
S. Wang, L. Zhang, X. Wang, W. Huang, H. Wu, and A. Song, “Patchhar: A mlp-like architecture for efficient activity recognition using wearables,”IEEE Transactions on Biometrics, Behavior, and Identity Science, 2024
work page 2024
-
[21]
M. Yao, L. Zhang, D. Cheng, L. Qin, X. Liu, Z. Fu, H. Wu, and A. Song, “Revisiting Large-Kernel CNN Design via Structural Re- Parameterization for Sensor-Based Human Activity Recognition,”IEEE Sensors Journal, 2024
work page 2024
-
[22]
M. A. Al-Qaness, A. Dahou, M. Abd Elaziz, and A. Helmi, “Multi- ResAtt: Multilevel residual network with attention for human activity recognition using wearable sensors,”IEEE Transactions on Industrial Informatics, vol. 19, no. 1, pp. 144–152, 2022
work page 2022
-
[23]
Robust Human Activity Recognition via Wearable Sensors Using Dynamic Gaussian Kernel Learning,
S. Wang, Y . Sun, Y . Sha, G. Yang, D. Cheng, L. Zhang, H. Wu, and A. Song, “Robust Human Activity Recognition via Wearable Sensors Using Dynamic Gaussian Kernel Learning,”IEEE Sensors Journal, 2024
work page 2024
-
[24]
W. Huang, L. Zhang, H. Wu, F. Min, and A. Song, “Channel- equalization-HAR: A light-weight convolutional neural network for wearable sensor based human activity recognition,”IEEE Transactions on Mobile Computing, vol. 22, no. 9, pp. 5064–5077, 2022
work page 2022
-
[25]
E. Essa and I. R. Abdelmaksoud, “Temporal-channel convolution with self-attention network for human activity recognition using wearable sensors,”Knowledge-Based Systems, vol. 278, p. 110867, Oct. 2023
work page 2023
-
[26]
Federated contrastive learning with feature- based distillation for human activity recognition,
Z. Xiao and H. Tong, “Federated contrastive learning with feature- based distillation for human activity recognition,”IEEE Transactions on Computational Social Systems, 2025
work page 2025
-
[27]
S. Li, T. Zhu, M. Nie, H. Ning, Z. Liu, and L. Chen, “P2LHAP: Wearable sensor-based human activity recognition, segmentation and forecast through Patch-to-Label Seq2Seq Transformer,”IEEE Internet of Things Journal, 2024
work page 2024
-
[28]
J. Liang, L. Zhang, C. Han, C. Bu, H. Wu, and A. Song, “A collaborative compression scheme for fast activity recognition on mobile devices via global compression ratio decision,”IEEE Transactions on Mobile Computing, vol. 23, no. 4, pp. 3259–3273, 2023
work page 2023
-
[29]
Multiresolution fusion convolutional network for open set human activity recognition,
J. Li, H. Xu, and Y . Wang, “Multiresolution fusion convolutional network for open set human activity recognition,”IEEE Internet of Things Journal, vol. 10, no. 13, pp. 11 369–11 382, 2023
work page 2023
-
[30]
M.-K. Yi, W.-K. Lee, and S. O. Hwang, “A human activity recognition method based on lightweight feature extraction combined with pruned and quantized CNN for wearable device,”IEEE Transactions on Con- sumer Electronics, vol. 69, no. 3, pp. 657–670, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.