pith. machine review for the scientific record. sign in

arxiv: 2604.25092 · v1 · submitted 2026-04-28 · 💻 cs.HC

Recognition: unknown

Feature Anchors for Time-Series Sensor-Based Human Activity Recognition

Authors on Pith no claims yet

Pith reviewed 2026-05-07 15:56 UTC · model grok-4.3

classification 💻 cs.HC
keywords human activity recognitiontime-series featuresfeature anchorswearable sensorsIMU datadeep learningtemporal conditioningsensor-based HAR
0
0 comments X

The pith

Handcrafted time-series features improve wearable human activity recognition when kept explicit and modulated inside the model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to demonstrate that handcrafted time-series features from IMU sensors work best in human activity recognition when retained as visible, adjustable anchors rather than discarded after preprocessing or hidden inside opaque deep representations. It introduces a network that extracts these anchors and uses neural context from raw signals to predict modulation parameters, allowing the features to adapt without losing their semantic meaning. This hybrid design addresses the longstanding trade-off between the interpretability of statistical features and the flexibility of learned representations. Empirical results across five standard benchmarks show accuracy gains, with ablations indicating that the modulation step, not mere fusion, drives most of the improvement.

Core claim

The paper claims that treating handcrafted time-series features as feature anchors—explicit intermediate representations that are adjusted in feature space by context-conditioned scale, bias, and gating parameters—produces representations that are both semantically transparent and task-adaptive, yielding higher macro-F1 scores on USC-HAD, Daphnet, MHealth, and PAMAP2 than baselines that either fix the features or rely solely on latent learning.

What carries the argument

The Temporal Conditioning Network (TCNet), which extracts handcrafted TSF anchors and modulates them via predicted scale, bias, and gating values derived from separate time-domain and frequency-domain context encoders applied to raw IMU windows.

If this is right

  • Handcrafted TSFs retain discriminative value when kept explicit and modulated rather than treated as fixed preprocessing outputs.
  • Gains on the five benchmarks are attributable to anchor guidance and not merely to the addition of a parallel branch.
  • Several families of discriminative time-series statistics remain inaccessible to standard latent representations learned directly from raw signals.
  • Keeping anchors visible allows the model to adapt them to the classification objective without post-hoc feature selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same anchoring principle could be tested on other sensor-based time-series tasks such as gesture recognition or equipment monitoring where statistical features are known to be informative.
  • Models built this way may offer improved post-hoc interpretability because the modulated anchors remain traceable to known motion statistics.
  • Replacing the handcrafted anchors with learned but still explicitly grouped features might preserve some of the gains while reducing reliance on domain-specific feature engineering.

Load-bearing premise

The observed accuracy gains arise primarily from the explicit anchor modulation mechanism rather than from architectural side effects such as branch fusion or from dataset-specific properties of the chosen handcrafted features.

What would settle it

A controlled experiment on the same benchmarks in which anchor modulation is disabled (replacing predicted scale/bias/gating with identity or fixed values) while keeping all other network components unchanged, and finding no statistically significant drop in mF1, would falsify the claim that modulation of explicit anchors is the key driver.

Figures

Figures reproduced from arXiv: 2604.25092 by Chenhang Li, Danyang Zhuo, Ruijie Yao, Tingjun Chen, Xiaoyue Ni.

Figure 1
Figure 1. Figure 1: From handcrafted feature anchors to adapted representations. Handcrafted time-series features (TSFs) are treated as view at source ↗
Figure 2
Figure 2. Figure 2: Overall performance comparison across five HAR benchmarks. Each bubble represents one method; horizontal position indicates view at source ↗
Figure 3
Figure 3. Figure 3: Handcrafted TSFs are highly sensitive to small input perturbations. Even mild noise ( view at source ↗
Figure 4
Figure 4. Figure 4: TCNet architecture. From a raw IMU window, TCNet computes three representations in parallel: (i) a lightweight time-context view at source ↗
Figure 5
Figure 5. Figure 5: Parameter efficiency and per-cell performance margins. view at source ↗
Figure 6
Figure 6. Figure 6: Ablation evidence that TCNet’s gain comes primarily from anchor adaptation rather than branch fusion alone. view at source ↗
Figure 7
Figure 7. Figure 7: Raw versus anchor-guided feature distributions for three TSF families on Daphnet (top) and MHealth (bottom). Violin plots view at source ↗
Figure 8
Figure 8. Figure 8: Linear probe 𝑅 2 from frozen TimesNet embeddings to individual TSF families across five HAR datasets. Left: task-trained encoder. Center: random-initialization control with the same architecture and no training. Right: Δ𝑅 2 = 𝑅 2 task − 𝑅 2 random; red cells indicate families whose linear accessibility decreases after supervised training. Three families, Autocorr (𝑅2 = −0.09), Statistics (𝑅2 = 0.22), and T… view at source ↗
Figure 9
Figure 9. Figure 9: Structural mismatch between TSF importance and latent encodability. view at source ↗
Figure 10
Figure 10. Figure 10: Compact TCNet pretraining pipeline. A lightweight TCNet encoder (0.14 M parameters; single time branch, single frequency view at source ↗
Figure 11
Figure 11. Figure 11: Compact TCNet pretrained on ≤23 subjects matches or surpasses UKB-SSL pretrained on ∼100k subjects. Panels A (UCI-HAR) and B (PAMAP2) each present two comparisons. Left block: Compact TCNet + RF improves over Raw RF-TSF (UCI-HAR: 93.79 vs. 92.64 mF1; PAMAP2: 90.23 vs. 89.39), confirming that anchor-guided pretraining adds value beyond static features. Right block: the same representation surpasses both UK… view at source ↗
Figure 12
Figure 12. Figure 12: Importance–encodability mismatch for the UKB-SSL encoder. view at source ↗
read the original abstract

Wearable Human Activity Recognition (HAR) still lacks a representation that is both explicit and adaptable. Handcrafted time-series features (TSFs) capture meaningful motion statistics and remain competitive on standard benchmarks, but they are usually used as fixed preprocessing outputs. Deep models learn adaptable representations directly from raw signals, but those representations are typically latent and difficult to inspect. We address this gap by treating handcrafted TSFs as feature anchors: explicit intermediate representations that remain inside the model and are adjusted by neural context instead of being discarded. We propose the Temporal Conditioning Network for Feature Anchors (TCNet), which extracts handcrafted anchors, encodes complementary time-domain and frequency-domain context from raw IMU windows, and predicts context-conditioned scale, bias, and gating parameters to modulate anchor groups directly in feature space. This design keeps anchor semantics visible while allowing the representation to adapt to the classification objective. Across five HAR benchmarks, TCNet achieves 70.2% mF1 on USC-HAD, 85.1% mF1 on Daphnet, 93.9% mF1 on MHealth, and 94.5% mF1 on PAMAP2. Relative to rTsfNet, it improves by 4.5 points on USC-HAD, 14.6 points on Daphnet, and 6.5 points on MHealth. Ablations show that the gains come primarily from anchor guidance rather than simple branch fusion, and feature-space analyses indicate that several discriminative TSF families are not reliably accessible in standard latent representations. These results suggest that, for HAR, handcrafted TSFs are most useful when they remain explicit and adaptable within the model. The code is available at: https://github.com/ni-x-lab/TCNet-har

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes the Temporal Conditioning Network (TCNet) for wearable sensor-based Human Activity Recognition (HAR). It treats handcrafted time-series features (TSFs) as explicit 'feature anchors' that are kept inside the model and modulated by context-dependent scale, bias, and gating parameters predicted from complementary time-domain and frequency-domain encoders applied to raw IMU windows. Across five benchmarks, TCNet reports mF1 scores of 70.2% (USC-HAD), 85.1% (Daphnet), 93.9% (MHealth), and 94.5% (PAMAP2), with gains over rTsfNet (e.g., +4.5 on USC-HAD) attributed primarily to anchor guidance rather than branch fusion; feature-space analyses suggest certain discriminative TSF families are inaccessible in standard latent representations. The authors conclude that handcrafted TSFs are most useful when kept explicit and adaptable, and release code at https://github.com/ni-x-lab/TCNet-har.

Significance. If the central claim and ablations hold, the work offers a practical hybrid representation for HAR that preserves the interpretability and domain knowledge of handcrafted TSFs while adding neural adaptability, potentially influencing designs that currently favor fully latent deep features. The multi-benchmark evaluation and public code repository are clear strengths that enable reproducibility and extension. The suggestion that explicit anchors can access feature families missed by standard latent spaces, if substantiated, would be a useful empirical observation for the field.

major comments (1)
  1. [Ablation experiments] Ablations (as summarized in the abstract): The central claim that improvements derive primarily from anchor guidance rather than simple branch fusion rests on the ablation results. However, if the 'simple branch fusion' baseline does not include equivalent time/frequency context encoders or the same parameter budget for predicting scale/bias/gating, the comparison does not cleanly isolate the benefit of keeping anchors explicit. This is load-bearing for the paper's main conclusion and requires a more tightly controlled ablation.
minor comments (2)
  1. [Experimental evaluation] The manuscript does not report error bars, standard deviations across runs, or statistical significance tests for the mF1 improvements, nor does it detail data splits, preprocessing, or hyperparameter selection procedures.
  2. [Analysis section] The feature-space analyses that indicate certain TSF families are inaccessible in latent representations would benefit from additional methodological detail (e.g., exact distance metrics, selection criteria for TSF families, and quantitative thresholds).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address the concern regarding the ablation experiments below and have revised the manuscript to provide a more tightly controlled comparison.

read point-by-point responses
  1. Referee: [Ablation experiments] Ablations (as summarized in the abstract): The central claim that improvements derive primarily from anchor guidance rather than simple branch fusion rests on the ablation results. However, if the 'simple branch fusion' baseline does not include equivalent time/frequency context encoders or the same parameter budget for predicting scale/bias/gating, the comparison does not cleanly isolate the benefit of keeping anchors explicit. This is load-bearing for the paper's main conclusion and requires a more tightly controlled ablation.

    Authors: We appreciate the referee's observation that the ablation must cleanly isolate the contribution of explicit anchor guidance. In the submitted manuscript, the 'simple branch fusion' baseline uses the same time- and frequency-domain encoders as TCNet but fuses their outputs via concatenation with the anchors, without the context-dependent modulation. To address the concern about parameter budget, we have performed an additional controlled ablation in which a comparable number of parameters are used to predict scale, bias, and gating terms that are instead applied to a standard latent representation (i.e., without explicit anchors). The results of this experiment, which we will report in the revised manuscript, continue to show superior performance for the anchor-based modulation, thereby supporting our central claim. We have also expanded the description of all baselines in Section 4.3 to clarify the architectural equivalence. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture with benchmark results

full rationale

The paper proposes TCNet as a neural architecture that keeps handcrafted time-series features (TSFs) explicit as anchors and modulates them via predicted scale/bias/gating from time/frequency context encoders. All central claims rest on empirical mF1 scores across five standard HAR benchmarks (USC-HAD, Daphnet, MHealth, PAMAP2, etc.) plus ablations that compare against rTsfNet and branch-fusion baselines. No equations, first-principles derivation, or uniqueness theorem is presented that reduces by construction to fitted parameters, self-citations, or renamed inputs. The work is therefore self-contained; reported gains are tested against external datasets and architectural controls rather than being forced by internal definitions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that handcrafted TSFs capture meaningful motion statistics worth preserving as explicit anchors, plus standard neural network training assumptions. No major invented entities beyond the conceptual framing of anchors; hyperparameters such as context encoder dimensions are free parameters typical of deep models.

free parameters (1)
  • context encoder architecture and training hyperparameters
    Neural network sizes, learning rates, and modulation parameter predictors are chosen or fitted to achieve the reported benchmark performance.
axioms (1)
  • domain assumption Handcrafted time-series features capture meaningful and discriminative motion statistics for HAR
    Invoked as the basis for treating TSFs as anchors rather than discarding them.
invented entities (1)
  • Feature anchors no independent evidence
    purpose: Explicit intermediate representations that remain inside the model and are modulated by neural context
    Conceptual framing introduced to keep semantics visible while allowing adaptation; no independent falsifiable evidence provided beyond the empirical results.

pith-pipeline@v0.9.0 · 5636 in / 1364 out tokens · 42965 ms · 2026-05-07T15:56:38.541723+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 11 canonical work pages · 1 internal anchor

  1. [1]

    Ranasinghe

    Alireza Abedin, Mahsa Ehsanpour, Qinfeng Shi, Hamid Rezatofighi, and Damith C. Ranasinghe. 2021. Attend and Discriminate: Beyond the State-of-the-Art for Human Activity Recognition Using Wearable Sensors.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies5, 1 (2021), 1:1–1:22. doi:10.1145/3448083

  2. [2]

    Rida Amin, Eoin Keogh, et al. 2024. Exploring the Applications of Explainability in Wearable Data Analytics: Systematic Literature Review.Journal of Medical Internet Research12, 1 (2024)

  3. [3]

    Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz, et al . 2013. A public domain dataset for human activity recognition using smartphones.. InEsann, Vol. 3. 3–4

  4. [4]

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization.arXiv preprint arXiv:1607.06450(2016)

  5. [5]

    Marc Bachlin, Meir Plotnik, Daniel Roggen, Inbal Maidan, Jeffrey M Hausdorff, Nir Giladi, and Gerhard Troster. 2009. Wearable assistant for Parkinson’s disease patients with the freezing of gait symptom.IEEE Transactions on Information Technology in Biomedicine14, 2 (2009), 436–446

  6. [6]

    Hausdorff, Nir Giladi, and Gerhard Tröster

    Marc Bächlin, Meir Plotnik, Daniel Roggen, Inbal Maidan, Jeffrey M. Hausdorff, Nir Giladi, and Gerhard Tröster. 2010. Wearable Assistant for Parkinson’s Disease Patients With the Freezing of Gait Symptom.IEEE Transactions on Information Technology in Biomedicine14, 2 (2010), 436–446

  7. [7]

    Oresti Banos, Rafael Garcia, Juan A Holgado-Terriza, Miguel Damas, Hector Pomares, Ignacio Rojas, Alejandro Saez, and Claudia Villalonga. 2014. mHealthDroid: a novel framework for agile development of mobile health applications. InInternational workshop on ambient assisted living. Springer, 91–98

  8. [8]

    Marius Bock, Michael Moeller, and Kristof Van Laerhoven. 2024. Temporal action localization for inertial-based human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8, 4 (2024), 1–19

  9. [9]

    Andreas Bulling, Ulf Blanke, and Bernt Schiele. 2014. A Tutorial on Human Activity Recognition Using Body-worn Inertial Sensors.Comput. Surveys46, 3 (2014), 1–33

  10. [10]

    Kempa-Liehr

    Maximilian Christ, Nils Braun, Julius Neuffer, and Andreas W. Kempa-Liehr. 2018. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package).Neurocomputing307 (2018), 72–77

  11. [11]

    Nidhi Dua, Shiva Nand Singh, Vijay Bhaskar Semwal, and Sravan Kumar Challa. 2023. Inception inspired CNN-GRU hybrid network for human activity recognition.Multimedia Tools and Applications82, 4 (2023), 5369–5403

  12. [12]

    Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. 2017. A Learned Representation For Artistic Style. InProceedings of the International Conference on Learning Representations

  13. [13]

    Sannara Ek, François Portet, and Philippe Lalanda. 2023. Transformer-based models to deal with heterogeneous environments in human activity recognition.Personal and Ubiquitous Computing27, 6 (2023), 2267–2280. 24 R. Yao et al

  14. [14]

    Ziqi Gao, Yuntao Wang, Jianguo Chen, Junliang Xing, Shwetak Patel, Xin Liu, and Yuanchun Shi. 2023. MMTSA: Multi-modal temporal segment attention network for efficient human activity recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies7, 3 (2023), 1–26

  15. [15]

    Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587

  16. [16]

    Yu Guan and Thomas Plötz. 2017. Ensembles of Deep LSTM Learners for Activity Recognition Using Wearables.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies1, 2 (2017), 11:1–11:28. doi:10.1145/3090076

  17. [17]

    Hammerla, Shane Halloran, and Thomas Plötz

    Nils Y. Hammerla, Shane Halloran, and Thomas Plötz. 2016. Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using Wearables. InProceedings of the International Joint Conference on Artificial Intelligence. 1533–1540

  18. [18]

    Harish Haresamudram, Chi Ian Tang, Sungho Suh, Paul Lukowicz, and Thomas Plötz. 2025. Past, Present, and Future of Sensor-based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies9, 2 (2025), 34:1–34:44. doi:10.1145/3729467

  19. [19]

    Raul Igual, Carlos Medrano, and Inmaculada Plaza. 2013. Challenges, Issues and Trends in Fall Detection Systems.BioMedical Engineering OnLine 12, 1 (2013), 66

  20. [20]

    Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning. 448–456

  21. [21]

    Nikita Kitaev, Łukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The Efficient Transformer. InProceedings of the International Conference on Learning Representations

  22. [22]

    Lara and Miguel A

    Oscar D. Lara and Miguel A. Labrador. 2013. A Survey on Human Activity Recognition Using Wearable Sensors.IEEE Communications Surveys & Tutorials15, 3 (2013), 1192–1209

  23. [23]

    Liu, and Schahram Dustdar

    Shizhan Liu, Hang Yu, Cong Liao, Jianguo Li, Weiyao Lin, Alex X. Liu, and Schahram Dustdar. 2022. Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting. InProceedings of the International Conference on Learning Representations

  24. [24]

    Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. 2024. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. InProceedings of the International Conference on Learning Representations

  25. [25]

    Limeng Lu, Chuanlin Zhang, Kai Cao, Tao Deng, and Qianqian Yang. 2022. A multichannel CNN-GRU model for human activity recognition.IEEE Access10 (2022), 66797–66810

  26. [26]

    Wenjun Ma, Haoran Jing, Zhiwen Yu, and Bin Guo. 2019. AttnSense: Multi-level Attention Mechanism For Multimodal Human Activity Recognition. InProceedings of the International Joint Conference on Artificial Intelligence. 3109–3115

  27. [27]

    Shenghuan Miao, Ling Chen, and Rong Hu. 2023. Spatial-Temporal Masked Autoencoder for Multi-Device Wearable Human Activity Recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies7, 4 (2023), 172:1–172:25. doi:10.1145/3631415

  28. [28]

    Shenghuan Miao, Ling Chen, Rong Hu, and Yingsong Luo. 2022. Towards a Dynamic Inter-Sensor Correlations Learning Framework for Multi- Sensor-Based Wearable Human Activity Recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies6, 3 (2022), 130:1–130:25. doi:10.1145/3550331

  29. [29]

    Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

    Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. InProceedings of the International Conference on Learning Representations

  30. [30]

    Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition.Sensors16, 1 (2016), 115

  31. [31]

    Lingfeng Peng, Luyu Chen, Zhiwen Ye, and Yi Zhang. 2018. AROMA: A Deep Multi-Task Learning Based Simple and Complex Human Activity Recognition Method Using Wearable Sensors.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies2, 2 (2018), 74:1–74:16. doi:10.1145/3214277

  32. [32]

    Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron Courville. 2018. FiLM: Visual Reasoning with a General Conditioning Layer. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 32

  33. [33]

    Hammerla, and Patrick Olivier

    Thomas Plötz, Nils Y. Hammerla, and Patrick Olivier. 2011. Feature Learning for Activity Recognition in Ubiquitous Computing. InProceedings of the International Joint Conference on Artificial Intelligence. 1729–1734

  34. [34]

    Preece, John Yannis Goulermas, Laurence P

    Stephen J. Preece, John Yannis Goulermas, Laurence P. J. Kenney, and David Howard. 2009. A Comparison of Feature Extraction Methods for the Classification of Dynamic Activities From Accelerometer Data.IEEE Transactions on Biomedical Engineering56, 3 (2009), 871–879

  35. [35]

    Attila Reiss and Didier Stricker. 2012. Introducing a new benchmarked dataset for activity monitoring. In2012 16th international symposium on wearable computers. IEEE, 108–109

  36. [36]

    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. InAdvances in Neural Information Processing Systems, Vol. 28

  37. [37]

    Rui Shao, Hao Wang, and Shuochao Yao. 2023. ConvBoost: Boosting ConvNets for Sensor-based Activity Recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies7, 1 (2023), 33:1–33:26. doi:10.1145/3580897

  38. [38]

    Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul J. M. Havinga. 2015. A Survey of Online Activity Recognition Using Mobile Phones.Sensors15, 1 (2015), 2059–2085

  39. [39]

    Jie Su, Fengtong Ge, Zhenyu Wen, Taotao Li, Yang Bai, Yejian Zhou, and Xiaoqin Zhang. 2025. IMUZero: Zero-Shot Human Activity Recognition by Language-Based Cross Modality Fusion.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies9, 4 (2025), 211:1–211:28. Feature Anchors for Time-Series Sensor-Based Human Activity Recogniti...

  40. [40]

    Gomez, Łukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. InAdvances in Neural Information Processing Systems, Vol. 30

  41. [41]

    Huiqiang Wang, Jian Peng, Feihu Huang, Jince Wang, Junhui Chen, and Yifei Xiao. 2023. MICN: Multi-scale Local and Global Context Modeling for Long-term Series Forecasting. InProceedings of the International Conference on Learning Representations

  42. [42]

    Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep Learning for Sensor-based Activity Recognition: A Survey.Pattern Recognition Letters119 (2019), 3–11

  43. [43]

    Sheng Wen and Eno Lab. 2024. rTsfNet: A DNN Model with Multi-head 3D Rotation and Time Series Feature Extraction for IMU-based Human Activity Recognition.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies8, 4 (2024)

  44. [44]

    Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. InProceedings of the International Conference on Learning Representations

  45. [45]

    Zechen Yang et al. 2025. SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition. InProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)

  46. [46]

    Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing. InProceedings of the 26th International Conference on World Wide Web. 351–360

  47. [47]

    Creagh, Catherine Tong, David A

    Hang Yuan, Shing Chan, Andrew P. Creagh, Catherine Tong, David A. Clifton, and Aiden Doherty. 2024. Self-supervised Learning for Human Activity Recognition Using 700,000 Person-days of Wearable Data.npj Digital Medicine7 (2024), 91

  48. [48]

    Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. 2023. Are Transformers Effective for Time Series Forecasting?. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37

  49. [49]

    Mi Zhang and Alexander A Sawchuk. 2012. USC-HAD: A daily activity dataset for ubiquitous activity recognition using wearable sensors. In Proceedings of the 2012 ACM conference on ubiquitous computing. 1036–1043

  50. [50]

    Tianping Zhang, Yizhuo Zhang, Wei Cao, Jiang Bian, Xiaohan Yi, Shun Zheng, and Jian Li. 2022. Less Is More: Fast Multivariate Time Series Forecasting with Light Sampling-oriented MLP Structures.arXiv preprint arXiv:2207.01186(2022)

  51. [51]

    Ye Zhang, Longguang Wang, Huiling Chen, Aosheng Tian, Shilin Zhou, and Yulan Guo. 2022. IF-ConvTransformer: A Framework for Human Activity Recognition Using IMU Fusion and ConvTransformer.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 88:1–88:26. doi:10.1145/3534584

  52. [52]

    Yunhao Zhang and Junchi Yan. 2023. Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting. InProceedings of the International Conference on Learning Representations

  53. [53]

    Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35

  54. [54]

    Tian Zhou, Ziqing Ma, Qingsong Wen, Liang Sun, Terrance Yardley, Xue Wang, and Rong Jin. 2022. FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting. InAdvances in Neural Information Processing Systems, Vol. 35

  55. [55]

    Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting. InProceedings of the International Conference on Machine Learning