Activity2Vec: Learning ADL Embeddings from Sensor Data with a Sequence-to-Sequence Model

Alireza Ghods; Diane J. Cook

arxiv: 1907.05597 · v1 · pith:ZZGBJLBXnew · submitted 2019-07-12 · 💻 cs.LG

Activity2Vec: Learning ADL Embeddings from Sensor Data with a Sequence-to-Sequence Model

Alireza Ghods , Diane J. Cook This is my paper

Pith reviewed 2026-05-24 22:46 UTC · model grok-4.3

classification 💻 cs.LG

keywords activity recognitionsequence-to-sequenceembeddingssensor dataactivities of daily livingsemi-supervised learningfeature extractionfall detection

0 comments

The pith

A sequence-to-sequence model learns universal embeddings from sensor data for activities of daily living.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes training a sequence-to-sequence model on raw sensor streams to automatically produce embeddings that replace manual feature design for activity recognition. These embeddings are presented as meaningful across related tasks and usable even when only part of the data carries labels. The approach is tested on both wearable-device recordings and ambient home sensors for standard activity recognition and for fall detection. A sympathetic reader would care because large volumes of unlabeled sensor data already exist in homes and on phones; an automatic embedding method could turn that data into usable input without repeated expert tuning. If the embeddings prove transferable, the same learned representation could feed multiple downstream health-monitoring models.

Core claim

The central claim is that a sequence-to-sequence architecture can be trained on unlabeled or partially labeled sensor sequences to produce embeddings that serve as drop-in features for activity-of-daily-living recognition and for fall detection, thereby eliminating hand-crafted feature engineering while also enabling semi-supervised learning on the same data sources.

What carries the argument

The sequence-to-sequence model that encodes variable-length sensor time series into fixed embeddings for downstream classification.

If this is right

Activity recognition systems no longer require separate feature-engineering pipelines for each new sensor type or environment.
Partially labeled sensor collections can be exploited for training by treating the learned embeddings as input to a downstream supervised head.
The same embedding space can be reused for fall detection without redesigning input representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The embeddings could serve as a starting representation for other time-series health tasks such as sleep-stage detection or gait analysis if the universality assumption holds.
Deployment in real homes would require checking whether the learned features remain stable when sensor placement or sampling rates change slightly from the training distribution.

Load-bearing premise

The embeddings produced by the sequence-to-sequence model remain meaningful and useful when transferred to new but similar sensor-based recognition tasks without retraining or task-specific adjustments.

What would settle it

Train the model on one sensor dataset, freeze the embeddings, then measure whether a simple classifier using those embeddings matches or exceeds the accuracy of models built with hand-engineered features on a second, independent activity-recognition or fall-detection dataset collected from different users or sensor placements.

Figures

Figures reproduced from arXiv: 1907.05597 by Alireza Ghods, Diane J. Cook.

**Figure 1.** Figure 1: The structure of seq2seq model. Yet another shortcoming of the multilayer perceptron autoencoder is its inability to handle time series input data of differentsized time windows. If a fixed-time input is required, then the data must first be segmented before it can be modeled and learned. Common segmentation methods typically produce data sequences that partition the data into either fixed-sized windows… view at source ↗

**Figure 2.** Figure 2: t-SNE projection of handcrafted features and en [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Recognizing activities of daily living (ADLs) plays an essential role in analyzing human health and behavior. The widespread availability of sensors implanted in homes, smartphones, and smart watches have engendered collection of big datasets that reflect human behavior. To obtain a machine learning model based on these data,researchers have developed multiple feature extraction methods. In this study, we investigate a method for automatically extracting universal and meaningful features that are applicable across similar time series-based learning tasks such as activity recognition and fall detection. We propose creating a sequence-to-sequence (seq2seq) model to perform this feature learning. Beside avoiding feature engineering, the meaningful features learned by the seq2seq model can also be utilized for semi-supervised learning. We evaluate both of these benefits on datasets collected from wearable and ambient sensors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Seq2seq applied to ADL sensor embeddings is a reasonable domain extension but incremental and light on shown results.

read the letter

The core of this paper is using a seq2seq model to learn embeddings directly from raw ADL sensor time series instead of hand-engineering features, with the added claim that those embeddings transfer to activity recognition, fall detection, and semi-supervised settings. That framing is practical for ambient assisted living work where labeling is expensive. The choice to test on both wearable and ambient sensor datasets is a small strength, since it avoids tying the method to one hardware type. The semi-supervised angle also lines up with real deployment constraints. By 2019 the seq2seq architecture itself was already standard, so the contribution sits in the application rather than any new architecture or theoretical step. The abstract states that the features are evaluated but gives no numbers, baselines, or error bars, which leaves the actual gains unclear. The universality claim for the embeddings across tasks also rests on whatever transfer results appear in the full paper; without those details it is hard to know how robust the finding is. This is the kind of paper that fits a specialized reading group on sensor-based health monitoring or time-series representation learning. A reader already working on activity recognition might pick up the experimental setup or the semi-supervised protocol if the numbers check out. It is solid enough on its own terms to go to referees rather than desk reject; the problem is well-motivated and the method is reproducible in principle, so reviewers can verify whether the claimed benefits hold on the datasets.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes Activity2Vec, a sequence-to-sequence model for automatically learning embeddings from wearable and ambient sensor time series data representing activities of daily living. The central claims are that the learned features avoid manual engineering, are universal across related tasks such as activity recognition and fall detection, and can support semi-supervised learning; these benefits are stated to have been evaluated on the relevant datasets.

Significance. If the empirical results demonstrate that the seq2seq embeddings match or exceed hand-engineered baselines on multiple tasks while enabling effective semi-supervised performance, the work would offer a practical data-driven alternative to feature engineering in sensor-based ADL analysis.

minor comments (2)

The abstract states that evaluation was performed on wearable and ambient sensor datasets but supplies no quantitative metrics, dataset sizes, or baseline comparisons; the full manuscript should include these in the results section to allow assessment of the universality claim.
Notation for the seq2seq architecture (encoder/decoder dimensions, loss function) is not described in the provided abstract; the methods section should define these explicitly with reference to standard seq2seq components.

Simulated Author's Rebuttal

0 responses · 1 unresolved

We thank the referee for reviewing our manuscript. The provided summary accurately reflects the proposed Activity2Vec approach and its claimed benefits for ADL embedding learning, universality across tasks, and semi-supervised utility. No specific major comments appear after the 'MAJOR COMMENTS:' heading, which prevents point-by-point addressing of concerns underlying the uncertain recommendation.

standing simulated objections not resolved

The uncertain recommendation cannot be directly addressed without the specific major comments that motivated it.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes a seq2seq model for automatic feature extraction from sensor time series data for ADL recognition and related tasks, with empirical evaluation on wearable and ambient sensor datasets. No equations, derivations, fitted parameters presented as predictions, or self-citation chains appear in the provided abstract or description of the argument. The central claim is a methodological proposal whose benefits are assessed empirically rather than derived by construction from inputs. The universality claim is framed as an outcome of evaluation, not an untested premise or definitional reduction. This is self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view yields no explicit free parameters, axioms, or invented entities; the proposal rests on the unstated assumption that seq2seq training will produce transferable representations without domain-specific tuning.

pith-pipeline@v0.9.0 · 5666 in / 939 out tokens · 15837 ms · 2026-05-24T22:46:22.217808+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

[1]

Samaneh Aminikhanghahi and Diane J Cook. 2019. Enhancing activity recogni- tion using CPD-based activity segmentation. Pervasive and Mobile Computing 53 (2019), 75–89

work page 2019
[2]

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz. 2013. A public domain dataset for human activity recognition using smartphones.. In ESANN

work page 2013
[3]

François Chollet et al. 2015. Keras. https://github.com/fchollet/keras

work page 2015
[4]

Cain CT Clark, Claire M Barnes, Gareth Stratton, Melitta A McNarry, Kelly A Mackintosh, and Huw D Summers. 2017. A review of emerging analytical tech- niques for objective physical activity measurement in humans. Sports medicine 47, 3 (2017), 439–447

work page 2017
[5]

Diane J Cook, Aaron S Crandall, Brian L Thomas, and Narayanan C Krishnan

work page
[6]

Computer 46, 7 (2013), 62–69

CASAS: A smart home in a box. Computer 46, 7 (2013), 62–69

work page 2013
[7]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation 9, 8 (1997), 1735–1780

work page 1997
[8]

Shehroz S Khan and Babak Taati. 2017. Detecting unseen falls from wearable devices using channel-wise ensemble of autoencoders. Expert Systems with Applications 87 (2017), 280–290

work page 2017
[9]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti- mization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[10]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605

work page 2008
[11]

Mario Munoz-Organero and Ramona Ruiz-Blazquez. 2017. Time-elastic genera- tive model for acceleration time series in human activity recognition. Sensors 17, 2 (2017), 319

work page 2017
[12]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830

work page 2011
[13]

Rohit Prabhavalkar, Kanishka Rao, Tara N Sainath, Bo Li, Leif Johnson, and Navdeep Jaitly. 2017. A Comparison of Sequence-to-Sequence Models for Speech Recognition.. In Interspeech. 939–943

work page 2017
[14]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. InAdvances in neural information processing systems. 3104– 3112

work page 2014
[15]

Moshe Unger, Ariel Bar, Bracha Shapira, and Lior Rokach. 2016. Towards latent context-aware recommendation systems. Knowledge-Based Systems 104 (2016), 165–178

work page 2016
[16]

Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2015. Sequence to sequence-video to text. In Proceedings of the IEEE international conference on computer vision . 4534–4542

work page 2015
[17]

Aiguo Wang, Guilin Chen, Cuijuan Shang, Miaofei Zhang, and Li Liu. 2016. Human activity recognition in a smart home environment with stacked denoising autoencoders. In International Conference on Web-Age Information Management. Springer, 29–40

work page 2016

[1] [1]

Samaneh Aminikhanghahi and Diane J Cook. 2019. Enhancing activity recogni- tion using CPD-based activity segmentation. Pervasive and Mobile Computing 53 (2019), 75–89

work page 2019

[2] [2]

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz. 2013. A public domain dataset for human activity recognition using smartphones.. In ESANN

work page 2013

[3] [3]

François Chollet et al. 2015. Keras. https://github.com/fchollet/keras

work page 2015

[4] [4]

Cain CT Clark, Claire M Barnes, Gareth Stratton, Melitta A McNarry, Kelly A Mackintosh, and Huw D Summers. 2017. A review of emerging analytical tech- niques for objective physical activity measurement in humans. Sports medicine 47, 3 (2017), 439–447

work page 2017

[5] [5]

Diane J Cook, Aaron S Crandall, Brian L Thomas, and Narayanan C Krishnan

work page

[6] [6]

Computer 46, 7 (2013), 62–69

CASAS: A smart home in a box. Computer 46, 7 (2013), 62–69

work page 2013

[7] [7]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation 9, 8 (1997), 1735–1780

work page 1997

[8] [8]

Shehroz S Khan and Babak Taati. 2017. Detecting unseen falls from wearable devices using channel-wise ensemble of autoencoders. Expert Systems with Applications 87 (2017), 280–290

work page 2017

[9] [9]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti- mization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[10] [10]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605

work page 2008

[11] [11]

Mario Munoz-Organero and Ramona Ruiz-Blazquez. 2017. Time-elastic genera- tive model for acceleration time series in human activity recognition. Sensors 17, 2 (2017), 319

work page 2017

[12] [12]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830

work page 2011

[13] [13]

Rohit Prabhavalkar, Kanishka Rao, Tara N Sainath, Bo Li, Leif Johnson, and Navdeep Jaitly. 2017. A Comparison of Sequence-to-Sequence Models for Speech Recognition.. In Interspeech. 939–943

work page 2017

[14] [14]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. InAdvances in neural information processing systems. 3104– 3112

work page 2014

[15] [15]

Moshe Unger, Ariel Bar, Bracha Shapira, and Lior Rokach. 2016. Towards latent context-aware recommendation systems. Knowledge-Based Systems 104 (2016), 165–178

work page 2016

[16] [16]

Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2015. Sequence to sequence-video to text. In Proceedings of the IEEE international conference on computer vision . 4534–4542

work page 2015

[17] [17]

Aiguo Wang, Guilin Chen, Cuijuan Shang, Miaofei Zhang, and Li Liu. 2016. Human activity recognition in a smart home environment with stacked denoising autoencoders. In International Conference on Web-Age Information Management. Springer, 29–40

work page 2016