pith. sign in

arxiv: 1907.11879 · v1 · pith:RZ6CNNR2new · submitted 2019-07-27 · 💻 cs.LG · stat.ML

Multi-task Self-Supervised Learning for Human Activity Detection

Pith reviewed 2026-05-24 14:54 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords self-supervised learninghuman activity recognitionmulti-task learningtemporal convolutional networksensor datafeature extractionsemi-supervised learningtransfer learning
0
0 comments X

The pith

Self-supervised multi-task learning on signal transformations extracts features that match or exceed fully supervised performance for human activity recognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a self-supervised pretraining method that trains a temporal convolutional network to solve multiple binary classification tasks, each detecting whether a particular transformation has been applied to raw sensor signals. These auxiliary tasks supply a supervisory signal without any activity labels, and the learned representations are transferred to the downstream human activity recognition problem. Experiments across multiple smartphone datasets show the approach reaches performance levels comparable to or better than fully supervised networks in unsupervised, semi-supervised, and transfer settings, while outperforming autoencoders. The method is presented as general beyond human activity recognition.

Core claim

By learning a multi-task temporal convolutional network to recognize transformations applied on an input signal, the method demonstrates that simple auxiliary tasks of binary classification result in a strong supervisory signal for extracting useful features for the downstream human activity recognition task, achieving performance levels superior to or comparable with fully-supervised networks.

What carries the argument

Multi-task temporal convolutional network trained to perform binary classification of transformations applied to input sensor signals.

If this is right

  • Semi-supervised training with only 10 labeled examples per class reaches kappa scores of 0.7-0.8.
  • Features transferred from a different data source still yield strong detection performance.
  • The method significantly outperforms autoencoders on the same tasks.
  • Performance reaches or exceeds that of fully supervised networks trained on the full labeled set.
  • The technique is presented as applicable to other sensor-based problems beyond human activity recognition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same transformation-based pretraining could be tested on other time-series domains such as ECG monitoring or industrial sensor streams where labels are scarce.
  • If the transformations are chosen to preserve semantic content while altering low-level statistics, similar auxiliary tasks might work for non-temporal data like images or graphs.
  • Reducing reliance on labeled data through this route could lower privacy and annotation barriers in deployed health-monitoring applications.
  • The multi-task formulation might be extended by adding more diverse transformations to further strengthen the learned representations.

Load-bearing premise

The transformations applied to the input signal create auxiliary binary classification tasks whose learned representations transfer effectively to the human activity recognition downstream task.

What would settle it

A controlled experiment in which self-supervised features are compared against random initialization or autoencoder features on the same HAR datasets using exactly 10 labeled examples per class; if the self-supervised version shows no gain in kappa score, the transfer claim fails.

Figures

Figures reproduced from arXiv: 1907.11879 by Aaqib Saeed, Johan Lukkien, Tanir Ozcelebi.

Figure 1
Figure 1. Figure 1: Illustration of the proposed multi-task self-supervised approach for feature learning. We train a temporal convolutional [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of feature learning approaches from hand-craed methods towards task discovery for self-supervision. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Detailed architectural specification of transformation prediction and activity recognition networks. We propose a [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Evaluation of activity classification performance using the features learned based on self-supervision (per layer). We [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of individual self-supervised tasks with the multi-task se‚ing. The TPN is pre-trained for solving a [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Generalization of the self-supervised learned features under semi-supervised se‚ing. The TPN is pre-trained on an [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Assessment of the transferred self-supervised learned features from a di€erent but related dataset (MobiAct) under [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: CCA similarity between fully-supervised and self-supervised networks. We employ the SVCAA technique [ [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Saliency maps [63] of randomly selected instances from MotionSense dataset. The input signal is illustrated in the top pane with the magnitude computed from the sample shown in the bo‚om panes for be‚er interpretability. The strong colored intensities exhibit the regions that substantially a€ect the model predictions. The saliency mapping of both networks focus on similar input areas which shows that the s… view at source ↗
Figure 10
Figure 10. Figure 10: t-SNE visualization of the learned representations. We visualize the features from [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Convergence analysis of transformation recognition tasks. We plot the kappa score of self-supervised tasks (i.e. [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Weighted F-score: Generalization of the self-supervised learned features under semi-supervised se‚ing. The reported [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Weighted F-score: Assessment of the transferred self-supervised learned features from a di€erent but related dataset [PITH_FULL_IMAGE:figures/full_fig_p031_13.png] view at source ↗
read the original abstract

Deep learning methods are successfully used in applications pertaining to ubiquitous computing, health, and well-being. Specifically, the area of human activity recognition (HAR) is primarily transformed by the convolutional and recurrent neural networks, thanks to their ability to learn semantic representations from raw input. However, to extract generalizable features, massive amounts of well-curated data are required, which is a notoriously challenging task; hindered by privacy issues, and annotation costs. Therefore, unsupervised representation learning is of prime importance to leverage the vast amount of unlabeled data produced by smart devices. In this work, we propose a novel self-supervised technique for feature learning from sensory data that does not require access to any form of semantic labels. We learn a multi-task temporal convolutional network to recognize transformations applied on an input signal. By exploiting these transformations, we demonstrate that simple auxiliary tasks of the binary classification result in a strong supervisory signal for extracting useful features for the downstream task. We extensively evaluate the proposed approach on several publicly available datasets for smartphone-based HAR in unsupervised, semi-supervised, and transfer learning settings. Our method achieves performance levels superior to or comparable with fully-supervised networks, and it performs significantly better than autoencoders. Notably, for the semi-supervised case, the self-supervised features substantially boost the detection rate by attaining a kappa score between 0.7-0.8 with only 10 labeled examples per class. We get similar impressive performance even if the features are transferred from a different data source. While this paper focuses on HAR as the application domain, the proposed technique is general and could be applied to a wide variety of problems in other areas.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a self-supervised multi-task learning method for human activity recognition (HAR) from sensory data. A temporal convolutional network is trained to solve multiple binary classification tasks that detect whether specific transformations have been applied to the raw input signal; the resulting shared representation is then used for the downstream HAR task. The approach is evaluated in unsupervised, semi-supervised, and transfer-learning regimes on public smartphone-based HAR datasets and is claimed to reach performance superior or comparable to fully supervised networks while outperforming autoencoders; notably strong results are reported with only 10 labeled examples per class.

Significance. If the empirical claims hold, the work would be a useful contribution to self-supervised representation learning for time-series sensor data. It directly tackles the annotation bottleneck in HAR and demonstrates practical gains in low-label and cross-dataset regimes. The multi-task transformation-recognition framing is simple and generalizable, which could extend beyond HAR.

major comments (2)
  1. [Abstract / method description paragraph] Abstract and method description paragraph: the central claim that the auxiliary binary classification tasks supply a strong supervisory signal for activity-discriminative features rests on the unstated assumption that the chosen transformations cannot be solved from low-level statistics alone. No list of transformations, no ablation on their difficulty, and no analysis of what features the network must learn to solve them are supplied, leaving open the possibility that the multi-task objective is solved by shallow detectors that do not transfer to HAR.
  2. [Abstract] Abstract: the assertion of performance “superior to or comparable with fully-supervised networks” and “significantly better than autoencoders” is presented without any quantitative metrics, dataset names, or baseline numbers. Because the abstract is the only place where the headline result is stated, readers cannot assess whether the data actually support the claim.
minor comments (1)
  1. [Abstract] The abstract states that the method is evaluated “extensively” on “several publicly available datasets” yet supplies neither the dataset names nor the evaluation protocol (e.g., cross-subject vs. cross-session splits).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the clarity of the abstract and method description. We address each point below and will incorporate revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract / method description paragraph] Abstract and method description paragraph: the central claim that the auxiliary binary classification tasks supply a strong supervisory signal for activity-discriminative features rests on the unstated assumption that the chosen transformations cannot be solved from low-level statistics alone. No list of transformations, no ablation on their difficulty, and no analysis of what features the network must learn to solve them are supplied, leaving open the possibility that the multi-task objective is solved by shallow detectors that do not transfer to HAR.

    Authors: We agree that the abstract and method description paragraph should explicitly list the transformations and briefly justify why they require more than low-level statistics. The full manuscript describes the transformations in Section 3, but these details are not summarized in the abstract. We will revise the abstract to include the list of transformations and a short statement on their design. An ablation on transformation difficulty and a basic analysis of the features (e.g., comparison against hand-crafted statistical features) are not present in the current version; we will add a concise ablation in the experiments section of the revised manuscript to directly address this concern. revision: yes

  2. Referee: [Abstract] Abstract: the assertion of performance “superior to or comparable with fully-supervised networks” and “significantly better than autoencoders” is presented without any quantitative metrics, dataset names, or baseline numbers. Because the abstract is the only place where the headline result is stated, readers cannot assess whether the data actually support the claim.

    Authors: We acknowledge that the abstract would be more informative with explicit dataset names and baseline numbers. While the current abstract already reports a quantitative result (kappa 0.7-0.8 with 10 labels per class), it does not name the datasets or provide specific baseline comparisons. We will revise the abstract to include the dataset names (UCI-HAR, WISDM, PAMAP2) and concise statements of the performance deltas versus the fully supervised and autoencoder baselines, making the claims directly verifiable from the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with dataset-driven evaluation

full rationale

The paper proposes a multi-task self-supervised approach that trains a temporal convolutional network to recognize applied signal transformations and then transfers the learned features to human activity recognition. All claims rest on empirical results across public datasets in unsupervised, semi-supervised, and transfer settings, with direct comparisons to fully-supervised networks and autoencoders. No derivation, uniqueness theorem, or first-principles prediction is asserted that reduces by construction to fitted parameters, self-citations, or author-defined quantities; the transformations and auxiliary tasks are chosen inputs whose utility is measured externally rather than presupposed.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is abstract-only; no free parameters, invented entities, or non-standard axioms are mentioned. The work rests on the domain assumption that temporal convolutional networks are suitable for sensor time series and that transformation recognition yields transferable features.

axioms (1)
  • domain assumption Temporal convolutional networks can learn semantic representations from raw sensory time-series data.
    The method description presupposes the effectiveness of TCNs for HAR feature extraction.

pith-pipeline@v0.9.0 · 5829 in / 1257 out tokens · 45829 ms · 2026-05-24T14:54:47.629956+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 15 internal anchors

  1. [1]

    Learning to see by moving

    Pulkit Agrawal, Joao Carreira, and Jitendra Malik. Learning to see by moving. In Proceedings of the IEEE International Conference on Computer Vision, pages 37–45, 2015

  2. [2]

    A public domain dataset for human activity recognition using smartphones

    Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. In ESANN, 2013

  3. [3]

    Objects that Sound

    Relja Arandjelovi ´c and Andrew Zisserman. Objects that sound. arXiv preprint arXiv:1712.06651, 2017

  4. [4]

    Soundnet: Learning sound representations from unlabeled video

    Yusuf Aytar, Carl Vondrick, and Antonio Torralba. Soundnet: Learning sound representations from unlabeled video. In Advances in Neural Information Processing Systems , pages 892–900, 2016

  5. [5]

    An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

    Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018

  6. [6]

    Autoencoders, unsupervised learning, and deep architectures

    Pierre Baldi. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML workshop on unsupervised and transfer learning, pages 37–49, 2012

  7. [7]

    A complexity-invariant distance measure for time series

    Gustavo EAPA Batista, Xiaoyue Wang, and Eamonn J Keogh. A complexity-invariant distance measure for time series. In Proceedings of the 2011 SIAM international conference on data mining , pages 699–710. SIAM, 2011

  8. [8]

    Representation learning: A review and new perspectives

    Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pa/t_tern analysis and machine intelligence, 35(8):1798–1828, 2013

  9. [9]

    Using unlabeled data in a sparse-coding framework for human activity recognition

    Sourav Bha/t_tacharya, Pe/t_teri Nurmi, Nils Hammerla, and /T_homas Pl¨otz. Using unlabeled data in a sparse-coding framework for human activity recognition. Pervasive and Mobile Computing , 15:242–262, 2014

  10. [10]

    Multitask learning

    Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997

  11. [11]

    Human daily activity and fall recognition using a smartphone/f_is acceleration sensor

    Charikleia Chatzaki, Ma/t_thew Pediaditis, George Vavoulas, and Manolis Tsiknakis. Human daily activity and fall recognition using a smartphone/f_is acceleration sensor. InInternational Conference on Information and Communication Technologies for Ageing Well and e-Health, pages 100–118. Springer, 2016

  12. [12]

    Multi-Scale Convolutional Neural Networks for Time Series Classification

    Zhicheng Cui, Wenlin Chen, and Yixin Chen. Multi-scale convolutional neural networks for time series classi/f_ication.arXiv preprint arXiv:1603.06995, 2016

  13. [13]

    Multi-task self-supervised visual learning

    Carl Doersch and Andrew Zisserman. Multi-task self-supervised visual learning. In /T_he IEEE International Conference on Computer Vision (ICCV), 2017

  14. [14]

    Unsupervised visual representation learning by context prediction

    Carl Doersch, Abhinav Gupta, and Alexei A Efros. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision , pages 1422–1430, 2015

  15. [15]

    Self-supervised video representation learning with odd-one-out networks

    Basura Fernando, Hakan Bilen, Efstratios Gavves, and Stephen Gould. Self-supervised video representation learning with odd-one-out networks. In Computer Vision and Pa/t_tern Recognition (CVPR), 2017 IEEE Conference on, pages 5729–5738. IEEE, 2017

  16. [16]

    Preprocessing techniques for context recognition from accelerometer data

    Davide Figo, Pedro C Diniz, Diogo R Ferreira, and Jo˜ao M Cardoso. Preprocessing techniques for context recognition from accelerometer data. Personal and Ubiquitous Computing , 14(7):645–662, 2010

  17. [17]

    Low-resource multi-task audio sensing for mobile and embedded devices via shared deep neural network representations

    Petko Georgiev, Sourav Bha/t_tacharya, Nicholas D Lane, and Cecilia Mascolo. Low-resource multi-task audio sensing for mobile and embedded devices via shared deep neural network representations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(3):50, 2017

  18. [18]

    Unsupervised Representation Learning by Predicting Image Rotations

    Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018

  19. [19]

    Self-supervised learning of visual features through embedding images into text topic spaces

    Lluis Gomez, Yash Patel, Marc ¸al Rusi˜nol, Dimosthenis Karatzas, and CV Jawahar. Self-supervised learning of visual features through embedding images into text topic spaces. arXiv preprint arXiv:1705.08631, 2017. PACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 0, No. 0, Article 000. Publication date: 0. Multi-task Self-Supervised ...

  20. [20]

    Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables

    Nils Y Hammerla, Shane Halloran, and /T_homas Ploetz. Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880, 2016

  21. [21]

    Hannun, Pranav Rajpurkar, Masoumeh Haghpanahi, Geoffrey H

    Awni Y. Hannun, Pranav Rajpurkar, Masoumeh Haghpanahi, Geoffrey H. Tison, Codie Bourn, Mintu P. Turakhia, and Andrew Y. Ng. Cardiologist-level arrhythmia detection and classi/f_ication in ambulatory electrocardiograms using a deep neural network. Nature Medicine, 25(1):65–69, 2019. ISSN 1546-170X. doi: 10.1038/s41591-018-0268-3. URL h/t_tps://doi.org/10.10...

  22. [22]

    A joint many-task model: Growing a neural network for multiple nlp tasks

    Kazuma Hashimoto, Yoshimasa Tsuruoka, Richard Socher, et al. A joint many-task model: Growing a neural network for multiple nlp tasks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 1923–1933, 2017

  23. [23]

    Universal language model /f_ine-tuning for text classi/f_ication

    Jeremy Howard and Sebastian Ruder. Universal language model /f_ine-tuning for text classi/f_ication. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , volume 1, pages 328–339, 2018

  24. [24]

    Self-Supervised Feature Learning by Learning to Spot Artifacts

    Simon Jenni and Paolo Favaro. Self-supervised feature learning by learning to spot artifacts. arXiv preprint arXiv:1806.05024, 2018

  25. [25]

    Multi-task learning using uncertainty to weigh losses for scene geometry and semantics

    Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pa/t_tern Recognition, pages 7482–7491, 2018

  26. [26]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

  27. [27]

    Cooperative learning of audio and video models from self-supervised synchronization

    Bruno Korbar, Du Tran, and Lorenzo Torresani. Cooperative learning of audio and video models from self-supervised synchronization. In Advances in Neural Information Processing Systems , pages 7774–7785, 2018

  28. [28]

    Imagenet classi/f_ication with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classi/f_ication with deep convolutional neural networks. InAdvances in neural information processing systems , pages 1097–1105, 2012

  29. [29]

    Activity recognition using cell phone accelerometers.ACM SigKDD Explorations Newsle/t_ter, 12(2):74–82, 2011

    Jennifer R Kwapisz, Gary M Weiss, and Samuel A Moore. Activity recognition using cell phone accelerometers.ACM SigKDD Explorations Newsle/t_ter, 12(2):74–82, 2011

  30. [30]

    Colorization as a proxy task for visual understanding

    Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Colorization as a proxy task for visual understanding. In CVPR, volume 2, page 7, 2017

  31. [31]

    Optimal brain damage

    Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In Advances in neural information processing systems , pages 598–605, 1990

  32. [32]

    Deep learning

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436 EP –, May 2015. URL h/t_tps://doi.org/10.1038/ nature14539

  33. [33]

    Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

    Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning , pages 609–616. ACM, 2009

  34. [34]

    Unsupervised representation learning by sorting sequences

    Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. Unsupervised representation learning by sorting sequences. In Computer Vision (ICCV), 2017 IEEE International Conference on , pages 667–676. IEEE, 2017

  35. [35]

    Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

    Chi Li, M Zeeshan Zia, /Q_uoc-Huy Tran, Xiang Yu, Gregory D Hager, and Manmohan Chandraker. Deep supervision with shape concepts for occlusion-aware 3d object parsing. arXiv preprint arXiv:1612.02699, 2016

  36. [36]

    Measuring the Intrinsic Dimension of Objective Landscapes

    Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes. arXiv preprint arXiv:1804.08838, 2018

  37. [37]

    Unsupervised feature learning for human activity recognition using smartphone sensors

    Yongmou Li, Dianxi Shi, Bo Ding, and Dongbo Liu. Unsupervised feature learning for human activity recognition using smartphone sensors. In Mining Intelligence and Knowledge Exploration , pages 99–107. Springer, 2014

  38. [38]

    Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment

    Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, and Yunsheng Ma. Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. In International Conference on Smart Homes and Health Telematics , pages 37–48. Springer, 2016

  39. [39]

    Visualizing data using t-sne

    Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research , 9(Nov):2579–2605, 2008

  40. [40]

    Protecting sensory data against sensitive inferences

    Mohammad Malekzadeh, Richard G Clegg, Andrea Cavallaro, and Hamed Haddadi. Protecting sensory data against sensitive inferences. In Proceedings of the 1st Workshop on Privacy by Design in Distributed Systems , page 2. ACM, 2018

  41. [41]

    Unimib shar: A dataset for human activity recognition using acceleration data from smartphones

    Daniela Micucci, Marco Mobilio, and Paolo Napoletano. Unimib shar: A dataset for human activity recognition using acceleration data from smartphones. Applied Sciences, 7(10):1101, 2017

  42. [42]

    Shuffle and learn: unsupervised learning using temporal order veri/f_ication

    Ishan Misra, C Lawrence Zitnick, and Martial Hebert. Shuffle and learn: unsupervised learning using temporal order veri/f_ication. In European Conference on Computer Vision , pages 527–544. Springer, 2016

  43. [43]

    Acoustic modeling using deep belief networks

    Abdel-rahman Mohamed, George E Dahl, Geoffrey Hinton, et al. Acoustic modeling using deep belief networks. IEEE Trans. Audio, Speech & Language Processing , 20(1):14–22, 2012

  44. [44]

    Deep convolutional feature transfer across mobile activity recognition domains, sensor modalities and locations

    Francisco Javier Ord´o ˜nez Morales and Daniel Roggen. Deep convolutional feature transfer across mobile activity recognition domains, sensor modalities and locations. In Proceedings of the 2016 ACM International Symposium on Wearable Computers , pages 92–99. ACM, 2016

  45. [45]

    On the importance of single directions for generalization

    Ari S Morcos, David GT Barre/t_t, Neil C Rabinowitz, and Ma/t_thew Botvinick. On the importance of single directions for generalization. arXiv preprint arXiv:1803.06959, 2018. PACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 0, No. 0, Article 000. Publication date: 0. 000:26 • A. Saeed et al

  46. [46]

    Recti/f_ied linear units improve restricted boltzmann machines

    Vinod Nair and Geoffrey E Hinton. Recti/f_ied linear units improve restricted boltzmann machines. InProceedings of the 27th international conference on machine learning (ICML-10) , pages 807–814, 2010

  47. [47]

    Unsupervised learning of visual representations by solving jigsaw puzzles

    Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision, pages 69–84. Springer, 2016

  48. [48]

    Learning to Exploit Invariances in Clinical Time-Series Data using Sequence Transformer Networks

    Jeeheh Oh, Jiaxuan Wang, and Jenna Wiens. Learning to exploit invariances in clinical time-series data using sequence transformer networks. arXiv preprint arXiv:1808.06725, 2018

  49. [49]

    /T_he building blocks of interpretability

    Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye, and Alexander Mordvintsev. /T_he building blocks of interpretability. Distill, 2018. doi: unde/f_ined. h/t_tps://distill.pub/2018/building-blocks

  50. [50]

    Realistic evaluation of deep semi-supervised learning algorithms

    Avital Oliver, Augustus Odena, Colin A Raffel, Ekin Dogus Cubuk, and Ian Goodfellow. Realistic evaluation of deep semi-supervised learning algorithms. In Advances in Neural Information Processing Systems , pages 3235–3246, 2018

  51. [51]

    Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

    Andrew Owens and Alexei A Efros. Audio-visual scene analysis with self-supervised multisensory features. arXiv preprint arXiv:1804.03641, 2018

  52. [52]

    Ambient sound provides supervision for visual learning

    Andrew Owens, Jiajun Wu, Josh H McDermo/t_t, William T Freeman, and Antonio Torralba. Ambient sound provides supervision for visual learning. In European Conference on Computer Vision , pages 801–816. Springer, 2016

  53. [53]

    A survey on transfer learning.IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010

    Sinno Jialin Pan, Qiang Yang, et al. A survey on transfer learning.IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010

  54. [54]

    Curiosity-driven exploration by self-supervised prediction

    Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning (ICML) , volume 2017, 2017

  55. [55]

    Feature learning for activity recognition in ubiquitous computing

    /T_homas Pl¨otz, Nils Y Hammerla, and Patrick Olivier. Feature learning for activity recognition in ubiquitous computing. In IJCAI Proceedings-International Joint Conference on Arti/f_icial Intelligence, volume 22, page 1729, 2011

  56. [56]

    Multimodal deep learning for activity and context recognition

    Valentin Radu, Catherine Tong, Sourav Bha/t_tacharya, Nicholas D Lane, Cecilia Mascolo, Mahesh K Marina, and Fahim Kawsar. Multimodal deep learning for activity and context recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(4):157, 2018

  57. [57]

    Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability

    Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Advances in Neural Information Processing Systems , pages 6076–6085, 2017

  58. [58]

    Self-taught learning: transfer learning from unlabeled data

    Rajat Raina, Alexis Ba/t_tle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught learning: transfer learning from unlabeled data. In Proceedings of the 24th international conference on Machine learning , pages 759–766. ACM, 2007

  59. [59]

    Multi-task prediction of disease onsets from longitudinal laboratory tests

    Narges Razavian, Jake Marcus, and David Sontag. Multi-task prediction of disease onsets from longitudinal laboratory tests. In Machine Learning for Healthcare Conference , pages 73–100, 2016

  60. [60]

    Personalized Driver Stress Detection with Multi-task Neural Networks using Physiological Signals

    Aaqib Saeed and Stojan Trajanovski. Personalized driver stress detection with multi-task neural networks using physiological signals. arXiv preprint arXiv:1711.06116, 2017

  61. [61]

    Synthesizing and reconstructing missing sensory modalities in behavioral context recognition

    Aaqib Saeed, Tanir Ozcelebi, and Johan Lukkien. Synthesizing and reconstructing missing sensory modalities in behavioral context recognition. Sensors, 18(9):2967, 2018

  62. [62]

    Cnn features off-the-shelf: an astounding baseline for recognition

    Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pa/t_tern recognition workshops, pages 806–813, 2014

  63. [63]

    Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

    Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classi/f_ication models and saliency maps. arXiv preprint arXiv:1312.6034, 2013

  64. [64]

    Smart devices are different: Assessing and mitigating mobile sensing heterogeneities for activity recognition

    Allan Stisen, Henrik Blunck, Sourav Bha/t_tacharya, /T_hor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. Smart devices are different: Assessing and mitigating mobile sensing heterogeneities for activity recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems , pages 127–140. ACM, 2015

  65. [65]

    Sequence to sequence learning with neural networks

    Ilya Sutskever, Oriol Vinyals, and /Q_uoc V Le. Sequence to sequence learning with neural networks. InAdvances in neural information processing systems, pages 3104–3112, 2014

  66. [66]

    Deepface: Closing the gap to human-level performance in face veri/f_ication

    Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face veri/f_ication. InProceedings of the IEEE conference on computer vision and pa/t_tern recognition, pages 1701–1708, 2014

  67. [67]

    Data augmentation of wearable sensor data for parkinson/f_is disease monitoring using convolutional neural networks

    Terry T Um, Franz MJ P/f_ister, Daniel Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, and Dana Kuli ´c. Data augmentation of wearable sensor data for parkinson/f_is disease monitoring using convolutional neural networks. InProceedings of the 19th ACM International Conference on Multimodal Interaction , pages 216–220. ACM, 2017

  68. [68]

    Deep learning for sensor-based activity recognition: A survey

    Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. Deep learning for sensor-based activity recognition: A survey. Pa/t_tern Recognition Le/t_ters, 2018

  69. [69]

    Deep transfer learning for cross-domain activity recognition

    Jindong Wang, Vincent W Zheng, Yiqiang Chen, and Meiyu Huang. Deep transfer learning for cross-domain activity recognition. In Proceedings of the 3rd International Conference on Crowd Science and Engineering , page 16. ACM, 2018

  70. [70]

    Wawrzyniak and W

    S. Wawrzyniak and W. Niemiro. Clustering approach to the problem of human activity recognition using motion data. In 2015 Federated Conference on Computer Science and Information Systems (FedCSIS) , pages 411–416, Sep. 2015. doi: 10.15439/2015F424

  71. [71]

    Learning and using the arrow of time

    Donglai Wei, Joseph Lim, Andrew Zisserman, and William T Freeman. Learning and using the arrow of time. In Proceedings of the IEEE Conference on Computer Vision and Pa/t_tern Recognition, pages 8052–8060, 2018. PACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 0, No. 0, Article 000. Publication date: 0. Multi-task Self-Supervised Lea...

  72. [72]

    Deep convolutional neural networks on multichannel time series for human activity recognition

    Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. Deep convolutional neural networks on multichannel time series for human activity recognition. In Ijcai, volume 15, pages 3995–4001, 2015

  73. [73]

    Sensegan: Enabling deep learning for internet of things with a semi-supervised framework

    Shuochao Yao, Yiran Zhao, Huajie Shao, Chao Zhang, Aston Zhang, Shaohan Hu, Dongxin Liu, Shengzhong Liu, Lu Su, and Tarek Abdelzaher. Sensegan: Enabling deep learning for internet of things with a semi-supervised framework. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(3):144, 2018

  74. [74]

    Split-brain autoencoders: Unsupervised learning by cross-channel prediction

    Richard Zhang, Phillip Isola, and Alexei A Efros. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In CVPR, volume 1, page 5, 2017. PACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 0, No. 0, Article 000. Publication date: 0. 000:28 • A. Saeed et al. APPENDIX Table 4. Evaluating self-supervised representat...