Multi-task Self-Supervised Learning for Human Activity Detection
Pith reviewed 2026-05-24 14:54 UTC · model grok-4.3
The pith
Self-supervised multi-task learning on signal transformations extracts features that match or exceed fully supervised performance for human activity recognition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By learning a multi-task temporal convolutional network to recognize transformations applied on an input signal, the method demonstrates that simple auxiliary tasks of binary classification result in a strong supervisory signal for extracting useful features for the downstream human activity recognition task, achieving performance levels superior to or comparable with fully-supervised networks.
What carries the argument
Multi-task temporal convolutional network trained to perform binary classification of transformations applied to input sensor signals.
If this is right
- Semi-supervised training with only 10 labeled examples per class reaches kappa scores of 0.7-0.8.
- Features transferred from a different data source still yield strong detection performance.
- The method significantly outperforms autoencoders on the same tasks.
- Performance reaches or exceeds that of fully supervised networks trained on the full labeled set.
- The technique is presented as applicable to other sensor-based problems beyond human activity recognition.
Where Pith is reading between the lines
- The same transformation-based pretraining could be tested on other time-series domains such as ECG monitoring or industrial sensor streams where labels are scarce.
- If the transformations are chosen to preserve semantic content while altering low-level statistics, similar auxiliary tasks might work for non-temporal data like images or graphs.
- Reducing reliance on labeled data through this route could lower privacy and annotation barriers in deployed health-monitoring applications.
- The multi-task formulation might be extended by adding more diverse transformations to further strengthen the learned representations.
Load-bearing premise
The transformations applied to the input signal create auxiliary binary classification tasks whose learned representations transfer effectively to the human activity recognition downstream task.
What would settle it
A controlled experiment in which self-supervised features are compared against random initialization or autoencoder features on the same HAR datasets using exactly 10 labeled examples per class; if the self-supervised version shows no gain in kappa score, the transfer claim fails.
Figures
read the original abstract
Deep learning methods are successfully used in applications pertaining to ubiquitous computing, health, and well-being. Specifically, the area of human activity recognition (HAR) is primarily transformed by the convolutional and recurrent neural networks, thanks to their ability to learn semantic representations from raw input. However, to extract generalizable features, massive amounts of well-curated data are required, which is a notoriously challenging task; hindered by privacy issues, and annotation costs. Therefore, unsupervised representation learning is of prime importance to leverage the vast amount of unlabeled data produced by smart devices. In this work, we propose a novel self-supervised technique for feature learning from sensory data that does not require access to any form of semantic labels. We learn a multi-task temporal convolutional network to recognize transformations applied on an input signal. By exploiting these transformations, we demonstrate that simple auxiliary tasks of the binary classification result in a strong supervisory signal for extracting useful features for the downstream task. We extensively evaluate the proposed approach on several publicly available datasets for smartphone-based HAR in unsupervised, semi-supervised, and transfer learning settings. Our method achieves performance levels superior to or comparable with fully-supervised networks, and it performs significantly better than autoencoders. Notably, for the semi-supervised case, the self-supervised features substantially boost the detection rate by attaining a kappa score between 0.7-0.8 with only 10 labeled examples per class. We get similar impressive performance even if the features are transferred from a different data source. While this paper focuses on HAR as the application domain, the proposed technique is general and could be applied to a wide variety of problems in other areas.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a self-supervised multi-task learning method for human activity recognition (HAR) from sensory data. A temporal convolutional network is trained to solve multiple binary classification tasks that detect whether specific transformations have been applied to the raw input signal; the resulting shared representation is then used for the downstream HAR task. The approach is evaluated in unsupervised, semi-supervised, and transfer-learning regimes on public smartphone-based HAR datasets and is claimed to reach performance superior or comparable to fully supervised networks while outperforming autoencoders; notably strong results are reported with only 10 labeled examples per class.
Significance. If the empirical claims hold, the work would be a useful contribution to self-supervised representation learning for time-series sensor data. It directly tackles the annotation bottleneck in HAR and demonstrates practical gains in low-label and cross-dataset regimes. The multi-task transformation-recognition framing is simple and generalizable, which could extend beyond HAR.
major comments (2)
- [Abstract / method description paragraph] Abstract and method description paragraph: the central claim that the auxiliary binary classification tasks supply a strong supervisory signal for activity-discriminative features rests on the unstated assumption that the chosen transformations cannot be solved from low-level statistics alone. No list of transformations, no ablation on their difficulty, and no analysis of what features the network must learn to solve them are supplied, leaving open the possibility that the multi-task objective is solved by shallow detectors that do not transfer to HAR.
- [Abstract] Abstract: the assertion of performance “superior to or comparable with fully-supervised networks” and “significantly better than autoencoders” is presented without any quantitative metrics, dataset names, or baseline numbers. Because the abstract is the only place where the headline result is stated, readers cannot assess whether the data actually support the claim.
minor comments (1)
- [Abstract] The abstract states that the method is evaluated “extensively” on “several publicly available datasets” yet supplies neither the dataset names nor the evaluation protocol (e.g., cross-subject vs. cross-session splits).
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the clarity of the abstract and method description. We address each point below and will incorporate revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract / method description paragraph] Abstract and method description paragraph: the central claim that the auxiliary binary classification tasks supply a strong supervisory signal for activity-discriminative features rests on the unstated assumption that the chosen transformations cannot be solved from low-level statistics alone. No list of transformations, no ablation on their difficulty, and no analysis of what features the network must learn to solve them are supplied, leaving open the possibility that the multi-task objective is solved by shallow detectors that do not transfer to HAR.
Authors: We agree that the abstract and method description paragraph should explicitly list the transformations and briefly justify why they require more than low-level statistics. The full manuscript describes the transformations in Section 3, but these details are not summarized in the abstract. We will revise the abstract to include the list of transformations and a short statement on their design. An ablation on transformation difficulty and a basic analysis of the features (e.g., comparison against hand-crafted statistical features) are not present in the current version; we will add a concise ablation in the experiments section of the revised manuscript to directly address this concern. revision: yes
-
Referee: [Abstract] Abstract: the assertion of performance “superior to or comparable with fully-supervised networks” and “significantly better than autoencoders” is presented without any quantitative metrics, dataset names, or baseline numbers. Because the abstract is the only place where the headline result is stated, readers cannot assess whether the data actually support the claim.
Authors: We acknowledge that the abstract would be more informative with explicit dataset names and baseline numbers. While the current abstract already reports a quantitative result (kappa 0.7-0.8 with 10 labels per class), it does not name the datasets or provide specific baseline comparisons. We will revise the abstract to include the dataset names (UCI-HAR, WISDM, PAMAP2) and concise statements of the performance deltas versus the fully supervised and autoencoder baselines, making the claims directly verifiable from the abstract. revision: yes
Circularity Check
No circularity: empirical method with dataset-driven evaluation
full rationale
The paper proposes a multi-task self-supervised approach that trains a temporal convolutional network to recognize applied signal transformations and then transfers the learned features to human activity recognition. All claims rest on empirical results across public datasets in unsupervised, semi-supervised, and transfer settings, with direct comparisons to fully-supervised networks and autoencoders. No derivation, uniqueness theorem, or first-principles prediction is asserted that reduces by construction to fitted parameters, self-citations, or author-defined quantities; the transformations and auxiliary tasks are chosen inputs whose utility is measured externally rather than presupposed.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Temporal convolutional networks can learn semantic representations from raw sensory time-series data.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Breath1024.lean (and headline theorem reality_from_one_distinction)8-tick period forced by distinction echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
we utilize eight transformations to train a multi-task network for simultaneously recognizing each of them... The transformations utilized in this work are summarized below: Noised, Scaled, Rotated, Negated, Horizontally Flipped, Permuted, Time-Warped, Channel-Shuffled
-
IndisputableMonolith/Cost/FunctionalEquation.leanJ(x) = ½(x + x⁻¹) − 1 uniqueness echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
simple auxiliary tasks of the binary classification result in a strong supervisory signal for extracting useful features
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Pulkit Agrawal, Joao Carreira, and Jitendra Malik. Learning to see by moving. In Proceedings of the IEEE International Conference on Computer Vision, pages 37–45, 2015
work page 2015
-
[2]
A public domain dataset for human activity recognition using smartphones
Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. In ESANN, 2013
work page 2013
-
[3]
Relja Arandjelovi ´c and Andrew Zisserman. Objects that sound. arXiv preprint arXiv:1712.06651, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[4]
Soundnet: Learning sound representations from unlabeled video
Yusuf Aytar, Carl Vondrick, and Antonio Torralba. Soundnet: Learning sound representations from unlabeled video. In Advances in Neural Information Processing Systems , pages 892–900, 2016
work page 2016
-
[5]
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Autoencoders, unsupervised learning, and deep architectures
Pierre Baldi. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML workshop on unsupervised and transfer learning, pages 37–49, 2012
work page 2012
-
[7]
A complexity-invariant distance measure for time series
Gustavo EAPA Batista, Xiaoyue Wang, and Eamonn J Keogh. A complexity-invariant distance measure for time series. In Proceedings of the 2011 SIAM international conference on data mining , pages 699–710. SIAM, 2011
work page 2011
-
[8]
Representation learning: A review and new perspectives
Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pa/t_tern analysis and machine intelligence, 35(8):1798–1828, 2013
work page 2013
-
[9]
Using unlabeled data in a sparse-coding framework for human activity recognition
Sourav Bha/t_tacharya, Pe/t_teri Nurmi, Nils Hammerla, and /T_homas Pl¨otz. Using unlabeled data in a sparse-coding framework for human activity recognition. Pervasive and Mobile Computing , 15:242–262, 2014
work page 2014
-
[10]
Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997
work page 1997
-
[11]
Human daily activity and fall recognition using a smartphone/f_is acceleration sensor
Charikleia Chatzaki, Ma/t_thew Pediaditis, George Vavoulas, and Manolis Tsiknakis. Human daily activity and fall recognition using a smartphone/f_is acceleration sensor. InInternational Conference on Information and Communication Technologies for Ageing Well and e-Health, pages 100–118. Springer, 2016
work page 2016
-
[12]
Multi-Scale Convolutional Neural Networks for Time Series Classification
Zhicheng Cui, Wenlin Chen, and Yixin Chen. Multi-scale convolutional neural networks for time series classi/f_ication.arXiv preprint arXiv:1603.06995, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[13]
Multi-task self-supervised visual learning
Carl Doersch and Andrew Zisserman. Multi-task self-supervised visual learning. In /T_he IEEE International Conference on Computer Vision (ICCV), 2017
work page 2017
-
[14]
Unsupervised visual representation learning by context prediction
Carl Doersch, Abhinav Gupta, and Alexei A Efros. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision , pages 1422–1430, 2015
work page 2015
-
[15]
Self-supervised video representation learning with odd-one-out networks
Basura Fernando, Hakan Bilen, Efstratios Gavves, and Stephen Gould. Self-supervised video representation learning with odd-one-out networks. In Computer Vision and Pa/t_tern Recognition (CVPR), 2017 IEEE Conference on, pages 5729–5738. IEEE, 2017
work page 2017
-
[16]
Preprocessing techniques for context recognition from accelerometer data
Davide Figo, Pedro C Diniz, Diogo R Ferreira, and Jo˜ao M Cardoso. Preprocessing techniques for context recognition from accelerometer data. Personal and Ubiquitous Computing , 14(7):645–662, 2010
work page 2010
-
[17]
Petko Georgiev, Sourav Bha/t_tacharya, Nicholas D Lane, and Cecilia Mascolo. Low-resource multi-task audio sensing for mobile and embedded devices via shared deep neural network representations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(3):50, 2017
work page 2017
-
[18]
Unsupervised Representation Learning by Predicting Image Rotations
Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
Self-supervised learning of visual features through embedding images into text topic spaces
Lluis Gomez, Yash Patel, Marc ¸al Rusi˜nol, Dimosthenis Karatzas, and CV Jawahar. Self-supervised learning of visual features through embedding images into text topic spaces. arXiv preprint arXiv:1705.08631, 2017. PACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 0, No. 0, Article 000. Publication date: 0. Multi-task Self-Supervised ...
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables
Nils Y Hammerla, Shane Halloran, and /T_homas Ploetz. Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[21]
Hannun, Pranav Rajpurkar, Masoumeh Haghpanahi, Geoffrey H
Awni Y. Hannun, Pranav Rajpurkar, Masoumeh Haghpanahi, Geoffrey H. Tison, Codie Bourn, Mintu P. Turakhia, and Andrew Y. Ng. Cardiologist-level arrhythmia detection and classi/f_ication in ambulatory electrocardiograms using a deep neural network. Nature Medicine, 25(1):65–69, 2019. ISSN 1546-170X. doi: 10.1038/s41591-018-0268-3. URL h/t_tps://doi.org/10.10...
-
[22]
A joint many-task model: Growing a neural network for multiple nlp tasks
Kazuma Hashimoto, Yoshimasa Tsuruoka, Richard Socher, et al. A joint many-task model: Growing a neural network for multiple nlp tasks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing , pages 1923–1933, 2017
work page 2017
-
[23]
Universal language model /f_ine-tuning for text classi/f_ication
Jeremy Howard and Sebastian Ruder. Universal language model /f_ine-tuning for text classi/f_ication. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , volume 1, pages 328–339, 2018
work page 2018
-
[24]
Self-Supervised Feature Learning by Learning to Spot Artifacts
Simon Jenni and Paolo Favaro. Self-supervised feature learning by learning to spot artifacts. arXiv preprint arXiv:1806.05024, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[25]
Multi-task learning using uncertainty to weigh losses for scene geometry and semantics
Alex Kendall, Yarin Gal, and Roberto Cipolla. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pa/t_tern Recognition, pages 7482–7491, 2018
work page 2018
-
[26]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[27]
Cooperative learning of audio and video models from self-supervised synchronization
Bruno Korbar, Du Tran, and Lorenzo Torresani. Cooperative learning of audio and video models from self-supervised synchronization. In Advances in Neural Information Processing Systems , pages 7774–7785, 2018
work page 2018
-
[28]
Imagenet classi/f_ication with deep convolutional neural networks
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classi/f_ication with deep convolutional neural networks. InAdvances in neural information processing systems , pages 1097–1105, 2012
work page 2012
-
[29]
Jennifer R Kwapisz, Gary M Weiss, and Samuel A Moore. Activity recognition using cell phone accelerometers.ACM SigKDD Explorations Newsle/t_ter, 12(2):74–82, 2011
work page 2011
-
[30]
Colorization as a proxy task for visual understanding
Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Colorization as a proxy task for visual understanding. In CVPR, volume 2, page 7, 2017
work page 2017
-
[31]
Yann LeCun, John S Denker, and Sara A Solla. Optimal brain damage. In Advances in neural information processing systems , pages 598–605, 1990
work page 1990
-
[32]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436 EP –, May 2015. URL h/t_tps://doi.org/10.1038/ nature14539
work page 2015
-
[33]
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning , pages 609–616. ACM, 2009
work page 2009
-
[34]
Unsupervised representation learning by sorting sequences
Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. Unsupervised representation learning by sorting sequences. In Computer Vision (ICCV), 2017 IEEE International Conference on , pages 667–676. IEEE, 2017
work page 2017
-
[35]
Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing
Chi Li, M Zeeshan Zia, /Q_uoc-Huy Tran, Xiang Yu, Gregory D Hager, and Manmohan Chandraker. Deep supervision with shape concepts for occlusion-aware 3d object parsing. arXiv preprint arXiv:1612.02699, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[36]
Measuring the Intrinsic Dimension of Objective Landscapes
Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the intrinsic dimension of objective landscapes. arXiv preprint arXiv:1804.08838, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[37]
Unsupervised feature learning for human activity recognition using smartphone sensors
Yongmou Li, Dianxi Shi, Bo Ding, and Dongbo Liu. Unsupervised feature learning for human activity recognition using smartphone sensors. In Mining Intelligence and Knowledge Exploration , pages 99–107. Springer, 2014
work page 2014
-
[38]
Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment
Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, and Yunsheng Ma. Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. In International Conference on Smart Homes and Health Telematics , pages 37–48. Springer, 2016
work page 2016
-
[39]
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research , 9(Nov):2579–2605, 2008
work page 2008
-
[40]
Protecting sensory data against sensitive inferences
Mohammad Malekzadeh, Richard G Clegg, Andrea Cavallaro, and Hamed Haddadi. Protecting sensory data against sensitive inferences. In Proceedings of the 1st Workshop on Privacy by Design in Distributed Systems , page 2. ACM, 2018
work page 2018
-
[41]
Unimib shar: A dataset for human activity recognition using acceleration data from smartphones
Daniela Micucci, Marco Mobilio, and Paolo Napoletano. Unimib shar: A dataset for human activity recognition using acceleration data from smartphones. Applied Sciences, 7(10):1101, 2017
work page 2017
-
[42]
Shuffle and learn: unsupervised learning using temporal order veri/f_ication
Ishan Misra, C Lawrence Zitnick, and Martial Hebert. Shuffle and learn: unsupervised learning using temporal order veri/f_ication. In European Conference on Computer Vision , pages 527–544. Springer, 2016
work page 2016
-
[43]
Acoustic modeling using deep belief networks
Abdel-rahman Mohamed, George E Dahl, Geoffrey Hinton, et al. Acoustic modeling using deep belief networks. IEEE Trans. Audio, Speech & Language Processing , 20(1):14–22, 2012
work page 2012
-
[44]
Francisco Javier Ord´o ˜nez Morales and Daniel Roggen. Deep convolutional feature transfer across mobile activity recognition domains, sensor modalities and locations. In Proceedings of the 2016 ACM International Symposium on Wearable Computers , pages 92–99. ACM, 2016
work page 2016
-
[45]
On the importance of single directions for generalization
Ari S Morcos, David GT Barre/t_t, Neil C Rabinowitz, and Ma/t_thew Botvinick. On the importance of single directions for generalization. arXiv preprint arXiv:1803.06959, 2018. PACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 0, No. 0, Article 000. Publication date: 0. 000:26 • A. Saeed et al
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[46]
Recti/f_ied linear units improve restricted boltzmann machines
Vinod Nair and Geoffrey E Hinton. Recti/f_ied linear units improve restricted boltzmann machines. InProceedings of the 27th international conference on machine learning (ICML-10) , pages 807–814, 2010
work page 2010
-
[47]
Unsupervised learning of visual representations by solving jigsaw puzzles
Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision, pages 69–84. Springer, 2016
work page 2016
-
[48]
Learning to Exploit Invariances in Clinical Time-Series Data using Sequence Transformer Networks
Jeeheh Oh, Jiaxuan Wang, and Jenna Wiens. Learning to exploit invariances in clinical time-series data using sequence transformer networks. arXiv preprint arXiv:1808.06725, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[49]
/T_he building blocks of interpretability
Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye, and Alexander Mordvintsev. /T_he building blocks of interpretability. Distill, 2018. doi: unde/f_ined. h/t_tps://distill.pub/2018/building-blocks
work page 2018
-
[50]
Realistic evaluation of deep semi-supervised learning algorithms
Avital Oliver, Augustus Odena, Colin A Raffel, Ekin Dogus Cubuk, and Ian Goodfellow. Realistic evaluation of deep semi-supervised learning algorithms. In Advances in Neural Information Processing Systems , pages 3235–3246, 2018
work page 2018
-
[51]
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens and Alexei A Efros. Audio-visual scene analysis with self-supervised multisensory features. arXiv preprint arXiv:1804.03641, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[52]
Ambient sound provides supervision for visual learning
Andrew Owens, Jiajun Wu, Josh H McDermo/t_t, William T Freeman, and Antonio Torralba. Ambient sound provides supervision for visual learning. In European Conference on Computer Vision , pages 801–816. Springer, 2016
work page 2016
-
[53]
Sinno Jialin Pan, Qiang Yang, et al. A survey on transfer learning.IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010
work page 2010
-
[54]
Curiosity-driven exploration by self-supervised prediction
Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning (ICML) , volume 2017, 2017
work page 2017
-
[55]
Feature learning for activity recognition in ubiquitous computing
/T_homas Pl¨otz, Nils Y Hammerla, and Patrick Olivier. Feature learning for activity recognition in ubiquitous computing. In IJCAI Proceedings-International Joint Conference on Arti/f_icial Intelligence, volume 22, page 1729, 2011
work page 2011
-
[56]
Multimodal deep learning for activity and context recognition
Valentin Radu, Catherine Tong, Sourav Bha/t_tacharya, Nicholas D Lane, Cecilia Mascolo, Mahesh K Marina, and Fahim Kawsar. Multimodal deep learning for activity and context recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(4):157, 2018
work page 2018
-
[57]
Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. In Advances in Neural Information Processing Systems , pages 6076–6085, 2017
work page 2017
-
[58]
Self-taught learning: transfer learning from unlabeled data
Rajat Raina, Alexis Ba/t_tle, Honglak Lee, Benjamin Packer, and Andrew Y Ng. Self-taught learning: transfer learning from unlabeled data. In Proceedings of the 24th international conference on Machine learning , pages 759–766. ACM, 2007
work page 2007
-
[59]
Multi-task prediction of disease onsets from longitudinal laboratory tests
Narges Razavian, Jake Marcus, and David Sontag. Multi-task prediction of disease onsets from longitudinal laboratory tests. In Machine Learning for Healthcare Conference , pages 73–100, 2016
work page 2016
-
[60]
Personalized Driver Stress Detection with Multi-task Neural Networks using Physiological Signals
Aaqib Saeed and Stojan Trajanovski. Personalized driver stress detection with multi-task neural networks using physiological signals. arXiv preprint arXiv:1711.06116, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[61]
Synthesizing and reconstructing missing sensory modalities in behavioral context recognition
Aaqib Saeed, Tanir Ozcelebi, and Johan Lukkien. Synthesizing and reconstructing missing sensory modalities in behavioral context recognition. Sensors, 18(9):2967, 2018
work page 2018
-
[62]
Cnn features off-the-shelf: an astounding baseline for recognition
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pa/t_tern recognition workshops, pages 806–813, 2014
work page 2014
-
[63]
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classi/f_ication models and saliency maps. arXiv preprint arXiv:1312.6034, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[64]
Allan Stisen, Henrik Blunck, Sourav Bha/t_tacharya, /T_hor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. Smart devices are different: Assessing and mitigating mobile sensing heterogeneities for activity recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems , pages 127–140. ACM, 2015
work page 2015
-
[65]
Sequence to sequence learning with neural networks
Ilya Sutskever, Oriol Vinyals, and /Q_uoc V Le. Sequence to sequence learning with neural networks. InAdvances in neural information processing systems, pages 3104–3112, 2014
work page 2014
-
[66]
Deepface: Closing the gap to human-level performance in face veri/f_ication
Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face veri/f_ication. InProceedings of the IEEE conference on computer vision and pa/t_tern recognition, pages 1701–1708, 2014
work page 2014
-
[67]
Terry T Um, Franz MJ P/f_ister, Daniel Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, and Dana Kuli ´c. Data augmentation of wearable sensor data for parkinson/f_is disease monitoring using convolutional neural networks. InProceedings of the 19th ACM International Conference on Multimodal Interaction , pages 216–220. ACM, 2017
work page 2017
-
[68]
Deep learning for sensor-based activity recognition: A survey
Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. Deep learning for sensor-based activity recognition: A survey. Pa/t_tern Recognition Le/t_ters, 2018
work page 2018
-
[69]
Deep transfer learning for cross-domain activity recognition
Jindong Wang, Vincent W Zheng, Yiqiang Chen, and Meiyu Huang. Deep transfer learning for cross-domain activity recognition. In Proceedings of the 3rd International Conference on Crowd Science and Engineering , page 16. ACM, 2018
work page 2018
-
[70]
S. Wawrzyniak and W. Niemiro. Clustering approach to the problem of human activity recognition using motion data. In 2015 Federated Conference on Computer Science and Information Systems (FedCSIS) , pages 411–416, Sep. 2015. doi: 10.15439/2015F424
-
[71]
Learning and using the arrow of time
Donglai Wei, Joseph Lim, Andrew Zisserman, and William T Freeman. Learning and using the arrow of time. In Proceedings of the IEEE Conference on Computer Vision and Pa/t_tern Recognition, pages 8052–8060, 2018. PACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 0, No. 0, Article 000. Publication date: 0. Multi-task Self-Supervised Lea...
work page 2018
-
[72]
Deep convolutional neural networks on multichannel time series for human activity recognition
Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy. Deep convolutional neural networks on multichannel time series for human activity recognition. In Ijcai, volume 15, pages 3995–4001, 2015
work page 2015
-
[73]
Sensegan: Enabling deep learning for internet of things with a semi-supervised framework
Shuochao Yao, Yiran Zhao, Huajie Shao, Chao Zhang, Aston Zhang, Shaohan Hu, Dongxin Liu, Shengzhong Liu, Lu Su, and Tarek Abdelzaher. Sensegan: Enabling deep learning for internet of things with a semi-supervised framework. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(3):144, 2018
work page 2018
-
[74]
Split-brain autoencoders: Unsupervised learning by cross-channel prediction
Richard Zhang, Phillip Isola, and Alexei A Efros. Split-brain autoencoders: Unsupervised learning by cross-channel prediction. In CVPR, volume 1, page 5, 2017. PACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 0, No. 0, Article 000. Publication date: 0. 000:28 • A. Saeed et al. APPENDIX Table 4. Evaluating self-supervised representat...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.