Easy Ensemble: Simple Deep Ensemble Learning for Sensor-Based Human Activity Recognition

Kazuma Kondo; Tatsuhito Hasegawa

arxiv: 2203.04153 · v2 · submitted 2022-03-08 · 💻 cs.CV

Easy Ensemble: Simple Deep Ensemble Learning for Sensor-Based Human Activity Recognition

Tatsuhito Hasegawa , Kazuma Kondo This is my paper

Pith reviewed 2026-05-24 11:18 UTC · model grok-4.3

classification 💻 cs.CV

keywords Easy Ensembledeep ensemble learninghuman activity recognitionsensor-based HARsingle model ensembleinput variationerstepwise ensemblechannel shuffle

0 comments

The pith

Easy Ensemble implements deep ensemble learning inside one model for sensor-based human activity recognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Easy Ensemble as a method that delivers the generalization gains of deep ensemble learning without the usual requirement to train and maintain multiple separate models. It achieves this by embedding three specific techniques—an input variationer, stepwise ensemble, and channel shuffle—directly into a single network architecture. This matters for sensor-based human activity recognition because traditional ensembles improve accuracy on raw sensor data but add substantial training time and deployment cost in IoT settings. Experiments on benchmark datasets compare the single-model approach against conventional ensembles and demonstrate comparable performance. A sympathetic reader would care because the method reduces the procedural overhead while retaining the robustness that makes representation learning effective for activity data.

Core claim

Easy Ensemble enables the easy implementation of deep ensemble learning in a single model for sensor-based human activity recognition. The approach incorporates an input variationer to create diverse inputs, a stepwise ensemble to build the ensemble progressively, and channel shuffle to increase feature diversity, allowing the single model to replicate the generalization benefits that normally require training multiple independent models.

What carries the argument

Easy Ensemble, a single deep network that integrates input variationer, stepwise ensemble, and channel shuffle to produce ensemble-like generalization without separate model training.

If this is right

Deep ensemble benefits become available without separate data partitioning and multiple training runs.
Training time and computational expense decrease while maintaining performance on sensor-based activity recognition tasks.
The single-model design simplifies deployment in resource-limited IoT environments.
The three techniques can be combined with existing representation learning pipelines for HAR.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same single-model substitution might reduce ensemble costs in related sensor tasks such as gesture recognition or fall detection.
If the techniques mainly increase internal diversity, they could be tested as a lightweight addition to other regularization strategies.
Extending the method to longer time-series windows or multi-modal sensor inputs would test whether the observed benefits scale.

Load-bearing premise

The proposed techniques of input variationer, stepwise ensemble, and channel shuffle can replicate the generalization benefits of training multiple separate models within a single model architecture.

What would settle it

An experiment showing that Easy Ensemble produces substantially lower accuracy than a conventional ensemble of multiple independently trained models on the same benchmark HAR dataset would falsify the central claim.

Figures

Figures reproduced from arXiv: 2203.04153 by Kazuma Kondo, Tatsuhito Hasegawa.

**Figure 2.** Figure 2: Model architecture of our proposed EE method and common ensemble of deep learning. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Ensemble method (VGG architecture) versus HASC Accuracy. The number of filters in the first convolutional [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Model size versus HASC accuracy for each model. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: HAR accuracies for each public dataset [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation study (VGG architecture and HASC accuracy). The first letter denotes convolution type (G: group [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Model Size versus HASC Accuracy. Ensemble models are scaled up by increasing the number of ensembles [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Input masking to make input groups. R16 R4M4 R4A4 A4M4 Input type 0.80 0.82 0.84 0.86 test_acc [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: HASC accuracies for each input variation. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: HAR accuracies by modality ensembles (VGG architecture). [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 11.** Figure 11: The effect of stepwise ensemble model (HASC acc. and VGG architecture). [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

read the original abstract

Sensor-based human activity recognition (HAR) is a paramount technology in the Internet of Things services. HAR using representation learning, which automatically learns a feature representation from raw data, is the mainstream method because it is difficult to interpret relevant information from raw sensor data to design meaningful features. Ensemble learning is a robust approach to improve generalization performance; however, deep ensemble learning requires various procedures, such as data partitioning and training multiple models, which are time-consuming and computationally expensive. In this study, we propose Easy Ensemble (EE) for HAR, which enables the easy implementation of deep ensemble learning in a single model. In addition, we propose various techniques (input variationer, stepwise ensemble, and channel shuffle) for the EE. Experiments on a benchmark dataset for HAR demonstrated the effectiveness of EE and various techniques and their characteristics compared with conventional ensemble learning methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Easy Ensemble packages three tricks into one model to approximate deep ensembles for HAR, which is a narrow but practical move.

read the letter

The paper's core move is Easy Ensemble: a single-model setup that tries to get the generalization lift of deep ensembles without training multiple separate networks. It adds an input variationer, stepwise ensemble, and channel shuffle to create diversity inside one architecture. That directly targets the compute and time cost of standard ensembles in sensor-based human activity recognition, which matters for IoT settings where you cannot afford multiple forward passes or separate trainings at inference time. The experiments on a benchmark dataset are the main evidence offered, and the paper positions the method against conventional ensemble baselines. That framing is straightforward and the techniques are described clearly enough that someone could reimplement them. The work stays focused on the practical constraint rather than claiming broader theoretical advances. The soft spot is the absence of any numbers in the abstract—no accuracy deltas, no compute comparisons, no ablation on the three components, and no detail on the dataset or training protocol. Without those, it is difficult to judge whether the single-model version actually recovers most of the ensemble benefit or just adds mild regularization. The central assumption that input variation plus channel shuffle plus stepwise training can substitute for model diversity is plausible but untested in the summary, so the strength of the result hinges on what the full tables show. Minor concern only if the experiments turn out to be thin; otherwise the claim is scoped narrowly enough that it does not collapse. This paper is for people already working on efficient HAR pipelines who want a lighter way to boost performance. A reader who needs reproducible tricks for edge deployment would get usable ideas from the method section. It is solid enough on its own terms to go to peer review; the idea is concrete, the motivation is real, and referees can check the numbers and ablations directly.

Referee Report

2 major / 2 minor

Summary. The paper proposes Easy Ensemble (EE), a method to realize deep ensemble learning for sensor-based human activity recognition (HAR) inside a single model architecture. It introduces three supporting techniques—an input variationer, stepwise ensemble, and channel shuffle—and reports benchmark experiments demonstrating that EE matches or exceeds the generalization performance of conventional multi-model ensembles while avoiding their training overhead.

Significance. If the central claim holds, EE would provide a practical, lower-cost route to ensemble-level robustness in HAR pipelines for IoT applications. The work supplies direct experimental comparisons against standard ensemble baselines on a public benchmark, which is a positive attribute for reproducibility and falsifiability.

major comments (2)

[§4.1, Eq. (3)] §4.1 and Eq. (3): the input variationer is presented as the key mechanism for injecting ensemble-like diversity, yet the manuscript does not quantify the effective diversity (e.g., via prediction disagreement or feature-space variance) between the implicit sub-models; without this measurement the claim that EE replicates multi-model generalization rests on accuracy numbers alone.
[Table 3] Table 3, final row: the reported F1-score improvement of EE over the single-model baseline is 1.8 percentage points, but no standard deviation across runs or statistical significance test is supplied; this weakens the assertion that the observed gain is reliably attributable to the ensemble mechanism rather than training stochasticity.

minor comments (2)

[§3.3] The notation for the channel-shuffle operation in §3.3 is introduced without an accompanying diagram or pseudocode, making the exact tensor reshaping difficult to reconstruct from the prose description alone.
[Figure 4] Figure 4 caption does not state the number of independent training runs used to generate the plotted curves.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recommendation. We address the two major comments below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [§4.1, Eq. (3)] §4.1 and Eq. (3): the input variationer is presented as the key mechanism for injecting ensemble-like diversity, yet the manuscript does not quantify the effective diversity (e.g., via prediction disagreement or feature-space variance) between the implicit sub-models; without this measurement the claim that EE replicates multi-model generalization rests on accuracy numbers alone.

Authors: We agree that an explicit quantification of diversity would provide stronger support for the claim. In the revised version we will add an analysis section reporting prediction disagreement rates and feature-space variance (e.g., cosine distance between activations) across the implicit sub-models created by the input variationer, directly comparing these metrics to those obtained from a conventional multi-model ensemble. revision: yes
Referee: [Table 3] Table 3, final row: the reported F1-score improvement of EE over the single-model baseline is 1.8 percentage points, but no standard deviation across runs or statistical significance test is supplied; this weakens the assertion that the observed gain is reliably attributable to the ensemble mechanism rather than training stochasticity.

Authors: We acknowledge that reporting variability and significance strengthens the results. We will rerun all experiments with at least five random seeds, add standard deviations to Table 3, and include a statistical significance test (paired t-test) between EE and the single-model baseline. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes Easy Ensemble as a single-model approximation to deep ensembles via three new components (input variationer, stepwise ensemble, channel shuffle) and validates them on standard HAR benchmarks. No derivation reduces to a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation chain; the central claim is an empirical engineering contribution whose correctness is tested externally rather than assumed by construction. The provided abstract and description contain no equations or premises that collapse into their own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on any free parameters, axioms, or invented entities used in the work.

pith-pipeline@v0.9.0 · 5670 in / 1062 out tokens · 50635 ms · 2026-05-24T11:18:48.729144+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

C. Xu, D. Chai, J. He, X. Zhang, and S. Duan. Innohar: A deep neural network for complex human activity recognition. IEEE Access, 7:9893–9902, Jan. 2019. 13 Running Title for Header

work page 2019
[2]

Asymmetric residual neural network for accurate human activity recognition

Jun Long, Wuqing Sun, Zhan Yang, and Osolo Ian Raymond. Asymmetric residual neural network for accurate human activity recognition. Information, 10(6), 2019

work page 2019
[3]

K. Wang, J. He, and L. Zhang. Attention-based convolutional neural network for weakly labeled human activities´recognition with wearable sensors. IEEE Sensors Journal, 19(17):7598–7604, Sep. 2019

work page 2019
[4]

Reiss and D

A. Reiss and D. Stricker. Creating and benchmarking a new dataset for physical activity monitoring. In In Proc. of the 5th International Conference on PErvasive Technologies Related to Assistive Environments (PETRA), pages 40:1–40:8, 2012

work page 2012
[5]

Deepsense: A uniﬁed deep learning framework for time-series mobile sensing data processing

Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. Deepsense: A uniﬁed deep learning framework for time-series mobile sensing data processing. In Proc. of the 26th International Conference on World Wide Web, page 351–360, 2017

work page 2017
[6]

Lane, Cecilia Mascolo, Mahesh K

Valentin Radu, Catherine Tong, Sourav Bhattacharya, Nicholas D. Lane, Cecilia Mascolo, Mahesh K. Marina, and Fahim Kawsar. Multimodal deep learning for activity and context recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 1(4), jan 2018

work page 2018
[7]

Embracenet for activity: A deep multimodal fusion architecture for activity recognition

Jun-Ho Choi and Jong-Seok Lee. Embracenet for activity: A deep multimodal fusion architecture for activity recognition. In Adjunct Proc. of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, page 693–698, 2019

work page 2019
[8]

Deep learning based multimodal complex human activity recognition using wearable devices

Ling Chen, Xiaoze Liu, Liangying Peng, and Menghan Wu. Deep learning based multimodal complex human activity recognition using wearable devices. Applied Intelligence, 51:4029–4042, jun 2021

work page 2021
[9]

Mothernets: Rapid deep ensemble learning

Abdul Wasay, Brian Hentschel, Yuze Liao, Sanyuan Chen, and Stratos Idreos. Mothernets: Rapid deep ensemble learning. In I. Dhillon, D. Papailiopoulos, and V . Sze, editors,Proc. of Machine Learning and Systems, volume 2, pages 199–215, 2020

work page 2020
[10]

Dammas, Rahaf D

Abdulhamit Subasi, Dalia H. Dammas, Rahaf D. Alghamdi, Raghad A. Makawi, Eman A. Albiety, Tayeb Brahimi, and Akila Sarirete. Sensor based human activity recognition using adaboost ensemble classiﬁer. Procedia Computer Science, 140:104–111, 2018

work page 2018
[11]

Naomi Irvine, Chris Nugent, Shuai Zhang, Hui Wang, and Wing W. Y . NG. Neural network ensembles for sensor-based human activity recognition within smart environments. Sensors, 20(1):1–26, 2020

work page 2020
[12]

Physique-based human activity recognition using ensemble learning and smartphone sensors

Nurul Amin Choudhury, Soumen Moulik, and Diptendu Sinha Roy. Physique-based human activity recognition using ensemble learning and smartphone sensors. IEEE Sensors Journal, 21(15):16852–16860, 2021

work page 2021
[13]

A cascade ensemble learning model for human activity recognition with smartphones

Shoujiang Xu, Qingfeng Tang, Linpeng Jin, and Zhigeng Pan. A cascade ensemble learning model for human activity recognition with smartphones. Sensors, 19(10):1–17, 2019

work page 2019
[14]

Efﬁcient human activity recognition solving the confusing activities via deep ensemble learning.IEEE Access, 7:75490–75499, 2019

Ran Zhu, Zhuoling Xiao, Ying Li, Mingkun Yang, Yawen Tan, Liang Zhou, Shuisheng Lin, and Hongkai Wen. Efﬁcient human activity recognition solving the confusing activities via deep ensemble learning.IEEE Access, 7:75490–75499, 2019

work page 2019
[15]

An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition

Anjali Gupta Vijay Bhaskar Semwal and Praveen Lalwani. An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition. The Journal of Supercomputing, pages 1–25, 2021

work page 2021
[16]

Why m heads are better than one: Training a diverse ensemble of deep networks, 2015

Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, and Dhruv Batra. Why m heads are better than one: Training a diverse ensemble of deep networks, 2015

work page 2015
[17]

Batchensemble: an alternative approach to efﬁcient ensemble and lifelong learning

Yeming Wen, Dustin Tran, and Jimmy Ba. Batchensemble: an alternative approach to efﬁcient ensemble and lifelong learning. In Proc. of the International Conference on Learning Representations, 2020

work page 2020
[18]

Training independent subnetworks for robust prediction

Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, and Dustin Tran. Training independent subnetworks for robust prediction. In Proc. of the International Conference on Learning Representations, 2021

work page 2021
[19]

More or less: When and how to build convolutional neural network ensembles

Abdul Wasay and Stratos Idreos. More or less: When and how to build convolutional neural network ensembles. In Proc. of the International Conference on Learning Representations, 2021

work page 2021
[20]

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Zeyuan Allen-Zhu and Yuanzhi Li. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. arXiv, 2012.09816v2:1–70, Jul. 2021

work page arXiv 2012
[21]

Distilling the knowledge in a neural network

Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In Proc. of the NIPS Deep Learning and Representation Learning Workshop, 2015

work page 2015
[22]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5987–5995, 2017. 14 Running Title for Header

work page 2017
[23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016
[24]

Group ensemble: Learning an ensemble of convnets in a single convnet, 2020

Hao Chen and Abhinav Shrivastava. Group ensemble: Learning an ensemble of convnets in a single convnet, 2020

work page 2020
[25]

Simonyan and A

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. of the International Conference on Learning Representations, pages 1–14, May 2015

work page 2015
[26]

Xception: Deep learning with depthwise separable convolutions

François Chollet. Xception: Deep learning with depthwise separable convolutions. InProc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1800–1807, 2017

work page 2017
[27]

Group normalization

Yuxin Wu and Kaiming He. Group normalization. In Proc. of the European Conference on Computer Vision (ECCV), September 2018

work page 2018
[28]

Kawaguchi, N

N. Kawaguchi, N. Ogawa, Y . Iwasaki, K. Kaji, T. Terada, K. Murao, S. Inoue, Y . Kawahara, Y . Sumi, and N. Nishio. Hasc challenge: Gathering large scale human activity corpus for the real-world activity understandings. In In Proc. of the 2nd Augmented Human International Conference, Mar. 2011

work page 2011
[29]

Rethinking the inception architecture for computer vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, 2016

work page 2016
[30]

Squeeze-and-excitation networks

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

work page 2018
[31]

Anguita, A

D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. In In Proc. of the 21st European Symposium on Artiﬁcial Neural Networks (ESANN), pages 437–442, Apr. 2013

work page 2013
[32]

J. R. Kwapisz, G. M. Weiss, and S. A. Moore. Activity recognition using cell phone accelerometers. SIGKDD Explor. Newsl., 12(2):74–82, 2011

work page 2011
[33]

Mobilio D

M. Mobilio D. Micucci and P. Napoletano. Unimib shar: A dataset for human activity recognition using acceleration data from smartphones. Apld. Sci.,, 7(10), 2017

work page 2017
[34]

Octave mix: Data augmentation using frequency decomposition for activity recognition

Tatsuhito Hasegawa. Octave mix: Data augmentation using frequency decomposition for activity recognition. IEEE Access, 9:53679–53686, 2021

work page 2021
[35]

Zhang, M

H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. In Proc. of the International Conference on Learning Representations, pages 1–13, Apr. 2018

work page 2018
[36]

Takahashi, T

R. Takahashi, T. Matsubara, and K. Uehara. Ricap: Random image cropping and patchingdata augmentation for deep cnns. In Proc. of Mach. Lrn. Res., volume 95, pages 786–798, Apr. 2018

work page 2018
[37]

Randaugment: Practical automated data augmentation with a reduced search space

Ekin Dogus Cubuk, Barret Zoph, Jon Shlens, and Quoc Le. Randaugment: Practical automated data augmentation with a reduced search space. In Advances in Neural Information Processing Systems, volume 33, pages 18613– 18624, 2020

work page 2020
[38]

Human activity recognition based on smartphone and wearable sensors using multiscale dcnn ensemble

Jessica Sena, Jesimon Barreto, Carlos Caetano, Guilherme Cramer, and William Robson Schwartz. Human activity recognition based on smartphone and wearable sensors using multiscale dcnn ensemble. Neurocomputing, 444:226–243, 2021

work page 2021
[39]

En- semconvnet: a deep learning approach for human activity recognition using smartphone sensors for healthcare applications

Debadyuti Mukherjee, Riktim Mondal, Pawan Kumar Singh, Ram Sarkar, and Debotosh Bhattacharjee. En- semconvnet: a deep learning approach for human activity recognition using smartphone sensors for healthcare applications. Multimedia Tools and Applications, 79:31663–31690, 11 2020

work page 2020
[40]

Multi-input cnn-gru based human activity recognition using wearable sensors

Nidhi Dua, Shiva Singh, and Vijay Semwal. Multi-input cnn-gru based human activity recognition using wearable sensors. Computing, 103:1–18, 07 2021. 15

work page 2021

[1] [1]

C. Xu, D. Chai, J. He, X. Zhang, and S. Duan. Innohar: A deep neural network for complex human activity recognition. IEEE Access, 7:9893–9902, Jan. 2019. 13 Running Title for Header

work page 2019

[2] [2]

Asymmetric residual neural network for accurate human activity recognition

Jun Long, Wuqing Sun, Zhan Yang, and Osolo Ian Raymond. Asymmetric residual neural network for accurate human activity recognition. Information, 10(6), 2019

work page 2019

[3] [3]

K. Wang, J. He, and L. Zhang. Attention-based convolutional neural network for weakly labeled human activities´recognition with wearable sensors. IEEE Sensors Journal, 19(17):7598–7604, Sep. 2019

work page 2019

[4] [4]

Reiss and D

A. Reiss and D. Stricker. Creating and benchmarking a new dataset for physical activity monitoring. In In Proc. of the 5th International Conference on PErvasive Technologies Related to Assistive Environments (PETRA), pages 40:1–40:8, 2012

work page 2012

[5] [5]

Deepsense: A uniﬁed deep learning framework for time-series mobile sensing data processing

Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. Deepsense: A uniﬁed deep learning framework for time-series mobile sensing data processing. In Proc. of the 26th International Conference on World Wide Web, page 351–360, 2017

work page 2017

[6] [6]

Lane, Cecilia Mascolo, Mahesh K

Valentin Radu, Catherine Tong, Sourav Bhattacharya, Nicholas D. Lane, Cecilia Mascolo, Mahesh K. Marina, and Fahim Kawsar. Multimodal deep learning for activity and context recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 1(4), jan 2018

work page 2018

[7] [7]

Embracenet for activity: A deep multimodal fusion architecture for activity recognition

Jun-Ho Choi and Jong-Seok Lee. Embracenet for activity: A deep multimodal fusion architecture for activity recognition. In Adjunct Proc. of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, page 693–698, 2019

work page 2019

[8] [8]

Deep learning based multimodal complex human activity recognition using wearable devices

Ling Chen, Xiaoze Liu, Liangying Peng, and Menghan Wu. Deep learning based multimodal complex human activity recognition using wearable devices. Applied Intelligence, 51:4029–4042, jun 2021

work page 2021

[9] [9]

Mothernets: Rapid deep ensemble learning

Abdul Wasay, Brian Hentschel, Yuze Liao, Sanyuan Chen, and Stratos Idreos. Mothernets: Rapid deep ensemble learning. In I. Dhillon, D. Papailiopoulos, and V . Sze, editors,Proc. of Machine Learning and Systems, volume 2, pages 199–215, 2020

work page 2020

[10] [10]

Dammas, Rahaf D

Abdulhamit Subasi, Dalia H. Dammas, Rahaf D. Alghamdi, Raghad A. Makawi, Eman A. Albiety, Tayeb Brahimi, and Akila Sarirete. Sensor based human activity recognition using adaboost ensemble classiﬁer. Procedia Computer Science, 140:104–111, 2018

work page 2018

[11] [11]

Naomi Irvine, Chris Nugent, Shuai Zhang, Hui Wang, and Wing W. Y . NG. Neural network ensembles for sensor-based human activity recognition within smart environments. Sensors, 20(1):1–26, 2020

work page 2020

[12] [12]

Physique-based human activity recognition using ensemble learning and smartphone sensors

Nurul Amin Choudhury, Soumen Moulik, and Diptendu Sinha Roy. Physique-based human activity recognition using ensemble learning and smartphone sensors. IEEE Sensors Journal, 21(15):16852–16860, 2021

work page 2021

[13] [13]

A cascade ensemble learning model for human activity recognition with smartphones

Shoujiang Xu, Qingfeng Tang, Linpeng Jin, and Zhigeng Pan. A cascade ensemble learning model for human activity recognition with smartphones. Sensors, 19(10):1–17, 2019

work page 2019

[14] [14]

Efﬁcient human activity recognition solving the confusing activities via deep ensemble learning.IEEE Access, 7:75490–75499, 2019

Ran Zhu, Zhuoling Xiao, Ying Li, Mingkun Yang, Yawen Tan, Liang Zhou, Shuisheng Lin, and Hongkai Wen. Efﬁcient human activity recognition solving the confusing activities via deep ensemble learning.IEEE Access, 7:75490–75499, 2019

work page 2019

[15] [15]

An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition

Anjali Gupta Vijay Bhaskar Semwal and Praveen Lalwani. An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition. The Journal of Supercomputing, pages 1–25, 2021

work page 2021

[16] [16]

Why m heads are better than one: Training a diverse ensemble of deep networks, 2015

Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, and Dhruv Batra. Why m heads are better than one: Training a diverse ensemble of deep networks, 2015

work page 2015

[17] [17]

Batchensemble: an alternative approach to efﬁcient ensemble and lifelong learning

Yeming Wen, Dustin Tran, and Jimmy Ba. Batchensemble: an alternative approach to efﬁcient ensemble and lifelong learning. In Proc. of the International Conference on Learning Representations, 2020

work page 2020

[18] [18]

Training independent subnetworks for robust prediction

Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, and Dustin Tran. Training independent subnetworks for robust prediction. In Proc. of the International Conference on Learning Representations, 2021

work page 2021

[19] [19]

More or less: When and how to build convolutional neural network ensembles

Abdul Wasay and Stratos Idreos. More or less: When and how to build convolutional neural network ensembles. In Proc. of the International Conference on Learning Representations, 2021

work page 2021

[20] [20]

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Zeyuan Allen-Zhu and Yuanzhi Li. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. arXiv, 2012.09816v2:1–70, Jul. 2021

work page arXiv 2012

[21] [21]

Distilling the knowledge in a neural network

Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In Proc. of the NIPS Deep Learning and Representation Learning Workshop, 2015

work page 2015

[22] [22]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5987–5995, 2017. 14 Running Title for Header

work page 2017

[23] [23]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

work page 2016

[24] [24]

Group ensemble: Learning an ensemble of convnets in a single convnet, 2020

Hao Chen and Abhinav Shrivastava. Group ensemble: Learning an ensemble of convnets in a single convnet, 2020

work page 2020

[25] [25]

Simonyan and A

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. of the International Conference on Learning Representations, pages 1–14, May 2015

work page 2015

[26] [26]

Xception: Deep learning with depthwise separable convolutions

François Chollet. Xception: Deep learning with depthwise separable convolutions. InProc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1800–1807, 2017

work page 2017

[27] [27]

Group normalization

Yuxin Wu and Kaiming He. Group normalization. In Proc. of the European Conference on Computer Vision (ECCV), September 2018

work page 2018

[28] [28]

Kawaguchi, N

N. Kawaguchi, N. Ogawa, Y . Iwasaki, K. Kaji, T. Terada, K. Murao, S. Inoue, Y . Kawahara, Y . Sumi, and N. Nishio. Hasc challenge: Gathering large scale human activity corpus for the real-world activity understandings. In In Proc. of the 2nd Augmented Human International Conference, Mar. 2011

work page 2011

[29] [29]

Rethinking the inception architecture for computer vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, 2016

work page 2016

[30] [30]

Squeeze-and-excitation networks

Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

work page 2018

[31] [31]

Anguita, A

D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. In In Proc. of the 21st European Symposium on Artiﬁcial Neural Networks (ESANN), pages 437–442, Apr. 2013

work page 2013

[32] [32]

J. R. Kwapisz, G. M. Weiss, and S. A. Moore. Activity recognition using cell phone accelerometers. SIGKDD Explor. Newsl., 12(2):74–82, 2011

work page 2011

[33] [33]

Mobilio D

M. Mobilio D. Micucci and P. Napoletano. Unimib shar: A dataset for human activity recognition using acceleration data from smartphones. Apld. Sci.,, 7(10), 2017

work page 2017

[34] [34]

Octave mix: Data augmentation using frequency decomposition for activity recognition

Tatsuhito Hasegawa. Octave mix: Data augmentation using frequency decomposition for activity recognition. IEEE Access, 9:53679–53686, 2021

work page 2021

[35] [35]

Zhang, M

H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. In Proc. of the International Conference on Learning Representations, pages 1–13, Apr. 2018

work page 2018

[36] [36]

Takahashi, T

R. Takahashi, T. Matsubara, and K. Uehara. Ricap: Random image cropping and patchingdata augmentation for deep cnns. In Proc. of Mach. Lrn. Res., volume 95, pages 786–798, Apr. 2018

work page 2018

[37] [37]

Randaugment: Practical automated data augmentation with a reduced search space

Ekin Dogus Cubuk, Barret Zoph, Jon Shlens, and Quoc Le. Randaugment: Practical automated data augmentation with a reduced search space. In Advances in Neural Information Processing Systems, volume 33, pages 18613– 18624, 2020

work page 2020

[38] [38]

Human activity recognition based on smartphone and wearable sensors using multiscale dcnn ensemble

Jessica Sena, Jesimon Barreto, Carlos Caetano, Guilherme Cramer, and William Robson Schwartz. Human activity recognition based on smartphone and wearable sensors using multiscale dcnn ensemble. Neurocomputing, 444:226–243, 2021

work page 2021

[39] [39]

En- semconvnet: a deep learning approach for human activity recognition using smartphone sensors for healthcare applications

Debadyuti Mukherjee, Riktim Mondal, Pawan Kumar Singh, Ram Sarkar, and Debotosh Bhattacharjee. En- semconvnet: a deep learning approach for human activity recognition using smartphone sensors for healthcare applications. Multimedia Tools and Applications, 79:31663–31690, 11 2020

work page 2020

[40] [40]

Multi-input cnn-gru based human activity recognition using wearable sensors

Nidhi Dua, Shiva Singh, and Vijay Semwal. Multi-input cnn-gru based human activity recognition using wearable sensors. Computing, 103:1–18, 07 2021. 15

work page 2021