pith. sign in

arxiv: 2203.04153 · v2 · submitted 2022-03-08 · 💻 cs.CV

Easy Ensemble: Simple Deep Ensemble Learning for Sensor-Based Human Activity Recognition

Pith reviewed 2026-05-24 11:18 UTC · model grok-4.3

classification 💻 cs.CV
keywords Easy Ensembledeep ensemble learninghuman activity recognitionsensor-based HARsingle model ensembleinput variationerstepwise ensemblechannel shuffle
0
0 comments X

The pith

Easy Ensemble implements deep ensemble learning inside one model for sensor-based human activity recognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Easy Ensemble as a method that delivers the generalization gains of deep ensemble learning without the usual requirement to train and maintain multiple separate models. It achieves this by embedding three specific techniques—an input variationer, stepwise ensemble, and channel shuffle—directly into a single network architecture. This matters for sensor-based human activity recognition because traditional ensembles improve accuracy on raw sensor data but add substantial training time and deployment cost in IoT settings. Experiments on benchmark datasets compare the single-model approach against conventional ensembles and demonstrate comparable performance. A sympathetic reader would care because the method reduces the procedural overhead while retaining the robustness that makes representation learning effective for activity data.

Core claim

Easy Ensemble enables the easy implementation of deep ensemble learning in a single model for sensor-based human activity recognition. The approach incorporates an input variationer to create diverse inputs, a stepwise ensemble to build the ensemble progressively, and channel shuffle to increase feature diversity, allowing the single model to replicate the generalization benefits that normally require training multiple independent models.

What carries the argument

Easy Ensemble, a single deep network that integrates input variationer, stepwise ensemble, and channel shuffle to produce ensemble-like generalization without separate model training.

If this is right

  • Deep ensemble benefits become available without separate data partitioning and multiple training runs.
  • Training time and computational expense decrease while maintaining performance on sensor-based activity recognition tasks.
  • The single-model design simplifies deployment in resource-limited IoT environments.
  • The three techniques can be combined with existing representation learning pipelines for HAR.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same single-model substitution might reduce ensemble costs in related sensor tasks such as gesture recognition or fall detection.
  • If the techniques mainly increase internal diversity, they could be tested as a lightweight addition to other regularization strategies.
  • Extending the method to longer time-series windows or multi-modal sensor inputs would test whether the observed benefits scale.

Load-bearing premise

The proposed techniques of input variationer, stepwise ensemble, and channel shuffle can replicate the generalization benefits of training multiple separate models within a single model architecture.

What would settle it

An experiment showing that Easy Ensemble produces substantially lower accuracy than a conventional ensemble of multiple independently trained models on the same benchmark HAR dataset would falsify the central claim.

Figures

Figures reproduced from arXiv: 2203.04153 by Kazuma Kondo, Tatsuhito Hasegawa.

Figure 1
Figure 1. Figure 1: EE and input variationer make it possible to easily use a deep ensemble learning model for HAR. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Model architecture of our proposed EE method and common ensemble of deep learning. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ensemble method (VGG architecture) versus HASC Accuracy. The number of filters in the first convolutional [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Model size versus HASC accuracy for each model. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: HAR accuracies for each public dataset [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation study (VGG architecture and HASC accuracy). The first letter denotes convolution type (G: group [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Model Size versus HASC Accuracy. Ensemble models are scaled up by increasing the number of ensembles [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Input masking to make input groups. R16 R4M4 R4A4 A4M4 Input type 0.80 0.82 0.84 0.86 test_acc [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: HASC accuracies for each input variation. [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: HAR accuracies by modality ensembles (VGG architecture). [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The effect of stepwise ensemble model (HASC acc. and VGG architecture). [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

Sensor-based human activity recognition (HAR) is a paramount technology in the Internet of Things services. HAR using representation learning, which automatically learns a feature representation from raw data, is the mainstream method because it is difficult to interpret relevant information from raw sensor data to design meaningful features. Ensemble learning is a robust approach to improve generalization performance; however, deep ensemble learning requires various procedures, such as data partitioning and training multiple models, which are time-consuming and computationally expensive. In this study, we propose Easy Ensemble (EE) for HAR, which enables the easy implementation of deep ensemble learning in a single model. In addition, we propose various techniques (input variationer, stepwise ensemble, and channel shuffle) for the EE. Experiments on a benchmark dataset for HAR demonstrated the effectiveness of EE and various techniques and their characteristics compared with conventional ensemble learning methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Easy Ensemble (EE), a method to realize deep ensemble learning for sensor-based human activity recognition (HAR) inside a single model architecture. It introduces three supporting techniques—an input variationer, stepwise ensemble, and channel shuffle—and reports benchmark experiments demonstrating that EE matches or exceeds the generalization performance of conventional multi-model ensembles while avoiding their training overhead.

Significance. If the central claim holds, EE would provide a practical, lower-cost route to ensemble-level robustness in HAR pipelines for IoT applications. The work supplies direct experimental comparisons against standard ensemble baselines on a public benchmark, which is a positive attribute for reproducibility and falsifiability.

major comments (2)
  1. [§4.1, Eq. (3)] §4.1 and Eq. (3): the input variationer is presented as the key mechanism for injecting ensemble-like diversity, yet the manuscript does not quantify the effective diversity (e.g., via prediction disagreement or feature-space variance) between the implicit sub-models; without this measurement the claim that EE replicates multi-model generalization rests on accuracy numbers alone.
  2. [Table 3] Table 3, final row: the reported F1-score improvement of EE over the single-model baseline is 1.8 percentage points, but no standard deviation across runs or statistical significance test is supplied; this weakens the assertion that the observed gain is reliably attributable to the ensemble mechanism rather than training stochasticity.
minor comments (2)
  1. [§3.3] The notation for the channel-shuffle operation in §3.3 is introduced without an accompanying diagram or pseudocode, making the exact tensor reshaping difficult to reconstruct from the prose description alone.
  2. [Figure 4] Figure 4 caption does not state the number of independent training runs used to generate the plotted curves.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recommendation. We address the two major comments below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [§4.1, Eq. (3)] §4.1 and Eq. (3): the input variationer is presented as the key mechanism for injecting ensemble-like diversity, yet the manuscript does not quantify the effective diversity (e.g., via prediction disagreement or feature-space variance) between the implicit sub-models; without this measurement the claim that EE replicates multi-model generalization rests on accuracy numbers alone.

    Authors: We agree that an explicit quantification of diversity would provide stronger support for the claim. In the revised version we will add an analysis section reporting prediction disagreement rates and feature-space variance (e.g., cosine distance between activations) across the implicit sub-models created by the input variationer, directly comparing these metrics to those obtained from a conventional multi-model ensemble. revision: yes

  2. Referee: [Table 3] Table 3, final row: the reported F1-score improvement of EE over the single-model baseline is 1.8 percentage points, but no standard deviation across runs or statistical significance test is supplied; this weakens the assertion that the observed gain is reliably attributable to the ensemble mechanism rather than training stochasticity.

    Authors: We acknowledge that reporting variability and significance strengthens the results. We will rerun all experiments with at least five random seeds, add standard deviations to Table 3, and include a statistical significance test (paired t-test) between EE and the single-model baseline. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes Easy Ensemble as a single-model approximation to deep ensembles via three new components (input variationer, stepwise ensemble, channel shuffle) and validates them on standard HAR benchmarks. No derivation reduces to a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation chain; the central claim is an empirical engineering contribution whose correctness is tested externally rather than assumed by construction. The provided abstract and description contain no equations or premises that collapse into their own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on any free parameters, axioms, or invented entities used in the work.

pith-pipeline@v0.9.0 · 5670 in / 1062 out tokens · 50635 ms · 2026-05-24T11:18:48.729144+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    C. Xu, D. Chai, J. He, X. Zhang, and S. Duan. Innohar: A deep neural network for complex human activity recognition. IEEE Access, 7:9893–9902, Jan. 2019. 13 Running Title for Header

  2. [2]

    Asymmetric residual neural network for accurate human activity recognition

    Jun Long, Wuqing Sun, Zhan Yang, and Osolo Ian Raymond. Asymmetric residual neural network for accurate human activity recognition. Information, 10(6), 2019

  3. [3]

    K. Wang, J. He, and L. Zhang. Attention-based convolutional neural network for weakly labeled human activities´recognition with wearable sensors. IEEE Sensors Journal, 19(17):7598–7604, Sep. 2019

  4. [4]

    Reiss and D

    A. Reiss and D. Stricker. Creating and benchmarking a new dataset for physical activity monitoring. In In Proc. of the 5th International Conference on PErvasive Technologies Related to Assistive Environments (PETRA), pages 40:1–40:8, 2012

  5. [5]

    Deepsense: A unified deep learning framework for time-series mobile sensing data processing

    Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proc. of the 26th International Conference on World Wide Web, page 351–360, 2017

  6. [6]

    Lane, Cecilia Mascolo, Mahesh K

    Valentin Radu, Catherine Tong, Sourav Bhattacharya, Nicholas D. Lane, Cecilia Mascolo, Mahesh K. Marina, and Fahim Kawsar. Multimodal deep learning for activity and context recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 1(4), jan 2018

  7. [7]

    Embracenet for activity: A deep multimodal fusion architecture for activity recognition

    Jun-Ho Choi and Jong-Seok Lee. Embracenet for activity: A deep multimodal fusion architecture for activity recognition. In Adjunct Proc. of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, page 693–698, 2019

  8. [8]

    Deep learning based multimodal complex human activity recognition using wearable devices

    Ling Chen, Xiaoze Liu, Liangying Peng, and Menghan Wu. Deep learning based multimodal complex human activity recognition using wearable devices. Applied Intelligence, 51:4029–4042, jun 2021

  9. [9]

    Mothernets: Rapid deep ensemble learning

    Abdul Wasay, Brian Hentschel, Yuze Liao, Sanyuan Chen, and Stratos Idreos. Mothernets: Rapid deep ensemble learning. In I. Dhillon, D. Papailiopoulos, and V . Sze, editors,Proc. of Machine Learning and Systems, volume 2, pages 199–215, 2020

  10. [10]

    Dammas, Rahaf D

    Abdulhamit Subasi, Dalia H. Dammas, Rahaf D. Alghamdi, Raghad A. Makawi, Eman A. Albiety, Tayeb Brahimi, and Akila Sarirete. Sensor based human activity recognition using adaboost ensemble classifier. Procedia Computer Science, 140:104–111, 2018

  11. [11]

    Naomi Irvine, Chris Nugent, Shuai Zhang, Hui Wang, and Wing W. Y . NG. Neural network ensembles for sensor-based human activity recognition within smart environments. Sensors, 20(1):1–26, 2020

  12. [12]

    Physique-based human activity recognition using ensemble learning and smartphone sensors

    Nurul Amin Choudhury, Soumen Moulik, and Diptendu Sinha Roy. Physique-based human activity recognition using ensemble learning and smartphone sensors. IEEE Sensors Journal, 21(15):16852–16860, 2021

  13. [13]

    A cascade ensemble learning model for human activity recognition with smartphones

    Shoujiang Xu, Qingfeng Tang, Linpeng Jin, and Zhigeng Pan. A cascade ensemble learning model for human activity recognition with smartphones. Sensors, 19(10):1–17, 2019

  14. [14]

    Efficient human activity recognition solving the confusing activities via deep ensemble learning.IEEE Access, 7:75490–75499, 2019

    Ran Zhu, Zhuoling Xiao, Ying Li, Mingkun Yang, Yawen Tan, Liang Zhou, Shuisheng Lin, and Hongkai Wen. Efficient human activity recognition solving the confusing activities via deep ensemble learning.IEEE Access, 7:75490–75499, 2019

  15. [15]

    An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition

    Anjali Gupta Vijay Bhaskar Semwal and Praveen Lalwani. An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition. The Journal of Supercomputing, pages 1–25, 2021

  16. [16]

    Why m heads are better than one: Training a diverse ensemble of deep networks, 2015

    Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, and Dhruv Batra. Why m heads are better than one: Training a diverse ensemble of deep networks, 2015

  17. [17]

    Batchensemble: an alternative approach to efficient ensemble and lifelong learning

    Yeming Wen, Dustin Tran, and Jimmy Ba. Batchensemble: an alternative approach to efficient ensemble and lifelong learning. In Proc. of the International Conference on Learning Representations, 2020

  18. [18]

    Training independent subnetworks for robust prediction

    Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, and Dustin Tran. Training independent subnetworks for robust prediction. In Proc. of the International Conference on Learning Representations, 2021

  19. [19]

    More or less: When and how to build convolutional neural network ensembles

    Abdul Wasay and Stratos Idreos. More or less: When and how to build convolutional neural network ensembles. In Proc. of the International Conference on Learning Representations, 2021

  20. [20]

    Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

    Zeyuan Allen-Zhu and Yuanzhi Li. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. arXiv, 2012.09816v2:1–70, Jul. 2021

  21. [21]

    Distilling the knowledge in a neural network

    Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In Proc. of the NIPS Deep Learning and Representation Learning Workshop, 2015

  22. [22]

    Aggregated residual transformations for deep neural networks

    Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5987–5995, 2017. 14 Running Title for Header

  23. [23]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

  24. [24]

    Group ensemble: Learning an ensemble of convnets in a single convnet, 2020

    Hao Chen and Abhinav Shrivastava. Group ensemble: Learning an ensemble of convnets in a single convnet, 2020

  25. [25]

    Simonyan and A

    K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. of the International Conference on Learning Representations, pages 1–14, May 2015

  26. [26]

    Xception: Deep learning with depthwise separable convolutions

    François Chollet. Xception: Deep learning with depthwise separable convolutions. InProc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1800–1807, 2017

  27. [27]

    Group normalization

    Yuxin Wu and Kaiming He. Group normalization. In Proc. of the European Conference on Computer Vision (ECCV), September 2018

  28. [28]

    Kawaguchi, N

    N. Kawaguchi, N. Ogawa, Y . Iwasaki, K. Kaji, T. Terada, K. Murao, S. Inoue, Y . Kawahara, Y . Sumi, and N. Nishio. Hasc challenge: Gathering large scale human activity corpus for the real-world activity understandings. In In Proc. of the 2nd Augmented Human International Conference, Mar. 2011

  29. [29]

    Rethinking the inception architecture for computer vision

    Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, 2016

  30. [30]

    Squeeze-and-excitation networks

    Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

  31. [31]

    Anguita, A

    D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. In In Proc. of the 21st European Symposium on Artificial Neural Networks (ESANN), pages 437–442, Apr. 2013

  32. [32]

    J. R. Kwapisz, G. M. Weiss, and S. A. Moore. Activity recognition using cell phone accelerometers. SIGKDD Explor. Newsl., 12(2):74–82, 2011

  33. [33]

    Mobilio D

    M. Mobilio D. Micucci and P. Napoletano. Unimib shar: A dataset for human activity recognition using acceleration data from smartphones. Apld. Sci.,, 7(10), 2017

  34. [34]

    Octave mix: Data augmentation using frequency decomposition for activity recognition

    Tatsuhito Hasegawa. Octave mix: Data augmentation using frequency decomposition for activity recognition. IEEE Access, 9:53679–53686, 2021

  35. [35]

    Zhang, M

    H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. In Proc. of the International Conference on Learning Representations, pages 1–13, Apr. 2018

  36. [36]

    Takahashi, T

    R. Takahashi, T. Matsubara, and K. Uehara. Ricap: Random image cropping and patchingdata augmentation for deep cnns. In Proc. of Mach. Lrn. Res., volume 95, pages 786–798, Apr. 2018

  37. [37]

    Randaugment: Practical automated data augmentation with a reduced search space

    Ekin Dogus Cubuk, Barret Zoph, Jon Shlens, and Quoc Le. Randaugment: Practical automated data augmentation with a reduced search space. In Advances in Neural Information Processing Systems, volume 33, pages 18613– 18624, 2020

  38. [38]

    Human activity recognition based on smartphone and wearable sensors using multiscale dcnn ensemble

    Jessica Sena, Jesimon Barreto, Carlos Caetano, Guilherme Cramer, and William Robson Schwartz. Human activity recognition based on smartphone and wearable sensors using multiscale dcnn ensemble. Neurocomputing, 444:226–243, 2021

  39. [39]

    En- semconvnet: a deep learning approach for human activity recognition using smartphone sensors for healthcare applications

    Debadyuti Mukherjee, Riktim Mondal, Pawan Kumar Singh, Ram Sarkar, and Debotosh Bhattacharjee. En- semconvnet: a deep learning approach for human activity recognition using smartphone sensors for healthcare applications. Multimedia Tools and Applications, 79:31663–31690, 11 2020

  40. [40]

    Multi-input cnn-gru based human activity recognition using wearable sensors

    Nidhi Dua, Shiva Singh, and Vijay Semwal. Multi-input cnn-gru based human activity recognition using wearable sensors. Computing, 103:1–18, 07 2021. 15