pith. sign in

arxiv: 2604.24096 · v1 · submitted 2026-04-27 · 💻 cs.LG · cs.AI

Meta-Ensemble Learning with Diverse Data Splits for Improved Respiratory Sound Classification

Pith reviewed 2026-05-08 04:31 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords respiratory sound classificationmeta-ensemble learningdata splitsICBHI benchmarkgeneralizationensemble diversity
0
0 comments X

The pith

Training base models on four distinct data-split and granularity regimes lets a meta-learner reach 66.49 percent on ICBHI respiratory classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that ordinary ensembles overfit when every base model sees identical data, so the authors cross two split strategies with two granularity levels to produce four base-model variants. These variants generate less correlated predictions on the small ICBHI collection. A meta-model then learns to combine the four outputs. The resulting system sets a new benchmark score of 66.49 percent and performs better on two external test sets. The method matters because medical audio datasets are tiny and lack patient variety, making standard training brittle.

Core claim

By training base models on the ICBHI dataset using a fixed 80-20 percent split and five-fold cross-validation, each at both patient-level and sample-level granularity, then combining their outputs through a trained meta-model, the approach reaches a Score of 66.49 percent on the benchmark and improves generalization on two out-of-distribution datasets.

What carries the argument

The meta-model that learns to fuse predictions from the four base models trained under different split and granularity regimes.

If this is right

  • Ensemble effectiveness increases because base predictions are less correlated without new data.
  • The model maintains higher accuracy when test patients differ from those seen in training.
  • The method requires no extra labeled recordings beyond the original ICBHI collection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same four-regime split tactic could be tested on other small-scale medical audio or signal tasks.
  • If the meta-model adds little beyond simple averaging, a non-learned fusion rule might suffice and reduce complexity.
  • Adding further split variants or different granularities could be checked to see whether gains continue or plateau.

Load-bearing premise

The four chosen combinations of split strategy and granularity produce enough independent error patterns for the meta-model to discover a stable weighting rule rather than fitting noise from the training subjects.

What would settle it

A new respiratory-sound dataset on which all four base-model families produce highly correlated errors on held-out patients would show whether the induced diversity is sufficient.

Figures

Figures reproduced from arXiv: 2604.24096 by Doyoung Kwon, Heejoon Koo, June-Woo Kim, Kyunghoon Kim, Miika Toikkanen, Yoon Tae Kim.

Figure 1
Figure 1. Figure 1: Overview of the proposed meta-ensemble framework, where diverse view at source ↗
read the original abstract

Training reliable respiratory sound classification models remains challenging due to the limited size and subject diversity of datasets. Ensemble methods can improve robustness, but when base models are trained on identical data, models tend to overfit and produce highly correlated predictions, thereby reducing the effectiveness of ensembling. In this work, we investigate a meta-ensemble learning methodology that enhances prediction diversity by training base models on diverse data splits and combining their outputs through a trained meta-model. Specifically, we train base models on the ICBHI dataset using two data split settings: fixed 80-20% split and five-fold cross-validation split, under two data granularity settings: patient- and sample-level. The resulting diversity in base model predictions enables the meta-model to better generalize. Our approach achieves new state-of-the-art performance on the ICBHI benchmark, reaching a Score of 66.49% and showing improved generalization on two out-of-distribution datasets, indicating its potential applicability to real-world clinical data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a meta-ensemble approach for respiratory sound classification on the ICBHI dataset. Base models are trained under four combinations of data splits (fixed 80-20% vs. 5-fold cross-validation) and granularities (patient-level vs. sample-level) to induce prediction diversity; their outputs are then combined by a trained meta-model. The central claims are a new state-of-the-art Score of 66.49% on ICBHI together with improved generalization on two out-of-distribution datasets.

Significance. If the performance gains are shown to arise from the split-induced diversity rather than dataset-specific correlations, the method would provide a low-overhead route to more robust ensembles in small medical audio datasets. The explicit OOD evaluation is a positive feature for clinical relevance. The work is purely empirical and does not introduce new theoretical machinery.

major comments (3)
  1. Abstract: The claim of a new state-of-the-art Score of 66.49% is presented without any numerical values for prior SOTA methods, for the four individual base models, or for a standard ensemble baseline. This omission makes it impossible to determine the size or source of the reported improvement and is load-bearing for the central performance claim.
  2. Abstract and results sections: No ablation is reported that isolates the contribution of each split/granularity combination to the meta-model's OOD performance, nor are statistical significance tests (e.g., paired t-tests or McNemar) supplied for the claimed gains. Without these, the assertion that the four variants produce sufficiently diverse predictions for a generalizable fusion rule cannot be evaluated.
  3. Abstract: The meta-model architecture, training procedure, and input feature construction are not described. Because the meta-learner is the component that must discover a transferable combination rule, the absence of these details prevents assessment of whether the reported OOD improvement is reproducible or merely an artifact of ICBHI-specific error correlations.
minor comments (1)
  1. The term 'Score' is used without an explicit definition or reference to the ICBHI evaluation protocol; a one-sentence clarification would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of our results and methodology. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: Abstract: The claim of a new state-of-the-art Score of 66.49% is presented without any numerical values for prior SOTA methods, for the four individual base models, or for a standard ensemble baseline. This omission makes it impossible to determine the size or source of the reported improvement and is load-bearing for the central performance claim.

    Authors: We agree that explicit numerical comparisons are necessary to substantiate the central claim. In the revised manuscript, we will update the abstract to report the scores of prior state-of-the-art methods, the four individual base models, and a standard ensemble baseline. A detailed comparison table will be added to the results section to quantify the magnitude and source of the improvement. revision: yes

  2. Referee: Abstract and results sections: No ablation is reported that isolates the contribution of each split/granularity combination to the meta-model's OOD performance, nor are statistical significance tests (e.g., paired t-tests or McNemar) supplied for the claimed gains. Without these, the assertion that the four variants produce sufficiently diverse predictions for a generalizable fusion rule cannot be evaluated.

    Authors: We acknowledge that ablations and statistical tests are required to rigorously support the role of split-induced diversity. We will add ablation studies isolating the contribution of each data split and granularity combination to OOD performance. Statistical significance tests, including McNemar's test, will be reported for the performance gains to confirm that the four variants yield sufficiently diverse predictions for effective meta-fusion. revision: yes

  3. Referee: Abstract: The meta-model architecture, training procedure, and input feature construction are not described. Because the meta-learner is the component that must discover a transferable combination rule, the absence of these details prevents assessment of whether the reported OOD improvement is reproducible or merely an artifact of ICBHI-specific error correlations.

    Authors: We will expand the methods section with a complete description of the meta-model architecture, training procedure, and input feature construction. Key details will be summarized in the abstract to enable readers to evaluate reproducibility and assess whether the OOD gains arise from a generalizable fusion rule rather than dataset-specific correlations. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical meta-ensemble with no derivations or self-referential reductions

full rationale

The paper describes an empirical pipeline: base classifiers trained on ICBHI under four split/granularity combinations, followed by a meta-model trained on their prediction vectors. No equations, uniqueness theorems, ansatzes, or derivations are presented that could reduce to fitted quantities by construction. Performance claims rest on held-out ICBHI scores and separate OOD evaluations rather than any self-definition or load-bearing self-citation chain. The reader's assessment of score 2.0 is consistent with minor normal self-citation in an ML methods paper; no load-bearing step collapses to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the claim rests on standard supervised learning assumptions and the public ICBHI dataset.

pith-pipeline@v0.9.0 · 5486 in / 1030 out tokens · 28970 ms · 2026-05-08T04:31:33.069412+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning,

    Y . Kim, Y . Hyon, S. S. Jung, S. Lee, G. Yoo, C. Chung, and T. Ha, “Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning,”Scientific reports, vol. 11, no. 1, pp. 1–11, 2021

  2. [2]

    Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification,

    S. Bae, J.-W. Kim, W.-Y . Cho, H. Baek, S. Son, B. Lee, C. Ha, K. Tae, S. Kim, and S.-Y . Yun, “Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification,” inProc. INTERSPEECH 2023, 2023, pp. 5436–5440

  3. [3]

    Exploring machine learning for audio- based respiratory condition screening: A concise review of databases, methods, and open issues,

    T. Xia, J. Han, and C. Mascolo, “Exploring machine learning for audio- based respiratory condition screening: A concise review of databases, methods, and open issues,”Experimental Biology and Medicine, vol. 247, no. 22, pp. 2053–2061, 2022

  4. [4]

    Stethoscope- guided supervised contrastive learning for cross-domain adaptation on respiratory sound classification,

    J.-W. Kim, S. Bae, W.-Y . Cho, B. Lee, and H.-Y . Jung, “Stethoscope- guided supervised contrastive learning for cross-domain adaptation on respiratory sound classification,” inICASSP 2024-2024 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 1431–1435

  5. [5]

    Adaptive metadata-guided supervised contrastive learning for domain adaptation on respiratory sound classi- fication,

    J.-W. Kim, M. Toikkanen, A. Jalali, M. Kim, H.-J. Han, H. Kim, W. Shin, H.-Y . Jung, and K. Kim, “Adaptive metadata-guided supervised contrastive learning for domain adaptation on respiratory sound classi- fication,”IEEE Journal of Biomedical and Health Informatics, 2025

  6. [6]

    Em- powering multimodal respiratory sound classification with counterfactual adversarial debiasing for out-of-distribution robustness,

    H. Koo, M. Toikkanen, Y . T. Kim, S. Y . Kim, and J.-W. Kim, “Em- powering multimodal respiratory sound classification with counterfactual adversarial debiasing for out-of-distribution robustness,” inICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2026, pp. 14 967–14 971

  7. [7]

    A respiratory sound database for the development of auto- mated classification,

    B. Rocha, D. Filos, L. Mendes, I. V ogiatzis, E. Perantoni, E. Kaimakamis, P. Natsiavas, A. Oliveira, C. J ´acome, A. Marques et al., “A respiratory sound database for the development of auto- mated classification,” inPrecision Medicine Powered by pHealth and Connected Health: ICBHI 2017, Thessaloniki, Greece, 18-21 November

  8. [8]

    Springer, 2018, pp. 33–37

  9. [9]

    Lung sound classification using co- tuning and stochastic normalization,

    T. Nguyen and F. Pernkopf, “Lung sound classification using co- tuning and stochastic normalization,”IEEE Transactions on Biomedical Engineering, vol. 69, no. 9, pp. 2872–2882, 2022

  10. [10]

    Afen: Respira- tory disease classification using ensemble learning,

    R. Nadkarni, E. Nikolakakis, and R. Marinescu, “Afen: Respira- tory disease classification using ensemble learning,”arXiv preprint arXiv:2405.05467, 2024

  11. [11]

    Improving Respiratory Sound Classifi- cation with Architecture-Agnostic Knowledge Distillation from Ensem- bles,

    M. Toikkanen and J.-W. Kim, “Improving Respiratory Sound Classifi- cation with Architecture-Agnostic Knowledge Distillation from Ensem- bles,” inInterspeech 2025, 2025, pp. 1023–1027

  12. [12]

    Diversity and general- ization in neural network ensembles,

    L. A. Ortega, R. Caba ˜nas, and A. Masegosa, “Diversity and general- ization in neural network ensembles,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 11 720–11 743

  13. [13]

    A unified theory of diversity in ensemble learning,

    D. Wood, T. Mu, A. M. Webb, H. W. Reeve, M. Lujan, and G. Brown, “A unified theory of diversity in ensemble learning,”Journal of machine learning research, vol. 24, no. 359, pp. 1–49, 2023

  14. [14]

    The relative performance of ensemble methods with deep convolutional neural networks for image classification,

    C. Ju, A. Bibaut, and M. van der Laan, “The relative performance of ensemble methods with deep convolutional neural networks for image classification,”Journal of applied statistics, vol. 45, no. 15, pp. 2800– 2818, 2018

  15. [15]

    Diversity in search strategies for ensemble feature selection,

    A. Tsymbal, M. Pechenizkiy, and P. Cunningham, “Diversity in search strategies for ensemble feature selection,”Information fusion, vol. 6, no. 1, pp. 83–98, 2005

  16. [16]

    Bts: Bridging text and sound modalities for metadata-aided respiratory sound classification,

    J.-W. Kim, M. Toikkanen, Y . Choi, S.-E. Moon, and H.-Y . Jung, “Bts: Bridging text and sound modalities for metadata-aided respiratory sound classification,” inInterspeech 2024, 2024, pp. 1690–1694

  17. [17]

    Sprsound: Open-source sjtu paediatric respiratory sound database,

    Q. Zhang, J. Zhang, J. Yuan, H. Huang, Y . Zhang, B. Zhang, G. Lv, S. Lin, N. Wang, X. Liuet al., “Sprsound: Open-source sjtu paediatric respiratory sound database,”IEEE Transactions on Biomedical Circuits and Systems, vol. 16, no. 5, pp. 867–881, 2022

  18. [18]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

  19. [19]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  20. [20]

    Efficientnet: Rethinking model scaling for con- volutional neural networks,

    M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con- volutional neural networks,” inInternational conference on machine learning. PMLR, 2019, pp. 6105–6114

  21. [21]

    Panns: Large-scale pretrained audio neural networks for audio pattern recognition,

    Q. Kong, Y . Cao, T. Iqbal, Y . Wang, W. Wang, and M. D. Plumbley, “Panns: Large-scale pretrained audio neural networks for audio pattern recognition,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020

  22. [22]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255

  23. [23]

    Audio set: An ontology and human-labeled dataset for audio events,

    J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” inProc. IEEE ICASSP 2017, New Orleans, LA, 2017

  24. [24]

    AST: Audio Spectrogram Trans- former,

    Y . Gong, Y .-A. Chung, and J. Glass, “AST: Audio Spectrogram Trans- former,” inProc. Interspeech 2021, 2021, pp. 571–575

  25. [25]

    Specaugment: A simple data augmentation method for automatic speech recognition,

    D. S. Park, W. Chan, Y . Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V . Le, “Specaugment: A simple data augmentation method for automatic speech recognition,”Interspeech 2019, Sep 2019

  26. [26]

    Repaug- ment: Input-agnostic representation-level augmentation for respiratory sound classification,

    J.-W. Kim, M. Toikkanen, S. Bae, M. Kim, and H.-Y . Jung, “Repaug- ment: Input-agnostic representation-level augmentation for respiratory sound classification,” in2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2024, pp. 1–6

  27. [27]

    Adversarial fine-tuning using generated respiratory sound to address class imbal- ance,

    J.-W. Kim, C. Yoon, M. Toikkanen, S. Bae, and H.-Y . Jung, “Adversarial fine-tuning using generated respiratory sound to address class imbal- ance,”arXiv preprint arXiv:2311.06480, 2023

  28. [28]

    Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,

    Y . Wu, K. Chen, T. Zhang, Y . Hui, T. Berg-Kirkpatrick, and S. Dubnov, “Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,” inICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

  29. [29]

    Masked modeling duo: Towards a universal audio pre-training frame- work,

    D. Niizumi, D. Takeuchi, Y . Ohishi, N. Harada, and K. Kashino, “Masked modeling duo: Towards a universal audio pre-training frame- work,”IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, 2024

  30. [30]

    Towards open respiratory acoustic foundation models: Pretraining and benchmarking,

    Y . Zhang, T. Xia, J. Han, Y . Wu, G. Rizos, Y . Liu, M. Mosuily, J. Chauhan, and C. Mascolo, “Towards open respiratory acoustic foundation models: Pretraining and benchmarking,” in The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. [Online]. Available: https://openreview.net/forum?id=vXnGXRbOfb

  31. [31]

    Bagging predictors,

    L. Breiman, “Bagging predictors,”Machine learning, vol. 24, pp. 123– 140, 1996

  32. [32]

    A decision-theoretic generalization of on-line learning and an application to boosting,

    Y . Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,”Journal of computer and system sciences, vol. 55, no. 1, pp. 119–139, 1997

  33. [33]

    Stacked generalization,

    D. H. Wolpert, “Stacked generalization,”Neural networks, vol. 5, no. 2, pp. 241–259, 1992

  34. [34]

    Towards inference efficient deep ensemble learning,

    Z. Li, K. Ren, Y . Yang, X. Jiang, Y . Yang, and D. Li, “Towards inference efficient deep ensemble learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 7, 2023, pp. 8711– 8719

  35. [35]

    Ensemble methods in machine learning,

    T. G. Dietterich, “Ensemble methods in machine learning,” inInterna- tional workshop on multiple classifier systems. Springer, 2000, pp. 1–15

  36. [36]

    Tri-mtl: A triple multitask learning approach for respiratory disease diagnosis,

    J.-W. Kim, S. Lee, M. Toikkanen, D. Hwang, and K. Kim, “Tri-mtl: A triple multitask learning approach for respiratory disease diagnosis,” in 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2025, pp. 1–6

  37. [37]

    Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis,

    Y . Chang, Z. Ren, T. T. Nguyen, W. Nejdl, and B. W. Schuller, “Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis,” inProc. Interspeech 2022, 2022, pp. 4003–4007

  38. [38]

    Adventitious respiratory classification using attentive residual neural networks,

    Z. Yang, S. Liu, M. Song, E. Parada-Cabaleiro, and B. W. Schuller, “Adventitious respiratory classification using attentive residual neural networks,” inInterspeech, 2020

  39. [39]

    Lungrn+ nl: An improved adventitious lung sound classification using non-local block resnet neural network with mixup data augmentation

    Y . Ma, X. Xu, and Y . Li, “Lungrn+ nl: An improved adventitious lung sound classification using non-local block resnet neural network with mixup data augmentation.” inInterspeech, 2020, pp. 2902–2906