Meta-Ensemble Learning with Diverse Data Splits for Improved Respiratory Sound Classification

Doyoung Kwon; Heejoon Koo; June-Woo Kim; Kyunghoon Kim; Miika Toikkanen; Yoon Tae Kim

arxiv: 2604.24096 · v1 · submitted 2026-04-27 · 💻 cs.LG · cs.AI

Meta-Ensemble Learning with Diverse Data Splits for Improved Respiratory Sound Classification

June-Woo Kim , Miika Toikkanen , Heejoon Koo , Yoon Tae Kim , Doyoung Kwon , Kyunghoon Kim This is my paper

Pith reviewed 2026-05-08 04:31 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords respiratory sound classificationmeta-ensemble learningdata splitsICBHI benchmarkgeneralizationensemble diversity

0 comments

The pith

Training base models on four distinct data-split and granularity regimes lets a meta-learner reach 66.49 percent on ICBHI respiratory classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that ordinary ensembles overfit when every base model sees identical data, so the authors cross two split strategies with two granularity levels to produce four base-model variants. These variants generate less correlated predictions on the small ICBHI collection. A meta-model then learns to combine the four outputs. The resulting system sets a new benchmark score of 66.49 percent and performs better on two external test sets. The method matters because medical audio datasets are tiny and lack patient variety, making standard training brittle.

Core claim

By training base models on the ICBHI dataset using a fixed 80-20 percent split and five-fold cross-validation, each at both patient-level and sample-level granularity, then combining their outputs through a trained meta-model, the approach reaches a Score of 66.49 percent on the benchmark and improves generalization on two out-of-distribution datasets.

What carries the argument

The meta-model that learns to fuse predictions from the four base models trained under different split and granularity regimes.

If this is right

Ensemble effectiveness increases because base predictions are less correlated without new data.
The model maintains higher accuracy when test patients differ from those seen in training.
The method requires no extra labeled recordings beyond the original ICBHI collection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same four-regime split tactic could be tested on other small-scale medical audio or signal tasks.
If the meta-model adds little beyond simple averaging, a non-learned fusion rule might suffice and reduce complexity.
Adding further split variants or different granularities could be checked to see whether gains continue or plateau.

Load-bearing premise

The four chosen combinations of split strategy and granularity produce enough independent error patterns for the meta-model to discover a stable weighting rule rather than fitting noise from the training subjects.

What would settle it

A new respiratory-sound dataset on which all four base-model families produce highly correlated errors on held-out patients would show whether the induced diversity is sufficient.

Figures

Figures reproduced from arXiv: 2604.24096 by Doyoung Kwon, Heejoon Koo, June-Woo Kim, Kyunghoon Kim, Miika Toikkanen, Yoon Tae Kim.

**Figure 1.** Figure 1: Overview of the proposed meta-ensemble framework, where diverse view at source ↗

read the original abstract

Training reliable respiratory sound classification models remains challenging due to the limited size and subject diversity of datasets. Ensemble methods can improve robustness, but when base models are trained on identical data, models tend to overfit and produce highly correlated predictions, thereby reducing the effectiveness of ensembling. In this work, we investigate a meta-ensemble learning methodology that enhances prediction diversity by training base models on diverse data splits and combining their outputs through a trained meta-model. Specifically, we train base models on the ICBHI dataset using two data split settings: fixed 80-20% split and five-fold cross-validation split, under two data granularity settings: patient- and sample-level. The resulting diversity in base model predictions enables the meta-model to better generalize. Our approach achieves new state-of-the-art performance on the ICBHI benchmark, reaching a Score of 66.49% and showing improved generalization on two out-of-distribution datasets, indicating its potential applicability to real-world clinical data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The meta-ensemble hits 66.49% on ICBHI by mixing four split variants, but the OOD gains rest on thin evidence that the fusion rule transfers rather than memorizing dataset quirks.

read the letter

The paper trains base respiratory sound classifiers on the ICBHI set using fixed 80-20 splits and 5-fold CV, each done at both patient and sample level, then feeds those predictions into a meta-model. This produces the reported 66.49% Score and some improvement on two external datasets. The core move is using those split differences to reduce prediction correlation, which is a practical response to the small size and patient structure of medical audio data. That part is sensible and directly targets a known weakness of plain ensembles on the same corpus. The authors also keep the focus narrow, which keeps the work grounded. What is missing is any visible baseline table, ablation of the four settings, statistical tests, or even a sketch of the meta-model architecture. Without those, the numerical claim cannot be checked for whether the diversity actually drives the gain or whether other factors are at work. The stress-test point about correlated errors across all variants is worth taking seriously here, since every base model still comes from the identical ICBHI distribution; any shared recording or demographic bias could let the meta-learner exploit dataset-specific patterns instead of learning a robust combination rule. That directly weakens in the OOD results. This is the kind of incremental empirical tweak that people working on small medical audio benchmarks might want to try, but only after seeing the full methods and results sections. It is not yet strong enough on its own to change practice. I would send it for peer review so referees can examine the missing details and the OOD evaluation design, but I would not cite it until those gaps are closed.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a meta-ensemble approach for respiratory sound classification on the ICBHI dataset. Base models are trained under four combinations of data splits (fixed 80-20% vs. 5-fold cross-validation) and granularities (patient-level vs. sample-level) to induce prediction diversity; their outputs are then combined by a trained meta-model. The central claims are a new state-of-the-art Score of 66.49% on ICBHI together with improved generalization on two out-of-distribution datasets.

Significance. If the performance gains are shown to arise from the split-induced diversity rather than dataset-specific correlations, the method would provide a low-overhead route to more robust ensembles in small medical audio datasets. The explicit OOD evaluation is a positive feature for clinical relevance. The work is purely empirical and does not introduce new theoretical machinery.

major comments (3)

Abstract: The claim of a new state-of-the-art Score of 66.49% is presented without any numerical values for prior SOTA methods, for the four individual base models, or for a standard ensemble baseline. This omission makes it impossible to determine the size or source of the reported improvement and is load-bearing for the central performance claim.
Abstract and results sections: No ablation is reported that isolates the contribution of each split/granularity combination to the meta-model's OOD performance, nor are statistical significance tests (e.g., paired t-tests or McNemar) supplied for the claimed gains. Without these, the assertion that the four variants produce sufficiently diverse predictions for a generalizable fusion rule cannot be evaluated.
Abstract: The meta-model architecture, training procedure, and input feature construction are not described. Because the meta-learner is the component that must discover a transferable combination rule, the absence of these details prevents assessment of whether the reported OOD improvement is reproducible or merely an artifact of ICBHI-specific error correlations.

minor comments (1)

The term 'Score' is used without an explicit definition or reference to the ICBHI evaluation protocol; a one-sentence clarification would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of our results and methodology. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: Abstract: The claim of a new state-of-the-art Score of 66.49% is presented without any numerical values for prior SOTA methods, for the four individual base models, or for a standard ensemble baseline. This omission makes it impossible to determine the size or source of the reported improvement and is load-bearing for the central performance claim.

Authors: We agree that explicit numerical comparisons are necessary to substantiate the central claim. In the revised manuscript, we will update the abstract to report the scores of prior state-of-the-art methods, the four individual base models, and a standard ensemble baseline. A detailed comparison table will be added to the results section to quantify the magnitude and source of the improvement. revision: yes
Referee: Abstract and results sections: No ablation is reported that isolates the contribution of each split/granularity combination to the meta-model's OOD performance, nor are statistical significance tests (e.g., paired t-tests or McNemar) supplied for the claimed gains. Without these, the assertion that the four variants produce sufficiently diverse predictions for a generalizable fusion rule cannot be evaluated.

Authors: We acknowledge that ablations and statistical tests are required to rigorously support the role of split-induced diversity. We will add ablation studies isolating the contribution of each data split and granularity combination to OOD performance. Statistical significance tests, including McNemar's test, will be reported for the performance gains to confirm that the four variants yield sufficiently diverse predictions for effective meta-fusion. revision: yes
Referee: Abstract: The meta-model architecture, training procedure, and input feature construction are not described. Because the meta-learner is the component that must discover a transferable combination rule, the absence of these details prevents assessment of whether the reported OOD improvement is reproducible or merely an artifact of ICBHI-specific error correlations.

Authors: We will expand the methods section with a complete description of the meta-model architecture, training procedure, and input feature construction. Key details will be summarized in the abstract to enable readers to evaluate reproducibility and assess whether the OOD gains arise from a generalizable fusion rule rather than dataset-specific correlations. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical meta-ensemble with no derivations or self-referential reductions

full rationale

The paper describes an empirical pipeline: base classifiers trained on ICBHI under four split/granularity combinations, followed by a meta-model trained on their prediction vectors. No equations, uniqueness theorems, ansatzes, or derivations are presented that could reduce to fitted quantities by construction. Performance claims rest on held-out ICBHI scores and separate OOD evaluations rather than any self-definition or load-bearing self-citation chain. The reader's assessment of score 2.0 is consistent with minor normal self-citation in an ML methods paper; no load-bearing step collapses to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the claim rests on standard supervised learning assumptions and the public ICBHI dataset.

pith-pipeline@v0.9.0 · 5486 in / 1030 out tokens · 28970 ms · 2026-05-08T04:31:33.069412+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

[1]

Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning,

Y . Kim, Y . Hyon, S. S. Jung, S. Lee, G. Yoo, C. Chung, and T. Ha, “Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning,”Scientific reports, vol. 11, no. 1, pp. 1–11, 2021

work page 2021
[2]

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification,

S. Bae, J.-W. Kim, W.-Y . Cho, H. Baek, S. Son, B. Lee, C. Ha, K. Tae, S. Kim, and S.-Y . Yun, “Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification,” inProc. INTERSPEECH 2023, 2023, pp. 5436–5440

work page 2023
[3]

Exploring machine learning for audio- based respiratory condition screening: A concise review of databases, methods, and open issues,

T. Xia, J. Han, and C. Mascolo, “Exploring machine learning for audio- based respiratory condition screening: A concise review of databases, methods, and open issues,”Experimental Biology and Medicine, vol. 247, no. 22, pp. 2053–2061, 2022

work page 2053
[4]

Stethoscope- guided supervised contrastive learning for cross-domain adaptation on respiratory sound classification,

J.-W. Kim, S. Bae, W.-Y . Cho, B. Lee, and H.-Y . Jung, “Stethoscope- guided supervised contrastive learning for cross-domain adaptation on respiratory sound classification,” inICASSP 2024-2024 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 1431–1435

work page 2024
[5]

Adaptive metadata-guided supervised contrastive learning for domain adaptation on respiratory sound classi- fication,

J.-W. Kim, M. Toikkanen, A. Jalali, M. Kim, H.-J. Han, H. Kim, W. Shin, H.-Y . Jung, and K. Kim, “Adaptive metadata-guided supervised contrastive learning for domain adaptation on respiratory sound classi- fication,”IEEE Journal of Biomedical and Health Informatics, 2025

work page 2025
[6]

Em- powering multimodal respiratory sound classification with counterfactual adversarial debiasing for out-of-distribution robustness,

H. Koo, M. Toikkanen, Y . T. Kim, S. Y . Kim, and J.-W. Kim, “Em- powering multimodal respiratory sound classification with counterfactual adversarial debiasing for out-of-distribution robustness,” inICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2026, pp. 14 967–14 971

work page 2026
[7]

A respiratory sound database for the development of auto- mated classification,

B. Rocha, D. Filos, L. Mendes, I. V ogiatzis, E. Perantoni, E. Kaimakamis, P. Natsiavas, A. Oliveira, C. J ´acome, A. Marques et al., “A respiratory sound database for the development of auto- mated classification,” inPrecision Medicine Powered by pHealth and Connected Health: ICBHI 2017, Thessaloniki, Greece, 18-21 November

work page 2017
[8]

Springer, 2018, pp. 33–37

work page 2018
[9]

Lung sound classification using co- tuning and stochastic normalization,

T. Nguyen and F. Pernkopf, “Lung sound classification using co- tuning and stochastic normalization,”IEEE Transactions on Biomedical Engineering, vol. 69, no. 9, pp. 2872–2882, 2022

work page 2022
[10]

Afen: Respira- tory disease classification using ensemble learning,

R. Nadkarni, E. Nikolakakis, and R. Marinescu, “Afen: Respira- tory disease classification using ensemble learning,”arXiv preprint arXiv:2405.05467, 2024

work page arXiv 2024
[11]

Improving Respiratory Sound Classifi- cation with Architecture-Agnostic Knowledge Distillation from Ensem- bles,

M. Toikkanen and J.-W. Kim, “Improving Respiratory Sound Classifi- cation with Architecture-Agnostic Knowledge Distillation from Ensem- bles,” inInterspeech 2025, 2025, pp. 1023–1027

work page 2025
[12]

Diversity and general- ization in neural network ensembles,

L. A. Ortega, R. Caba ˜nas, and A. Masegosa, “Diversity and general- ization in neural network ensembles,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 11 720–11 743

work page 2022
[13]

A unified theory of diversity in ensemble learning,

D. Wood, T. Mu, A. M. Webb, H. W. Reeve, M. Lujan, and G. Brown, “A unified theory of diversity in ensemble learning,”Journal of machine learning research, vol. 24, no. 359, pp. 1–49, 2023

work page 2023
[14]

The relative performance of ensemble methods with deep convolutional neural networks for image classification,

C. Ju, A. Bibaut, and M. van der Laan, “The relative performance of ensemble methods with deep convolutional neural networks for image classification,”Journal of applied statistics, vol. 45, no. 15, pp. 2800– 2818, 2018

work page 2018
[15]

Diversity in search strategies for ensemble feature selection,

A. Tsymbal, M. Pechenizkiy, and P. Cunningham, “Diversity in search strategies for ensemble feature selection,”Information fusion, vol. 6, no. 1, pp. 83–98, 2005

work page 2005
[16]

Bts: Bridging text and sound modalities for metadata-aided respiratory sound classification,

J.-W. Kim, M. Toikkanen, Y . Choi, S.-E. Moon, and H.-Y . Jung, “Bts: Bridging text and sound modalities for metadata-aided respiratory sound classification,” inInterspeech 2024, 2024, pp. 1690–1694

work page 2024
[17]

Sprsound: Open-source sjtu paediatric respiratory sound database,

Q. Zhang, J. Zhang, J. Yuan, H. Huang, Y . Zhang, B. Zhang, G. Lv, S. Lin, N. Wang, X. Liuet al., “Sprsound: Open-source sjtu paediatric respiratory sound database,”IEEE Transactions on Biomedical Circuits and Systems, vol. 16, no. 5, pp. 867–881, 2022

work page 2022
[18]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

work page 2017
[19]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016
[20]

Efficientnet: Rethinking model scaling for con- volutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con- volutional neural networks,” inInternational conference on machine learning. PMLR, 2019, pp. 6105–6114

work page 2019
[21]

Panns: Large-scale pretrained audio neural networks for audio pattern recognition,

Q. Kong, Y . Cao, T. Iqbal, Y . Wang, W. Wang, and M. D. Plumbley, “Panns: Large-scale pretrained audio neural networks for audio pattern recognition,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020

work page 2020
[22]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255

work page 2009
[23]

Audio set: An ontology and human-labeled dataset for audio events,

J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” inProc. IEEE ICASSP 2017, New Orleans, LA, 2017

work page 2017
[24]

AST: Audio Spectrogram Trans- former,

Y . Gong, Y .-A. Chung, and J. Glass, “AST: Audio Spectrogram Trans- former,” inProc. Interspeech 2021, 2021, pp. 571–575

work page 2021
[25]

Specaugment: A simple data augmentation method for automatic speech recognition,

D. S. Park, W. Chan, Y . Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V . Le, “Specaugment: A simple data augmentation method for automatic speech recognition,”Interspeech 2019, Sep 2019

work page 2019
[26]

Repaug- ment: Input-agnostic representation-level augmentation for respiratory sound classification,

J.-W. Kim, M. Toikkanen, S. Bae, M. Kim, and H.-Y . Jung, “Repaug- ment: Input-agnostic representation-level augmentation for respiratory sound classification,” in2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2024, pp. 1–6

work page 2024
[27]

Adversarial fine-tuning using generated respiratory sound to address class imbal- ance,

J.-W. Kim, C. Yoon, M. Toikkanen, S. Bae, and H.-Y . Jung, “Adversarial fine-tuning using generated respiratory sound to address class imbal- ance,”arXiv preprint arXiv:2311.06480, 2023

work page arXiv 2023
[28]

Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,

Y . Wu, K. Chen, T. Zhang, Y . Hui, T. Berg-Kirkpatrick, and S. Dubnov, “Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,” inICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

work page 2023
[29]

Masked modeling duo: Towards a universal audio pre-training frame- work,

D. Niizumi, D. Takeuchi, Y . Ohishi, N. Harada, and K. Kashino, “Masked modeling duo: Towards a universal audio pre-training frame- work,”IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, 2024

work page 2024
[30]

Towards open respiratory acoustic foundation models: Pretraining and benchmarking,

Y . Zhang, T. Xia, J. Han, Y . Wu, G. Rizos, Y . Liu, M. Mosuily, J. Chauhan, and C. Mascolo, “Towards open respiratory acoustic foundation models: Pretraining and benchmarking,” in The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. [Online]. Available: https://openreview.net/forum?id=vXnGXRbOfb

work page 2024
[31]

Bagging predictors,

L. Breiman, “Bagging predictors,”Machine learning, vol. 24, pp. 123– 140, 1996

work page 1996
[32]

A decision-theoretic generalization of on-line learning and an application to boosting,

Y . Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,”Journal of computer and system sciences, vol. 55, no. 1, pp. 119–139, 1997

work page 1997
[33]

Stacked generalization,

D. H. Wolpert, “Stacked generalization,”Neural networks, vol. 5, no. 2, pp. 241–259, 1992

work page 1992
[34]

Towards inference efficient deep ensemble learning,

Z. Li, K. Ren, Y . Yang, X. Jiang, Y . Yang, and D. Li, “Towards inference efficient deep ensemble learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 7, 2023, pp. 8711– 8719

work page 2023
[35]

Ensemble methods in machine learning,

T. G. Dietterich, “Ensemble methods in machine learning,” inInterna- tional workshop on multiple classifier systems. Springer, 2000, pp. 1–15

work page 2000
[36]

Tri-mtl: A triple multitask learning approach for respiratory disease diagnosis,

J.-W. Kim, S. Lee, M. Toikkanen, D. Hwang, and K. Kim, “Tri-mtl: A triple multitask learning approach for respiratory disease diagnosis,” in 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2025, pp. 1–6

work page 2025
[37]

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis,

Y . Chang, Z. Ren, T. T. Nguyen, W. Nejdl, and B. W. Schuller, “Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis,” inProc. Interspeech 2022, 2022, pp. 4003–4007

work page 2022
[38]

Adventitious respiratory classification using attentive residual neural networks,

Z. Yang, S. Liu, M. Song, E. Parada-Cabaleiro, and B. W. Schuller, “Adventitious respiratory classification using attentive residual neural networks,” inInterspeech, 2020

work page 2020
[39]

Lungrn+ nl: An improved adventitious lung sound classification using non-local block resnet neural network with mixup data augmentation

Y . Ma, X. Xu, and Y . Li, “Lungrn+ nl: An improved adventitious lung sound classification using non-local block resnet neural network with mixup data augmentation.” inInterspeech, 2020, pp. 2902–2906

work page 2020

[1] [1]

Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning,

Y . Kim, Y . Hyon, S. S. Jung, S. Lee, G. Yoo, C. Chung, and T. Ha, “Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning,”Scientific reports, vol. 11, no. 1, pp. 1–11, 2021

work page 2021

[2] [2]

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification,

S. Bae, J.-W. Kim, W.-Y . Cho, H. Baek, S. Son, B. Lee, C. Ha, K. Tae, S. Kim, and S.-Y . Yun, “Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification,” inProc. INTERSPEECH 2023, 2023, pp. 5436–5440

work page 2023

[3] [3]

Exploring machine learning for audio- based respiratory condition screening: A concise review of databases, methods, and open issues,

T. Xia, J. Han, and C. Mascolo, “Exploring machine learning for audio- based respiratory condition screening: A concise review of databases, methods, and open issues,”Experimental Biology and Medicine, vol. 247, no. 22, pp. 2053–2061, 2022

work page 2053

[4] [4]

Stethoscope- guided supervised contrastive learning for cross-domain adaptation on respiratory sound classification,

J.-W. Kim, S. Bae, W.-Y . Cho, B. Lee, and H.-Y . Jung, “Stethoscope- guided supervised contrastive learning for cross-domain adaptation on respiratory sound classification,” inICASSP 2024-2024 IEEE In- ternational Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 1431–1435

work page 2024

[5] [5]

Adaptive metadata-guided supervised contrastive learning for domain adaptation on respiratory sound classi- fication,

J.-W. Kim, M. Toikkanen, A. Jalali, M. Kim, H.-J. Han, H. Kim, W. Shin, H.-Y . Jung, and K. Kim, “Adaptive metadata-guided supervised contrastive learning for domain adaptation on respiratory sound classi- fication,”IEEE Journal of Biomedical and Health Informatics, 2025

work page 2025

[6] [6]

Em- powering multimodal respiratory sound classification with counterfactual adversarial debiasing for out-of-distribution robustness,

H. Koo, M. Toikkanen, Y . T. Kim, S. Y . Kim, and J.-W. Kim, “Em- powering multimodal respiratory sound classification with counterfactual adversarial debiasing for out-of-distribution robustness,” inICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2026, pp. 14 967–14 971

work page 2026

[7] [7]

A respiratory sound database for the development of auto- mated classification,

B. Rocha, D. Filos, L. Mendes, I. V ogiatzis, E. Perantoni, E. Kaimakamis, P. Natsiavas, A. Oliveira, C. J ´acome, A. Marques et al., “A respiratory sound database for the development of auto- mated classification,” inPrecision Medicine Powered by pHealth and Connected Health: ICBHI 2017, Thessaloniki, Greece, 18-21 November

work page 2017

[8] [8]

Springer, 2018, pp. 33–37

work page 2018

[9] [9]

Lung sound classification using co- tuning and stochastic normalization,

T. Nguyen and F. Pernkopf, “Lung sound classification using co- tuning and stochastic normalization,”IEEE Transactions on Biomedical Engineering, vol. 69, no. 9, pp. 2872–2882, 2022

work page 2022

[10] [10]

Afen: Respira- tory disease classification using ensemble learning,

R. Nadkarni, E. Nikolakakis, and R. Marinescu, “Afen: Respira- tory disease classification using ensemble learning,”arXiv preprint arXiv:2405.05467, 2024

work page arXiv 2024

[11] [11]

Improving Respiratory Sound Classifi- cation with Architecture-Agnostic Knowledge Distillation from Ensem- bles,

M. Toikkanen and J.-W. Kim, “Improving Respiratory Sound Classifi- cation with Architecture-Agnostic Knowledge Distillation from Ensem- bles,” inInterspeech 2025, 2025, pp. 1023–1027

work page 2025

[12] [12]

Diversity and general- ization in neural network ensembles,

L. A. Ortega, R. Caba ˜nas, and A. Masegosa, “Diversity and general- ization in neural network ensembles,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 11 720–11 743

work page 2022

[13] [13]

A unified theory of diversity in ensemble learning,

D. Wood, T. Mu, A. M. Webb, H. W. Reeve, M. Lujan, and G. Brown, “A unified theory of diversity in ensemble learning,”Journal of machine learning research, vol. 24, no. 359, pp. 1–49, 2023

work page 2023

[14] [14]

The relative performance of ensemble methods with deep convolutional neural networks for image classification,

C. Ju, A. Bibaut, and M. van der Laan, “The relative performance of ensemble methods with deep convolutional neural networks for image classification,”Journal of applied statistics, vol. 45, no. 15, pp. 2800– 2818, 2018

work page 2018

[15] [15]

Diversity in search strategies for ensemble feature selection,

A. Tsymbal, M. Pechenizkiy, and P. Cunningham, “Diversity in search strategies for ensemble feature selection,”Information fusion, vol. 6, no. 1, pp. 83–98, 2005

work page 2005

[16] [16]

Bts: Bridging text and sound modalities for metadata-aided respiratory sound classification,

J.-W. Kim, M. Toikkanen, Y . Choi, S.-E. Moon, and H.-Y . Jung, “Bts: Bridging text and sound modalities for metadata-aided respiratory sound classification,” inInterspeech 2024, 2024, pp. 1690–1694

work page 2024

[17] [17]

Sprsound: Open-source sjtu paediatric respiratory sound database,

Q. Zhang, J. Zhang, J. Yuan, H. Huang, Y . Zhang, B. Zhang, G. Lv, S. Lin, N. Wang, X. Liuet al., “Sprsound: Open-source sjtu paediatric respiratory sound database,”IEEE Transactions on Biomedical Circuits and Systems, vol. 16, no. 5, pp. 867–881, 2022

work page 2022

[18] [18]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

work page 2017

[19] [19]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

work page 2016

[20] [20]

Efficientnet: Rethinking model scaling for con- volutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con- volutional neural networks,” inInternational conference on machine learning. PMLR, 2019, pp. 6105–6114

work page 2019

[21] [21]

Panns: Large-scale pretrained audio neural networks for audio pattern recognition,

Q. Kong, Y . Cao, T. Iqbal, Y . Wang, W. Wang, and M. D. Plumbley, “Panns: Large-scale pretrained audio neural networks for audio pattern recognition,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2880–2894, 2020

work page 2020

[22] [22]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255

work page 2009

[23] [23]

Audio set: An ontology and human-labeled dataset for audio events,

J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” inProc. IEEE ICASSP 2017, New Orleans, LA, 2017

work page 2017

[24] [24]

AST: Audio Spectrogram Trans- former,

Y . Gong, Y .-A. Chung, and J. Glass, “AST: Audio Spectrogram Trans- former,” inProc. Interspeech 2021, 2021, pp. 571–575

work page 2021

[25] [25]

Specaugment: A simple data augmentation method for automatic speech recognition,

D. S. Park, W. Chan, Y . Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V . Le, “Specaugment: A simple data augmentation method for automatic speech recognition,”Interspeech 2019, Sep 2019

work page 2019

[26] [26]

Repaug- ment: Input-agnostic representation-level augmentation for respiratory sound classification,

J.-W. Kim, M. Toikkanen, S. Bae, M. Kim, and H.-Y . Jung, “Repaug- ment: Input-agnostic representation-level augmentation for respiratory sound classification,” in2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2024, pp. 1–6

work page 2024

[27] [27]

Adversarial fine-tuning using generated respiratory sound to address class imbal- ance,

J.-W. Kim, C. Yoon, M. Toikkanen, S. Bae, and H.-Y . Jung, “Adversarial fine-tuning using generated respiratory sound to address class imbal- ance,”arXiv preprint arXiv:2311.06480, 2023

work page arXiv 2023

[28] [28]

Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,

Y . Wu, K. Chen, T. Zhang, Y . Hui, T. Berg-Kirkpatrick, and S. Dubnov, “Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,” inICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

work page 2023

[29] [29]

Masked modeling duo: Towards a universal audio pre-training frame- work,

D. Niizumi, D. Takeuchi, Y . Ohishi, N. Harada, and K. Kashino, “Masked modeling duo: Towards a universal audio pre-training frame- work,”IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, 2024

work page 2024

[30] [30]

Towards open respiratory acoustic foundation models: Pretraining and benchmarking,

Y . Zhang, T. Xia, J. Han, Y . Wu, G. Rizos, Y . Liu, M. Mosuily, J. Chauhan, and C. Mascolo, “Towards open respiratory acoustic foundation models: Pretraining and benchmarking,” in The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. [Online]. Available: https://openreview.net/forum?id=vXnGXRbOfb

work page 2024

[31] [31]

Bagging predictors,

L. Breiman, “Bagging predictors,”Machine learning, vol. 24, pp. 123– 140, 1996

work page 1996

[32] [32]

A decision-theoretic generalization of on-line learning and an application to boosting,

Y . Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,”Journal of computer and system sciences, vol. 55, no. 1, pp. 119–139, 1997

work page 1997

[33] [33]

Stacked generalization,

D. H. Wolpert, “Stacked generalization,”Neural networks, vol. 5, no. 2, pp. 241–259, 1992

work page 1992

[34] [34]

Towards inference efficient deep ensemble learning,

Z. Li, K. Ren, Y . Yang, X. Jiang, Y . Yang, and D. Li, “Towards inference efficient deep ensemble learning,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 7, 2023, pp. 8711– 8719

work page 2023

[35] [35]

Ensemble methods in machine learning,

T. G. Dietterich, “Ensemble methods in machine learning,” inInterna- tional workshop on multiple classifier systems. Springer, 2000, pp. 1–15

work page 2000

[36] [36]

Tri-mtl: A triple multitask learning approach for respiratory disease diagnosis,

J.-W. Kim, S. Lee, M. Toikkanen, D. Hwang, and K. Kim, “Tri-mtl: A triple multitask learning approach for respiratory disease diagnosis,” in 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2025, pp. 1–6

work page 2025

[37] [37]

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis,

Y . Chang, Z. Ren, T. T. Nguyen, W. Nejdl, and B. W. Schuller, “Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis,” inProc. Interspeech 2022, 2022, pp. 4003–4007

work page 2022

[38] [38]

Adventitious respiratory classification using attentive residual neural networks,

Z. Yang, S. Liu, M. Song, E. Parada-Cabaleiro, and B. W. Schuller, “Adventitious respiratory classification using attentive residual neural networks,” inInterspeech, 2020

work page 2020

[39] [39]

Lungrn+ nl: An improved adventitious lung sound classification using non-local block resnet neural network with mixup data augmentation

Y . Ma, X. Xu, and Y . Li, “Lungrn+ nl: An improved adventitious lung sound classification using non-local block resnet neural network with mixup data augmentation.” inInterspeech, 2020, pp. 2902–2906

work page 2020