From Signals to Patterns: Non-Invasive Tuberculosis Detection from Cough Audio using Bandit Weighted Hyperbolic Prototypes
Pith reviewed 2026-06-27 02:00 UTC · model grok-4.3
The pith
A fusion framework called COBALT combines spectral and foundation audio features via hyperbolic prototypes and bandit weighting to improve tuberculosis detection from cough sounds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
COBALT fuses heterogeneous audio representations using codebook-aligned hyperbolic prototypes and bandit-style reliability weighting, consistently outperforming individual representations and a concatenation baseline on the CODA TB DREAM Challenge benchmark and achieving the best overall performance when fusing MFCC with PaSST to establish a new state-of-the-art.
What carries the argument
COBALT framework based on codebook-aligned hyperbolic prototypes and bandit-style reliability weighting to integrate heterogeneous representations.
If this is right
- The fusion reveals complementary strengths between spectral features that preserve short-time acoustic detail and foundation embeddings that capture higher-level temporal patterns.
- COBALT outperforms both individual representations and a concatenation baseline across the benchmark.
- The strongest results occur when MFCC is fused with PaSST, establishing a new state-of-the-art on the CODA TB DREAM Challenge.
- The framework provides an effective way to integrate heterogeneous audio representations through hyperbolic geometry and adaptive weighting.
Where Pith is reading between the lines
- The same fusion approach could be tested on audio recordings for other respiratory conditions such as pneumonia or asthma.
- Hyperbolic space may better preserve hierarchical relationships among cough event patterns than standard Euclidean fusion methods.
- Bandit-style weighting might prove useful for handling variable recording quality in field deployments with different microphones.
- Applying the method to cough data from multiple geographic regions would test whether the benchmark gains translate to broader populations.
Load-bearing premise
The CODA TB DREAM Challenge benchmark is sufficiently representative of real-world cough variability and the observed performance gain will generalize beyond the specific test split used.
What would settle it
An experiment on a new cough audio dataset collected under different conditions where the COBALT fusion shows no improvement over the concatenation baseline or the best single representation would falsify the central claim.
Figures
read the original abstract
In this study, we focus on cough-based tuberculosis screening (CBTS) and hypothesize that fusing speech/audio foundation representations with spectral descriptors will yield stronger screening performance. We expect this fusion to reveal complementary strengths: spectral features preserve fine-grained short-time acoustic detail in cough signals, while foundation embeddings capture higher-level temporal and event-level patterns learned from large-scale pretraining. To this end, we propose COBALT, a novel fusion framework based on codebook-aligned hyperbolic prototypes and bandit-style reliability weighting to integrate heterogeneous representations effectively. Using the CODA TB DREAM Challenge benchmark, COBALT consistently outperforms individual representations and a concatenation baseline, achieving the best overall performance when fusing MFCC with PaSST thereby establishing a new state-of-the-art on the benchmark.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes COBALT, a fusion framework for cough-based tuberculosis screening that integrates MFCC spectral features with PaSST foundation embeddings via codebook-aligned hyperbolic prototypes and bandit-style reliability weighting. On the CODA TB DREAM Challenge benchmark, the MFCC+PaSST fusion is reported to outperform both individual representations and a simple concatenation baseline, thereby claiming a new state-of-the-art on the benchmark.
Significance. If the performance gains are reproducible and the method is shown to be robust, the work would offer a concrete advance in non-invasive TB detection by demonstrating how low-level acoustic detail and high-level pretrained patterns can be fused effectively in hyperbolic space. This could support scalable screening applications, provided the benchmark results translate beyond the specific test split.
major comments (2)
- [Abstract] Abstract: The claim of new state-of-the-art performance on the CODA TB DREAM Challenge benchmark rests entirely on results from the benchmark's designated test split. No cross-validation across alternative splits, external cohorts, or metrics addressing distribution shift (demographics, recording devices, ambient noise, disease stage) are referenced, which directly limits the strength of the broader claim that the approach advances real-world cough-based TB screening.
- [Abstract] Abstract / Results: The reported outperformance of the bandit-weighted hyperbolic fusion over concatenation and single representations is presented without accompanying numerical metrics, confidence intervals, statistical tests, or ablation tables that would allow verification that the gains are attributable to the proposed components rather than post-hoc selection or dataset-specific effects.
minor comments (1)
- [Abstract] The abstract would be strengthened by inclusion of the actual performance numbers (e.g., AUC or sensitivity/specificity) that support the SOTA claim.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major comment below and will revise the manuscript accordingly to improve clarity and strengthen the presentation of results.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim of new state-of-the-art performance on the CODA TB DREAM Challenge benchmark rests entirely on results from the benchmark's designated test split. No cross-validation across alternative splits, external cohorts, or metrics addressing distribution shift (demographics, recording devices, ambient noise, disease stage) are referenced, which directly limits the strength of the broader claim that the approach advances real-world cough-based TB screening.
Authors: We acknowledge that the primary results are reported on the challenge's fixed test split, as is standard for DREAM Challenge benchmarks to ensure fair and reproducible comparisons across methods. The manuscript does not include cross-validation on alternative splits or external cohorts because the challenge data release is structured around this single held-out test set. We agree this constrains stronger claims about robustness to distribution shift. In the revised manuscript, we will add an explicit limitations paragraph discussing potential demographic, device, and environmental shifts, and we will report any available internal validation metrics from the training portion if they can be computed without violating challenge rules. revision: yes
-
Referee: [Abstract] Abstract / Results: The reported outperformance of the bandit-weighted hyperbolic fusion over concatenation and single representations is presented without accompanying numerical metrics, confidence intervals, statistical tests, or ablation tables that would allow verification that the gains are attributable to the proposed components rather than post-hoc selection or dataset-specific effects.
Authors: The full manuscript contains numerical results, ablation studies, and comparisons in the Results section, but the abstract summarizes them qualitatively. We agree that including concrete metrics in the abstract would improve verifiability. In the revision we will update the abstract to report key performance numbers (e.g., AUC or equivalent metric with confidence intervals where available) and will ensure the main text explicitly presents statistical comparisons and component ablations so readers can assess the contribution of the hyperbolic prototypes and bandit weighting. revision: yes
Circularity Check
No circularity: empirical SOTA claim on external benchmark with no visible derivation chain
full rationale
The provided abstract and context contain no equations, parameter-fitting steps, or self-citations that reduce the reported performance to a tautology or input by construction. The central claim is a direct empirical comparison (COBALT vs. individuals and concatenation baseline) on the held-out CODA TB DREAM Challenge test split. This is a standard falsifiable ML benchmark result with no self-definitional, fitted-prediction, or uniqueness-imported structure visible. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Introduction Tuberculosis (TB), a major infectious disease, often presents with persistent cough and other respiratory symptoms—signals that are not only clinically salient for screening but also re- flect underlying pulmonary pathology and disease burden [1]. Accordingly, CBTS triage has attracted growing interest as a rapid, low-cost screening route; ho...
Pith/arXiv arXiv 2026
-
[2]
Representations In this section, we outline the pretrained representation models and handcrafted spectral features used in our experiments. Pretrained representation models:PaSST 2 [12] is a spec- trogram Transformer that adapts ViT-style patch tokenization to audio and employs patch masking for regularization; we use the PaSST-S variant (87M). Whisper 3 ...
-
[3]
Modeling 3.1. Proposed framework :COBALT We proposeCOBALT(Codebook-Aligned Bandit-weighted pro- totypE fusion Learning) for fusing dual pretrained-model (PTM) representations for cough-based tuberculosis detection. The overall architecture ofCOBALTis shown in Figure 1. Dual-stream representation encoding:Letxdenote an input cough sample andBthe batch size...
-
[4]
Benchmark Dataset Our study is grounded in experiments on the CODA TB[16] 8 benchmark dataset released through the CODA TB DREAM Challenge
Experiment 4.1. Benchmark Dataset Our study is grounded in experiments on the CODA TB[16] 8 benchmark dataset released through the CODA TB DREAM Challenge. It consists of solicited cough audio from adult par- ticipants (≥18 years) presenting with respiratory symptoms, collected at outpatient health centers spanning seven countries (India, Madagascar, the ...
-
[5]
At the same time, spectral features remain competitive and provide complementary information, motivat- ing heterogeneous fusion
Conclusion In this study, we present a systematic benchmark for cough- based tuberculosis screening (CBTS) and show that large pre- trained acoustic encoders are strong representation learners for this task, with PaSST emerging as the most effective single en- coder in our setting. At the same time, spectral features remain competitive and provide complem...
-
[6]
Acknowledgements The authors gratefully acknowledge the support of the United States–Ireland–Northern Ireland R&D Partnership Programme (USI-207), and access to the Tier 2 High-Performance Com- puting resources from the Northern Ireland High Performance Computing (NI-HPC) service funded by EPSRC (EP/T022175)
-
[7]
They did not contribute to the development of scientific ideas, the design or execution of analyses, the production of re- sults, or the interpretation of findings
Generative AI Use Disclosure AI assistants were used only to refine grammar, enhance clarity, and improve the manuscript’s overall readability and presenta- tion. They did not contribute to the development of scientific ideas, the design or execution of analyses, the production of re- sults, or the interpretation of findings. The authors assume full respo...
-
[8]
Global tuberculosis report 2025,
World Health Organization, “Global tuberculosis report 2025,” Geneva, Switzerland, 2025
2025
-
[9]
A systematic review of delay in the diagnosis and treatment of tuberculosis,
D. G. Storla, S. Yimer, and G. A. Bjune, “A systematic review of delay in the diagnosis and treatment of tuberculosis,”BMC public health, vol. 8, no. 1, p. 15, 2008
2008
-
[10]
Guidance for studies evaluating the accuracy of tuberculosis triage tests,
R. R. Nathavitharana, C. Yoon, P. Macpherson, D. W. Dowdy, A. Cattamanchi, A. Somoskovi, T. Broger, T. H. M. Otten- hoff, N. Arinaminpathy, K. L ¨onnroth, K. Reither, F. Cobelens, C. Gilpin, C. M. Denkinger, and S. G. Schumacher, “Guidance for studies evaluating the accuracy of tuberculosis triage tests,” The Journal of Infectious Diseases, vol. 220, no. ...
2019
-
[11]
Towards Pre-training an Effective Respiratory Audio Foundation Model,
D. Niizumi, D. Takeuchi, M. Yasuda, B. T. Nguyen, Y . Ohishi, and N. Harada, “Towards Pre-training an Effective Respiratory Audio Foundation Model,” inInterspeech 2025, 2025, pp. 998–1002
2025
-
[12]
Accelerating cough-based algorithms for pulmonary tuberculosis screening: Results from the coda tb dream challenge,
D. Jaganath, S. K. Sieberts, M. Raberahona, S. Huddart, L. Omberg, R. Rakotoarivelo, I. Lyimo, O. Lweno, D. J. Christo- pher, N. V . Nhunget al., “Accelerating cough-based algorithms for pulmonary tuberculosis screening: Results from the coda tb dream challenge,” inOpen Forum Infectious Diseases, vol. 12, no. 10. Oxford University Press US, 2025, p. ofaf572
2025
-
[13]
Considerations and challenges for real-world deployment of an acoustic-based covid- 19 screening system,
D. Grant, I. McLane, V . Rennoll, and J. West, “Considerations and challenges for real-world deployment of an acoustic-based covid- 19 screening system,”Sensors, vol. 22, no. 23, p. 9530, 2022
2022
-
[14]
Automatic cough classification for tuberculosis screening in a real-world environment,
M. Pahar, M. Klopper, B. Reeve, R. Warren, G. Theron, and T. Niesler, “Automatic cough classification for tuberculosis screening in a real-world environment,”Physiological measure- ment, vol. 42, no. 10, p. 105014, 2021
2021
-
[15]
TB or not TB? Acous- tic cough analysis for tuberculosis classification,
G. T. Frost, G. Theron, and T. Niesler, “TB or not TB? Acous- tic cough analysis for tuberculosis classification,” inInterspeech 2022, 2022, pp. 2448–2452
2022
-
[16]
Detection of tuberculosis using cough audio analysis: a deep learning approach with capsule networks,
S. J. S. Rajasekar, A. R. Balaraman, D. V . Balaraman, S. Mo- hamed Ali, K. Narasimhan, N. Krishnasamy, and V . Perumal, “Detection of tuberculosis using cough audio analysis: a deep learning approach with capsule networks,”Discover Artificial In- telligence, vol. 4, no. 1, p. 77, 2024
2024
-
[17]
Non- Invasive TB Detection using Acoustic and Semantic Features from Cough Sounds ,
Y . Akhter, R. Ranjan, B. Dutta, M. Vatsa, and R. Singh, “ Non- Invasive TB Detection using Acoustic and Semantic Features from Cough Sounds ,” inProceedings of Medical Image Com- puting and Computer Assisted Intervention (MICCAI) 2025, vol. LNCS 15960. Springer Nature Switzerland, September 2025, pp. 460–470
2025
-
[18]
Tbscreen: A pas- sive cough classifier for tuberculosis screening with a controlled dataset,
M. Sharma, V . Nduba, L. N. Njagi, W. Murithi, Z. Mwongera, T. R. Hawn, S. N. Patel, and D. J. Horne, “Tbscreen: A pas- sive cough classifier for tuberculosis screening with a controlled dataset,”Science Advances, vol. 10, no. 1, p. eadi0282, 2024
2024
-
[19]
Ef- ficient Training of Audio Transformers with Patchout,
K. Koutini, J. Schl ¨uter, H. Eghbal-zadeh, and G. Widmer, “Ef- ficient Training of Audio Transformers with Patchout,” inInter- speech 2022, 2022, pp. 2753–2757
2022
-
[20]
Robust speech recognition via large-scale weak supervision,
A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” inInternational conference on machine learning. PMLR, 2023, pp. 28 492–28 518
2023
-
[21]
X-vectors: Robust dnn embeddings for speaker recogni- tion,
D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khu- danpur, “X-vectors: Robust dnn embeddings for speaker recogni- tion,”2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333, 2018
2018
-
[22]
Wavlm: Large-scale self- supervised pre-training for full stack speech processing,
S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiaoet al., “Wavlm: Large-scale self- supervised pre-training for full stack speech processing,”IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022
2022
-
[23]
A dataset of solicited cough sound for tuberculosis triage testing,
S. Huddart, V . Yadav, S. K. Sieberts, L. Omberg, M. Raberahona, R. Rakotoarivelo, I. N. Lyimo, O. Lweno, D. J. Christopher, N. V . Nhunget al., “A dataset of solicited cough sound for tuberculosis triage testing,”Scientific Data, vol. 11, no. 1, p. 1149, 2024
2024
-
[24]
N. Ma, B. Mirheidari, G. J. Brown, N. Sanjase, M. M. Maim- bolwa, S. Chifwamba, S. Muzazu, M. Muyoyeta, and M. Kagujje, “Deep learning for tuberculosis screening in a high-burden set- ting using cough analysis and speech foundation models,”arXiv preprint arXiv:2509.09746, 2025
arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.