pith. sign in

arxiv: 2606.17337 · v2 · pith:FHEFJFJAnew · submitted 2026-06-15 · 📡 eess.AS

From Signals to Patterns: Non-Invasive Tuberculosis Detection from Cough Audio using Bandit Weighted Hyperbolic Prototypes

Pith reviewed 2026-06-27 02:00 UTC · model grok-4.3

classification 📡 eess.AS
keywords cough audiotuberculosis detectionfeature fusionhyperbolic prototypesbandit weightingMFCCPaSSTaudio embeddings
0
0 comments X

The pith

A fusion framework called COBALT combines spectral and foundation audio features via hyperbolic prototypes and bandit weighting to improve tuberculosis detection from cough sounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes COBALT as a way to fuse different audio representations for cough-based tuberculosis screening. It claims spectral features like MFCC keep fine short-time details while foundation models like PaSST capture broader patterns, and special prototypes plus reliability weighting make the combination stronger than either alone. On the CODA TB DREAM Challenge benchmark the fused version beats single features and simple concatenation, with the strongest result from MFCC paired with PaSST. A sympathetic reader would care because the work targets a non-invasive screening approach that could reach more people if the gains hold. The method focuses on handling the mismatch between low-level acoustic descriptors and high-level pretrained embeddings.

Core claim

COBALT fuses heterogeneous audio representations using codebook-aligned hyperbolic prototypes and bandit-style reliability weighting, consistently outperforming individual representations and a concatenation baseline on the CODA TB DREAM Challenge benchmark and achieving the best overall performance when fusing MFCC with PaSST to establish a new state-of-the-art.

What carries the argument

COBALT framework based on codebook-aligned hyperbolic prototypes and bandit-style reliability weighting to integrate heterogeneous representations.

If this is right

  • The fusion reveals complementary strengths between spectral features that preserve short-time acoustic detail and foundation embeddings that capture higher-level temporal patterns.
  • COBALT outperforms both individual representations and a concatenation baseline across the benchmark.
  • The strongest results occur when MFCC is fused with PaSST, establishing a new state-of-the-art on the CODA TB DREAM Challenge.
  • The framework provides an effective way to integrate heterogeneous audio representations through hyperbolic geometry and adaptive weighting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fusion approach could be tested on audio recordings for other respiratory conditions such as pneumonia or asthma.
  • Hyperbolic space may better preserve hierarchical relationships among cough event patterns than standard Euclidean fusion methods.
  • Bandit-style weighting might prove useful for handling variable recording quality in field deployments with different microphones.
  • Applying the method to cough data from multiple geographic regions would test whether the benchmark gains translate to broader populations.

Load-bearing premise

The CODA TB DREAM Challenge benchmark is sufficiently representative of real-world cough variability and the observed performance gain will generalize beyond the specific test split used.

What would settle it

An experiment on a new cough audio dataset collected under different conditions where the COBALT fusion shows no improvement over the concatenation baseline or the best single representation would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.17337 by Girish, Mohd Mujtaba Akhtar, Muskaan Singh, Ning Ma, Sanjam Wadhwa.

Figure 1
Figure 1. Figure 1: Proposed COBALT framework for cough-based tuberculosis screening. with step size η > 0 and batch usage uj computed from the current prototype assignments. PrototypE fusion and prediction: Finally, we form the fused representation by concatenating the two reweighted evidence vectors and their agreement: f = h p˜ (1) , p˜ (2) , p˜ (1) ⊙ p˜ (2)i , and feed f to a lightweight MLP classifier to obtain the final… view at source ↗
Figure 2
Figure 2. Figure 2: t-SNE Plots for - (a) X-vector (b) MFCC+PaSST (COBALT) (a) (b) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Confusion Matrices - (a) Whisper+WavLM (Concat), (b) MFCC+PaSST (COBALT) 5. Conclusion In this study, we present a systematic benchmark for cough￾based tuberculosis screening (CBTS) and show that large pre￾trained acoustic encoders are strong representation learners for this task, with PaSST emerging as the most effective single en￾coder in our setting. At the same time, spectral features remain competitiv… view at source ↗
read the original abstract

In this study, we focus on cough-based tuberculosis screening (CBTS) and hypothesize that fusing speech/audio foundation representations with spectral descriptors will yield stronger screening performance. We expect this fusion to reveal complementary strengths: spectral features preserve fine-grained short-time acoustic detail in cough signals, while foundation embeddings capture higher-level temporal and event-level patterns learned from large-scale pretraining. To this end, we propose COBALT, a novel fusion framework based on codebook-aligned hyperbolic prototypes and bandit-style reliability weighting to integrate heterogeneous representations effectively. Using the CODA TB DREAM Challenge benchmark, COBALT consistently outperforms individual representations and a concatenation baseline, achieving the best overall performance when fusing MFCC with PaSST thereby establishing a new state-of-the-art on the benchmark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes COBALT, a fusion framework for cough-based tuberculosis screening that integrates MFCC spectral features with PaSST foundation embeddings via codebook-aligned hyperbolic prototypes and bandit-style reliability weighting. On the CODA TB DREAM Challenge benchmark, the MFCC+PaSST fusion is reported to outperform both individual representations and a simple concatenation baseline, thereby claiming a new state-of-the-art on the benchmark.

Significance. If the performance gains are reproducible and the method is shown to be robust, the work would offer a concrete advance in non-invasive TB detection by demonstrating how low-level acoustic detail and high-level pretrained patterns can be fused effectively in hyperbolic space. This could support scalable screening applications, provided the benchmark results translate beyond the specific test split.

major comments (2)
  1. [Abstract] Abstract: The claim of new state-of-the-art performance on the CODA TB DREAM Challenge benchmark rests entirely on results from the benchmark's designated test split. No cross-validation across alternative splits, external cohorts, or metrics addressing distribution shift (demographics, recording devices, ambient noise, disease stage) are referenced, which directly limits the strength of the broader claim that the approach advances real-world cough-based TB screening.
  2. [Abstract] Abstract / Results: The reported outperformance of the bandit-weighted hyperbolic fusion over concatenation and single representations is presented without accompanying numerical metrics, confidence intervals, statistical tests, or ablation tables that would allow verification that the gains are attributable to the proposed components rather than post-hoc selection or dataset-specific effects.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by inclusion of the actual performance numbers (e.g., AUC or sensitivity/specificity) that support the SOTA claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below and will revise the manuscript accordingly to improve clarity and strengthen the presentation of results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim of new state-of-the-art performance on the CODA TB DREAM Challenge benchmark rests entirely on results from the benchmark's designated test split. No cross-validation across alternative splits, external cohorts, or metrics addressing distribution shift (demographics, recording devices, ambient noise, disease stage) are referenced, which directly limits the strength of the broader claim that the approach advances real-world cough-based TB screening.

    Authors: We acknowledge that the primary results are reported on the challenge's fixed test split, as is standard for DREAM Challenge benchmarks to ensure fair and reproducible comparisons across methods. The manuscript does not include cross-validation on alternative splits or external cohorts because the challenge data release is structured around this single held-out test set. We agree this constrains stronger claims about robustness to distribution shift. In the revised manuscript, we will add an explicit limitations paragraph discussing potential demographic, device, and environmental shifts, and we will report any available internal validation metrics from the training portion if they can be computed without violating challenge rules. revision: yes

  2. Referee: [Abstract] Abstract / Results: The reported outperformance of the bandit-weighted hyperbolic fusion over concatenation and single representations is presented without accompanying numerical metrics, confidence intervals, statistical tests, or ablation tables that would allow verification that the gains are attributable to the proposed components rather than post-hoc selection or dataset-specific effects.

    Authors: The full manuscript contains numerical results, ablation studies, and comparisons in the Results section, but the abstract summarizes them qualitatively. We agree that including concrete metrics in the abstract would improve verifiability. In the revision we will update the abstract to report key performance numbers (e.g., AUC or equivalent metric with confidence intervals where available) and will ensure the main text explicitly presents statistical comparisons and component ablations so readers can assess the contribution of the hyperbolic prototypes and bandit weighting. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical SOTA claim on external benchmark with no visible derivation chain

full rationale

The provided abstract and context contain no equations, parameter-fitting steps, or self-citations that reduce the reported performance to a tautology or input by construction. The central claim is a direct empirical comparison (COBALT vs. individuals and concatenation baseline) on the held-out CODA TB DREAM Challenge test split. This is a standard falsifiable ML benchmark result with no self-definitional, fitted-prediction, or uniqueness-imported structure visible. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations or methods section; free parameters, axioms, and invented entities cannot be enumerated.

pith-pipeline@v0.9.1-grok · 5674 in / 1066 out tokens · 24212 ms · 2026-06-27T02:00:15.239740+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 1 linked inside Pith

  1. [1]

    Introduction Tuberculosis (TB), a major infectious disease, often presents with persistent cough and other respiratory symptoms—signals that are not only clinically salient for screening but also re- flect underlying pulmonary pathology and disease burden [1]. Accordingly, CBTS triage has attracted growing interest as a rapid, low-cost screening route; ho...

  2. [2]

    Representations In this section, we outline the pretrained representation models and handcrafted spectral features used in our experiments. Pretrained representation models:PaSST 2 [12] is a spec- trogram Transformer that adapts ViT-style patch tokenization to audio and employs patch masking for regularization; we use the PaSST-S variant (87M). Whisper 3 ...

  3. [3]

    Modeling 3.1. Proposed framework :COBALT We proposeCOBALT(Codebook-Aligned Bandit-weighted pro- totypE fusion Learning) for fusing dual pretrained-model (PTM) representations for cough-based tuberculosis detection. The overall architecture ofCOBALTis shown in Figure 1. Dual-stream representation encoding:Letxdenote an input cough sample andBthe batch size...

  4. [4]

    Benchmark Dataset Our study is grounded in experiments on the CODA TB[16] 8 benchmark dataset released through the CODA TB DREAM Challenge

    Experiment 4.1. Benchmark Dataset Our study is grounded in experiments on the CODA TB[16] 8 benchmark dataset released through the CODA TB DREAM Challenge. It consists of solicited cough audio from adult par- ticipants (≥18 years) presenting with respiratory symptoms, collected at outpatient health centers spanning seven countries (India, Madagascar, the ...

  5. [5]

    At the same time, spectral features remain competitive and provide complementary information, motivat- ing heterogeneous fusion

    Conclusion In this study, we present a systematic benchmark for cough- based tuberculosis screening (CBTS) and show that large pre- trained acoustic encoders are strong representation learners for this task, with PaSST emerging as the most effective single en- coder in our setting. At the same time, spectral features remain competitive and provide complem...

  6. [6]

    Acknowledgements The authors gratefully acknowledge the support of the United States–Ireland–Northern Ireland R&D Partnership Programme (USI-207), and access to the Tier 2 High-Performance Com- puting resources from the Northern Ireland High Performance Computing (NI-HPC) service funded by EPSRC (EP/T022175)

  7. [7]

    They did not contribute to the development of scientific ideas, the design or execution of analyses, the production of re- sults, or the interpretation of findings

    Generative AI Use Disclosure AI assistants were used only to refine grammar, enhance clarity, and improve the manuscript’s overall readability and presenta- tion. They did not contribute to the development of scientific ideas, the design or execution of analyses, the production of re- sults, or the interpretation of findings. The authors assume full respo...

  8. [8]

    Global tuberculosis report 2025,

    World Health Organization, “Global tuberculosis report 2025,” Geneva, Switzerland, 2025

  9. [9]

    A systematic review of delay in the diagnosis and treatment of tuberculosis,

    D. G. Storla, S. Yimer, and G. A. Bjune, “A systematic review of delay in the diagnosis and treatment of tuberculosis,”BMC public health, vol. 8, no. 1, p. 15, 2008

  10. [10]

    Guidance for studies evaluating the accuracy of tuberculosis triage tests,

    R. R. Nathavitharana, C. Yoon, P. Macpherson, D. W. Dowdy, A. Cattamanchi, A. Somoskovi, T. Broger, T. H. M. Otten- hoff, N. Arinaminpathy, K. L ¨onnroth, K. Reither, F. Cobelens, C. Gilpin, C. M. Denkinger, and S. G. Schumacher, “Guidance for studies evaluating the accuracy of tuberculosis triage tests,” The Journal of Infectious Diseases, vol. 220, no. ...

  11. [11]

    Towards Pre-training an Effective Respiratory Audio Foundation Model,

    D. Niizumi, D. Takeuchi, M. Yasuda, B. T. Nguyen, Y . Ohishi, and N. Harada, “Towards Pre-training an Effective Respiratory Audio Foundation Model,” inInterspeech 2025, 2025, pp. 998–1002

  12. [12]

    Accelerating cough-based algorithms for pulmonary tuberculosis screening: Results from the coda tb dream challenge,

    D. Jaganath, S. K. Sieberts, M. Raberahona, S. Huddart, L. Omberg, R. Rakotoarivelo, I. Lyimo, O. Lweno, D. J. Christo- pher, N. V . Nhunget al., “Accelerating cough-based algorithms for pulmonary tuberculosis screening: Results from the coda tb dream challenge,” inOpen Forum Infectious Diseases, vol. 12, no. 10. Oxford University Press US, 2025, p. ofaf572

  13. [13]

    Considerations and challenges for real-world deployment of an acoustic-based covid- 19 screening system,

    D. Grant, I. McLane, V . Rennoll, and J. West, “Considerations and challenges for real-world deployment of an acoustic-based covid- 19 screening system,”Sensors, vol. 22, no. 23, p. 9530, 2022

  14. [14]

    Automatic cough classification for tuberculosis screening in a real-world environment,

    M. Pahar, M. Klopper, B. Reeve, R. Warren, G. Theron, and T. Niesler, “Automatic cough classification for tuberculosis screening in a real-world environment,”Physiological measure- ment, vol. 42, no. 10, p. 105014, 2021

  15. [15]

    TB or not TB? Acous- tic cough analysis for tuberculosis classification,

    G. T. Frost, G. Theron, and T. Niesler, “TB or not TB? Acous- tic cough analysis for tuberculosis classification,” inInterspeech 2022, 2022, pp. 2448–2452

  16. [16]

    Detection of tuberculosis using cough audio analysis: a deep learning approach with capsule networks,

    S. J. S. Rajasekar, A. R. Balaraman, D. V . Balaraman, S. Mo- hamed Ali, K. Narasimhan, N. Krishnasamy, and V . Perumal, “Detection of tuberculosis using cough audio analysis: a deep learning approach with capsule networks,”Discover Artificial In- telligence, vol. 4, no. 1, p. 77, 2024

  17. [17]

    Non- Invasive TB Detection using Acoustic and Semantic Features from Cough Sounds ,

    Y . Akhter, R. Ranjan, B. Dutta, M. Vatsa, and R. Singh, “ Non- Invasive TB Detection using Acoustic and Semantic Features from Cough Sounds ,” inProceedings of Medical Image Com- puting and Computer Assisted Intervention (MICCAI) 2025, vol. LNCS 15960. Springer Nature Switzerland, September 2025, pp. 460–470

  18. [18]

    Tbscreen: A pas- sive cough classifier for tuberculosis screening with a controlled dataset,

    M. Sharma, V . Nduba, L. N. Njagi, W. Murithi, Z. Mwongera, T. R. Hawn, S. N. Patel, and D. J. Horne, “Tbscreen: A pas- sive cough classifier for tuberculosis screening with a controlled dataset,”Science Advances, vol. 10, no. 1, p. eadi0282, 2024

  19. [19]

    Ef- ficient Training of Audio Transformers with Patchout,

    K. Koutini, J. Schl ¨uter, H. Eghbal-zadeh, and G. Widmer, “Ef- ficient Training of Audio Transformers with Patchout,” inInter- speech 2022, 2022, pp. 2753–2757

  20. [20]

    Robust speech recognition via large-scale weak supervision,

    A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” inInternational conference on machine learning. PMLR, 2023, pp. 28 492–28 518

  21. [21]

    X-vectors: Robust dnn embeddings for speaker recogni- tion,

    D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khu- danpur, “X-vectors: Robust dnn embeddings for speaker recogni- tion,”2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333, 2018

  22. [22]

    Wavlm: Large-scale self- supervised pre-training for full stack speech processing,

    S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiaoet al., “Wavlm: Large-scale self- supervised pre-training for full stack speech processing,”IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022

  23. [23]

    A dataset of solicited cough sound for tuberculosis triage testing,

    S. Huddart, V . Yadav, S. K. Sieberts, L. Omberg, M. Raberahona, R. Rakotoarivelo, I. N. Lyimo, O. Lweno, D. J. Christopher, N. V . Nhunget al., “A dataset of solicited cough sound for tuberculosis triage testing,”Scientific Data, vol. 11, no. 1, p. 1149, 2024

  24. [24]

    Deep learning for tuberculosis screening in a high-burden set- ting using cough analysis and speech foundation models,

    N. Ma, B. Mirheidari, G. J. Brown, N. Sanjase, M. M. Maim- bolwa, S. Chifwamba, S. Muzazu, M. Muyoyeta, and M. Kagujje, “Deep learning for tuberculosis screening in a high-burden set- ting using cough analysis and speech foundation models,”arXiv preprint arXiv:2509.09746, 2025