RespiraMFM reports 9.15% AUROC gain in supervised fine-tuning and 20.98% in zero-shot settings over baselines by aligning respiratory audio with clinical text across seven real-world datasets for five diseases.
arXiv preprint arXiv:2305.14032 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SD 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
C2GA uses conditional VQ-VAE with decoupled local tokens and global class prototypes plus a Transformer prior to generate high-fidelity label-consistent Mel-spectrograms for respiratory sound data augmentation.
citing papers explorer
-
RespiraMFM: A Multimodal Foundation Model with Contrastive Audio-Language Alignment for Respiratory Disease Identification
RespiraMFM reports 9.15% AUROC gain in supervised fine-tuning and 20.98% in zero-shot settings over baselines by aligning respiratory audio with clinical text across seven real-world datasets for five diseases.
-
C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification
C2GA uses conditional VQ-VAE with decoupled local tokens and global class prototypes plus a Transformer prior to generate high-fidelity label-consistent Mel-spectrograms for respiratory sound data augmentation.