pith. sign in

arxiv: 2606.02212 · v1 · pith:OOSQYYMDnew · submitted 2026-06-01 · 💻 cs.SD

C2GA: A Class-Controllable Generative Augmentation Framework for Respiratory Sound Classification

classification 💻 cs.SD
keywords augmentationc2gaclassgenerativerespiratorysoundclassificationframework
0
0 comments X
read the original abstract

Background: Respiratory sound classification plays a critical role in the clinical identification of pulmonary pathologies. However, its performance is often hindered by the limited size, severe noise, and class imbalance of real-world auscultation datasets. Although conventional audio augmentation techniques are easy to implement, they may inadvertently distort subtle pathological characteristics. Meanwhile, existing Variational Autoencoder (VAE)- or Generative Adversarial Network (GAN)-based generative approaches often suffer from limited sample fidelity and insufficient controllability over class semantics, particularly under conditions of scarce supervision. Methods: To overcome these limitations, we propose C2GA, a class-controllable generative augmentation framework. C2GA first constructs a semantically rich discrete latent space using a conditional Vector-Quantized Variational Autoencoder (VQ-VAE), in which local acoustic tokens are explicitly decoupled from global class prototypes. Subsequently, a Transformer-based autoregressive prior is trained to generate label-consistent token sequences. These generated tokens are then fused with the corresponding class prototypes and decoded into high-fidelity Mel-spectrograms for data augmentation. Conclusion: These results indicate that C2GA provides an effective and semantically reliable augmentation strategy for respiratory sound analysis. By enabling controllable and high-quality data generation, the proposed framework offers a promising solution for improving the robustness and generalization of respiratory sound classification in realistic clinical scenarios.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.