pith. sign in

arxiv: 2605.24863 · v1 · pith:XCPUXKDFnew · submitted 2026-05-24 · 📡 eess.AS · cs.SD

Rethinking Continual Learning for Speech and Audio: A Representation-Centric Taxonomy and Open Problems

classification 📡 eess.AS cs.SD
keywords speechfoundationacousticaudiocontinuallearningmodelnon-stationary
0
0 comments X
read the original abstract

Speech and audio systems operate in inherently non-stationary environments, yet continual learning (CL) research in this domain, especially in the foundation model era, remains fragmented that fail to account for the coupled, geometry-sensitive nature of acoustic representations. Modern speech foundation models operate over highly entangled, continuous representations that jointly encode linguistic, speaker, and paralinguistic factors within a shared latent space. CL is therefore fundamentally about preserving and evolving shared representation structure rather than retaining isolated task knowledge. In this work, we revisit CL for speech from a representation-centered perspective, and introduce a new taxonomy that organizes CL according to how underlying representation geometry evolves under non-stationary acoustic conditions. We further identify key mismatches between current CL assumptions and speech foundation model behavior, and finally outline a set of open challenges and future research directions.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.