GeMCL scales few-shot spoken word classification to 1000 classes with 5 shots each, matching frozen-HuBERT baseline performance while adapting 2000 times faster on less than half the data.
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
Multilingual generative meta-learning for spoken word classification shows small gains over monolingual models, with unique data volume mattering more than the number of languages.
citing papers explorer
-
Scaling few-shot spoken word classification with generative meta-continual learning
GeMCL scales few-shot spoken word classification to 1000 classes with 5 shots each, matching frozen-HuBERT baseline performance while adapting 2000 times faster on less than half the data.
-
Does language matter for spoken word classification? A multilingual generative meta-learning approach
Multilingual generative meta-learning for spoken word classification shows small gains over monolingual models, with unique data volume mattering more than the number of languages.
- PlanRAG-Audio: Planning and Retrieval Augmented Generation for Long-form Audio Understanding