HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , year=

Hsu, Wei-Ning, Bolte, Benjamin, Tsai, Yao-Hung Hubert, Lakhotia, Kushal, Salakhutdinov, Ruslan, Mohamed, Abdelrahman , journal=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Scaling few-shot spoken word classification with generative meta-continual learning

cs.CL · 2026-05-13 · unverdicted · novelty 5.0 · 2 refs

GeMCL scales few-shot spoken word classification to 1000 classes with 5 shots each, matching frozen-HuBERT baseline performance while adapting 2000 times faster on less than half the data.

Does language matter for spoken word classification? A multilingual generative meta-learning approach

cs.CL · 2026-05-13 · unverdicted · novelty 4.0 · 2 refs

Multilingual generative meta-learning for spoken word classification shows small gains over monolingual models, with unique data volume mattering more than the number of languages.

PlanRAG-Audio: Planning and Retrieval Augmented Generation for Long-form Audio Understanding

eess.AS · 2026-05-19

citing papers explorer

Showing 3 of 3 citing papers.

Scaling few-shot spoken word classification with generative meta-continual learning cs.CL · 2026-05-13 · unverdicted · none · ref 4 · 2 links
GeMCL scales few-shot spoken word classification to 1000 classes with 5 shots each, matching frozen-HuBERT baseline performance while adapting 2000 times faster on less than half the data.
Does language matter for spoken word classification? A multilingual generative meta-learning approach cs.CL · 2026-05-13 · unverdicted · none · ref 4 · 2 links
Multilingual generative meta-learning for spoken word classification shows small gains over monolingual models, with unique data volume mattering more than the number of languages.
PlanRAG-Audio: Planning and Retrieval Augmented Generation for Long-form Audio Understanding eess.AS · 2026-05-19 · unreviewed · ref 21

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , year=

fields

years

verdicts

representative citing papers

citing papers explorer