Hubert: Self-supervised speech represen- tation learning by masked prediction of hidden units

· 2021

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

representative citing papers

APEX: Audio Prototype EXplanations for Classification Tasks

cs.SD · 2026-05-11 · unverdicted · novelty 6.0

APEX generates four types of prototype-based explanations for pre-trained audio classifiers that preserve output invariance and target acoustic properties better than gradient methods applied to spectrograms.

SDTalk: Structured Facial Priors and Dual-Branch Motion Fields for Generalizable Gaussian Talking Head Synthesis

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

SDTalk proposes a generalizable one-shot 3DGS talking head method that uses structured facial priors for complete reconstruction and dual-branch motion fields for dynamics, outperforming prior identity-specific approaches.

Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models

eess.AS · 2026-04-10 · unverdicted · novelty 6.0

A Conformer-conditioned decoder-only language model generates discrete tokens via a neural audio codec to separate four music stems, reaching near state-of-the-art perceptual quality and top NISQA on vocals in MUSDB18-HQ tests.

Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment

cs.SD · 2025-06-17 · unverdicted · novelty 6.0

Acoustic scattering signals fed into fine-tuned self-supervised deep learning models classify hair type and moisture at nearly 90% accuracy as a non-invasive alternative to visual methods.

Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

cs.CL · 2026-04-29 · unverdicted · novelty 5.0

Fine-tuned speech representation models with hierarchical classification outperform multimodal LLMs on pediatric speech sound disorder tasks.

Closing the Speech-Text Gap with Limited Audio for Effective Domain Adaptation in LLM-Based ASR

cs.CL · 2026-04-07 · unverdicted · novelty 5.0

Mixed batching with only 10% target-domain speech achieves word error rates matching or exceeding conventional full-dataset ASR fine-tuning in LLM-based models.

Learning to Attend to Depression-Related Patterns: An Adaptive Cross-Modal Gating Network for Depression Detection

cs.SD · 2026-04-11 · unverdicted · novelty 4.0

An adaptive cross-modal gating network improves depression detection from speech by selectively weighting sparse relevant segments across acoustic and textual modalities.

Exploring Speech Foundation Models for Speaker Diarization Across Lifespan

eess.AS · 2026-04-06 · unverdicted · novelty 4.0 · 2 refs

Cross-lifespan evaluation shows adult-trained speech foundation models degrade on child and older-adult data, with joint multi-age training and targeted adaptation improving robustness especially using Whisper encoder.

citing papers explorer

Showing 8 of 8 citing papers.

APEX: Audio Prototype EXplanations for Classification Tasks cs.SD · 2026-05-11 · unverdicted · none · ref 47
APEX generates four types of prototype-based explanations for pre-trained audio classifiers that preserve output invariance and target acoustic properties better than gradient methods applied to spectrograms.
SDTalk: Structured Facial Priors and Dual-Branch Motion Fields for Generalizable Gaussian Talking Head Synthesis cs.CV · 2026-05-11 · unverdicted · none · ref 25
SDTalk proposes a generalizable one-shot 3DGS talking head method that uses structured facial priors for complete reconstruction and dual-branch motion fields for dynamics, outperforming prior identity-specific approaches.
Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models eess.AS · 2026-04-10 · unverdicted · none · ref 29
A Conformer-conditioned decoder-only language model generates discrete tokens via a neural audio codec to separate four music stems, reaching near state-of-the-art perceptual quality and top NISQA on vocals in MUSDB18-HQ tests.
Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment cs.SD · 2025-06-17 · unverdicted · none · ref 31
Acoustic scattering signals fed into fine-tuned self-supervised deep learning models classify hair type and moisture at nearly 90% accuracy as a non-invasive alternative to visual methods.
Multimodal LLMs are not all you need for Pediatric Speech Language Pathology cs.CL · 2026-04-29 · unverdicted · none · ref 27
Fine-tuned speech representation models with hierarchical classification outperform multimodal LLMs on pediatric speech sound disorder tasks.
Closing the Speech-Text Gap with Limited Audio for Effective Domain Adaptation in LLM-Based ASR cs.CL · 2026-04-07 · unverdicted · none · ref 16
Mixed batching with only 10% target-domain speech achieves word error rates matching or exceeding conventional full-dataset ASR fine-tuning in LLM-based models.
Learning to Attend to Depression-Related Patterns: An Adaptive Cross-Modal Gating Network for Depression Detection cs.SD · 2026-04-11 · unverdicted · none · ref 13
An adaptive cross-modal gating network improves depression detection from speech by selectively weighting sparse relevant segments across acoustic and textual modalities.
Exploring Speech Foundation Models for Speaker Diarization Across Lifespan eess.AS · 2026-04-06 · unverdicted · none · ref 28 · 2 links
Cross-lifespan evaluation shows adult-trained speech foundation models degrade on child and older-adult data, with joint multi-age training and targeted adaptation improving robustness especially using Whisper encoder.

Hubert: Self-supervised speech represen- tation learning by masked prediction of hidden units

fields

years

verdicts

representative citing papers

citing papers explorer