Usad: Universal speech and audio representation via distillation.arXiv preprint arXiv:2506.18843

Heng-Jui Chang, Saurabhchand Bhati, James Glass, Alexander H Liu · arXiv 2506.18843

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Stage-adaptive audio diffusion modeling

cs.SD · 2026-05-06 · unverdicted · novelty 6.0

A semantic progress signal from SSL discrepancy slope enables three stage-aware mechanisms that improve training efficiency and performance in audio diffusion models over static baselines.

Alethia: A Foundational Encoder for Voice Deepfakes

cs.SD · 2026-04-30 · unverdicted · novelty 6.0

Alethia is a pretrained audio encoder using continuous embedding prediction and generative flow-matching reconstruction that outperforms existing speech foundation models on voice deepfake tasks with better robustness and zero-shot generalization.

citing papers explorer

Showing 2 of 2 citing papers.

Stage-adaptive audio diffusion modeling cs.SD · 2026-05-06 · unverdicted · none · ref 1
A semantic progress signal from SSL discrepancy slope enables three stage-aware mechanisms that improve training efficiency and performance in audio diffusion models over static baselines.
Alethia: A Foundational Encoder for Voice Deepfakes cs.SD · 2026-04-30 · unverdicted · none · ref 7
Alethia is a pretrained audio encoder using continuous embedding prediction and generative flow-matching reconstruction that outperforms existing speech foundation models on voice deepfake tasks with better robustness and zero-shot generalization.

Usad: Universal speech and audio representation via distillation.arXiv preprint arXiv:2506.18843

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer