3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
read the original abstract
Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io/
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Audio2Tool: Speak, Call, Act -- A Dataset for Benchmarking Speech Tool Use
Audio2Tool is a new benchmark dataset that shows speech models perform well on simple commands but degrade sharply on compositional tasks and realistic acoustic noise.
-
SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription
SoulX-Transcriber is a unified LLM framework for end-to-end multi-speaker transcription using two-stage training (speaker-aware pre-training then supervised fine-tuning) that reports strong results on AliMeeting, AISH...
-
Kiwano: A Cutting-Edge Open-Source Toolkit for Speaker Verification
Kiwano is an open-source toolkit that supplies standardized PyTorch pipelines, pretrained models, and evaluation protocols for speaker verification.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.