pith. sign in

arxiv: 2306.15354 · v3 · pith:4DAMEIXAnew · submitted 2023-06-27 · 💻 cs.CL · cs.SD· eess.AS

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

classification 💻 cs.CL cs.SDeess.AS
keywords speechd-speakerrepresentationcorpusdifferentdisentanglementinformationlarge-scale
0
0 comments X
read the original abstract

Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io/

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Audio2Tool: Speak, Call, Act -- A Dataset for Benchmarking Speech Tool Use

    cs.SD 2026-04 unverdicted novelty 6.0

    Audio2Tool is a new benchmark dataset that shows speech models perform well on simple commands but degrade sharply on compositional tasks and realistic acoustic noise.

  2. SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription

    eess.AS 2026-06 unverdicted novelty 4.0

    SoulX-Transcriber is a unified LLM framework for end-to-end multi-speaker transcription using two-stage training (speaker-aware pre-training then supervised fine-tuning) that reports strong results on AliMeeting, AISH...

  3. Kiwano: A Cutting-Edge Open-Source Toolkit for Speaker Verification

    cs.SD 2026-06 unverdicted novelty 3.0

    Kiwano is an open-source toolkit that supplies standardized PyTorch pipelines, pretrained models, and evaluation protocols for speaker verification.