Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
Taal, Richard C
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2verdicts
UNVERDICTED 2representative citing papers
Balalaika is a data-centric annotation pipeline for Russian speech that combines semantic VAD, ASR ensembling, and prosody enrichment to build a 5.1k-hour corpus showing gains in denoising and TTS.
citing papers explorer
-
Two-Dimensional Quantization for Geometry-Aware Audio Coding
Q2D2 uses 2D geometric grid projections to quantize feature pairs in neural audio codecs, yielding implicit codebooks that improve efficiency and utilization over RVQ, VQ, and FSQ while maintaining reconstruction quality.
-
Balalaika: Data-Centric, Prosody-Aware Annotation Pipeline for Russian Speech
Balalaika is a data-centric annotation pipeline for Russian speech that combines semantic VAD, ASR ensembling, and prosody enrichment to build a 5.1k-hour corpus showing gains in denoising and TTS.