PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token count by 55% on TIMIT.
Vector-quantizedautoregressivepredictivecoding.arXiv preprint arXiv:2005.08392
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token count by 55% on TIMIT.