High-Quality, Low-Delay Music Coding in the Opus Codec

Gregory Maxwell; Jean-Marc Valin; Koen Vos; Timothy B. Terriberry

arxiv: 1602.04845 · v1 · pith:B4MNP2T5new · submitted 2016-02-15 · 💻 cs.MM · cs.SD

High-Quality, Low-Delay Music Coding in the Opus Codec

Jean-Marc Valin , Gregory Maxwell , Timothy B. Terriberry , Koen Vos This is my paper

classification 💻 cs.MM cs.SD

keywords coderopuscodecreal-timetransformapplicationsattentionaudio

0 comments

read the original abstract

The IETF recently standardized the Opus codec as RFC6716. Opus targets a wide range of real-time Internet applications by combining a linear prediction coder with a transform coder. We describe the transform coder, with particular attention to the psychoacoustic knowledge built into the format. The result out-performs existing audio codecs that do not operate under real-time constraints.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Codec-Robust Attacks on Audio LLMs
cs.SD 2026-05 unverdicted novelty 7.0

CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.
AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling
cs.SD 2026-05 unverdicted novelty 7.0

AffectCodec is an emotion-guided neural speech codec that preserves emotional cues during quantization while maintaining semantic fidelity and prosodic naturalness.
Codec-Robust Attacks on Audio LLMs
cs.SD 2026-05 unverdicted novelty 6.0

CodecAttack optimizes perturbations in neural audio codec latent space to reach 85.5% average target-substring ASR on compressed Opus audio while waveform baselines stay below 26%.
Benchmarking Audio Deepfake Detection Robustness in Real-world Communication Scenarios
eess.AS 2025-04 unverdicted novelty 5.0

The paper creates ADD-C benchmark dataset for audio deepfake detection under codec compression and packet loss, shows baseline degradation, and demonstrates a data augmentation method that boosts robustness.