pith. sign in

arxiv: 2606.24661 · v1 · pith:ICYJUKE4new · submitted 2026-06-23 · 📡 eess.AS

Perceptual Evaluation of Higher-Order Ambisonic Codecs on Both Synthetic Mixing and Native Recordings

Pith reviewed 2026-06-25 22:23 UTC · model grok-4.3

classification 📡 eess.AS
keywords higher-order ambisonicsIVAS codecperceptual evaluationaudio compressionspatial audiointer-channel correlationVR/AR applications
0
0 comments X

The pith

IVAS codec for higher-order ambisonics outperforms multi-mono encoding at the same bitrate by exploiting inter-channel correlations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares the standardized IVAS codec against a basic multi-mono approach for compressing higher-order ambisonic audio across synthetic mixes and native recordings. Listening tests show IVAS delivers better perceptual quality at matched bitrates because it reduces data by using correlations between channels. This advantage appears strongest for signals made from a limited number of plane waves. The evaluation covers various contents and spatialization methods relevant to virtual and augmented reality. The work aims to guide codec choice for efficient storage and transmission of spatial audio.

Core claim

The IVAS codec achieves superior perceptual quality to multi-mono HOA coding at the same bitrate by exploiting inter-channel correlation, with the performance gap largest on signals composed of few plane waves.

What carries the argument

IVAS codec's use of inter-channel correlation to reduce bitrate while preserving perceptual quality in higher-order ambisonics.

If this is right

  • IVAS supports lower bitrates for equivalent quality in correlated HOA signals.
  • Multi-mono encoding wastes bitrate on highly correlated content such as few-plane-wave scenes.
  • IVAS is especially suitable for communication use cases involving limited numbers of sound sources.
  • Perceptual tests across synthetic and native material confirm the correlation benefit holds for multiple spatialization methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Real-time VR and AR systems could adopt IVAS to lower transmission costs without quality loss on typical scene content.
  • Codec selection for spatial audio may need to account for expected inter-channel correlation rather than treating all HOA signals uniformly.
  • Extending the comparison to higher orders or dynamic scenes would test whether the correlation advantage scales.

Load-bearing premise

The chosen contents, spatialization methods, and listening test conditions represent real-world HOA use in VR and AR applications.

What would settle it

A new listening test on a different set of contents or higher ambisonic orders in which IVAS shows no quality advantage over multi-mono at the same bitrate.

Figures

Figures reproduced from arXiv: 2606.24661 by Adrien Llave, Gr\'egory Pallone, J\'er\^ome Daniel.

Figure 1
Figure 1. Figure 1: Listening room picture. signal from its input for the i th parametrized HOA component. This function allows resynthesizing from wˆ ′ the part of the signal which could not be predicted from the TCs. Finally, xˆ is converted back to time domain using an inverse filter bank. III. EXPERIMENT 1 The aim of this experiment is to test the performance of the IVAS and EVSx16 codecs in terms of global quality on a w… view at source ↗
Figure 2
Figure 2. Figure 2: MUSHRA scores averaged across listeners for each listening condition and item. The items spatialized with an ideal plane wave encoding are set in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: MUSHRA scores averaged across listeners for each listening condition and item. The vertical segments show the mean confidence interval at 95 %. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Difference of the listeners-averaged MUSHRA score between IVAS [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: For each codec, the listener-averaged MUSHRA scores differences [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

Spatial audio is spreading in applications such as virtual and augmented reality and immersive games. The higher-order ambisonic (HOA) format is particularly useful in this context. Transmitting spatial information requires multiple channels, e.g., 16 channels for 3rd-order ambisonics, resulting in increased memory requirements for storage and higher bitrates for communication. Therefore, efficient compression algorithms are necessary for those contents. The recently standardized IVAS codec allows the coding of HOA content for communication use-cases. Here, we propose to evaluate it in comparison with a basic multi-mono approach across a variety of contents and spatialization methods. Results show that IVAS outperforms the multi-mono approach at the same bitrate. In particular, this codec exploits inter-channel correlation to reduce the bitrate. We point out that it is therefore especially robust for signals with a high interchannel correlation, such as those composed of a limited number of plane waves. Conversely, the multi-mono approach is unable to exploit this correlation and performs poorly on this type of signal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript presents a perceptual listening test comparing the standardized IVAS codec against a basic multi-mono HOA coding approach at matched bitrates. Using both synthetic mixtures and native recordings, it reports that IVAS yields higher perceptual quality by exploiting inter-channel correlation, with the advantage being largest for content composed of a small number of plane waves.

Significance. If the listening-test outcomes prove robust, the work supplies concrete evidence that correlation-aware HOA codecs can deliver measurable perceptual gains over independent-channel coding at the same bitrate. This is directly relevant to bitrate-constrained spatial-audio delivery in VR/AR and immersive communication.

major comments (1)
  1. [Abstract] Abstract: the performance result is stated without details on listener count, statistical tests, content selection criteria, or error analysis, so the data-to-claim link cannot be verified.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment and for recognizing the relevance of our work to bitrate-constrained spatial audio delivery. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the performance result is stated without details on listener count, statistical tests, content selection criteria, or error analysis, so the data-to-claim link cannot be verified.

    Authors: We agree that the abstract would be strengthened by including key methodological details. The full manuscript reports a MUSHRA listening test with 12 expert listeners, statistical analysis via repeated-measures ANOVA with post-hoc pairwise comparisons (Bonferroni-corrected), content selected to span synthetic mixtures (controlled plane-wave counts from 1 to 8) and native HOA recordings, and results presented with 95% confidence intervals. We will revise the abstract to concisely incorporate listener count, mention of statistical testing, and a note on content diversity while preserving length constraints. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical perceptual comparison

full rationale

The manuscript reports results from listening tests that directly compare perceptual quality of IVAS versus multi-mono HOA coding at matched bitrates across selected contents. No derivation, predictive model, fitted parameters, or mathematical claim is advanced whose output is asserted to follow from the inputs by construction. No equations, ansatzes, or uniqueness theorems appear; the central observation that IVAS exploits inter-channel correlation is presented as an empirical finding from the test data rather than a self-referential reduction. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical perceptual evaluation study; no mathematical model, free parameters, axioms, or invented entities are involved.

pith-pipeline@v0.9.1-grok · 5724 in / 918 out tokens · 28769 ms · 2026-06-25T22:23:49.822569+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 1 canonical work pages

  1. [1]

    Direct Comparison of the Impact of Head Tracking, Reverberation, and Individualized Head-Related Transfer Functions on the Spatial Perception of a Virtual Speech Source,

    D. R. Begault, “Direct Comparison of the Impact of Head Tracking, Reverberation, and Individualized Head-Related Transfer Functions on the Spatial Perception of a Virtual Speech Source,”J. Audio Eng. Soc., vol. 49, no. 10, 2001

  2. [2]

    Minimum BRIR grid resolution for dynamic binaural synthesis,

    A. Lindau, H.-J. Maempel, and S. Weinzierl, “Minimum BRIR grid resolution for dynamic binaural synthesis,”J. Acous. Soc. America, vol. 123, May 2008

  3. [3]

    Daniel,Repr ´esentation de champs acoustiques, appli- cation `a la transmission et `a la reproduction de sc `enes sonores complexes dans un contexte multim ´edia

    J. Daniel,Repr ´esentation de champs acoustiques, appli- cation `a la transmission et `a la reproduction de sc `enes sonores complexes dans un contexte multim ´edia. PhD thesis, Univ. Paris 6, July 2001

  4. [4]

    A 3D ambisonic based binaural sound reproduction system,

    M. Noisternig, A. Sontacchi, T. Musil, and R. H¨oldrich, “A 3D ambisonic based binaural sound reproduction system,” in24th Int. Conf.: Multichannel Audio, The New Reality, Audio Eng. Soc., June 2003

  5. [5]

    Binaural Rendering with Measured Room Responses: First-Order Ambisonic Microphone vs. Dummy Head,

    M. Zaunschirm, M. Frank, and F. Zotter, “Binaural Rendering with Measured Room Responses: First-Order Ambisonic Microphone vs. Dummy Head,”Applied Sciences, vol. 10, Feb. 2020

  6. [6]

    Binau- ral Rendering of Ambisonic Signals via Magnitude Least Squares,

    C. Schorkhuber, M. Zaunschirm, and R. Holdrich, “Binau- ral Rendering of Ambisonic Signals via Magnitude Least Squares,” inProceedings of the DAGA, vol. 44, 2018

  7. [7]

    Ambisonics Sound Source Localization With Varying Amount of Visual Information in Virtual Reality,

    T. Huisman, A. Ahrens, and E. MacDonald, “Ambisonics Sound Source Localization With Varying Amount of Visual Information in Virtual Reality,”Frontiers in Virtual Reality, vol. 2, Oct. 2021

  8. [8]

    Ambisonics in an Ogg Opus Container,

    J. Skoglund and M. Graczyk, “Ambisonics in an Ogg Opus Container,” Tech. Rep. RFC 8486, Internet Engineering Task Force (IETF), Oct. 2018. https://www.rfc-editor.org/ rfc/rfc8486.txt

  9. [9]

    Streaming VR for immersion: Quality aspects of compressed spatial audio,

    M. Narbutt, S. O’Leary, A. Allen, J. Skoglund, and A. Hines, “Streaming VR for immersion: Quality aspects of compressed spatial audio,” in23rd Int. Conf. Virt. Sys. & Multimedia (VSMM), (Dublin), IEEE, Oct. 2017

  10. [10]

    Rudzki,Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics

    T. Rudzki,Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics. PhD thesis, Univ. of York, 2023

  11. [11]

    MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio,

    J. Herre, J. Hilpert, A. Kuntz, and J. Plogsties, “MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio,”IEEE J Sel. Top. in Sig. Proc., vol. 9, Aug. 2015

  12. [12]

    RTP payload format and SDP parameter definitions (Release 18),” Nov

    “3GPP TS 26.253 - Technical Specification Group Ser- vices and System Aspects; Codec for Immersive V oice and Audio Services; Detailed Algorithmic Description incl. RTP payload format and SDP parameter definitions (Release 18),” Nov. 2023. 3GPP TS 26.253

  13. [13]

    Ambisonics Coding in IV AS: A Hybrid SPAR and DirAC System,

    D. Weckbecker, S. Brown, J. Torres, M. Multrus, A. Tama- rapu, and G. Fuchs, “Ambisonics Coding in IV AS: A Hybrid SPAR and DirAC System,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), (Hyderabad, India), Apr. 2025

  14. [14]

    Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec,

    D. McGrath, S. Bruhn, H. Purnhagen, M. Eckert, J. Torres, S. Brown, and D. Darcy, “Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), May 2019

  15. [15]

    Reproducing applause-type signals with directional audio coding,

    M.-V . Laitinen, F. Kuech, S. Disch, and V . Pulkki, “Reproducing applause-type signals with directional audio coding,”J. Audio Eng. Soc., vol. 59, 2011

  16. [16]

    3GPP TR 26.997 - IV AS codec performance characteri- zation,

    “3GPP TR 26.997 - IV AS codec performance characteri- zation,” tech. rep., July 2024

  17. [17]

    Spatial redundancy in Higher Order Ambisonics and its use for lowdelay lossless compression,

    E. Hellerud, A. Solvang, and U. P. Svensson, “Spatial redundancy in Higher Order Ambisonics and its use for lowdelay lossless compression,” inIEEE Int. Conf. Acous., Speech Sig. Proc., (Taipei, Taiwan), Apr. 2009

  18. [18]

    Perceptually-motivated Spatial Audio Codec for Higher- Order Ambisonics Compression,

    C. Hold, L. McCormack, A. Politis, and V . Pulkki, “Perceptually-motivated Spatial Audio Codec for Higher- Order Ambisonics Compression,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), Jan. 2024

  19. [19]

    Clarity Challenge - Task 3,

    “Clarity Challenge - Task 3,” 2024. https:// claritychallenge.org/docs/cec3/task 3/cec3 task3 data

  20. [20]

    ICASSP 2022 Deep Noise Suppression Challenge,

    H. Dubey, V . Gopal, R. Cutler, A. Aazami, S. Matusevych, S. Braun, S. E. Eskimez, M. Thakker, T. Yoshioka, H. Gamper, and R. Aichner, “ICASSP 2022 Deep Noise Suppression Challenge,” Feb. 2022. arXiv:2202.13288 [eess]

  21. [21]

    ITU-R Rec. BS.1534-3: Method for the subjective assessment of intermediate quality level of audio systems,

    “ITU-R Rec. BS.1534-3: Method for the subjective assessment of intermediate quality level of audio systems,” 2015

  22. [22]

    EBU R-128 Loudness Normalisation and Permitted Maximum Level of Audio Signals,

    “EBU R-128 Loudness Normalisation and Permitted Maximum Level of Audio Signals,” 2023

  23. [23]

    Codec for immersive voice and audio services (IV AS); c code (floating-point),

    “Codec for immersive voice and audio services (IV AS); c code (floating-point),” 2024. https://www.3gpp.org/ftp/ Specs/archive/26 series/26.258/26258-i20.zip

  24. [24]

    Overview of the EVS codec architecture,

    M. Dietz, M. Multrus, V . Eksler, V . Malenovsky, E. Norvell, H. Pobloth, L. Miao, Z. Wang, L. Laaksonen, A. Vasilache, Y . Kamamoto, K. Kikuiri, S. Ragot, J. Faure, H. Ehara, V . Rajendran, V . Atti, H. Sung, E. Oh, H. Yuan, and C. Zhu, “Overview of the EVS codec architecture,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), (South Brisbane, QLD, A...

  25. [25]

    Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes,

    T. Rudzki, I. Gomez-Lanzaco, J. Stubbs, J. Skoglund, D. T. Murphy, and G. Kearney, “Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes,”Applied Sciences, vol. 9, June 2019

  26. [26]

    Ambisonics Binaural Rendering via Masked Magnitude Least Squares,

    O. Berebi, F. Brinkmann, S. Weinzierl, and B. Rafaely, “Ambisonics Binaural Rendering via Masked Magnitude Least Squares,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), Apr. 2025

  27. [27]

    IEM Plug-in Suite,

    IEM, “IEM Plug-in Suite,” Nov. 2021. https://plugins. iem.at v1.13

  28. [28]

    All-Round Ambisonic Panning and Decoding,

    F. Zotter and M. Frank, “All-Round Ambisonic Panning and Decoding,”J. Audio Eng. Soc., vol. 60, no. 10, 2012

  29. [29]

    ITU-R Rec. BS.1116-3 – Methods for the subjective assessment of small impairments in audio systems,

    “ITU-R Rec. BS.1116-3 – Methods for the subjective assessment of small impairments in audio systems,” 2015