Perceptual Evaluation of Higher-Order Ambisonic Codecs on Both Synthetic Mixing and Native Recordings

Adrien Llave; Gr\'egory Pallone; J\'er\^ome Daniel

arxiv: 2606.24661 · v1 · pith:ICYJUKE4new · submitted 2026-06-23 · 📡 eess.AS

Perceptual Evaluation of Higher-Order Ambisonic Codecs on Both Synthetic Mixing and Native Recordings

Adrien Llave , Gr\'egory Pallone , J\'er\^ome Daniel This is my paper

Pith reviewed 2026-06-25 22:23 UTC · model grok-4.3

classification 📡 eess.AS

keywords higher-order ambisonicsIVAS codecperceptual evaluationaudio compressionspatial audiointer-channel correlationVR/AR applications

0 comments

The pith

IVAS codec for higher-order ambisonics outperforms multi-mono encoding at the same bitrate by exploiting inter-channel correlations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares the standardized IVAS codec against a basic multi-mono approach for compressing higher-order ambisonic audio across synthetic mixes and native recordings. Listening tests show IVAS delivers better perceptual quality at matched bitrates because it reduces data by using correlations between channels. This advantage appears strongest for signals made from a limited number of plane waves. The evaluation covers various contents and spatialization methods relevant to virtual and augmented reality. The work aims to guide codec choice for efficient storage and transmission of spatial audio.

Core claim

The IVAS codec achieves superior perceptual quality to multi-mono HOA coding at the same bitrate by exploiting inter-channel correlation, with the performance gap largest on signals composed of few plane waves.

What carries the argument

IVAS codec's use of inter-channel correlation to reduce bitrate while preserving perceptual quality in higher-order ambisonics.

If this is right

IVAS supports lower bitrates for equivalent quality in correlated HOA signals.
Multi-mono encoding wastes bitrate on highly correlated content such as few-plane-wave scenes.
IVAS is especially suitable for communication use cases involving limited numbers of sound sources.
Perceptual tests across synthetic and native material confirm the correlation benefit holds for multiple spatialization methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-time VR and AR systems could adopt IVAS to lower transmission costs without quality loss on typical scene content.
Codec selection for spatial audio may need to account for expected inter-channel correlation rather than treating all HOA signals uniformly.
Extending the comparison to higher orders or dynamic scenes would test whether the correlation advantage scales.

Load-bearing premise

The chosen contents, spatialization methods, and listening test conditions represent real-world HOA use in VR and AR applications.

What would settle it

A new listening test on a different set of contents or higher ambisonic orders in which IVAS shows no quality advantage over multi-mono at the same bitrate.

Figures

Figures reproduced from arXiv: 2606.24661 by Adrien Llave, Gr\'egory Pallone, J\'er\^ome Daniel.

**Figure 1.** Figure 1: Listening room picture. signal from its input for the i th parametrized HOA component. This function allows resynthesizing from wˆ ′ the part of the signal which could not be predicted from the TCs. Finally, xˆ is converted back to time domain using an inverse filter bank. III. EXPERIMENT 1 The aim of this experiment is to test the performance of the IVAS and EVSx16 codecs in terms of global quality on a w… view at source ↗

**Figure 2.** Figure 2: MUSHRA scores averaged across listeners for each listening condition and item. The items spatialized with an ideal plane wave encoding are set in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: MUSHRA scores averaged across listeners for each listening condition and item. The vertical segments show the mean confidence interval at 95 %. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Difference of the listeners-averaged MUSHRA score between IVAS [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: For each codec, the listener-averaged MUSHRA scores differences [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Spatial audio is spreading in applications such as virtual and augmented reality and immersive games. The higher-order ambisonic (HOA) format is particularly useful in this context. Transmitting spatial information requires multiple channels, e.g., 16 channels for 3rd-order ambisonics, resulting in increased memory requirements for storage and higher bitrates for communication. Therefore, efficient compression algorithms are necessary for those contents. The recently standardized IVAS codec allows the coding of HOA content for communication use-cases. Here, we propose to evaluate it in comparison with a basic multi-mono approach across a variety of contents and spatialization methods. Results show that IVAS outperforms the multi-mono approach at the same bitrate. In particular, this codec exploits inter-channel correlation to reduce the bitrate. We point out that it is therefore especially robust for signals with a high interchannel correlation, such as those composed of a limited number of plane waves. Conversely, the multi-mono approach is unable to exploit this correlation and performs poorly on this type of signal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper runs a listening test showing IVAS beats multi-mono on HOA at matched bitrates, especially for correlated signals, but the abstract leaves the test setup too thin to judge reliability.

read the letter

The main point for you is that IVAS outperforms a straightforward multi-mono baseline in perceptual quality for third-order ambisonics at the same bitrate. The advantage shows up more clearly on content with high inter-channel correlation, such as signals built from a few plane waves. They tested both synthetic mixes and native recordings, which gives the comparison a bit more reach than earlier codec checks.

What stands out as new is the specific outcome on native HOA recordings alongside the synthetic ones. The note that IVAS exploits correlation while multi-mono cannot is a straightforward observation that follows from the codec designs. That part lines up with how these tools are supposed to work.

The paper does a clean job of framing why efficient HOA coding matters for VR and immersive applications. The results are presented as practical guidance rather than a new theory, which keeps the scope realistic.

The soft spot is the lack of concrete numbers in the abstract: no listener count, no mention of statistical tests, and no detail on content selection or error analysis. Without those, the link from data to the claim that IVAS is reliably better stays hard to verify. If the full paper supplies standard MUSHRA-style methods and proper stats, that gap closes; if not, the evidence stays weak. The stress-test note is right that nothing internally contradicts, but the empirical claim still needs the methods to hold up.

This is aimed at audio engineers choosing codecs for spatial transmission. Someone already working with IVAS or HOA pipelines would find the comparison useful as one data point. It is not foundational, but it is a legitimate extension of prior evaluations.

I would bring it to a reading group as a maybe, mainly to discuss the test design. I would not cite it in my own work unless the methods turn out to be solid. It deserves peer review because the question is practical and the approach is direct, even if revisions would likely focus on adding the missing experimental details.

Referee Report

1 major / 0 minor

Summary. The manuscript presents a perceptual listening test comparing the standardized IVAS codec against a basic multi-mono HOA coding approach at matched bitrates. Using both synthetic mixtures and native recordings, it reports that IVAS yields higher perceptual quality by exploiting inter-channel correlation, with the advantage being largest for content composed of a small number of plane waves.

Significance. If the listening-test outcomes prove robust, the work supplies concrete evidence that correlation-aware HOA codecs can deliver measurable perceptual gains over independent-channel coding at the same bitrate. This is directly relevant to bitrate-constrained spatial-audio delivery in VR/AR and immersive communication.

major comments (1)

[Abstract] Abstract: the performance result is stated without details on listener count, statistical tests, content selection criteria, or error analysis, so the data-to-claim link cannot be verified.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment and for recognizing the relevance of our work to bitrate-constrained spatial audio delivery. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the performance result is stated without details on listener count, statistical tests, content selection criteria, or error analysis, so the data-to-claim link cannot be verified.

Authors: We agree that the abstract would be strengthened by including key methodological details. The full manuscript reports a MUSHRA listening test with 12 expert listeners, statistical analysis via repeated-measures ANOVA with post-hoc pairwise comparisons (Bonferroni-corrected), content selected to span synthetic mixtures (controlled plane-wave counts from 1 to 8) and native HOA recordings, and results presented with 95% confidence intervals. We will revise the abstract to concisely incorporate listener count, mention of statistical testing, and a note on content diversity while preserving length constraints. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical perceptual comparison

full rationale

The manuscript reports results from listening tests that directly compare perceptual quality of IVAS versus multi-mono HOA coding at matched bitrates across selected contents. No derivation, predictive model, fitted parameters, or mathematical claim is advanced whose output is asserted to follow from the inputs by construction. No equations, ansatzes, or uniqueness theorems appear; the central observation that IVAS exploits inter-channel correlation is presented as an empirical finding from the test data rather than a self-referential reduction. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical perceptual evaluation study; no mathematical model, free parameters, axioms, or invented entities are involved.

pith-pipeline@v0.9.1-grok · 5724 in / 918 out tokens · 28769 ms · 2026-06-25T22:23:49.822569+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 1 canonical work pages

[1]

Direct Comparison of the Impact of Head Tracking, Reverberation, and Individualized Head-Related Transfer Functions on the Spatial Perception of a Virtual Speech Source,

D. R. Begault, “Direct Comparison of the Impact of Head Tracking, Reverberation, and Individualized Head-Related Transfer Functions on the Spatial Perception of a Virtual Speech Source,”J. Audio Eng. Soc., vol. 49, no. 10, 2001

2001
[2]

Minimum BRIR grid resolution for dynamic binaural synthesis,

A. Lindau, H.-J. Maempel, and S. Weinzierl, “Minimum BRIR grid resolution for dynamic binaural synthesis,”J. Acous. Soc. America, vol. 123, May 2008

2008
[3]

Daniel,Repr ´esentation de champs acoustiques, appli- cation `a la transmission et `a la reproduction de sc `enes sonores complexes dans un contexte multim ´edia

J. Daniel,Repr ´esentation de champs acoustiques, appli- cation `a la transmission et `a la reproduction de sc `enes sonores complexes dans un contexte multim ´edia. PhD thesis, Univ. Paris 6, July 2001

2001
[4]

A 3D ambisonic based binaural sound reproduction system,

M. Noisternig, A. Sontacchi, T. Musil, and R. H¨oldrich, “A 3D ambisonic based binaural sound reproduction system,” in24th Int. Conf.: Multichannel Audio, The New Reality, Audio Eng. Soc., June 2003

2003
[5]

Binaural Rendering with Measured Room Responses: First-Order Ambisonic Microphone vs. Dummy Head,

M. Zaunschirm, M. Frank, and F. Zotter, “Binaural Rendering with Measured Room Responses: First-Order Ambisonic Microphone vs. Dummy Head,”Applied Sciences, vol. 10, Feb. 2020

2020
[6]

Binau- ral Rendering of Ambisonic Signals via Magnitude Least Squares,

C. Schorkhuber, M. Zaunschirm, and R. Holdrich, “Binau- ral Rendering of Ambisonic Signals via Magnitude Least Squares,” inProceedings of the DAGA, vol. 44, 2018

2018
[7]

Ambisonics Sound Source Localization With Varying Amount of Visual Information in Virtual Reality,

T. Huisman, A. Ahrens, and E. MacDonald, “Ambisonics Sound Source Localization With Varying Amount of Visual Information in Virtual Reality,”Frontiers in Virtual Reality, vol. 2, Oct. 2021

2021
[8]

Ambisonics in an Ogg Opus Container,

J. Skoglund and M. Graczyk, “Ambisonics in an Ogg Opus Container,” Tech. Rep. RFC 8486, Internet Engineering Task Force (IETF), Oct. 2018. https://www.rfc-editor.org/ rfc/rfc8486.txt

2018
[9]

Streaming VR for immersion: Quality aspects of compressed spatial audio,

M. Narbutt, S. O’Leary, A. Allen, J. Skoglund, and A. Hines, “Streaming VR for immersion: Quality aspects of compressed spatial audio,” in23rd Int. Conf. Virt. Sys. & Multimedia (VSMM), (Dublin), IEEE, Oct. 2017

2017
[10]

Rudzki,Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics

T. Rudzki,Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics. PhD thesis, Univ. of York, 2023

2023
[11]

MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio,

J. Herre, J. Hilpert, A. Kuntz, and J. Plogsties, “MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio,”IEEE J Sel. Top. in Sig. Proc., vol. 9, Aug. 2015

2015
[12]

RTP payload format and SDP parameter definitions (Release 18),” Nov

“3GPP TS 26.253 - Technical Specification Group Ser- vices and System Aspects; Codec for Immersive V oice and Audio Services; Detailed Algorithmic Description incl. RTP payload format and SDP parameter definitions (Release 18),” Nov. 2023. 3GPP TS 26.253

2023
[13]

Ambisonics Coding in IV AS: A Hybrid SPAR and DirAC System,

D. Weckbecker, S. Brown, J. Torres, M. Multrus, A. Tama- rapu, and G. Fuchs, “Ambisonics Coding in IV AS: A Hybrid SPAR and DirAC System,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), (Hyderabad, India), Apr. 2025

2025
[14]

Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec,

D. McGrath, S. Bruhn, H. Purnhagen, M. Eckert, J. Torres, S. Brown, and D. Darcy, “Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), May 2019

2019
[15]

Reproducing applause-type signals with directional audio coding,

M.-V . Laitinen, F. Kuech, S. Disch, and V . Pulkki, “Reproducing applause-type signals with directional audio coding,”J. Audio Eng. Soc., vol. 59, 2011

2011
[16]

3GPP TR 26.997 - IV AS codec performance characteri- zation,

“3GPP TR 26.997 - IV AS codec performance characteri- zation,” tech. rep., July 2024

2024
[17]

Spatial redundancy in Higher Order Ambisonics and its use for lowdelay lossless compression,

E. Hellerud, A. Solvang, and U. P. Svensson, “Spatial redundancy in Higher Order Ambisonics and its use for lowdelay lossless compression,” inIEEE Int. Conf. Acous., Speech Sig. Proc., (Taipei, Taiwan), Apr. 2009

2009
[18]

Perceptually-motivated Spatial Audio Codec for Higher- Order Ambisonics Compression,

C. Hold, L. McCormack, A. Politis, and V . Pulkki, “Perceptually-motivated Spatial Audio Codec for Higher- Order Ambisonics Compression,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), Jan. 2024

2024
[19]

Clarity Challenge - Task 3,

“Clarity Challenge - Task 3,” 2024. https:// claritychallenge.org/docs/cec3/task 3/cec3 task3 data

2024
[20]

ICASSP 2022 Deep Noise Suppression Challenge,

H. Dubey, V . Gopal, R. Cutler, A. Aazami, S. Matusevych, S. Braun, S. E. Eskimez, M. Thakker, T. Yoshioka, H. Gamper, and R. Aichner, “ICASSP 2022 Deep Noise Suppression Challenge,” Feb. 2022. arXiv:2202.13288 [eess]

work page arXiv 2022
[21]

ITU-R Rec. BS.1534-3: Method for the subjective assessment of intermediate quality level of audio systems,

“ITU-R Rec. BS.1534-3: Method for the subjective assessment of intermediate quality level of audio systems,” 2015

2015
[22]

EBU R-128 Loudness Normalisation and Permitted Maximum Level of Audio Signals,

“EBU R-128 Loudness Normalisation and Permitted Maximum Level of Audio Signals,” 2023

2023
[23]

Codec for immersive voice and audio services (IV AS); c code (floating-point),

“Codec for immersive voice and audio services (IV AS); c code (floating-point),” 2024. https://www.3gpp.org/ftp/ Specs/archive/26 series/26.258/26258-i20.zip

2024
[24]

Overview of the EVS codec architecture,

M. Dietz, M. Multrus, V . Eksler, V . Malenovsky, E. Norvell, H. Pobloth, L. Miao, Z. Wang, L. Laaksonen, A. Vasilache, Y . Kamamoto, K. Kikuiri, S. Ragot, J. Faure, H. Ehara, V . Rajendran, V . Atti, H. Sung, E. Oh, H. Yuan, and C. Zhu, “Overview of the EVS codec architecture,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), (South Brisbane, QLD, A...

2015
[25]

Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes,

T. Rudzki, I. Gomez-Lanzaco, J. Stubbs, J. Skoglund, D. T. Murphy, and G. Kearney, “Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes,”Applied Sciences, vol. 9, June 2019

2019
[26]

Ambisonics Binaural Rendering via Masked Magnitude Least Squares,

O. Berebi, F. Brinkmann, S. Weinzierl, and B. Rafaely, “Ambisonics Binaural Rendering via Masked Magnitude Least Squares,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), Apr. 2025

2025
[27]

IEM Plug-in Suite,

IEM, “IEM Plug-in Suite,” Nov. 2021. https://plugins. iem.at v1.13

2021
[28]

All-Round Ambisonic Panning and Decoding,

F. Zotter and M. Frank, “All-Round Ambisonic Panning and Decoding,”J. Audio Eng. Soc., vol. 60, no. 10, 2012

2012
[29]

ITU-R Rec. BS.1116-3 – Methods for the subjective assessment of small impairments in audio systems,

“ITU-R Rec. BS.1116-3 – Methods for the subjective assessment of small impairments in audio systems,” 2015

2015

[1] [1]

Direct Comparison of the Impact of Head Tracking, Reverberation, and Individualized Head-Related Transfer Functions on the Spatial Perception of a Virtual Speech Source,

D. R. Begault, “Direct Comparison of the Impact of Head Tracking, Reverberation, and Individualized Head-Related Transfer Functions on the Spatial Perception of a Virtual Speech Source,”J. Audio Eng. Soc., vol. 49, no. 10, 2001

2001

[2] [2]

Minimum BRIR grid resolution for dynamic binaural synthesis,

A. Lindau, H.-J. Maempel, and S. Weinzierl, “Minimum BRIR grid resolution for dynamic binaural synthesis,”J. Acous. Soc. America, vol. 123, May 2008

2008

[3] [3]

Daniel,Repr ´esentation de champs acoustiques, appli- cation `a la transmission et `a la reproduction de sc `enes sonores complexes dans un contexte multim ´edia

J. Daniel,Repr ´esentation de champs acoustiques, appli- cation `a la transmission et `a la reproduction de sc `enes sonores complexes dans un contexte multim ´edia. PhD thesis, Univ. Paris 6, July 2001

2001

[4] [4]

A 3D ambisonic based binaural sound reproduction system,

M. Noisternig, A. Sontacchi, T. Musil, and R. H¨oldrich, “A 3D ambisonic based binaural sound reproduction system,” in24th Int. Conf.: Multichannel Audio, The New Reality, Audio Eng. Soc., June 2003

2003

[5] [5]

Binaural Rendering with Measured Room Responses: First-Order Ambisonic Microphone vs. Dummy Head,

M. Zaunschirm, M. Frank, and F. Zotter, “Binaural Rendering with Measured Room Responses: First-Order Ambisonic Microphone vs. Dummy Head,”Applied Sciences, vol. 10, Feb. 2020

2020

[6] [6]

Binau- ral Rendering of Ambisonic Signals via Magnitude Least Squares,

C. Schorkhuber, M. Zaunschirm, and R. Holdrich, “Binau- ral Rendering of Ambisonic Signals via Magnitude Least Squares,” inProceedings of the DAGA, vol. 44, 2018

2018

[7] [7]

Ambisonics Sound Source Localization With Varying Amount of Visual Information in Virtual Reality,

T. Huisman, A. Ahrens, and E. MacDonald, “Ambisonics Sound Source Localization With Varying Amount of Visual Information in Virtual Reality,”Frontiers in Virtual Reality, vol. 2, Oct. 2021

2021

[8] [8]

Ambisonics in an Ogg Opus Container,

J. Skoglund and M. Graczyk, “Ambisonics in an Ogg Opus Container,” Tech. Rep. RFC 8486, Internet Engineering Task Force (IETF), Oct. 2018. https://www.rfc-editor.org/ rfc/rfc8486.txt

2018

[9] [9]

Streaming VR for immersion: Quality aspects of compressed spatial audio,

M. Narbutt, S. O’Leary, A. Allen, J. Skoglund, and A. Hines, “Streaming VR for immersion: Quality aspects of compressed spatial audio,” in23rd Int. Conf. Virt. Sys. & Multimedia (VSMM), (Dublin), IEEE, Oct. 2017

2017

[10] [10]

Rudzki,Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics

T. Rudzki,Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics. PhD thesis, Univ. of York, 2023

2023

[11] [11]

MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio,

J. Herre, J. Hilpert, A. Kuntz, and J. Plogsties, “MPEG-H 3D Audio—The New Standard for Coding of Immersive Spatial Audio,”IEEE J Sel. Top. in Sig. Proc., vol. 9, Aug. 2015

2015

[12] [12]

RTP payload format and SDP parameter definitions (Release 18),” Nov

“3GPP TS 26.253 - Technical Specification Group Ser- vices and System Aspects; Codec for Immersive V oice and Audio Services; Detailed Algorithmic Description incl. RTP payload format and SDP parameter definitions (Release 18),” Nov. 2023. 3GPP TS 26.253

2023

[13] [13]

Ambisonics Coding in IV AS: A Hybrid SPAR and DirAC System,

D. Weckbecker, S. Brown, J. Torres, M. Multrus, A. Tama- rapu, and G. Fuchs, “Ambisonics Coding in IV AS: A Hybrid SPAR and DirAC System,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), (Hyderabad, India), Apr. 2025

2025

[14] [14]

Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec,

D. McGrath, S. Bruhn, H. Purnhagen, M. Eckert, J. Torres, S. Brown, and D. Darcy, “Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), May 2019

2019

[15] [15]

Reproducing applause-type signals with directional audio coding,

M.-V . Laitinen, F. Kuech, S. Disch, and V . Pulkki, “Reproducing applause-type signals with directional audio coding,”J. Audio Eng. Soc., vol. 59, 2011

2011

[16] [16]

3GPP TR 26.997 - IV AS codec performance characteri- zation,

“3GPP TR 26.997 - IV AS codec performance characteri- zation,” tech. rep., July 2024

2024

[17] [17]

Spatial redundancy in Higher Order Ambisonics and its use for lowdelay lossless compression,

E. Hellerud, A. Solvang, and U. P. Svensson, “Spatial redundancy in Higher Order Ambisonics and its use for lowdelay lossless compression,” inIEEE Int. Conf. Acous., Speech Sig. Proc., (Taipei, Taiwan), Apr. 2009

2009

[18] [18]

Perceptually-motivated Spatial Audio Codec for Higher- Order Ambisonics Compression,

C. Hold, L. McCormack, A. Politis, and V . Pulkki, “Perceptually-motivated Spatial Audio Codec for Higher- Order Ambisonics Compression,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), Jan. 2024

2024

[19] [19]

Clarity Challenge - Task 3,

“Clarity Challenge - Task 3,” 2024. https:// claritychallenge.org/docs/cec3/task 3/cec3 task3 data

2024

[20] [20]

ICASSP 2022 Deep Noise Suppression Challenge,

H. Dubey, V . Gopal, R. Cutler, A. Aazami, S. Matusevych, S. Braun, S. E. Eskimez, M. Thakker, T. Yoshioka, H. Gamper, and R. Aichner, “ICASSP 2022 Deep Noise Suppression Challenge,” Feb. 2022. arXiv:2202.13288 [eess]

work page arXiv 2022

[21] [21]

ITU-R Rec. BS.1534-3: Method for the subjective assessment of intermediate quality level of audio systems,

“ITU-R Rec. BS.1534-3: Method for the subjective assessment of intermediate quality level of audio systems,” 2015

2015

[22] [22]

EBU R-128 Loudness Normalisation and Permitted Maximum Level of Audio Signals,

“EBU R-128 Loudness Normalisation and Permitted Maximum Level of Audio Signals,” 2023

2023

[23] [23]

Codec for immersive voice and audio services (IV AS); c code (floating-point),

“Codec for immersive voice and audio services (IV AS); c code (floating-point),” 2024. https://www.3gpp.org/ftp/ Specs/archive/26 series/26.258/26258-i20.zip

2024

[24] [24]

Overview of the EVS codec architecture,

M. Dietz, M. Multrus, V . Eksler, V . Malenovsky, E. Norvell, H. Pobloth, L. Miao, Z. Wang, L. Laaksonen, A. Vasilache, Y . Kamamoto, K. Kikuiri, S. Ragot, J. Faure, H. Ehara, V . Rajendran, V . Atti, H. Sung, E. Oh, H. Yuan, and C. Zhu, “Overview of the EVS codec architecture,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), (South Brisbane, QLD, A...

2015

[25] [25]

Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes,

T. Rudzki, I. Gomez-Lanzaco, J. Stubbs, J. Skoglund, D. T. Murphy, and G. Kearney, “Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes,”Applied Sciences, vol. 9, June 2019

2019

[26] [26]

Ambisonics Binaural Rendering via Masked Magnitude Least Squares,

O. Berebi, F. Brinkmann, S. Weinzierl, and B. Rafaely, “Ambisonics Binaural Rendering via Masked Magnitude Least Squares,” inIEEE Int. Conf. Acous., Speech Sig. Proc. (ICASSP), Apr. 2025

2025

[27] [27]

IEM Plug-in Suite,

IEM, “IEM Plug-in Suite,” Nov. 2021. https://plugins. iem.at v1.13

2021

[28] [28]

All-Round Ambisonic Panning and Decoding,

F. Zotter and M. Frank, “All-Round Ambisonic Panning and Decoding,”J. Audio Eng. Soc., vol. 60, no. 10, 2012

2012

[29] [29]

ITU-R Rec. BS.1116-3 – Methods for the subjective assessment of small impairments in audio systems,

“ITU-R Rec. BS.1116-3 – Methods for the subjective assessment of small impairments in audio systems,” 2015

2015