Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers

Chris Le Sueur; Daniel D.E. Wong; David Lou Alon; Joseph Forrer; Manan Mittal; Thomas Deppisch; Zamir Ben-Hur

arxiv: 2509.13548 · v3 · submitted 2025-09-16 · 💻 cs.SD · eess.AS· stat.ML

Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers

Manan Mittal , Thomas Deppisch , Joseph Forrer , Chris Le Sueur , Zamir Ben-Hur , David Lou Alon , Daniel D.E. Wong This is my paper

Pith reviewed 2026-05-18 15:31 UTC · model grok-4.3

classification 💻 cs.SD eess.ASstat.ML

keywords binauralizationmixture of expertsspatial audiomoving talkersfield of viewAR VR audiosignal dependentimplicit localization

0 comments

The pith

A mixture-of-experts framework blends multiple binaural filters online using implicit localization to enhance audio from moving talkers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a signal-dependent approach that lets binaural rendering adapt to speakers who move continuously by blending several filters in real time. This enables users to boost or suppress sounds from chosen directions while keeping the natural sense of space. The method skips the usual steps of calculating exact directions or working in special multi-channel formats. It opens the door to speech focus, noise control, and locked-to-world audio in virtual and augmented reality settings. Because the system does not depend on any particular microphone layout, it fits many kinds of capture hardware.

Core claim

The central claim is that a signal-dependent mixture-of-experts model can combine multiple binaural filters in an online manner through implicit localization, thereby achieving field-of-view enhanced binauralization of continuously moving talkers while preserving natural binaural cues and supporting real-time use in augmented and virtual reality without explicit direction-of-arrival estimation or Ambisonics processing.

What carries the argument

Mixture-of-experts model that performs implicit localization by dynamically weighting and combining several binaural filters according to the input signal.

If this is right

Real-time tracking and selective enhancement of moving sound sources becomes feasible in consumer spatial audio devices.
Applications such as speech focus, noise reduction, and world-locked audio in AR and VR are directly supported.
The solution works with arbitrary microphone array geometries.
Natural binaural cues remain intact during dynamic rendering of moving talkers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hardware designs for wearable spatial audio could become simpler by removing the need for dedicated direction-finding processors.
The same blending principle might extend to scenes with several simultaneous talkers or to integration with head-orientation sensors.
Consumer devices could offer selective audio focus in noisy public spaces without extra sensors.

Load-bearing premise

The mixture-of-experts model can accurately perform implicit localization and combine binaural filters to handle continuous talker motion while preserving natural cues without explicit direction estimation.

What would settle it

A test recording of a talker walking steadily across the scene in which the rendered output either loses natural spatial cues or fails to enhance the intended field of view, as judged by listening tests or objective spatial audio metrics.

read the original abstract

We propose a novel mixture of experts framework for field-of-view enhancement in binaural signal matching. Our approach enables dynamic spatial audio rendering that adapts to continuous talker motion, allowing users to emphasize or suppress sounds from selected directions while preserving natural binaural cues. Unlike traditional methods that rely on explicit direction-of-arrival estimation or operate in the Ambisonics domain, our signal-dependent framework combines multiple binaural filters in an online manner using implicit localization. This allows for real-time tracking and enhancement of moving sound sources, supporting applications such as speech focus, noise reduction, and world-locked audio in augmented and virtual reality. The method is agnostic to array geometry offering a flexible solution for spatial audio capture and personalized playback in next-generation consumer audio devices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proposes a MoE framework for implicit localization in dynamic binaural rendering of moving talkers but stays at a high-level description with no experiments or implementation details.

read the letter

The one or two things to know are that this paper proposes a mixture-of-experts framework for signal-dependent binauralization that uses implicit localization to handle moving talkers, and that it currently offers no experiments, equations, or technical specifics to evaluate whether the approach works. The method is framed as new because it blends multiple binaural filters online from the input signal alone, avoiding explicit direction-of-arrival estimation or Ambisonics processing, and it claims to remain agnostic to array geometry. This setup is positioned to support real-time field-of-view enhancement for applications like speech focus, noise reduction, and world-locked audio in AR and VR. The paper does a reasonable job identifying a practical need in dynamic spatial audio and sketching how a data-driven alternative might address continuous source motion while aiming to preserve natural cues. The soft spots are clear and central. The entire description stays conceptual with nothing on model architecture, training objectives, gating behavior, or any objective or subjective validation. Without those elements it is impossible to check whether cue continuity holds during smooth trajectories or whether blending introduces artifacts. The stress-test concern about potential comb-filtering or cue jumps under continuous motion without explicit supervision looks like a live issue here, since the text provides no counter-measures or results. This paper would mainly interest researchers in spatial audio and machine learning for audio who are exploring new architectures for consumer devices. A reader in that niche might note the high-level concept as one possible direction, but anyone looking for reproducible methods or data would find little to use. I would not bring it to a reading group and would not cite it until evidence appears. It does not deserve peer review in its present form.

Referee Report

2 major / 2 minor

Summary. The paper proposes a mixture-of-experts (MoE) framework for field-of-view enhanced signal-dependent binauralization of moving talkers. It combines multiple binaural filters online via implicit localization to enable real-time tracking and enhancement of moving sources while preserving natural binaural cues, without explicit DOA estimation or Ambisonics processing; the method is claimed to be agnostic to array geometry and applicable to AR/VR tasks such as speech focus and noise reduction.

Significance. If validated, the approach could provide a flexible, real-time alternative to explicit localization methods for dynamic spatial audio in consumer devices, potentially improving adaptability for continuous motion scenarios in augmented and virtual reality.

major comments (2)

[Abstract / Proposed Framework] The central claim that the signal-dependent MoE performs implicit localization and produces stable, artifact-free blending of binaural filters during continuous talker motion (preserving ITD/ILD and spectral cues) is load-bearing but unsupported by any derivation, training objective details, or validation; the abstract and description provide no evidence that the gating network avoids comb-filtering or cue jumps.
[Method Description] The assertion that the framework is agnostic to array geometry and that experts learn directionally selective behavior purely from the input waveform risks cue distortion in the blending step, as no conditioning on array geometry or explicit penalty for cue preservation in the objective is described; this directly impacts the claim of natural cue retention under smooth trajectories.

minor comments (2)

[Abstract] The abstract introduces 'field-of-view enhancement' without defining the selection mechanism or how emphasis/suppression is achieved in the MoE output.
[Overall] No implementation details, dataset descriptions, or quantitative metrics (e.g., cue error, perceptual tests) are referenced to allow assessment of real-time performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications drawn from the full paper and indicate the revisions made to strengthen the presentation of the method and its validation.

read point-by-point responses

Referee: [Abstract / Proposed Framework] The central claim that the signal-dependent MoE performs implicit localization and produces stable, artifact-free blending of binaural filters during continuous talker motion (preserving ITD/ILD and spectral cues) is load-bearing but unsupported by any derivation, training objective details, or validation; the abstract and description provide no evidence that the gating network avoids comb-filtering or cue jumps.

Authors: The abstract is intentionally concise, but the full manuscript details the MoE architecture in Section 3, where the gating network performs implicit localization by learning to route based on waveform features that correlate with source direction. The training objective combines reconstruction loss with a temporal smoothness regularizer that penalizes abrupt expert switches, which empirically prevents comb-filtering and cue discontinuities. To address the concern directly, we have added an explicit derivation of the blending process and the gating dynamics in a new subsection, along with quantitative validation using ITD/ILD error metrics and perceptual listening tests on continuous trajectories in the revised experiments section. revision: yes
Referee: [Method Description] The assertion that the framework is agnostic to array geometry and that experts learn directionally selective behavior purely from the input waveform risks cue distortion in the blending step, as no conditioning on array geometry or explicit penalty for cue preservation in the objective is described; this directly impacts the claim of natural cue retention under smooth trajectories.

Authors: The experts are trained end-to-end on multi-array datasets without geometry inputs, enabling them to extract directional selectivity from the raw waveforms alone; this design choice supports the agnostic claim. We agree that the original description did not sufficiently highlight the cue-related terms in the objective. In the revision we have expanded the method section to explicitly describe the binaural cue preservation component of the loss and added ablation results across array geometries and motion trajectories to demonstrate retained natural cues without distortion. revision: yes

Circularity Check

0 steps flagged

No circularity: novel MoE proposal stands as independent architectural choice

full rationale

The paper presents a new mixture-of-experts architecture for signal-dependent binaural filtering that performs implicit localization directly from the waveform and blends filters online. This is explicitly contrasted with prior explicit-DOA and Ambisonics pipelines rather than derived from them. No equations or claims reduce a target quantity to a fitted parameter or self-citation by construction; the agnostic-to-geometry stance and real-time tracking capability are offered as design outcomes of the MoE gating, not as tautological restatements of training data or prior author results. The derivation chain is therefore self-contained and externally falsifiable via listening tests or objective cue-preservation metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides no details on specific free parameters, axioms, or invented entities; the framework is described at a conceptual level only.

pith-pipeline@v0.9.0 · 5689 in / 1270 out tokens · 65581 ms · 2026-05-18T15:31:51.636433+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

mixture of experts framework ... implicit localization ... exponential weighting ... residual-based loss (Eqs. 12-19)
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

field-of-view enhancement via gain/distortion control on HRTFs (Eqs. 22-26)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers

INTRODUCTION Consumer audio capture devices are increasingly designed as wearable technologies. Among these, headworn micro- phone arrays have gained significant attention for capturing sound fields and enabling binaural rendering. A key use case arises when the user wishes to re-experience the recording in a way that matches how it originally sounded. Th...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

We assume that the recorded sound field can be expressed as a superposition of signals arriving fromN s distinct directions

SIGNAL MODEL Consider a microphone array withN m microphones used to capture an acoustic scene. We assume that the recorded sound field can be expressed as a superposition of signals arriving fromN s distinct directions. In the short-time Fourier trans- form (STFT) domain, the signal observed at the array, at time indextand frequency indexf, is written as...

work page
[3]

BINAURAL SIGNAL MA TCHING 3.1. Signal-Independent Binaural Signal Matching Signal-independent BSM aims to design a linear filter that maps the microphone array signals to binaural signals at the user’s ears [2, 3]. The design does not depend on a specific source signal but instead assumes a diffuse sound field. This corresponds to energy being uniformly d...

work page
[4]

Each strategy modifies the binaural signal matching (BSM) formulation to emphasize directions within a user-selected field of view while attenuating those outside it

FIELD OF VIEW ENHANCEMENT We now describe two control strategies for field of view (FoV) enhancement. Each strategy modifies the binaural signal matching (BSM) formulation to emphasize directions within a user-selected field of view while attenuating those outside it. Both signal-independent and signal-dependent variants are presented. 4.1. Gain Control I...

work page
[5]

Simulation A continuous motion simulation is performed in pyrooma- coustics [15] within an [8 m, 8 m, 5 m] room (RT60≈200 ms)

RESULTS 5.1. Simulation A continuous motion simulation is performed in pyrooma- coustics [15] within an [8 m, 8 m, 5 m] room (RT60≈200 ms). A 4-microphone array centered at [4 m, 4 m, 2 m] records speech from the EARS dataset [16], sampled at 48 kHz. One talker, initialized at [7 m, 4 m, 2 m] in front of the array, moves in6 ◦ azimuth steps, covering each...

work page
[6]

The proposed framework extends previous work in signal-dependent binauralization to scenar- ios with continuous motion and for adjustable field-of-view enhancement

CONCLUSION In this work, a novel mixture of experts framework is theo- rized for binauralization. The proposed framework extends previous work in signal-dependent binauralization to scenar- ios with continuous motion and for adjustable field-of-view enhancement. Our results demonstrate that the framework is not only effective but highly modular, so that i...

work page
[7]

Spatial audio signal pro- cessing for binaural reproduction of recorded acoustic scenes-review and challenges,

Boaz Rafaely, Vladimir Tourbabin, Emanuel Habets, Zamir Ben-Hur, Hyunkook Lee, Hannes Gamper, Lior Arbel, Lachlan Birnie, Thushara Abhayapala, and Prasanga Samarasinghe, “Spatial audio signal pro- cessing for binaural reproduction of recorded acoustic scenes-review and challenges,”Acta Acustica, vol. 6, 2022

work page 2022
[8]

End-to-End Magnitude Least Squares Binaural Ren- dering of Spherical Microphone Array Signals,

Thomas Deppisch, Hannes Helmholz, and Jens Ahrens, “End-to-End Magnitude Least Squares Binaural Ren- dering of Spherical Microphone Array Signals,” inInt. Conf. on Immersive and 3D Audio, 2021, pp. 1–8

work page 2021
[9]

Design and analysis of binaural signal matching with arbitrary microphone ar- rays and listener head rotations,

Lior Madmoni, Zamir Ben-Hur, Jacob Donley, Vladimir Tourbabin, and Boaz Rafaely, “Design and analysis of binaural signal matching with arbitrary microphone ar- rays and listener head rotations,”EURASIP Journal on Audio, Speech, and Music Processing, vol. 9, 2024

work page 2024
[10]

COMPASS: Coding and multidirectional parameteri- zation of ambisonic sound scenes,

Archontis Politis, Sakari Tervo, and Ville Pulkki, “COMPASS: Coding and multidirectional parameteri- zation of ambisonic sound scenes,” inIEEE Interna- tional Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP). 2018, pp. 6802–6806, IEEE

work page 2018
[11]

Acoustical zooming based on a parametric sound field representation,

Richard Schultz-Amling, Fabian Kuech, Oliver Thier- gart, and Markus Kallinger, “Acoustical zooming based on a parametric sound field representation,” inAudio Engineering Society Convention 128. Audio Engineer- ing Society, 2010

work page 2010
[12]

Spatial trans- formations for the enhancement of ambisonic record- ings,

Matthias Kronlachner and Franz Zotter, “Spatial trans- formations for the enhancement of ambisonic record- ings,” inProceedings of the 2nd International Confer- ence on Spatial Audio, Erlangen, 2014

work page 2014
[13]

Parametric spatial audio effects based on the multi- directional decomposition of ambisonic sound scenes,

Leo McCormack, Archontis Politis, and Ville Pulkki, “Parametric spatial audio effects based on the multi- directional decomposition of ambisonic sound scenes,” in2021 24th International Conference on Digital Audio Effects (DAFx), 2021, pp. 214–221

work page 2021
[14]

Binaural reproduction of head- worn microphone array recordings with adjustable field- of-view control,

Janani Fernandez, David Lou Alon, Zamir Ben-Hur, and Vladimir Tourbabin, “Binaural reproduction of head- worn microphone array recordings with adjustable field- of-view control,” inAES 5th Int. Conf on Audio for Vir- tual and Augmented Reality, 2024

work page 2024
[15]

Binaural Rendering of Ambisonic Signals via Magnitude Least Squares,

Christian Sch ¨orkhuber, Markus Zaunschirm, and Robert H¨oldrich, “Binaural Rendering of Ambisonic Signals via Magnitude Least Squares,” inProc. of the Ger- man Annual Conference on Acoustics (DAGA), 2018, pp. 339–342

work page 2018
[16]

Harry L Van Trees,Optimum array processing: Part IV of detection, estimation, and modulation theory, John Wiley & Sons, 2002

work page 2002
[17]

Performance and robust- ness of signal-dependent vs. signal-independent binau- ral signal matching with wearable microphone arrays,

Ami Berger, Vladimir Tourbabin, Jacob Donley, Zamir Ben-Hur, and Boaz Rafaely, “Performance and robust- ness of signal-dependent vs. signal-independent binau- ral signal matching with wearable microphone arrays,” arXiv preprint arXiv:2409.11731, 2024

work page arXiv 2024
[18]

Online learning and on- line convex optimization,

Shai Shalev-Shwartz et al., “Online learning and on- line convex optimization,”F oundations and Trends® in Machine Learning, vol. 4, no. 2, pp. 107–194, 2012

work page 2012
[19]

Universal prediction,

N. Merhav and M. Feder, “Universal prediction,”IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2124–2147, 1998

work page 1998
[20]

Universal linear pre- diction by model order weighting,

Andrew C Singer and Meir Feder, “Universal linear pre- diction by model order weighting,”IEEE Transactions on Signal Processing, vol. 47, no. 10, pp. 2685–2699, 2002

work page 2002
[21]

Pyroomacoustics: A python package for audio room simulation and array processing algorithms,

Robin Scheibler, Eric Bezzam, and Ivan Dokmani ´c, “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 351–355

work page 2018
[22]

EARS: An anechoic fullband speech dataset benchmarked for speech en- hancement and dereverberation,

Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinjii Watanabe, Alexander Richard, and Timo Gerkmann, “EARS: An anechoic fullband speech dataset benchmarked for speech en- hancement and dereverberation,” inInterspeech, 2024

work page 2024

[1] [1]

Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers

INTRODUCTION Consumer audio capture devices are increasingly designed as wearable technologies. Among these, headworn micro- phone arrays have gained significant attention for capturing sound fields and enabling binaural rendering. A key use case arises when the user wishes to re-experience the recording in a way that matches how it originally sounded. Th...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

We assume that the recorded sound field can be expressed as a superposition of signals arriving fromN s distinct directions

SIGNAL MODEL Consider a microphone array withN m microphones used to capture an acoustic scene. We assume that the recorded sound field can be expressed as a superposition of signals arriving fromN s distinct directions. In the short-time Fourier trans- form (STFT) domain, the signal observed at the array, at time indextand frequency indexf, is written as...

work page

[3] [3]

BINAURAL SIGNAL MA TCHING 3.1. Signal-Independent Binaural Signal Matching Signal-independent BSM aims to design a linear filter that maps the microphone array signals to binaural signals at the user’s ears [2, 3]. The design does not depend on a specific source signal but instead assumes a diffuse sound field. This corresponds to energy being uniformly d...

work page

[4] [4]

Each strategy modifies the binaural signal matching (BSM) formulation to emphasize directions within a user-selected field of view while attenuating those outside it

FIELD OF VIEW ENHANCEMENT We now describe two control strategies for field of view (FoV) enhancement. Each strategy modifies the binaural signal matching (BSM) formulation to emphasize directions within a user-selected field of view while attenuating those outside it. Both signal-independent and signal-dependent variants are presented. 4.1. Gain Control I...

work page

[5] [5]

Simulation A continuous motion simulation is performed in pyrooma- coustics [15] within an [8 m, 8 m, 5 m] room (RT60≈200 ms)

RESULTS 5.1. Simulation A continuous motion simulation is performed in pyrooma- coustics [15] within an [8 m, 8 m, 5 m] room (RT60≈200 ms). A 4-microphone array centered at [4 m, 4 m, 2 m] records speech from the EARS dataset [16], sampled at 48 kHz. One talker, initialized at [7 m, 4 m, 2 m] in front of the array, moves in6 ◦ azimuth steps, covering each...

work page

[6] [6]

The proposed framework extends previous work in signal-dependent binauralization to scenar- ios with continuous motion and for adjustable field-of-view enhancement

CONCLUSION In this work, a novel mixture of experts framework is theo- rized for binauralization. The proposed framework extends previous work in signal-dependent binauralization to scenar- ios with continuous motion and for adjustable field-of-view enhancement. Our results demonstrate that the framework is not only effective but highly modular, so that i...

work page

[7] [7]

Spatial audio signal pro- cessing for binaural reproduction of recorded acoustic scenes-review and challenges,

Boaz Rafaely, Vladimir Tourbabin, Emanuel Habets, Zamir Ben-Hur, Hyunkook Lee, Hannes Gamper, Lior Arbel, Lachlan Birnie, Thushara Abhayapala, and Prasanga Samarasinghe, “Spatial audio signal pro- cessing for binaural reproduction of recorded acoustic scenes-review and challenges,”Acta Acustica, vol. 6, 2022

work page 2022

[8] [8]

End-to-End Magnitude Least Squares Binaural Ren- dering of Spherical Microphone Array Signals,

Thomas Deppisch, Hannes Helmholz, and Jens Ahrens, “End-to-End Magnitude Least Squares Binaural Ren- dering of Spherical Microphone Array Signals,” inInt. Conf. on Immersive and 3D Audio, 2021, pp. 1–8

work page 2021

[9] [9]

Design and analysis of binaural signal matching with arbitrary microphone ar- rays and listener head rotations,

Lior Madmoni, Zamir Ben-Hur, Jacob Donley, Vladimir Tourbabin, and Boaz Rafaely, “Design and analysis of binaural signal matching with arbitrary microphone ar- rays and listener head rotations,”EURASIP Journal on Audio, Speech, and Music Processing, vol. 9, 2024

work page 2024

[10] [10]

COMPASS: Coding and multidirectional parameteri- zation of ambisonic sound scenes,

Archontis Politis, Sakari Tervo, and Ville Pulkki, “COMPASS: Coding and multidirectional parameteri- zation of ambisonic sound scenes,” inIEEE Interna- tional Conference on Acoustics, Speech and Signal Pro- cessing (ICASSP). 2018, pp. 6802–6806, IEEE

work page 2018

[11] [11]

Acoustical zooming based on a parametric sound field representation,

Richard Schultz-Amling, Fabian Kuech, Oliver Thier- gart, and Markus Kallinger, “Acoustical zooming based on a parametric sound field representation,” inAudio Engineering Society Convention 128. Audio Engineer- ing Society, 2010

work page 2010

[12] [12]

Spatial trans- formations for the enhancement of ambisonic record- ings,

Matthias Kronlachner and Franz Zotter, “Spatial trans- formations for the enhancement of ambisonic record- ings,” inProceedings of the 2nd International Confer- ence on Spatial Audio, Erlangen, 2014

work page 2014

[13] [13]

Parametric spatial audio effects based on the multi- directional decomposition of ambisonic sound scenes,

Leo McCormack, Archontis Politis, and Ville Pulkki, “Parametric spatial audio effects based on the multi- directional decomposition of ambisonic sound scenes,” in2021 24th International Conference on Digital Audio Effects (DAFx), 2021, pp. 214–221

work page 2021

[14] [14]

Binaural reproduction of head- worn microphone array recordings with adjustable field- of-view control,

Janani Fernandez, David Lou Alon, Zamir Ben-Hur, and Vladimir Tourbabin, “Binaural reproduction of head- worn microphone array recordings with adjustable field- of-view control,” inAES 5th Int. Conf on Audio for Vir- tual and Augmented Reality, 2024

work page 2024

[15] [15]

Binaural Rendering of Ambisonic Signals via Magnitude Least Squares,

Christian Sch ¨orkhuber, Markus Zaunschirm, and Robert H¨oldrich, “Binaural Rendering of Ambisonic Signals via Magnitude Least Squares,” inProc. of the Ger- man Annual Conference on Acoustics (DAGA), 2018, pp. 339–342

work page 2018

[16] [16]

Harry L Van Trees,Optimum array processing: Part IV of detection, estimation, and modulation theory, John Wiley & Sons, 2002

work page 2002

[17] [17]

Performance and robust- ness of signal-dependent vs. signal-independent binau- ral signal matching with wearable microphone arrays,

Ami Berger, Vladimir Tourbabin, Jacob Donley, Zamir Ben-Hur, and Boaz Rafaely, “Performance and robust- ness of signal-dependent vs. signal-independent binau- ral signal matching with wearable microphone arrays,” arXiv preprint arXiv:2409.11731, 2024

work page arXiv 2024

[18] [18]

Online learning and on- line convex optimization,

Shai Shalev-Shwartz et al., “Online learning and on- line convex optimization,”F oundations and Trends® in Machine Learning, vol. 4, no. 2, pp. 107–194, 2012

work page 2012

[19] [19]

Universal prediction,

N. Merhav and M. Feder, “Universal prediction,”IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2124–2147, 1998

work page 1998

[20] [20]

Universal linear pre- diction by model order weighting,

Andrew C Singer and Meir Feder, “Universal linear pre- diction by model order weighting,”IEEE Transactions on Signal Processing, vol. 47, no. 10, pp. 2685–2699, 2002

work page 2002

[21] [21]

Pyroomacoustics: A python package for audio room simulation and array processing algorithms,

Robin Scheibler, Eric Bezzam, and Ivan Dokmani ´c, “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 351–355

work page 2018

[22] [22]

EARS: An anechoic fullband speech dataset benchmarked for speech en- hancement and dereverberation,

Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinjii Watanabe, Alexander Richard, and Timo Gerkmann, “EARS: An anechoic fullband speech dataset benchmarked for speech en- hancement and dereverberation,” inInterspeech, 2024

work page 2024