Neighbor-Consistent Neural Filters for Robust Personal Sound Zones Under Localization Uncertainty

Edgar Choueiri; Hao Jiang

arxiv: 2605.21891 · v1 · pith:5SCNKJ76new · submitted 2026-05-21 · 📡 eess.AS

Neighbor-Consistent Neural Filters for Robust Personal Sound Zones Under Localization Uncertainty

Hao Jiang , Edgar Choueiri This is my paper

Pith reviewed 2026-05-22 03:17 UTC · model grok-4.3

classification 📡 eess.AS

keywords personal sound zonesneural filterslocalization uncertaintyneighbor consistencyaudio renderinghead trackingrobustness

0 comments

The pith

Neighbor consistency regularization stabilizes personal sound zone filters against localization uncertainty

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces neighbor-consistent neural filters for head-tracked personal sound zones to counter sensitivity to localization uncertainty from tracking jitter or occlusions. By adding a penalty on filter differences at randomly perturbed neighboring coordinates during training, the coordinate-to-filter mapping becomes more stable. This regularization reduces spatial variation rates while largely preserving the acoustic isolation between zones. Simulation and in-situ measurements confirm the gains in robustness without major changes to array geometry or transfer functions.

Core claim

Neighbor consistency regularization applied during training of coordinate-conditioned neural networks reduces the root-mean-square variation rate of generated filters by up to 55.9 percent in the woofer band and 30.3 percent in the tweeter band while largely preserving isolation quality and improving lower-tail robustness; physical measurements with a 24-driver array show up to 16.9 percent better worst-case neighborhood isolation and up to 61.8 percent lower spatial variation rates.

What carries the argument

Neighbor-consistency regularization term that penalizes differences between filters generated at an anchor coordinate and at randomly sampled neighboring coordinates during training of the neural network.

Load-bearing premise

Penalizing filter differences only at randomly sampled neighboring coordinates during training will produce stable behavior under the distribution of real-world localization noise without changes to acoustic transfer functions or array geometry.

What would settle it

Apply localization perturbations drawn from a distribution different from the random sampling used in training, such as systematic optical distortion or occlusion-induced bias, and measure whether variation rates and isolation degrade.

Figures

Figures reproduced from arXiv: 2605.21891 by Edgar Choueiri, Hao Jiang.

**Figure 1.** Figure 1: Coordinate-conditioned neural PSZ filter generation for a split-band (woofer–tweeter) system using two independently trained models. The woofer [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Simulation (woofer, Listener 2): per-anchor distributions of IZI/IPI quality summaries (median and CVaR10; higher is better) and stability summaries [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Simulation (woofer, Listener 2): one-anchor example of the metric landscape under coordinate perturbations. Each map plots the frequency-averaged [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Simulation (tweeter, Listener 2): per-anchor distributions of IZI/IPI quality summaries (median and CVaR10; higher is better) and stability summaries [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Simulation (tweeter, Listener 2): one-anchor example of the metric landscape under coordinate perturbations. Each map plots the frequency-averaged [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: In-situ measurement setup. (Top) Photograph of the listening room [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Measurements: One-anchor example (Anchor 2) showing [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

Coordinate-conditioned neural networks can generate head-tracked personal sound zone (PSZ) loudspeaker filters in real time, but they are sensitive to localization uncertainty. Small fluctuations in estimated listener coordinates, caused by optical distortion, temporary occlusions, or tracking jitter, may produce large filter changes even when listeners are physically stationary. This paper proposes neighbor-consistent neural filters that regularize the coordinate-to-filter mapping by penalizing filter differences at randomly perturbed neighboring coordinates during training. To evaluate robustness against tracking noise, we introduce a decoupled protocol that fixes the acoustic transfer functions at a physical anchor while perturbing only the coordinate inputs used for filter generation. Isolation quality and local stability are evaluated using neighborhood median and lower-tail statistics of inter-zone and inter-program isolation, together with spatial variation rates that quantify metric sensitivity within a coordinate neighborhood. In simulation with a split-band woofer-tweeter system and 25 randomly sampled anchor positions, neighbor consistency reduces the root-mean-square (RMS) variation rate by up to 55.9% in the woofer band and 30.3% in the tweeter band while largely preserving isolation quality and improving lower-tail robustness. In in-situ measurements using a 24-driver array and two stationary head-and-torso simulators, the proposed regularization improves worst-case neighborhood isolation by up to 16.9% and reduces spatial variation rates by up to 61.8%. These results demonstrate that neighbor-consistency regularization effectively stabilizes PSZ rendering under localization uncertainty.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a neighbor-consistency penalty to coordinate-conditioned neural PSZ filters and reports solid drops in variation rates from both simulation and real-array measurements, but the random perturbations may not capture actual tracking error patterns.

read the letter

The main takeaway is that penalizing filter differences at randomly sampled neighboring coordinates during training makes the network less sensitive to small input shifts, and the authors back this with both simulation and hardware results. They keep the acoustic transfer functions fixed at physical points while only changing the coordinate input to the network, which cleanly isolates the mapping's stability without mixing in sound-field changes. In the split-band simulation across 25 anchors this cuts RMS variation rates by up to 55.9 % in the woofer band and 30.3 % in the tweeter band. The in-situ tests on a 24-driver array with two stationary head-and-torso simulators show up to 61.8 % lower spatial variation and 16.9 % better worst-case neighborhood isolation. That decoupled protocol and the dual sim-plus-measurement design are the parts that actually move the needle for practical head-tracked PSZ work. The regularization itself is a straightforward addition to the training loss and does not appear to trade off isolation quality much. The main soft spot is the assumption that uniform random neighbor samples will produce the same robustness as the real distribution of localization errors. Optical tracking noise often includes directional biases, brief large excursions from occlusions, or jitter that may not match the perturbation radius and sampling used in training. The measurements use fixed simulators, so they do not yet show behavior under actual listener motion or varying room conditions. This work is aimed at people already building or evaluating neural filters for personal sound zones who need better stability under imperfect tracking. A reader in spatial audio or array signal processing would find the evaluation protocol and the concrete numbers useful to replicate or extend. It deserves a serious referee because the empirical setup is thoughtful and the gains are quantified on real hardware, even if the generalization question needs tighter checks. I would send it to review and ask the authors to compare their training perturbation statistics against measured tracking error data from their own system.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes neighbor-consistent neural filters for head-tracked personal sound zones (PSZ) by adding a regularization term that penalizes filter differences at randomly perturbed neighboring listener coordinates during training. This aims to reduce sensitivity of the coordinate-to-filter mapping to localization uncertainty from optical distortion, occlusions, or jitter. A decoupled evaluation protocol is introduced that holds acoustic transfer functions fixed at physical anchors while perturbing only the coordinate inputs. Quantitative results are reported from simulation (split-band woofer-tweeter system, 25 anchor positions) showing up to 55.9% and 30.3% reductions in RMS variation rate for woofer and tweeter bands, and from in-situ measurements (24-driver array, two head-and-torso simulators) showing up to 16.9% improvement in worst-case neighborhood isolation and 61.8% reduction in spatial variation rates, while largely preserving isolation quality.

Significance. If the central claim holds, the work offers a practical, low-overhead regularization for stabilizing real-time PSZ rendering under realistic tracking noise without altering array geometry or acoustic transfer functions. The decoupled evaluation protocol is a useful methodological contribution for isolating coordinate sensitivity. Credit is due for combining simulation across multiple anchors with in-situ measurements using stationary simulators and for reporting both median and lower-tail neighborhood statistics. The approach could support more reliable deployment of head-tracked PSZ systems in consumer or automotive settings where localization jitter is common.

major comments (2)

[§4] §4 (Training and regularization): The neighbor-consistency loss penalizes filter differences at randomly sampled coordinates within a perturbation radius, but the manuscript provides no quantitative comparison between the distribution of these random perturbations and the actual statistics (bias, variance, directional correlation) of localization errors measured from the optical tracking system. The central robustness claim therefore rests on an unverified assumption that random sampling reproduces real-world error characteristics.
[§5.2] §5.2 (Decoupled evaluation protocol): While the protocol correctly isolates coordinate-to-filter sensitivity by fixing ATFs at physical anchors, it does not include a sensitivity analysis or ablation on perturbation radius or sampling strategy. If real localization errors exhibit larger excursions or structured biases (e.g., from occlusions) than the training distribution, the reported reductions in RMS variation rate (55.9% woofer, 30.3% tweeter) and worst-case isolation (16.9%) may not generalize.

minor comments (2)

[Abstract / §5.1] The abstract and §5.1 refer to '25 randomly sampled anchor positions' and '24-driver array' without specifying the exact coordinate ranges or array geometry; adding a brief table or figure reference would improve reproducibility.
[§3 / §5] Notation for the regularization strength and perturbation radius is introduced but not consistently labeled across equations and experimental tables; a single symbol table would aid clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We provide point-by-point responses to the major comments below. We will make revisions to address the concerns where feasible, strengthening the presentation of our methods and results.

read point-by-point responses

Referee: [§4] §4 (Training and regularization): The neighbor-consistency loss penalizes filter differences at randomly sampled coordinates within a perturbation radius, but the manuscript provides no quantitative comparison between the distribution of these random perturbations and the actual statistics (bias, variance, directional correlation) of localization errors measured from the optical tracking system. The central robustness claim therefore rests on an unverified assumption that random sampling reproduces real-world error characteristics.

Authors: We acknowledge the value of a direct comparison between the training perturbation distribution and empirical localization error statistics from the optical tracking system. In this study, the random perturbations were chosen to model small-scale uncertainties commonly encountered in head-tracking applications, such as jitter and minor distortions, without introducing specific biases. The decoupled evaluation uses the same perturbation model to assess robustness. While we did not perform a quantitative match to measured error distributions in the current work, we will revise §4 to provide a more explicit rationale for the uniform random sampling approach and discuss its relation to typical tracking errors, thereby clarifying the assumptions underlying the robustness claims. revision: partial
Referee: [§5.2] §5.2 (Decoupled evaluation protocol): While the protocol correctly isolates coordinate-to-filter sensitivity by fixing ATFs at physical anchors, it does not include a sensitivity analysis or ablation on perturbation radius or sampling strategy. If real localization errors exhibit larger excursions or structured biases (e.g., from occlusions) than the training distribution, the reported reductions in RMS variation rate (55.9% woofer, 30.3% tweeter) and worst-case isolation (16.9%) may not generalize.

Authors: We agree that a sensitivity analysis regarding the perturbation radius and sampling strategy would be beneficial for assessing the generalizability of our results. The radius was selected to reflect realistic levels of localization uncertainty in our experimental setup, and uniform sampling was used to avoid directional assumptions. The improvements in RMS variation rates and isolation metrics were observed consistently across the tested conditions. In the revised manuscript, we will incorporate an ablation study or additional analysis on varying perturbation radii to demonstrate the sensitivity and support the reported performance gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; regularization and metrics are independent

full rationale

The paper defines neighbor-consistency as an explicit regularization term added to the training loss that penalizes filter differences at randomly sampled neighboring coordinates. The claimed reductions in RMS variation rate (up to 55.9% woofer, 30.3% tweeter) and worst-case isolation (16.9%) are obtained from a separate decoupled evaluation protocol that holds acoustic transfer functions fixed at physical anchors while only perturbing coordinate inputs, then computes neighborhood median/lower-tail statistics and spatial variation rates on held-out positions. These evaluation quantities are not algebraically or statistically identical to the training penalty; the method could have produced no improvement or degradation. No self-citations, uniqueness theorems, or fitted parameters renamed as predictions appear in the derivation. The chain is therefore self-contained empirical regularization followed by independent measurement.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions in acoustic array processing and neural network training. No new physical entities are postulated. The regularization weight and perturbation distribution are implicit free parameters whose specific values are not detailed in the abstract.

free parameters (2)

regularization strength
The weight balancing the neighbor-consistency penalty against the primary isolation objective is a tunable hyperparameter whose value affects the reported trade-off between stability and isolation quality.
perturbation radius
The spatial scale of random coordinate perturbations used during training is chosen to match expected tracking noise and directly influences the learned robustness.

axioms (2)

domain assumption Acoustic transfer functions remain fixed when only coordinate inputs are perturbed in the decoupled evaluation protocol.
This separation is invoked to isolate the effect of localization uncertainty from changes in the physical sound field.
domain assumption Neighborhood median and lower-tail statistics of isolation metrics are representative of real-world tracking error distributions.
The paper uses these statistics to quantify robustness; their validity depends on the assumption that random perturbations adequately model actual sensor noise.

pith-pipeline@v0.9.0 · 5793 in / 1645 out tokens · 40396 ms · 2026-05-22T03:17:49.442306+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

[1]

Personal sound,

W. F. Druyvesteyn and J. Garas, “Personal sound,”J. Audio Eng. Soc., vol. 45, no. 9, pp. 685–701, 1997

work page 1997
[2]

Use of the Filtered-X least-mean-squares algorithm to adapt personal sound zones in a car cabin,

L. Vindrola, M. Melon, J.-C. Chamard, and B. Gazengel, “Use of the Filtered-X least-mean-squares algorithm to adapt personal sound zones in a car cabin,”J. Acoust. Soc. Am., vol. 150, no. 3, pp. 1779–1793, Sep. 2021

work page 2021
[3]

Personal sound zones: Delivering interface-free audio to multiple listeners,

T. Betlehem, W. Zhang, M. A. Poletti, and T. D. Abhayapala, “Personal sound zones: Delivering interface-free audio to multiple listeners,”IEEE Signal Process. Mag., vol. 32, no. 2, pp. 81–91, 2015

work page 2015
[4]

Design and evaluation of personal audio systems based on speech privacy constraints,

D. Wallace and J. Cheer, “Design and evaluation of personal audio systems based on speech privacy constraints,”J. Acoust. Soc. Am., vol. 147, no. 4, pp. 2271–2282, 2020

work page 2020
[5]

Living with sound zones: A long-term field study of dynamic sound zones in a domestic context,

R. M. Jacobsen, K. F. Skov, S. S. Johansen, M. B. Skov, and J. Kjeldskov, “Living with sound zones: A long-term field study of dynamic sound zones in a domestic context,” inProc. 2023 CHI Conf. Human Factors in Computing Systems (CHI), New York, NY , USA, 2023, pp. 1–14

work page 2023
[6]

Sound field reproduction using planar and linear arrays of loudspeakers,

J. Ahrens and S. Spors, “Sound field reproduction using planar and linear arrays of loudspeakers,”IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2038–2050, 2010

work page 2038
[7]

General metatheory of auditory localization,

M. A. Gerzon, “General metatheory of auditory localization,” inProc. Audio Eng. Soc. 92nd Conv., Vienna, Austria, 1982

work page 1982
[8]

Acoustic control by wave field synthesis,

A. J. Berkhout, D. de Vries, and P. V ogel, “Acoustic control by wave field synthesis,”J. Acoust. Soc. Am., vol. 93, no. 5, pp. 2764–2778, 1993

work page 1993
[9]

Reproduction of a plane-wave sound field using an array of loudspeakers,

D. B. Ward and T. D. Abhayapala, “Reproduction of a plane-wave sound field using an array of loudspeakers,”IEEE Trans. Speech Audio Process., vol. 9, no. 6, pp. 697–707, 2001

work page 2001
[10]

Three-dimensional surround sound systems based on spherical harmonics,

M. A. Poletti, “Three-dimensional surround sound systems based on spherical harmonics,”J. Audio Eng. Soc., vol. 53, no. 11, pp. 1004– 1025, 2005

work page 2005
[11]

Generation of an acoustically bright zone with an illuminated region using multiple sources,

J.-W. Choi and Y .-H. Kim, “Generation of an acoustically bright zone with an illuminated region using multiple sources,”J. Acoust. Soc. Am., vol. 111, no. 4, pp. 1695–1700, 2002

work page 2002
[12]

A realization of sound focused personal audio system using acoustic contrast control,

J.-H. Chang, C.-H. Lee, J.-Y . Park, and Y .-H. Kim, “A realization of sound focused personal audio system using acoustic contrast control,” J. Acoust. Soc. Am., vol. 125, no. 4, pp. 2091–2097, 2009

work page 2091
[13]

Spatial multizone soundfield reproduc- tion: Theory and design,

Y . J. Wu and T. D. Abhayapala, “Spatial multizone soundfield reproduc- tion: Theory and design,”IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 6, pp. 1711–1720, 2011

work page 2011
[14]

Weighted pressure matching with windowed targets for personal sound zones,

V . Mol ´es-Cases, S. J. Elliott, J. Cheer, G. Pi ˜nero, and A. Gonzalez, “Weighted pressure matching with windowed targets for personal sound zones,”J. Acoust. Soc. Am., vol. 151, no. 1, pp. 334–345, 2022

work page 2022
[15]

Design and implemen- tation of a car cabin personal audio system,

J. Cheer, S. J. Elliott, and M. F. Sim ´on G´alvez, “Design and implemen- tation of a car cabin personal audio system,”J. Audio Eng. Soc., vol. 61, no. 6, pp. 412–424, 2013

work page 2013
[16]

Controlled sound field with a dual layer loudspeaker array,

M. Shin, F. M. Fazi, P. A. Nelson, and F. C. Hirono, “Controlled sound field with a dual layer loudspeaker array,”J. Sound Vib., vol. 333, no. 16, pp. 3794–3817, 2014

work page 2014
[17]

Robustness and regularization of personal audio systems,

S. J. Elliott, J. Cheer, J.-W. Choi, and Y . Kim, “Robustness and regularization of personal audio systems,”IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 7, pp. 2123–2133, 2012

work page 2012
[18]

Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics,

S. Doclo and M. Moonen, “Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics,” IEEE Trans. Signal Process., vol. 51, no. 10, pp. 2511–2526, 2003

work page 2003
[19]

Regularization using Monte Carlo simula- tion to make optimal beamformers robust to system perturbations,

M. R. Bai and C.-C. Chen, “Regularization using Monte Carlo simula- tion to make optimal beamformers robust to system perturbations,”J. Acoust. Soc. Am., vol. 135, no. 5, pp. 2808–2820, 2014

work page 2014
[20]

Robust acoustic contrast control with reduced in-situ measurement by acoustic modelling,

Q. Zhu, P. Coleman, M. Wu, and J. Yang, “Robust acoustic contrast control with reduced in-situ measurement by acoustic modelling,”J. Audio Eng. Soc., vol. 65, no. 6, pp. 460–473, 2017

work page 2017
[21]

CGMM-based sound zone generation using robust pressure matching with ATF perturbation constraints,

J. Zhang, L. Shi, M. G. Christensen, W. Zhang, L. Zhang, and J. Chen, “CGMM-based sound zone generation using robust pressure matching with ATF perturbation constraints,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 31, pp. 3331–3345, 2023

work page 2023
[22]

Robust reproduction of sound zones with local sound orientation,

Q. Zhu, P. Coleman, M. Wu, and J. Yang, “Robust reproduction of sound zones with local sound orientation,”J. Acoust. Soc. Am., vol. 142, no. 1, pp. EL118–EL122, 2017

work page 2017
[23]

Personal sound zones by subband filtering and time domain optimization,

V . Mol ´es-Cases, G. Pi ˜nero, M. de Diego, and A. Gonzalez, “Personal sound zones by subband filtering and time domain optimization,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 2684– 2696, 2020

work page 2020
[24]

Personal sound zones in the short-time Fourier transform domain with relaxed reverberation,

J. Tang, W. Zhu, and X. Li, “Personal sound zones in the short-time Fourier transform domain with relaxed reverberation,”J. Acoust. Soc. Am., vol. 157, no. 2, pp. 778–796, 2025

work page 2025
[25]

Digital filters design for personal sound zones: A neural approach,

G. Pepe, L. Gabrielli, S. Squartini, C. Tripodi, and N. Strozzi, “Digital filters design for personal sound zones: A neural approach,” inProc. Int. Joint Conf. Neural Netw. (IJCNN), Padua, Italy, 2022

work page 2022
[26]

SANN-PSZ: Spatially adaptive neural network for head-tracked personal sound zones,

Y . Qiao and E. Y . Choueiri, “SANN-PSZ: Spatially adaptive neural network for head-tracked personal sound zones,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 2735–2748, 2025

work page 2025
[27]

Stereo audio rendering for personal sound zones using a binaural spatially adaptive neural network (BSANN),

H. Jiang and E. Y . Choueiri, “Stereo audio rendering for personal sound zones using a binaural spatially adaptive neural network (BSANN),” arXiv preprint, Jan. 2026, arXiv:2601.06621. [Online]. Available: https://arxiv.org/abs/2601.06621

work page arXiv 2026
[28]

Isolation performance metrics for personal sound zone reproduction systems,

Y . Qiao, L. Guadagnin, and E. Y . Choueiri, “Isolation performance metrics for personal sound zone reproduction systems,”JASA Express Lett., vol. 2, no. 10, p. 104801, 2022

work page 2022
[29]

Temporal ensembling for semi-supervised learn- ing,

S. Laine and T. Aila, “Temporal ensembling for semi-supervised learn- ing,” inProc. Int. Conf. Learn. Represent. (ICLR), 2017

work page 2017
[30]

Mean teachers are better role mod- els: Weight-averaged consistency targets improve semi-supervised deep learning results,

A. Tarvainen and H. Valpola, “Mean teachers are better role mod- els: Weight-averaged consistency targets improve semi-supervised deep learning results,” inAdvances in Neural Information Processing Systems, vol. 30, 2017

work page 2017
[31]

Virtual adversarial training: A regularization method for supervised and semi-supervised learning,

T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, “Virtual adversarial training: A regularization method for supervised and semi-supervised learning,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1979–1993, 2019

work page 1979
[32]

FixMatch: Simplifying semi- supervised learning with consistency and confidence,

K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C. Li, “FixMatch: Simplifying semi- supervised learning with consistency and confidence,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 596–608

work page 2020
[33]

HRTFformer: A spatially-aware transformer for personalized HRTF upsampling in immersive audio rendering,

X. Hu, J. Li, S. Zhang, S. Goetz, L. Picinali, O. B. Akan, and A. O. T. Hogg, “HRTFformer: A spatially-aware transformer for personalized HRTF upsampling in immersive audio rendering,” 2025, arXiv:2510.01891. [Online]. Available: https://arxiv.org/abs/2510.01891

work page internal anchor Pith review arXiv 2025

[1] [1]

Personal sound,

W. F. Druyvesteyn and J. Garas, “Personal sound,”J. Audio Eng. Soc., vol. 45, no. 9, pp. 685–701, 1997

work page 1997

[2] [2]

Use of the Filtered-X least-mean-squares algorithm to adapt personal sound zones in a car cabin,

L. Vindrola, M. Melon, J.-C. Chamard, and B. Gazengel, “Use of the Filtered-X least-mean-squares algorithm to adapt personal sound zones in a car cabin,”J. Acoust. Soc. Am., vol. 150, no. 3, pp. 1779–1793, Sep. 2021

work page 2021

[3] [3]

Personal sound zones: Delivering interface-free audio to multiple listeners,

T. Betlehem, W. Zhang, M. A. Poletti, and T. D. Abhayapala, “Personal sound zones: Delivering interface-free audio to multiple listeners,”IEEE Signal Process. Mag., vol. 32, no. 2, pp. 81–91, 2015

work page 2015

[4] [4]

Design and evaluation of personal audio systems based on speech privacy constraints,

D. Wallace and J. Cheer, “Design and evaluation of personal audio systems based on speech privacy constraints,”J. Acoust. Soc. Am., vol. 147, no. 4, pp. 2271–2282, 2020

work page 2020

[5] [5]

Living with sound zones: A long-term field study of dynamic sound zones in a domestic context,

R. M. Jacobsen, K. F. Skov, S. S. Johansen, M. B. Skov, and J. Kjeldskov, “Living with sound zones: A long-term field study of dynamic sound zones in a domestic context,” inProc. 2023 CHI Conf. Human Factors in Computing Systems (CHI), New York, NY , USA, 2023, pp. 1–14

work page 2023

[6] [6]

Sound field reproduction using planar and linear arrays of loudspeakers,

J. Ahrens and S. Spors, “Sound field reproduction using planar and linear arrays of loudspeakers,”IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2038–2050, 2010

work page 2038

[7] [7]

General metatheory of auditory localization,

M. A. Gerzon, “General metatheory of auditory localization,” inProc. Audio Eng. Soc. 92nd Conv., Vienna, Austria, 1982

work page 1982

[8] [8]

Acoustic control by wave field synthesis,

A. J. Berkhout, D. de Vries, and P. V ogel, “Acoustic control by wave field synthesis,”J. Acoust. Soc. Am., vol. 93, no. 5, pp. 2764–2778, 1993

work page 1993

[9] [9]

Reproduction of a plane-wave sound field using an array of loudspeakers,

D. B. Ward and T. D. Abhayapala, “Reproduction of a plane-wave sound field using an array of loudspeakers,”IEEE Trans. Speech Audio Process., vol. 9, no. 6, pp. 697–707, 2001

work page 2001

[10] [10]

Three-dimensional surround sound systems based on spherical harmonics,

M. A. Poletti, “Three-dimensional surround sound systems based on spherical harmonics,”J. Audio Eng. Soc., vol. 53, no. 11, pp. 1004– 1025, 2005

work page 2005

[11] [11]

Generation of an acoustically bright zone with an illuminated region using multiple sources,

J.-W. Choi and Y .-H. Kim, “Generation of an acoustically bright zone with an illuminated region using multiple sources,”J. Acoust. Soc. Am., vol. 111, no. 4, pp. 1695–1700, 2002

work page 2002

[12] [12]

A realization of sound focused personal audio system using acoustic contrast control,

J.-H. Chang, C.-H. Lee, J.-Y . Park, and Y .-H. Kim, “A realization of sound focused personal audio system using acoustic contrast control,” J. Acoust. Soc. Am., vol. 125, no. 4, pp. 2091–2097, 2009

work page 2091

[13] [13]

Spatial multizone soundfield reproduc- tion: Theory and design,

Y . J. Wu and T. D. Abhayapala, “Spatial multizone soundfield reproduc- tion: Theory and design,”IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 6, pp. 1711–1720, 2011

work page 2011

[14] [14]

Weighted pressure matching with windowed targets for personal sound zones,

V . Mol ´es-Cases, S. J. Elliott, J. Cheer, G. Pi ˜nero, and A. Gonzalez, “Weighted pressure matching with windowed targets for personal sound zones,”J. Acoust. Soc. Am., vol. 151, no. 1, pp. 334–345, 2022

work page 2022

[15] [15]

Design and implemen- tation of a car cabin personal audio system,

J. Cheer, S. J. Elliott, and M. F. Sim ´on G´alvez, “Design and implemen- tation of a car cabin personal audio system,”J. Audio Eng. Soc., vol. 61, no. 6, pp. 412–424, 2013

work page 2013

[16] [16]

Controlled sound field with a dual layer loudspeaker array,

M. Shin, F. M. Fazi, P. A. Nelson, and F. C. Hirono, “Controlled sound field with a dual layer loudspeaker array,”J. Sound Vib., vol. 333, no. 16, pp. 3794–3817, 2014

work page 2014

[17] [17]

Robustness and regularization of personal audio systems,

S. J. Elliott, J. Cheer, J.-W. Choi, and Y . Kim, “Robustness and regularization of personal audio systems,”IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 7, pp. 2123–2133, 2012

work page 2012

[18] [18]

Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics,

S. Doclo and M. Moonen, “Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics,” IEEE Trans. Signal Process., vol. 51, no. 10, pp. 2511–2526, 2003

work page 2003

[19] [19]

Regularization using Monte Carlo simula- tion to make optimal beamformers robust to system perturbations,

M. R. Bai and C.-C. Chen, “Regularization using Monte Carlo simula- tion to make optimal beamformers robust to system perturbations,”J. Acoust. Soc. Am., vol. 135, no. 5, pp. 2808–2820, 2014

work page 2014

[20] [20]

Robust acoustic contrast control with reduced in-situ measurement by acoustic modelling,

Q. Zhu, P. Coleman, M. Wu, and J. Yang, “Robust acoustic contrast control with reduced in-situ measurement by acoustic modelling,”J. Audio Eng. Soc., vol. 65, no. 6, pp. 460–473, 2017

work page 2017

[21] [21]

CGMM-based sound zone generation using robust pressure matching with ATF perturbation constraints,

J. Zhang, L. Shi, M. G. Christensen, W. Zhang, L. Zhang, and J. Chen, “CGMM-based sound zone generation using robust pressure matching with ATF perturbation constraints,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 31, pp. 3331–3345, 2023

work page 2023

[22] [22]

Robust reproduction of sound zones with local sound orientation,

Q. Zhu, P. Coleman, M. Wu, and J. Yang, “Robust reproduction of sound zones with local sound orientation,”J. Acoust. Soc. Am., vol. 142, no. 1, pp. EL118–EL122, 2017

work page 2017

[23] [23]

Personal sound zones by subband filtering and time domain optimization,

V . Mol ´es-Cases, G. Pi ˜nero, M. de Diego, and A. Gonzalez, “Personal sound zones by subband filtering and time domain optimization,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 2684– 2696, 2020

work page 2020

[24] [24]

Personal sound zones in the short-time Fourier transform domain with relaxed reverberation,

J. Tang, W. Zhu, and X. Li, “Personal sound zones in the short-time Fourier transform domain with relaxed reverberation,”J. Acoust. Soc. Am., vol. 157, no. 2, pp. 778–796, 2025

work page 2025

[25] [25]

Digital filters design for personal sound zones: A neural approach,

G. Pepe, L. Gabrielli, S. Squartini, C. Tripodi, and N. Strozzi, “Digital filters design for personal sound zones: A neural approach,” inProc. Int. Joint Conf. Neural Netw. (IJCNN), Padua, Italy, 2022

work page 2022

[26] [26]

SANN-PSZ: Spatially adaptive neural network for head-tracked personal sound zones,

Y . Qiao and E. Y . Choueiri, “SANN-PSZ: Spatially adaptive neural network for head-tracked personal sound zones,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 2735–2748, 2025

work page 2025

[27] [27]

Stereo audio rendering for personal sound zones using a binaural spatially adaptive neural network (BSANN),

H. Jiang and E. Y . Choueiri, “Stereo audio rendering for personal sound zones using a binaural spatially adaptive neural network (BSANN),” arXiv preprint, Jan. 2026, arXiv:2601.06621. [Online]. Available: https://arxiv.org/abs/2601.06621

work page arXiv 2026

[28] [28]

Isolation performance metrics for personal sound zone reproduction systems,

Y . Qiao, L. Guadagnin, and E. Y . Choueiri, “Isolation performance metrics for personal sound zone reproduction systems,”JASA Express Lett., vol. 2, no. 10, p. 104801, 2022

work page 2022

[29] [29]

Temporal ensembling for semi-supervised learn- ing,

S. Laine and T. Aila, “Temporal ensembling for semi-supervised learn- ing,” inProc. Int. Conf. Learn. Represent. (ICLR), 2017

work page 2017

[30] [30]

Mean teachers are better role mod- els: Weight-averaged consistency targets improve semi-supervised deep learning results,

A. Tarvainen and H. Valpola, “Mean teachers are better role mod- els: Weight-averaged consistency targets improve semi-supervised deep learning results,” inAdvances in Neural Information Processing Systems, vol. 30, 2017

work page 2017

[31] [31]

Virtual adversarial training: A regularization method for supervised and semi-supervised learning,

T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, “Virtual adversarial training: A regularization method for supervised and semi-supervised learning,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1979–1993, 2019

work page 1979

[32] [32]

FixMatch: Simplifying semi- supervised learning with consistency and confidence,

K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C. Li, “FixMatch: Simplifying semi- supervised learning with consistency and confidence,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 596–608

work page 2020

[33] [33]

HRTFformer: A spatially-aware transformer for personalized HRTF upsampling in immersive audio rendering,

X. Hu, J. Li, S. Zhang, S. Goetz, L. Picinali, O. B. Akan, and A. O. T. Hogg, “HRTFformer: A spatially-aware transformer for personalized HRTF upsampling in immersive audio rendering,” 2025, arXiv:2510.01891. [Online]. Available: https://arxiv.org/abs/2510.01891

work page internal anchor Pith review arXiv 2025