Neighbor-Consistent Neural Filters for Robust Personal Sound Zones Under Localization Uncertainty
Pith reviewed 2026-05-22 03:17 UTC · model grok-4.3
The pith
Neighbor consistency regularization stabilizes personal sound zone filters against localization uncertainty
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Neighbor consistency regularization applied during training of coordinate-conditioned neural networks reduces the root-mean-square variation rate of generated filters by up to 55.9 percent in the woofer band and 30.3 percent in the tweeter band while largely preserving isolation quality and improving lower-tail robustness; physical measurements with a 24-driver array show up to 16.9 percent better worst-case neighborhood isolation and up to 61.8 percent lower spatial variation rates.
What carries the argument
Neighbor-consistency regularization term that penalizes differences between filters generated at an anchor coordinate and at randomly sampled neighboring coordinates during training of the neural network.
Load-bearing premise
Penalizing filter differences only at randomly sampled neighboring coordinates during training will produce stable behavior under the distribution of real-world localization noise without changes to acoustic transfer functions or array geometry.
What would settle it
Apply localization perturbations drawn from a distribution different from the random sampling used in training, such as systematic optical distortion or occlusion-induced bias, and measure whether variation rates and isolation degrade.
Figures
read the original abstract
Coordinate-conditioned neural networks can generate head-tracked personal sound zone (PSZ) loudspeaker filters in real time, but they are sensitive to localization uncertainty. Small fluctuations in estimated listener coordinates, caused by optical distortion, temporary occlusions, or tracking jitter, may produce large filter changes even when listeners are physically stationary. This paper proposes neighbor-consistent neural filters that regularize the coordinate-to-filter mapping by penalizing filter differences at randomly perturbed neighboring coordinates during training. To evaluate robustness against tracking noise, we introduce a decoupled protocol that fixes the acoustic transfer functions at a physical anchor while perturbing only the coordinate inputs used for filter generation. Isolation quality and local stability are evaluated using neighborhood median and lower-tail statistics of inter-zone and inter-program isolation, together with spatial variation rates that quantify metric sensitivity within a coordinate neighborhood. In simulation with a split-band woofer-tweeter system and 25 randomly sampled anchor positions, neighbor consistency reduces the root-mean-square (RMS) variation rate by up to 55.9% in the woofer band and 30.3% in the tweeter band while largely preserving isolation quality and improving lower-tail robustness. In in-situ measurements using a 24-driver array and two stationary head-and-torso simulators, the proposed regularization improves worst-case neighborhood isolation by up to 16.9% and reduces spatial variation rates by up to 61.8%. These results demonstrate that neighbor-consistency regularization effectively stabilizes PSZ rendering under localization uncertainty.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes neighbor-consistent neural filters for head-tracked personal sound zones (PSZ) by adding a regularization term that penalizes filter differences at randomly perturbed neighboring listener coordinates during training. This aims to reduce sensitivity of the coordinate-to-filter mapping to localization uncertainty from optical distortion, occlusions, or jitter. A decoupled evaluation protocol is introduced that holds acoustic transfer functions fixed at physical anchors while perturbing only the coordinate inputs. Quantitative results are reported from simulation (split-band woofer-tweeter system, 25 anchor positions) showing up to 55.9% and 30.3% reductions in RMS variation rate for woofer and tweeter bands, and from in-situ measurements (24-driver array, two head-and-torso simulators) showing up to 16.9% improvement in worst-case neighborhood isolation and 61.8% reduction in spatial variation rates, while largely preserving isolation quality.
Significance. If the central claim holds, the work offers a practical, low-overhead regularization for stabilizing real-time PSZ rendering under realistic tracking noise without altering array geometry or acoustic transfer functions. The decoupled evaluation protocol is a useful methodological contribution for isolating coordinate sensitivity. Credit is due for combining simulation across multiple anchors with in-situ measurements using stationary simulators and for reporting both median and lower-tail neighborhood statistics. The approach could support more reliable deployment of head-tracked PSZ systems in consumer or automotive settings where localization jitter is common.
major comments (2)
- [§4] §4 (Training and regularization): The neighbor-consistency loss penalizes filter differences at randomly sampled coordinates within a perturbation radius, but the manuscript provides no quantitative comparison between the distribution of these random perturbations and the actual statistics (bias, variance, directional correlation) of localization errors measured from the optical tracking system. The central robustness claim therefore rests on an unverified assumption that random sampling reproduces real-world error characteristics.
- [§5.2] §5.2 (Decoupled evaluation protocol): While the protocol correctly isolates coordinate-to-filter sensitivity by fixing ATFs at physical anchors, it does not include a sensitivity analysis or ablation on perturbation radius or sampling strategy. If real localization errors exhibit larger excursions or structured biases (e.g., from occlusions) than the training distribution, the reported reductions in RMS variation rate (55.9% woofer, 30.3% tweeter) and worst-case isolation (16.9%) may not generalize.
minor comments (2)
- [Abstract / §5.1] The abstract and §5.1 refer to '25 randomly sampled anchor positions' and '24-driver array' without specifying the exact coordinate ranges or array geometry; adding a brief table or figure reference would improve reproducibility.
- [§3 / §5] Notation for the regularization strength and perturbation radius is introduced but not consistently labeled across equations and experimental tables; a single symbol table would aid clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. We provide point-by-point responses to the major comments below. We will make revisions to address the concerns where feasible, strengthening the presentation of our methods and results.
read point-by-point responses
-
Referee: [§4] §4 (Training and regularization): The neighbor-consistency loss penalizes filter differences at randomly sampled coordinates within a perturbation radius, but the manuscript provides no quantitative comparison between the distribution of these random perturbations and the actual statistics (bias, variance, directional correlation) of localization errors measured from the optical tracking system. The central robustness claim therefore rests on an unverified assumption that random sampling reproduces real-world error characteristics.
Authors: We acknowledge the value of a direct comparison between the training perturbation distribution and empirical localization error statistics from the optical tracking system. In this study, the random perturbations were chosen to model small-scale uncertainties commonly encountered in head-tracking applications, such as jitter and minor distortions, without introducing specific biases. The decoupled evaluation uses the same perturbation model to assess robustness. While we did not perform a quantitative match to measured error distributions in the current work, we will revise §4 to provide a more explicit rationale for the uniform random sampling approach and discuss its relation to typical tracking errors, thereby clarifying the assumptions underlying the robustness claims. revision: partial
-
Referee: [§5.2] §5.2 (Decoupled evaluation protocol): While the protocol correctly isolates coordinate-to-filter sensitivity by fixing ATFs at physical anchors, it does not include a sensitivity analysis or ablation on perturbation radius or sampling strategy. If real localization errors exhibit larger excursions or structured biases (e.g., from occlusions) than the training distribution, the reported reductions in RMS variation rate (55.9% woofer, 30.3% tweeter) and worst-case isolation (16.9%) may not generalize.
Authors: We agree that a sensitivity analysis regarding the perturbation radius and sampling strategy would be beneficial for assessing the generalizability of our results. The radius was selected to reflect realistic levels of localization uncertainty in our experimental setup, and uniform sampling was used to avoid directional assumptions. The improvements in RMS variation rates and isolation metrics were observed consistently across the tested conditions. In the revised manuscript, we will incorporate an ablation study or additional analysis on varying perturbation radii to demonstrate the sensitivity and support the reported performance gains. revision: yes
Circularity Check
No significant circularity; regularization and metrics are independent
full rationale
The paper defines neighbor-consistency as an explicit regularization term added to the training loss that penalizes filter differences at randomly sampled neighboring coordinates. The claimed reductions in RMS variation rate (up to 55.9% woofer, 30.3% tweeter) and worst-case isolation (16.9%) are obtained from a separate decoupled evaluation protocol that holds acoustic transfer functions fixed at physical anchors while only perturbing coordinate inputs, then computes neighborhood median/lower-tail statistics and spatial variation rates on held-out positions. These evaluation quantities are not algebraically or statistically identical to the training penalty; the method could have produced no improvement or degradation. No self-citations, uniqueness theorems, or fitted parameters renamed as predictions appear in the derivation. The chain is therefore self-contained empirical regularization followed by independent measurement.
Axiom & Free-Parameter Ledger
free parameters (2)
- regularization strength
- perturbation radius
axioms (2)
- domain assumption Acoustic transfer functions remain fixed when only coordinate inputs are perturbed in the decoupled evaluation protocol.
- domain assumption Neighborhood median and lower-tail statistics of isolation metrics are representative of real-world tracking error distributions.
Reference graph
Works this paper leans on
-
[1]
W. F. Druyvesteyn and J. Garas, “Personal sound,”J. Audio Eng. Soc., vol. 45, no. 9, pp. 685–701, 1997
work page 1997
-
[2]
Use of the Filtered-X least-mean-squares algorithm to adapt personal sound zones in a car cabin,
L. Vindrola, M. Melon, J.-C. Chamard, and B. Gazengel, “Use of the Filtered-X least-mean-squares algorithm to adapt personal sound zones in a car cabin,”J. Acoust. Soc. Am., vol. 150, no. 3, pp. 1779–1793, Sep. 2021
work page 2021
-
[3]
Personal sound zones: Delivering interface-free audio to multiple listeners,
T. Betlehem, W. Zhang, M. A. Poletti, and T. D. Abhayapala, “Personal sound zones: Delivering interface-free audio to multiple listeners,”IEEE Signal Process. Mag., vol. 32, no. 2, pp. 81–91, 2015
work page 2015
-
[4]
Design and evaluation of personal audio systems based on speech privacy constraints,
D. Wallace and J. Cheer, “Design and evaluation of personal audio systems based on speech privacy constraints,”J. Acoust. Soc. Am., vol. 147, no. 4, pp. 2271–2282, 2020
work page 2020
-
[5]
Living with sound zones: A long-term field study of dynamic sound zones in a domestic context,
R. M. Jacobsen, K. F. Skov, S. S. Johansen, M. B. Skov, and J. Kjeldskov, “Living with sound zones: A long-term field study of dynamic sound zones in a domestic context,” inProc. 2023 CHI Conf. Human Factors in Computing Systems (CHI), New York, NY , USA, 2023, pp. 1–14
work page 2023
-
[6]
Sound field reproduction using planar and linear arrays of loudspeakers,
J. Ahrens and S. Spors, “Sound field reproduction using planar and linear arrays of loudspeakers,”IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2038–2050, 2010
work page 2038
-
[7]
General metatheory of auditory localization,
M. A. Gerzon, “General metatheory of auditory localization,” inProc. Audio Eng. Soc. 92nd Conv., Vienna, Austria, 1982
work page 1982
-
[8]
Acoustic control by wave field synthesis,
A. J. Berkhout, D. de Vries, and P. V ogel, “Acoustic control by wave field synthesis,”J. Acoust. Soc. Am., vol. 93, no. 5, pp. 2764–2778, 1993
work page 1993
-
[9]
Reproduction of a plane-wave sound field using an array of loudspeakers,
D. B. Ward and T. D. Abhayapala, “Reproduction of a plane-wave sound field using an array of loudspeakers,”IEEE Trans. Speech Audio Process., vol. 9, no. 6, pp. 697–707, 2001
work page 2001
-
[10]
Three-dimensional surround sound systems based on spherical harmonics,
M. A. Poletti, “Three-dimensional surround sound systems based on spherical harmonics,”J. Audio Eng. Soc., vol. 53, no. 11, pp. 1004– 1025, 2005
work page 2005
-
[11]
Generation of an acoustically bright zone with an illuminated region using multiple sources,
J.-W. Choi and Y .-H. Kim, “Generation of an acoustically bright zone with an illuminated region using multiple sources,”J. Acoust. Soc. Am., vol. 111, no. 4, pp. 1695–1700, 2002
work page 2002
-
[12]
A realization of sound focused personal audio system using acoustic contrast control,
J.-H. Chang, C.-H. Lee, J.-Y . Park, and Y .-H. Kim, “A realization of sound focused personal audio system using acoustic contrast control,” J. Acoust. Soc. Am., vol. 125, no. 4, pp. 2091–2097, 2009
work page 2091
-
[13]
Spatial multizone soundfield reproduc- tion: Theory and design,
Y . J. Wu and T. D. Abhayapala, “Spatial multizone soundfield reproduc- tion: Theory and design,”IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 6, pp. 1711–1720, 2011
work page 2011
-
[14]
Weighted pressure matching with windowed targets for personal sound zones,
V . Mol ´es-Cases, S. J. Elliott, J. Cheer, G. Pi ˜nero, and A. Gonzalez, “Weighted pressure matching with windowed targets for personal sound zones,”J. Acoust. Soc. Am., vol. 151, no. 1, pp. 334–345, 2022
work page 2022
-
[15]
Design and implemen- tation of a car cabin personal audio system,
J. Cheer, S. J. Elliott, and M. F. Sim ´on G´alvez, “Design and implemen- tation of a car cabin personal audio system,”J. Audio Eng. Soc., vol. 61, no. 6, pp. 412–424, 2013
work page 2013
-
[16]
Controlled sound field with a dual layer loudspeaker array,
M. Shin, F. M. Fazi, P. A. Nelson, and F. C. Hirono, “Controlled sound field with a dual layer loudspeaker array,”J. Sound Vib., vol. 333, no. 16, pp. 3794–3817, 2014
work page 2014
-
[17]
Robustness and regularization of personal audio systems,
S. J. Elliott, J. Cheer, J.-W. Choi, and Y . Kim, “Robustness and regularization of personal audio systems,”IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 7, pp. 2123–2133, 2012
work page 2012
-
[18]
S. Doclo and M. Moonen, “Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics,” IEEE Trans. Signal Process., vol. 51, no. 10, pp. 2511–2526, 2003
work page 2003
-
[19]
M. R. Bai and C.-C. Chen, “Regularization using Monte Carlo simula- tion to make optimal beamformers robust to system perturbations,”J. Acoust. Soc. Am., vol. 135, no. 5, pp. 2808–2820, 2014
work page 2014
-
[20]
Robust acoustic contrast control with reduced in-situ measurement by acoustic modelling,
Q. Zhu, P. Coleman, M. Wu, and J. Yang, “Robust acoustic contrast control with reduced in-situ measurement by acoustic modelling,”J. Audio Eng. Soc., vol. 65, no. 6, pp. 460–473, 2017
work page 2017
-
[21]
CGMM-based sound zone generation using robust pressure matching with ATF perturbation constraints,
J. Zhang, L. Shi, M. G. Christensen, W. Zhang, L. Zhang, and J. Chen, “CGMM-based sound zone generation using robust pressure matching with ATF perturbation constraints,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 31, pp. 3331–3345, 2023
work page 2023
-
[22]
Robust reproduction of sound zones with local sound orientation,
Q. Zhu, P. Coleman, M. Wu, and J. Yang, “Robust reproduction of sound zones with local sound orientation,”J. Acoust. Soc. Am., vol. 142, no. 1, pp. EL118–EL122, 2017
work page 2017
-
[23]
Personal sound zones by subband filtering and time domain optimization,
V . Mol ´es-Cases, G. Pi ˜nero, M. de Diego, and A. Gonzalez, “Personal sound zones by subband filtering and time domain optimization,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 2684– 2696, 2020
work page 2020
-
[24]
Personal sound zones in the short-time Fourier transform domain with relaxed reverberation,
J. Tang, W. Zhu, and X. Li, “Personal sound zones in the short-time Fourier transform domain with relaxed reverberation,”J. Acoust. Soc. Am., vol. 157, no. 2, pp. 778–796, 2025
work page 2025
-
[25]
Digital filters design for personal sound zones: A neural approach,
G. Pepe, L. Gabrielli, S. Squartini, C. Tripodi, and N. Strozzi, “Digital filters design for personal sound zones: A neural approach,” inProc. Int. Joint Conf. Neural Netw. (IJCNN), Padua, Italy, 2022
work page 2022
-
[26]
SANN-PSZ: Spatially adaptive neural network for head-tracked personal sound zones,
Y . Qiao and E. Y . Choueiri, “SANN-PSZ: Spatially adaptive neural network for head-tracked personal sound zones,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 33, pp. 2735–2748, 2025
work page 2025
-
[27]
H. Jiang and E. Y . Choueiri, “Stereo audio rendering for personal sound zones using a binaural spatially adaptive neural network (BSANN),” arXiv preprint, Jan. 2026, arXiv:2601.06621. [Online]. Available: https://arxiv.org/abs/2601.06621
-
[28]
Isolation performance metrics for personal sound zone reproduction systems,
Y . Qiao, L. Guadagnin, and E. Y . Choueiri, “Isolation performance metrics for personal sound zone reproduction systems,”JASA Express Lett., vol. 2, no. 10, p. 104801, 2022
work page 2022
-
[29]
Temporal ensembling for semi-supervised learn- ing,
S. Laine and T. Aila, “Temporal ensembling for semi-supervised learn- ing,” inProc. Int. Conf. Learn. Represent. (ICLR), 2017
work page 2017
-
[30]
A. Tarvainen and H. Valpola, “Mean teachers are better role mod- els: Weight-averaged consistency targets improve semi-supervised deep learning results,” inAdvances in Neural Information Processing Systems, vol. 30, 2017
work page 2017
-
[31]
Virtual adversarial training: A regularization method for supervised and semi-supervised learning,
T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, “Virtual adversarial training: A regularization method for supervised and semi-supervised learning,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1979–1993, 2019
work page 1979
-
[32]
FixMatch: Simplifying semi- supervised learning with consistency and confidence,
K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C. Li, “FixMatch: Simplifying semi- supervised learning with consistency and confidence,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 596–608
work page 2020
-
[33]
X. Hu, J. Li, S. Zhang, S. Goetz, L. Picinali, O. B. Akan, and A. O. T. Hogg, “HRTFformer: A spatially-aware transformer for personalized HRTF upsampling in immersive audio rendering,” 2025, arXiv:2510.01891. [Online]. Available: https://arxiv.org/abs/2510.01891
work page internal anchor Pith review arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.