pith. sign in

arxiv: 2509.02571 · v2 · submitted 2025-08-20 · 📡 eess.AS · cs.AI· cs.LG· cs.SD· eess.SP

Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening

Pith reviewed 2026-05-18 22:28 UTC · model grok-4.3

classification 📡 eess.AS cs.AIcs.LGcs.SDeess.SP
keywords Gaussian process regressionsteering vectorsphysics-aware kernelsneural fieldsspeech enhancementbinaural renderingaugmented listeningarray signal processing
0
0 comments X

The pith

Gaussian process regression with a physics-aware composite kernel produces continuous steering vector models from far fewer measurements than deterministic super-resolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that steering vectors, which describe how a microphone array responds to sound from different directions, can be represented continuously over frequency and positions by embedding a neural field into a Gaussian process. The key step is a composite kernel that explicitly accounts for incoming directional waves and the scattering they undergo, turning the problem into a probabilistic regression that naturally handles uneven measurement uncertainty. A reader would care because this setup supports augmented listening applications such as speech enhancement and binaural rendering, reaching oracle-level performance in simulated SPEAR challenge data while requiring less than one-tenth the usual number of real measurements. The probabilistic framing replaces overfitting-prone deterministic upsampling with uncertainty-aware interpolation.

Core claim

We integrate an expressive representation based on the neural field into the principled probabilistic framework based on the Gaussian process. Specifically, we propose a physics-aware composite kernel that models the directional incoming waves and the subsequent scattering effect. Comprehensive experiments show that the resulting method attains oracle performances in downstream tasks such as speech enhancement and binaural rendering using the simulated data of the SPEAR challenge, with less than ten times fewer measurements.

What carries the argument

The physics-aware composite kernel inside the Gaussian process regression, which separately encodes directional wave propagation and subsequent scattering to produce continuous, uncertainty-aware steering-vector fields.

If this is right

  • Steering vectors become available as continuous functions of frequency, microphone position, and source direction rather than discrete tables.
  • Probabilistic regression replaces point-wise super-resolution, automatically down-weighting regions of high measurement uncertainty.
  • Downstream spatial filters and binaural renderers reach oracle quality with substantially sparser real-world calibration data.
  • User-parameterized control of reproduced sound fields becomes practical because the model supports arbitrary query locations without retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same kernel construction could be reused for other array-calibration tasks where scattering dominates, such as near-field source localization.
  • The built-in uncertainty estimates might be used to drive adaptive microphone placement or active sensing loops in future hardware.
  • Replacing the neural-field component with other expressive basis functions would allow direct comparison of representation power within the same probabilistic wrapper.

Load-bearing premise

Non-uniform uncertainty across the measurement space causes deterministic super-resolution to overfit, and the proposed composite kernel plus Gaussian process framework corrects this without adding new modeling errors.

What would settle it

On the SPEAR challenge simulated data, a head-to-head test in which the proposed method fails to match oracle downstream performance when restricted to fewer than one-tenth the usual measurement count would disprove the central claim.

Figures

Figures reproduced from arXiv: 2509.02571 by Bando Yoshiaki (AIST), Diego Di Carlo (RIKEN AIP), Fontaine Mathieu (S2A, IDS), Nugraha Aditya Arie (RIKEN AIP), Shoichi Koyama (UTokyo), Yoshii Kazuyoshi (RIKEN AIP).

Figure 1
Figure 1. Figure 1: A typical remixing workflow of augmented listening. Semantic and spatial information of the audio scene can be modified, but the spatial content must remain coherent to convey realism. hearing-impaired people to hear better what they attend to in real noisy echoic situations. As depicted in [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison between measured and algebraic steering vectors on the azimuthal plane, available in the SPEAR Challenge data. tion of the incoming sound [6]. Their estimation has been a key technique in spatial audio processing such as speech en￾hancement [9], sound source localization [10], and sound scene synthesis [11]. In practice, SVs encompass both the room im￾pulse response and the listener’s head-relat… view at source ↗
Figure 3
Figure 3. Figure 3: Measurement grid and reference system used in this study and illustration of the steering vector interpolation problem. III. PROPOSED METHOD In the frequency domain, the homogeneous Helmholtz equa￾tion describes the evolution of the complex acoustics pressure field h ∈ C as a function of position q ∈ R 3 and the angular frequency ω ∈ R as ∇2h(ω, q) + ω 2 c 2 h(ω, q) = 0, (1) where ∇2 is the 3-dimensional L… view at source ↗
Figure 4
Figure 4. Figure 4: Different upsampling models based on Neural Fields (NFs) considered in this work. The green area denotes the proposed model [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Figure 5a shows the Mollweide projection of observed co￾ordinates for two upsampling factors against the observation coordi￾nates in the EasyCom dataset. Stars and colors denote clusters and centroids for a quasi-uniform sampling. Figure 5b illustrates the sam￾pling strategy to select the validation data. To quantify the phase reconstruction in the time domain and the spatial similarity of the filters, we … view at source ↗
Figure 6
Figure 6. Figure 6: Interpolation results: normalized mean squared error (left) and cosine similarity per number of observed directions. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative results of interpolation: real part of the reconstructed steering vector at channel 2 at 2.5 kHz for 3 sampling factors and different methods. On the left, the ground-truth data and the steering vector are computed with the algebraic model. 0 100 200 300 Azimuth ϑ [◦ ] 0 50 100 150 Elevation ϕ [◦ ] 8 → 1020 0 100 200 300 Azimuth ϑ [◦ ] 16 → 1020 0 100 200 300 Azimuth ϑ [◦ ] 32 → 1020 0 100 200 … view at source ↗
Figure 8
Figure 8. Figure 8: Uncertainty quantification as standard deviation of the predicted steering vector at channel 2 at 2.5 kHz of the proposed model (GP￾Steerer) for different upsampling factors. White dots denote measurement locations. tasks like frontal beamforming. However, their performance rapidly degrades with distance. In contrast, learning-based mod￾els with embedded priors (NF-GW, PINN, GP-Chmat) gener￾alize better at… view at source ↗
Figure 9
Figure 9. Figure 9: (bottom) shows the nMSE averaged across configu￾rations over positive frequency bins. As expected, performance degrades with frequency, reflecting the spatial resolution lim￾its governed by sampling density [75]. The SH baseline per￾forms well at low frequencies—interpolating below 2 kHz with nMSE better than −15 dB using 64 observations—but overfits at higher frequencies, introducing spurious artifacts (p… view at source ↗
Figure 10
Figure 10. Figure 10: Relationship between interpolation quality (nMSE, CSIM) and speech enhancement performance (SDR, SAR, ISR, fwSegSNR, PESQ, MBSTOI) across selected methods. Enhancement metrics are reported as improvements over the unprocessed baseline. Marker size reflects the upsampling factor. Horizontal line denotes oracle methods NN-Oracle. 0.25 0.50 0.75 1.00 1.25 Source at 0 ◦ 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0… view at source ↗
Figure 11
Figure 11. Figure 11: Beampatterns of a super-directive MVDR at the 2 kHz. Each panel corresponds to a different source direction and upsampling factor. interpolation loss or its sensitivity to spectral balance in the covariance model. Despite this, the results confirm that com￾bining GP regression with physically structured kernels can significantly enhance the generalization ability of neural field approaches. Moreover, orac… view at source ↗
read the original abstract

This paper investigates continuous representations of steering vectors over frequency and microphone/source positions for augmented listening (e.g., spatial filtering and binaural rendering), enabling user-parameterized control of the reproduced sound field. Steering vectors have typically been used for representing the spatial response of a microphone array as a function of the look-up direction. The basic algebraic representation of these quantities assuming an idealized environment cannot deal with the scattering effect of the sound field. One may thus collect a discrete set of real steering vectors measured in dedicated facilities and super-resolve (i.e., upsample) them. Recently, physics-aware deep learning methods have been effectively used for this purpose. Such deterministic super-resolution, however, suffers from the overfitting problem due to the non-uniform uncertainty over the measurement space. To solve this problem, we integrate an expressive representation based on the neural field (NF) into the principled probabilistic framework based on the Gaussian process (GP). Specifically, we propose a physics-aware composite kernel that models the directional incoming waves and the subsequent scattering effect. Our comprehensive comparative experiment showed the effectiveness of the proposed method under data insufficiency conditions. In downstream tasks such as speech enhancement and binaural rendering using the simulated data of the SPEAR challenge, the oracle performances were attained with less than ten times fewer measurements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes integrating a neural field representation into a Gaussian process (GP) regression framework for continuous modeling of steering vectors over frequency and microphone/source positions. A physics-aware composite kernel is introduced to capture directional incoming waves and subsequent scattering effects, addressing overfitting in deterministic super-resolution methods caused by non-uniform measurement uncertainty. Comparative experiments under data-insufficient conditions are reported, along with downstream results on the SPEAR challenge simulated data where oracle performance in speech enhancement and binaural rendering is attained with less than ten times fewer measurements.

Significance. If the central results hold after addressing the noted gaps, the work offers a principled probabilistic alternative to deterministic neural-field super-resolution for acoustic steering vectors, potentially reducing measurement burden in augmented listening applications. The explicit incorporation of physics into the kernel while retaining GP uncertainty quantification is a constructive step beyond purely data-driven approaches, and the SPEAR downstream evaluations provide a practical testbed for the method's utility.

major comments (2)
  1. [Kernel parameterization and training (methods section describing the composite kernel)] The central claim that the composite kernel mitigates non-uniform uncertainty without introducing new modeling bias rests on the learned kernel preserving physical structure. No explicit post-training verification is described for properties such as reciprocity, far-field decay, or consistency with the Helmholtz equation; without such checks or regularization, the deep component risks reintroducing flexible fitting to sparse samples, which directly affects the reliability of the reported oracle downstream performance.
  2. [Experimental evaluation and downstream tasks (SPEAR challenge results)] The abstract and results claim oracle-level performance in speech enhancement and binaural rendering with <10x fewer measurements. This requires supporting details on uncertainty calibration, statistical significance testing across runs, and error analysis to confirm the improvement is attributable to the physics-aware GP rather than simulation specifics or lack of baseline regularization.
minor comments (1)
  1. [Methods] Notation for the composite kernel components (e.g., how the directional-wave term and scattering term are combined) could be clarified with an explicit equation early in the methods to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We provide point-by-point responses to the major comments below and outline the revisions we intend to make to address the raised concerns.

read point-by-point responses
  1. Referee: [Kernel parameterization and training (methods section describing the composite kernel)] The central claim that the composite kernel mitigates non-uniform uncertainty without introducing new modeling bias rests on the learned kernel preserving physical structure. No explicit post-training verification is described for properties such as reciprocity, far-field decay, or consistency with the Helmholtz equation; without such checks or regularization, the deep component risks reintroducing flexible fitting to sparse samples, which directly affects the reliability of the reported oracle downstream performance.

    Authors: We agree that explicit post-training verification would strengthen the claim regarding preservation of physical structure. The composite kernel is constructed from physics-aware components intended to capture directional waves and scattering, and the GP framework provides uncertainty quantification that helps mitigate overfitting to non-uniform measurement noise. However, we did not report post-training checks such as reciprocity or Helmholtz consistency in the original manuscript. In the revision we will add these verifications on held-out data together with a regularization term that penalizes deviations from expected physical behavior, thereby reducing the risk that the deep component reintroduces flexible fitting. revision: yes

  2. Referee: [Experimental evaluation and downstream tasks (SPEAR challenge results)] The abstract and results claim oracle-level performance in speech enhancement and binaural rendering with <10x fewer measurements. This requires supporting details on uncertainty calibration, statistical significance testing across runs, and error analysis to confirm the improvement is attributable to the physics-aware GP rather than simulation specifics or lack of baseline regularization.

    Authors: We acknowledge that additional statistical and calibration details are necessary to support the downstream performance claims. The reported oracle-level results on the SPEAR simulated data were obtained under data-insufficient conditions with the proposed physics-aware GP. In the revised manuscript we will include uncertainty calibration metrics, results aggregated over multiple independent runs with statistical significance testing, and a breakdown of errors by frequency and spatial position. These additions will help demonstrate that the observed gains are attributable to the composite kernel and GP uncertainty modeling rather than simulation artifacts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard GP regression with proposed composite kernel yields independent empirical claims.

full rationale

The derivation introduces a physics-aware composite kernel inside a Gaussian process to model directional waves plus scattering and thereby mitigate non-uniform measurement uncertainty. No quoted equations reduce the reported downstream oracle performance (SPEAR challenge speech enhancement and binaural rendering) to fitted parameters or self-citations by construction. The GP framework is standard, the kernel is presented as an additive modeling choice rather than a tautology, and results are framed as comparative experiments under data insufficiency. No self-citation load-bearing step, uniqueness theorem, or ansatz smuggling is exhibited that collapses the central claim to its inputs. This is the normal self-contained case.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate specific free parameters or invented entities; the core modeling assumptions rest on standard wave propagation physics and the validity of the composite kernel form.

axioms (1)
  • domain assumption Steering vectors can be represented as a continuous function of frequency and spatial positions with separable directional and scattering components
    Invoked to justify the composite kernel design for modeling real acoustic environments.

pith-pipeline@v0.9.0 · 5817 in / 1247 out tokens · 61645 ms · 2026-05-18T22:28:54.402034+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

93 extracted references · 93 canonical work pages

  1. [1]

    Microphone array processing for augmented listening,

    R. M. Corey, “Microphone array processing for augmented listening,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, 2019

  2. [2]

    Personal sound zones: Delivering interface-free audio to multiple listeners,

    T. Betlehem, W. Zhang, M. A. Poletti, and T. D. Abhayapala, “Personal sound zones: Delivering interface-free audio to multiple listeners,” IEEE Signal Process. Mag. , vol. 32, no. 2, pp. 81–91, 2015

  3. [3]

    An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction,

    M. Cobos, J. Ahrens, K. Kowalczyk, and A. Politis, “An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction,” EURASIP J. Audio, Speech, Music Process., vol. 2022, no. 1, p. 10, 2022

  4. [4]

    Physics-informed machine learning for sound field estimation: Funda- mentals, state of the art, and challenges,

    S. Koyama, J. G. Ribeiro, T. Nakamura, N. Ueno, and M. Pezzoli, “Physics-informed machine learning for sound field estimation: Funda- mentals, state of the art, and challenges,” IEEE Signal Process. Mag. , vol. 41, no. 6, pp. 60–71, 2025

  5. [5]

    Differentiable artificial reverberation,

    S. Lee, H.-S. Choi, and K. Lee, “Differentiable artificial reverberation,” IEEE/ACM Trans. Audio, Speech, Language Process. , vol. 30, pp. 2541– 2556, 2022

  6. [6]

    H. L. Van Trees, Optimum array processing: Part IV of detection, esti- mation, and modulation theory . John Wiley & Sons, 2002

  7. [7]

    The SPEAR challenge – review of results,

    V . Tourbabin, P. Guiraud, S. Hafezi, P. A. Naylor, A. H. Moore, J. Donley, and T. Lunner, “The SPEAR challenge – review of results,” in Proc. Forum Acusticum, 2023, pp. 623–629

  8. [8]

    Subspace hybrid beamforming for head-worn microphone arrays,

    S. Hafezi, A. H. Moore, P. Guiraud, P. A. Naylor, J. Donley, V . Tourbabin, and T. Lunner, “Subspace hybrid beamforming for head-worn microphone arrays,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2023, pp. 1–5

  9. [9]

    A consoli- dated perspective on multimicrophone speech enhancement and source separation,

    S. Gannot, E. Vincent, S. Markovich-Golan, and A. Ozerov, “A consoli- dated perspective on multimicrophone speech enhancement and source separation,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 25, no. 4, pp. 692–730, 2017

  10. [10]

    Theoretical analysis of beamforming steering vector for- mulations for acoustic source localization,

    G. Chardon, “Theoretical analysis of beamforming steering vector for- mulations for acoustic source localization,” J. Sound Vibration, vol. 517, p. 116544, 2022

  11. [11]

    Surround by sound: A review of spatial audio recording and reproduction,

    W. Zhang, P. N. Samarasinghe, H. Chen, and T. D. Abhayapala, “Surround by sound: A review of spatial audio recording and reproduction,” Appl. Sci., vol. 7, no. 5, p. 532, 2017

  12. [12]

    Neural fields in visual comput- ing and beyond,

    Y . Xie, T. Takikawa, S. Saito, O. Litany, S. Yan, N. Khan, F. Tombari, J. Tompkin, V . Sitzmann, and S. Sridhar, “Neural fields in visual comput- ing and beyond,” Comput. Graphics Forum, vol. 41, no. 2, pp. 641–676, 2022

  13. [13]

    Fifty years of artificial reverberation,

    V . Valimaki, J. D. Parker, L. Savioja, J. O. Smith, and J. S. Abel, “Fifty years of artificial reverberation,” IEEE Audio, Speech, Language Process., vol. 20, no. 5, pp. 1421–1448, 2012

  14. [14]

    RIR-in-a-Box: Estimating room acoustics from 3D mesh data through shoebox approximation,

    L. Kelley, D. Di Carlo, A. A. Nugraha, M. Fontaine, Y . Bando, and K. Yoshii, “RIR-in-a-Box: Estimating room acoustics from 3D mesh data through shoebox approximation,” in Proc. INTERSPEECH, 2024, pp. 3255–3259

  15. [15]

    Speech derever- beration constrained on room impulse response characteristics,

    L. Bahrman, M. Fontaine, J. Le Roux, and G. Richard, “Speech derever- beration constrained on room impulse response characteristics,” in Proc. INTERSPEECH, 2024, pp. 622–626

  16. [16]

    Neural full-rank spatial covariance analysis for blind source separation,

    Y . Bando, K. Sekiguchi, Y . Masuyama, A. A. Nugraha, M. Fontaine, and K. Yoshii, “Neural full-rank spatial covariance analysis for blind source separation,” IEEE Signal Process. Lett. , vol. 28, pp. 1670–1674, 2021

  17. [17]

    Analysis and design of spherical microphone arrays,

    B. Rafaely, “Analysis and design of spherical microphone arrays,” IEEE Trans. Speech Audio Process. , vol. 13, no. 1, pp. 135–143, 2004

  18. [18]

    Comparison of spherical harmonics and nearest-neighbor based interpolation of head- related transfer functions,

    C. P ¨orschmann, J. M. Arend, D. Bau, and T. L ¨ubeck, “Comparison of spherical harmonics and nearest-neighbor based interpolation of head- related transfer functions,” in Proc. AES Int. Conf. Audio Virtual Aug- mented Reality, 2020

  19. [19]

    Gaussian processes for sound field recon- struction,

    D. Caviedes-Nozal, N. A. Riis, F. M. Heuchel, J. Brunskog, P. Gerstoft, and E. Fernandez-Grande, “Gaussian processes for sound field recon- struction,” J. Acoust. Soc. Amer. , vol. 149, no. 2, pp. 1107–1119, 2021

  20. [20]

    Sound field estimation based on physics-constrained kernel interpolation adapted to environment,

    J. G. Ribeiro, S. Koyama, R. Horiuchi, and H. Saruwatari, “Sound field estimation based on physics-constrained kernel interpolation adapted to environment,” IEEE/ACM Trans. Audio, Speech, Language Process. , 2024

  21. [21]

    Compressive sens- ing in acoustic imaging,

    N. Bertin, L. Daudet, V . Emiya, and R. Gribonval, “Compressive sens- ing in acoustic imaging,” in Compressed Sensing and its Applications: MATHEON Workshop 2013. Springer, 2015, pp. 169–192

  22. [22]

    Sparse representation of a spatial sound field in a reverberant environment,

    S. Koyama and L. Daudet, “Sparse representation of a spatial sound field in a reverberant environment,” IEEE J. Sel. Topics Signal Process. , vol. 13, no. 1, pp. 172–184, 2019

  23. [23]

    Sparsity-based sound field separation in the spherical harmonics domain,

    M. Pezzoli, M. Cobos, F. Antonacci, and A. Sarti, “Sparsity-based sound field separation in the spherical harmonics domain,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2022, pp. 1051–1055

  24. [24]

    Room impulse response interpo- lation from a sparse set of measurements using a modal architecture,

    O. Das, P. Calamia, and S. V . A. Gari, “Room impulse response interpo- lation from a sparse set of measurements using a modal architecture,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2021, pp. 960– 964

  25. [25]

    Room impulse response interpolation using a sparse spatio- temporal representation of the sound field,

    N. Antonello, E. De Sena, M. Moonen, P. A. Naylor, and T. Van Wa- terschoot, “Room impulse response interpolation using a sparse spatio- temporal representation of the sound field,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 25, no. 10, pp. 1929–1941, 2017

  26. [26]

    Sound field reconstruction in rooms: Inpainting meets super-resolution,

    F. Lluis, P. Martinez-Nuevo, M. Bo Møller, and S. Ewan Shepstone, “Sound field reconstruction in rooms: Inpainting meets super-resolution,” J. Acoust. Soc. Amer. , vol. 148, no. 2, pp. 649–659, 2020

  27. [27]

    Deep sound field reconstruction in real rooms: Introducing the ISOBEL sound field dataset,

    M. S. Kristoffersen, M. B. Møller, P. Mart ´ınez-Nuevo, and J. Østergaard, “Deep sound field reconstruction in real rooms: Introducing the ISOBEL sound field dataset,” arXiv preprint arXiv:2102.06455 , 2021

  28. [28]

    Generative adversarial networks with physical sound field priors,

    X. Karakonstantis and E. Fernandez-Grande, “Generative adversarial networks with physical sound field priors,” J. Acoust. Soc. Amer. , vol. 154, no. 2, pp. 1226–1238, 2023

  29. [29]

    Deep prior approach for room impulse response reconstruction,

    M. Pezzoli, D. Perini, A. Bernardini, F. Borra, F. Antonacci, and A. Sarti, “Deep prior approach for room impulse response reconstruction,” Sensors, vol. 22, no. 7, p. 2710, 2022

  30. [30]

    Reconstruction of sound field through diffusion models,

    F. Miotello, L. Comanducci, M. Pezzoli, A. Bernardini, F. Antonacci, and A. Sarti, “Reconstruction of sound field through diffusion models,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2024, pp. 1476–1480

  31. [31]

    FRA-RIR: Fast random approximation of the image- source method,

    Y . Luo and J. Yu, “FRA-RIR: Fast random approximation of the image- source method,” in Proc. INTERSPEECH, 2023, pp. 3884–3888

  32. [32]

    Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responses,

    M. Pezzoli, F. Antonacci, A. Sarti et al., “Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responses,” in Proc. Forum Acusticum, 2023, pp. 2177–2184

  33. [33]

    Sound field reconstruction using a compact acoustics-informed neural network,

    F. Ma, S. Zhao, and I. S. Burnett, “Sound field reconstruction using a compact acoustics-informed neural network,” J. Acoust. Soc. Amer. , vol. 156, no. 3, pp. 2009–2021, 2024

  34. [34]

    Room impulse response reconstruction with physics-informed deep learning,

    X. Karakonstantis, D. Caviedes-Nozal, A. Richard, and E. Fernandez- Grande, “Room impulse response reconstruction with physics-informed deep learning,” J. Acoust. Soc. Amer. , vol. 155, no. 2, pp. 1048–1059, 2024

  35. [35]

    Spatio-temporal Bayesian regression for room impulse response reconstruction with spherical waves,

    D. Caviedes-Nozal and E. Fernandez-Grande, “Spatio-temporal Bayesian regression for room impulse response reconstruction with spherical waves,” IEEE/ACM Trans. Audio, Speech, Language Process. , 2023

  36. [36]

    Head-related transfer function interpolation from spatially sparse measurements using autoencoder with source position conditioning,

    Y . Ito, T. Nakamura, S. Koyama, and H. Saruwatari, “Head-related transfer function interpolation from spatially sparse measurements using autoencoder with source position conditioning,” in Proc. Int. Workshop Acoust. Signal Enhancement , 2022, pp. 1–5

  37. [37]

    Room impulse response re- construction using pattern-coupled sparse Bayesian learning with spheri- cal waves,

    X. Feng, J. Cheng, S. Chen, and Y . Shen, “Room impulse response re- construction using pattern-coupled sparse Bayesian learning with spheri- cal waves,” IEEE Signal Process. Lett. , 2024

  38. [38]

    Point neuron learning: A new physics- informed neural network architecture,

    H. Bi and T. D. Abhayapala, “Point neuron learning: A new physics- informed neural network architecture,” EURASIP J. Audio, Speech, Music Process., vol. 2024, no. 1, p. 56, 2024

  39. [39]

    On the apparent Pareto front of physics-informed neural networks,

    F. M. Rohrhofer, S. Posch, C. G ¨oßnitzer, and B. C. Geiger, “On the apparent Pareto front of physics-informed neural networks,” IEEE Access, 2023

  40. [40]

    C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning. MIT press Cambridge, MA, 2006, vol. 2, no. 3

  41. [41]

    A review on head-related transfer function generation for spatial audio,

    V . Bruschi, L. Grossi, N. A. Dourou, A. Quattrini, A. Vancheri, T. Leidi, and S. Cecchi, “A review on head-related transfer function generation for spatial audio,” Applied Sciences, vol. 14, no. 23, p. 11242, 2024

  42. [42]

    Recovery of individual head-related transfer functions from a small set of measurements,

    B.-S. Xie, “Recovery of individual head-related transfer functions from a small set of measurements,” J. Acoust. Soc. Amer. , vol. 132, no. 1, pp. 282–294, 2012

  43. [43]

    Modeling individual head- related transfer functions from sparse measurements using a convolutional neural network,

    Z. Jiang, J. Sang, C. Zheng, A. Li, and X. Li, “Modeling individual head- related transfer functions from sparse measurements using a convolutional neural network,” J. Acoust. Soc. Amer., vol. 153, no. 1, pp. 248–259, 2023

  44. [44]

    Implicit HRTF modeling using temporal convolu- tional networks,

    I. D. Gebru, D. Markovi ´c, A. Richard, S. Krenn, G. A. Butler, F. De la Torre, and Y . Sheikh, “Implicit HRTF modeling using temporal convolu- tional networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Pro- cess., 2021, pp. 3385–3389

  45. [45]

    HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection,

    A. O. Hogg, M. Jenkins, H. Liu, I. Squires, S. J. Cooper, and L. Pic- inali, “HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection,” IEEE/ACM Trans. Audio, Speech, Language Process., 2024

  46. [46]

    Autoencoding hrtfs for DNN based HRTF personalization using anthropometric features,

    T.-Y . Chen, T.-H. Kuo, and T.-S. Chi, “Autoencoding hrtfs for DNN based HRTF personalization using anthropometric features,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2019, pp. 271–275. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, JANUARY 2025 13

  47. [47]

    HRTF field: Unifying measured HRTF magnitude representation with neural fields,

    Y . Zhang, Y . Wang, and Z. Duan, “HRTF field: Unifying measured HRTF magnitude representation with neural fields,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2023, pp. 1–5

  48. [48]

    The manifolds of spatial hearing,

    R. Duraiswami and V . C. Raykar, “The manifolds of spatial hearing,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , vol. 3, 2005, pp. iii–285

  49. [49]

    Acoustic space learning for sound-source separation and localization on binaural manifolds,

    A. Deleforge, F. Forbes, and R. Horaud, “Acoustic space learning for sound-source separation and localization on binaural manifolds,” Int. J. of Neural Systems , vol. 25, no. 01, p. 21, 2015

  50. [50]

    Interpolation of head-related transfer functions using manifold learning,

    F. Grijalva, L. C. Martini, D. Florencio, and S. Goldenstein, “Interpolation of head-related transfer functions using manifold learning,” IEEE Signal Process. Lett., vol. 24, no. 2, pp. 221–225, 2017

  51. [51]

    Virtual sound source positioning using vector base amplitude panning,

    V . Pulkki, “Virtual sound source positioning using vector base amplitude panning,” Journal of the audio engineering society , vol. 45, no. 6, pp. 456–466, 1997

  52. [52]

    Kernel regression for head-related transfer function interpolation and spectral extrema extraction,

    Y . Luo, D. N. Zotkin, H. Daume, and R. Duraiswami, “Kernel regression for head-related transfer function interpolation and spectral extrema extraction,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2013, pp. 256–260

  53. [53]

    Head- related transfer function interpolation with a spherical cnn,

    X. Chen, F. Ma, Y . Zhang, A. Bastine, and P. N. Samarasinghe, “Head- related transfer function interpolation with a spherical cnn,”arXiv preprint arXiv:2309.08290, 2023

  54. [54]

    Regularized HRTF fitting using spherical harmonics,

    D. N. Zotkin, R. Duraiswami, and N. A. Gumerov, “Regularized HRTF fitting using spherical harmonics,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. , 2009, pp. 257–260

  55. [55]

    Interpolation of head-related transfer functions based on the common-acoustical-pole and residue model,

    K. Watanabe, S. Takane, and Y . Suzuki, “Interpolation of head-related transfer functions based on the common-acoustical-pole and residue model,” Acoust. Sci. Technol., vol. 24, no. 5, pp. 335–337, 2003

  56. [56]

    Spatial interpolation of hrtfs approximated by parametric iir filters,

    P. Nowak and U. Z ¨olzer, “Spatial interpolation of hrtfs approximated by parametric iir filters,” in Proc. DAGA, 2022

  57. [57]

    Neural fourier shift for binaural speech rendering,

    J. W. Lee and K. Lee, “Neural fourier shift for binaural speech rendering,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2023, pp. 1–5

  58. [58]

    Niirf: Neural iir filter field for HRTF upsampling and personalization,

    Y . Masuyama, G. Wichern, F. G. Germain, Z. Pan, S. Khurana, C. Hori, and J. Le Roux, “Niirf: Neural iir filter field for HRTF upsampling and personalization,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2024, pp. 1016–1020

  59. [59]

    Analyzing head-related transfer function measurements using surface spherical harmonics,

    M. J. Evans, J. A. Angus, and A. I. Tew, “Analyzing head-related transfer function measurements using surface spherical harmonics,” J. Acoust. Soc. Amer., vol. 104, no. 4, pp. 2400–2411, 1998

  60. [60]

    Interpolation and range extrapolation of HRTFs [head related transfer functions],

    R. Duraiswami, D. N. Zotkin, and N. A. Gumerov, “Interpolation and range extrapolation of HRTFs [head related transfer functions],” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , vol. 4, 2004, pp. iv–iv

  61. [61]

    HRTF magnitude modeling using a non-regularized least-squares fit of spherical harmonics coef- ficients on incomplete data,

    J. Ahrens, M. R. Thomas, and I. Tashev, “HRTF magnitude modeling using a non-regularized least-squares fit of spherical harmonics coef- ficients on incomplete data,” in Proc. Asia Pacific Signal Inf. Process. Assoc. Annu. Summit Conf. , 2012, pp. 1–5

  62. [62]

    Directional equalization of sparse head-related transfer function sets for spatial upsampling,

    C. P¨orschmann, J. M. Arend, and F. Brinkmann, “Directional equalization of sparse head-related transfer function sets for spatial upsampling,” IEEE/ACM Trans. Audio, Speech, Language Process. , vol. 27, no. 6, pp. 1060–1071, 2019

  63. [63]

    Efficient representa- tion and sparse sampling of head-related transfer functions using phase- correction based on ear alignment,

    Z. Ben-Hur, D. L. Alon, R. Mehra, and B. Rafaely, “Efficient representa- tion and sparse sampling of head-related transfer functions using phase- correction based on ear alignment,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 27, no. 12, pp. 2249–2262, 2019

  64. [64]

    Assessing spherical harmonics interpolation of time-aligned head-related transfer functions,

    J. M. Arend, F. Brinkmann, and C. P ¨orschmann, “Assessing spherical harmonics interpolation of time-aligned head-related transfer functions,” J. Audio Eng. Soc. , vol. 69, no. 1/2, pp. 104–117, 2021

  65. [65]

    Sound- field reconstruction in reverberant rooms based on compressive sensing and image-source models of early reflections,

    S. Damiano, F. Borra, A. Bernardini, F. Antonacci, and A. Sarti, “Sound- field reconstruction in reverberant rooms based on compressive sensing and image-source models of early reflections,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. , 2021, pp. 366–370

  66. [66]

    A bayesian framework for the estimation of head-related transfer functions,

    G. D. Romigh, R. M. Stern, D. S. Brungart, and B. D. Simpson, “A bayesian framework for the estimation of head-related transfer functions,” J. Acoust. Soc. Amer., vol. 137, no. 4 Supplement, pp. 2323–2323, 2015

  67. [67]

    J. M. Arend and C. P ¨orschmann, Spatial upsampling of sparse head- related transfer function sets by directional equalization-influence of the spherical sampling scheme . Universit ¨atsbibliothek der RWTH Aachen Aachen, Germany, 2019

  68. [68]

    Comparison of head-related transfer functions pre-processing techniques for spherical harmonics decomposi- tion,

    F. Brinkmann and S. Weinzierl, “Comparison of head-related transfer functions pre-processing techniques for spherical harmonics decomposi- tion,” in Proc. AES Int. Conf. Audio Virtual Augmented Reality , 2018

  69. [69]

    Manikas, Differential geometry in array processing

    A. Manikas, Differential geometry in array processing. Imperial College Press, 2004

  70. [70]

    An overview of self-calibration in sensor array processing,

    L. Qiong, G. Long, and Y . Zhongfu, “An overview of self-calibration in sensor array processing,” in Proc. Int. Symp. Antennas Propag. EM Theory, 2003, pp. 279–282

  71. [71]

    Multichannel eigenspace beam- forming in a reverberant noisy environment with multiple interfering speech signals,

    S. Markovich, S. Gannot, and I. Cohen, “Multichannel eigenspace beam- forming in a reverberant noisy environment with multiple interfering speech signals,” IEEE Audio, Speech, Language Process. , vol. 17, no. 6, pp. 1071–1086, 2009

  72. [72]

    Joint audio source localization and separation with distributed microphone arrays based on spatially-regularized multichannel nmf,

    Y . Sumura, D. Di Carlo, A. A. Nugraha, Y . Bando, and K. Yoshii, “Joint audio source localization and separation with distributed microphone arrays based on spatially-regularized multichannel nmf,” in Proc. Int. Workshop Acoust. Signal Enhancement , 2024, pp. 145–149

  73. [73]

    Data-driven multi- microphone speaker localization on manifolds,

    B. Laufer-Goldshtein, R. Talmon, S. Gannot et al., “Data-driven multi- microphone speaker localization on manifolds,” Foundations and Trends® in Signal Processing , vol. 14, no. 1–2, pp. 1–161, 2020

  74. [74]

    The inverse scattering problem for time-harmonic acoustic waves,

    D. Colton, “The inverse scattering problem for time-harmonic acoustic waves,” SIAM review, vol. 26, no. 3, pp. 323–350, 1984

  75. [75]

    E. G. Williams, Fourier acoustics: sound radiation and nearfield acous- tical holography. Academic press, 1999

  76. [76]

    Sound field estimation: Theories and appli- cations,

    N. Ueno, S. Koyama et al., “Sound field estimation: Theories and appli- cations,” Foundations and Trends® in Signal Processing , vol. 19, no. 1, pp. 1–98, 2025

  77. [77]

    Gaussian process data fusion for heterogeneous HRTF datasets,

    Y . Luo, D. N. Zotkin, and R. Duraiswami, “Gaussian process data fusion for heterogeneous HRTF datasets,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. , 2013, pp. 1–4

  78. [78]

    Range dependence of the response of a spherical head model,

    R. O. Duda and W. L. Martens, “Range dependence of the response of a spherical head model,” J. Acoust. Soc. Amer. , vol. 104, no. 5, pp. 3048– 3058, 1998

  79. [79]

    Learning in sinusoidal spaces with physics-informed neural networks,

    J. C. Wong, C. C. Ooi, A. Gupta, and Y .-S. Ong, “Learning in sinusoidal spaces with physics-informed neural networks,” IEEE Trans. Artif. Intell., vol. 5, no. 3, pp. 985–1000, 2022

  80. [80]

    Heat flow from the earth’s interior: analysis of the global data set,

    H. N. Pollack, S. J. Hurter, and J. R. Johnson, “Heat flow from the earth’s interior: analysis of the global data set,” Reviews of Geophysics , vol. 31, no. 3, pp. 267–280, 1993

Showing first 80 references.