Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening

Bando Yoshiaki (AIST); Diego Di Carlo (RIKEN AIP); Fontaine Mathieu (S2A; IDS); Nugraha Aditya Arie (RIKEN AIP); Shoichi Koyama (UTokyo); Yoshii Kazuyoshi (RIKEN AIP)

arxiv: 2509.02571 · v2 · submitted 2025-08-20 · 📡 eess.AS · cs.AI· cs.LG· cs.SD· eess.SP

Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening

Diego Di Carlo (RIKEN AIP) , Shoichi Koyama (UTokyo) , Nugraha Aditya Arie (RIKEN AIP) , Fontaine Mathieu (S2A , IDS) , Bando Yoshiaki (AIST) , Yoshii Kazuyoshi (RIKEN AIP) This is my paper

Pith reviewed 2026-05-18 22:28 UTC · model grok-4.3

classification 📡 eess.AS cs.AIcs.LGcs.SDeess.SP

keywords Gaussian process regressionsteering vectorsphysics-aware kernelsneural fieldsspeech enhancementbinaural renderingaugmented listeningarray signal processing

0 comments

The pith

Gaussian process regression with a physics-aware composite kernel produces continuous steering vector models from far fewer measurements than deterministic super-resolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that steering vectors, which describe how a microphone array responds to sound from different directions, can be represented continuously over frequency and positions by embedding a neural field into a Gaussian process. The key step is a composite kernel that explicitly accounts for incoming directional waves and the scattering they undergo, turning the problem into a probabilistic regression that naturally handles uneven measurement uncertainty. A reader would care because this setup supports augmented listening applications such as speech enhancement and binaural rendering, reaching oracle-level performance in simulated SPEAR challenge data while requiring less than one-tenth the usual number of real measurements. The probabilistic framing replaces overfitting-prone deterministic upsampling with uncertainty-aware interpolation.

Core claim

We integrate an expressive representation based on the neural field into the principled probabilistic framework based on the Gaussian process. Specifically, we propose a physics-aware composite kernel that models the directional incoming waves and the subsequent scattering effect. Comprehensive experiments show that the resulting method attains oracle performances in downstream tasks such as speech enhancement and binaural rendering using the simulated data of the SPEAR challenge, with less than ten times fewer measurements.

What carries the argument

The physics-aware composite kernel inside the Gaussian process regression, which separately encodes directional wave propagation and subsequent scattering to produce continuous, uncertainty-aware steering-vector fields.

If this is right

Steering vectors become available as continuous functions of frequency, microphone position, and source direction rather than discrete tables.
Probabilistic regression replaces point-wise super-resolution, automatically down-weighting regions of high measurement uncertainty.
Downstream spatial filters and binaural renderers reach oracle quality with substantially sparser real-world calibration data.
User-parameterized control of reproduced sound fields becomes practical because the model supports arbitrary query locations without retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same kernel construction could be reused for other array-calibration tasks where scattering dominates, such as near-field source localization.
The built-in uncertainty estimates might be used to drive adaptive microphone placement or active sensing loops in future hardware.
Replacing the neural-field component with other expressive basis functions would allow direct comparison of representation power within the same probabilistic wrapper.

Load-bearing premise

Non-uniform uncertainty across the measurement space causes deterministic super-resolution to overfit, and the proposed composite kernel plus Gaussian process framework corrects this without adding new modeling errors.

What would settle it

On the SPEAR challenge simulated data, a head-to-head test in which the proposed method fails to match oracle downstream performance when restricted to fewer than one-tenth the usual measurement count would disprove the central claim.

Figures

Figures reproduced from arXiv: 2509.02571 by Bando Yoshiaki (AIST), Diego Di Carlo (RIKEN AIP), Fontaine Mathieu (S2A, IDS), Nugraha Aditya Arie (RIKEN AIP), Shoichi Koyama (UTokyo), Yoshii Kazuyoshi (RIKEN AIP).

**Figure 1.** Figure 1: A typical remixing workflow of augmented listening. Semantic and spatial information of the audio scene can be modified, but the spatial content must remain coherent to convey realism. hearing-impaired people to hear better what they attend to in real noisy echoic situations. As depicted in [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison between measured and algebraic steering vectors on the azimuthal plane, available in the SPEAR Challenge data. tion of the incoming sound [6]. Their estimation has been a key technique in spatial audio processing such as speech enhancement [9], sound source localization [10], and sound scene synthesis [11]. In practice, SVs encompass both the room impulse response and the listener’s head-relat… view at source ↗

**Figure 3.** Figure 3: Measurement grid and reference system used in this study and illustration of the steering vector interpolation problem. III. PROPOSED METHOD In the frequency domain, the homogeneous Helmholtz equation describes the evolution of the complex acoustics pressure field h ∈ C as a function of position q ∈ R 3 and the angular frequency ω ∈ R as ∇2h(ω, q) + ω 2 c 2 h(ω, q) = 0, (1) where ∇2 is the 3-dimensional L… view at source ↗

**Figure 4.** Figure 4: Different upsampling models based on Neural Fields (NFs) considered in this work. The green area denotes the proposed model [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Figure 5a shows the Mollweide projection of observed coordinates for two upsampling factors against the observation coordinates in the EasyCom dataset. Stars and colors denote clusters and centroids for a quasi-uniform sampling. Figure 5b illustrates the sampling strategy to select the validation data. To quantify the phase reconstruction in the time domain and the spatial similarity of the filters, we … view at source ↗

**Figure 6.** Figure 6: Interpolation results: normalized mean squared error (left) and cosine similarity per number of observed directions. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative results of interpolation: real part of the reconstructed steering vector at channel 2 at 2.5 kHz for 3 sampling factors and different methods. On the left, the ground-truth data and the steering vector are computed with the algebraic model. 0 100 200 300 Azimuth ϑ [◦ ] 0 50 100 150 Elevation ϕ [◦ ] 8 → 1020 0 100 200 300 Azimuth ϑ [◦ ] 16 → 1020 0 100 200 300 Azimuth ϑ [◦ ] 32 → 1020 0 100 200 … view at source ↗

**Figure 8.** Figure 8: Uncertainty quantification as standard deviation of the predicted steering vector at channel 2 at 2.5 kHz of the proposed model (GPSteerer) for different upsampling factors. White dots denote measurement locations. tasks like frontal beamforming. However, their performance rapidly degrades with distance. In contrast, learning-based models with embedded priors (NF-GW, PINN, GP-Chmat) generalize better at… view at source ↗

**Figure 9.** Figure 9: (bottom) shows the nMSE averaged across configurations over positive frequency bins. As expected, performance degrades with frequency, reflecting the spatial resolution limits governed by sampling density [75]. The SH baseline performs well at low frequencies—interpolating below 2 kHz with nMSE better than −15 dB using 64 observations—but overfits at higher frequencies, introducing spurious artifacts (p… view at source ↗

**Figure 10.** Figure 10: Relationship between interpolation quality (nMSE, CSIM) and speech enhancement performance (SDR, SAR, ISR, fwSegSNR, PESQ, MBSTOI) across selected methods. Enhancement metrics are reported as improvements over the unprocessed baseline. Marker size reflects the upsampling factor. Horizontal line denotes oracle methods NN-Oracle. 0.25 0.50 0.75 1.00 1.25 Source at 0 ◦ 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0… view at source ↗

**Figure 11.** Figure 11: Beampatterns of a super-directive MVDR at the 2 kHz. Each panel corresponds to a different source direction and upsampling factor. interpolation loss or its sensitivity to spectral balance in the covariance model. Despite this, the results confirm that combining GP regression with physically structured kernels can significantly enhance the generalization ability of neural field approaches. Moreover, orac… view at source ↗

read the original abstract

This paper investigates continuous representations of steering vectors over frequency and microphone/source positions for augmented listening (e.g., spatial filtering and binaural rendering), enabling user-parameterized control of the reproduced sound field. Steering vectors have typically been used for representing the spatial response of a microphone array as a function of the look-up direction. The basic algebraic representation of these quantities assuming an idealized environment cannot deal with the scattering effect of the sound field. One may thus collect a discrete set of real steering vectors measured in dedicated facilities and super-resolve (i.e., upsample) them. Recently, physics-aware deep learning methods have been effectively used for this purpose. Such deterministic super-resolution, however, suffers from the overfitting problem due to the non-uniform uncertainty over the measurement space. To solve this problem, we integrate an expressive representation based on the neural field (NF) into the principled probabilistic framework based on the Gaussian process (GP). Specifically, we propose a physics-aware composite kernel that models the directional incoming waves and the subsequent scattering effect. Our comprehensive comparative experiment showed the effectiveness of the proposed method under data insufficiency conditions. In downstream tasks such as speech enhancement and binaural rendering using the simulated data of the SPEAR challenge, the oracle performances were attained with less than ten times fewer measurements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gets oracle-level results on audio tasks with far fewer measurements by embedding neural fields in a Gaussian process via a physics-aware composite kernel.

read the letter

The punchline on this one is that the authors get oracle-level results in speech enhancement and binaural rendering on the SPEAR challenge data by using a Gaussian process regression for steering vectors that incorporates a neural field with a physics-aware composite kernel, all while needing less than a tenth of the usual number of measurements. This stands out as new because it takes the neural field idea from recent deterministic super-resolution work and places it inside a probabilistic GP setup. The composite kernel is designed to capture both the directional incoming waves and the scattering effects separately, which directly targets the non-uniform uncertainty problem that leads to overfitting in pure deep learning approaches. The paper handles the experiments reasonably by running comparisons under data insufficiency and then evaluating on the downstream tasks with simulated data. This gives a clear picture of how the method performs in practical scenarios for augmented listening. That said, there are some soft spots worth noting. The abstract gives little information on how the composite kernel is actually parameterized or trained, and there are no details on uncertainty calibration or whether they ran statistical significance tests on the improvements. The biggest open question is whether the physics-aware part really constrains the model enough. The stress-test concern is on point here: without explicit checks after training for things like reciprocity, far-field decay, or consistency with the Helmholtz equation, it's possible that the neural field is still allowing flexible fitting to the non-uniform samples. This could make the simulated results look better than they would in real measurements where the uncertainty patterns differ. This work would interest people in the spatial audio and array signal processing community, particularly those dealing with calibration for consumer devices or accessibility applications. A reader focused on combining physics-informed models with probabilistic methods would find the kernel construction useful. I would recommend sending this to peer review. The technical novelty and the reported data efficiency make it worth a full evaluation by referees who can examine the methods in detail.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes integrating a neural field representation into a Gaussian process (GP) regression framework for continuous modeling of steering vectors over frequency and microphone/source positions. A physics-aware composite kernel is introduced to capture directional incoming waves and subsequent scattering effects, addressing overfitting in deterministic super-resolution methods caused by non-uniform measurement uncertainty. Comparative experiments under data-insufficient conditions are reported, along with downstream results on the SPEAR challenge simulated data where oracle performance in speech enhancement and binaural rendering is attained with less than ten times fewer measurements.

Significance. If the central results hold after addressing the noted gaps, the work offers a principled probabilistic alternative to deterministic neural-field super-resolution for acoustic steering vectors, potentially reducing measurement burden in augmented listening applications. The explicit incorporation of physics into the kernel while retaining GP uncertainty quantification is a constructive step beyond purely data-driven approaches, and the SPEAR downstream evaluations provide a practical testbed for the method's utility.

major comments (2)

[Kernel parameterization and training (methods section describing the composite kernel)] The central claim that the composite kernel mitigates non-uniform uncertainty without introducing new modeling bias rests on the learned kernel preserving physical structure. No explicit post-training verification is described for properties such as reciprocity, far-field decay, or consistency with the Helmholtz equation; without such checks or regularization, the deep component risks reintroducing flexible fitting to sparse samples, which directly affects the reliability of the reported oracle downstream performance.
[Experimental evaluation and downstream tasks (SPEAR challenge results)] The abstract and results claim oracle-level performance in speech enhancement and binaural rendering with <10x fewer measurements. This requires supporting details on uncertainty calibration, statistical significance testing across runs, and error analysis to confirm the improvement is attributable to the physics-aware GP rather than simulation specifics or lack of baseline regularization.

minor comments (1)

[Methods] Notation for the composite kernel components (e.g., how the directional-wave term and scattering term are combined) could be clarified with an explicit equation early in the methods to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We provide point-by-point responses to the major comments below and outline the revisions we intend to make to address the raised concerns.

read point-by-point responses

Referee: [Kernel parameterization and training (methods section describing the composite kernel)] The central claim that the composite kernel mitigates non-uniform uncertainty without introducing new modeling bias rests on the learned kernel preserving physical structure. No explicit post-training verification is described for properties such as reciprocity, far-field decay, or consistency with the Helmholtz equation; without such checks or regularization, the deep component risks reintroducing flexible fitting to sparse samples, which directly affects the reliability of the reported oracle downstream performance.

Authors: We agree that explicit post-training verification would strengthen the claim regarding preservation of physical structure. The composite kernel is constructed from physics-aware components intended to capture directional waves and scattering, and the GP framework provides uncertainty quantification that helps mitigate overfitting to non-uniform measurement noise. However, we did not report post-training checks such as reciprocity or Helmholtz consistency in the original manuscript. In the revision we will add these verifications on held-out data together with a regularization term that penalizes deviations from expected physical behavior, thereby reducing the risk that the deep component reintroduces flexible fitting. revision: yes
Referee: [Experimental evaluation and downstream tasks (SPEAR challenge results)] The abstract and results claim oracle-level performance in speech enhancement and binaural rendering with <10x fewer measurements. This requires supporting details on uncertainty calibration, statistical significance testing across runs, and error analysis to confirm the improvement is attributable to the physics-aware GP rather than simulation specifics or lack of baseline regularization.

Authors: We acknowledge that additional statistical and calibration details are necessary to support the downstream performance claims. The reported oracle-level results on the SPEAR simulated data were obtained under data-insufficient conditions with the proposed physics-aware GP. In the revised manuscript we will include uncertainty calibration metrics, results aggregated over multiple independent runs with statistical significance testing, and a breakdown of errors by frequency and spatial position. These additions will help demonstrate that the observed gains are attributable to the composite kernel and GP uncertainty modeling rather than simulation artifacts. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard GP regression with proposed composite kernel yields independent empirical claims.

full rationale

The derivation introduces a physics-aware composite kernel inside a Gaussian process to model directional waves plus scattering and thereby mitigate non-uniform measurement uncertainty. No quoted equations reduce the reported downstream oracle performance (SPEAR challenge speech enhancement and binaural rendering) to fitted parameters or self-citations by construction. The GP framework is standard, the kernel is presented as an additive modeling choice rather than a tautology, and results are framed as comparative experiments under data insufficiency. No self-citation load-bearing step, uniqueness theorem, or ansatz smuggling is exhibited that collapses the central claim to its inputs. This is the normal self-contained case.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate specific free parameters or invented entities; the core modeling assumptions rest on standard wave propagation physics and the validity of the composite kernel form.

axioms (1)

domain assumption Steering vectors can be represented as a continuous function of frequency and spatial positions with separable directional and scattering components
Invoked to justify the composite kernel design for modeling real acoustic environments.

pith-pipeline@v0.9.0 · 5817 in / 1247 out tokens · 61645 ms · 2026-05-18T22:28:54.402034+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a physics-aware composite kernel that models the directional incoming waves and the subsequent scattering effect... kθ(zfij,z′fij)=kωθ(ωf,ωf′)kdθ(zfij,z′fij)ksθ(zfij,z′fij)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the homogeneous Helmholtz equation... ∇²h(ω,q)+ω²/c²h(ω,q)=0

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

93 extracted references · 93 canonical work pages

[1]

Microphone array processing for augmented listening,

R. M. Corey, “Microphone array processing for augmented listening,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, 2019

work page 2019
[2]

Personal sound zones: Delivering interface-free audio to multiple listeners,

T. Betlehem, W. Zhang, M. A. Poletti, and T. D. Abhayapala, “Personal sound zones: Delivering interface-free audio to multiple listeners,” IEEE Signal Process. Mag. , vol. 32, no. 2, pp. 81–91, 2015

work page 2015
[3]

An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction,

M. Cobos, J. Ahrens, K. Kowalczyk, and A. Politis, “An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction,” EURASIP J. Audio, Speech, Music Process., vol. 2022, no. 1, p. 10, 2022

work page 2022
[4]

Physics-informed machine learning for sound field estimation: Funda- mentals, state of the art, and challenges,

S. Koyama, J. G. Ribeiro, T. Nakamura, N. Ueno, and M. Pezzoli, “Physics-informed machine learning for sound field estimation: Funda- mentals, state of the art, and challenges,” IEEE Signal Process. Mag. , vol. 41, no. 6, pp. 60–71, 2025

work page 2025
[5]

Differentiable artificial reverberation,

S. Lee, H.-S. Choi, and K. Lee, “Differentiable artificial reverberation,” IEEE/ACM Trans. Audio, Speech, Language Process. , vol. 30, pp. 2541– 2556, 2022

work page 2022
[6]

H. L. Van Trees, Optimum array processing: Part IV of detection, esti- mation, and modulation theory . John Wiley & Sons, 2002

work page 2002
[7]

The SPEAR challenge – review of results,

V . Tourbabin, P. Guiraud, S. Hafezi, P. A. Naylor, A. H. Moore, J. Donley, and T. Lunner, “The SPEAR challenge – review of results,” in Proc. Forum Acusticum, 2023, pp. 623–629

work page 2023
[8]

Subspace hybrid beamforming for head-worn microphone arrays,

S. Hafezi, A. H. Moore, P. Guiraud, P. A. Naylor, J. Donley, V . Tourbabin, and T. Lunner, “Subspace hybrid beamforming for head-worn microphone arrays,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2023, pp. 1–5

work page 2023
[9]

A consoli- dated perspective on multimicrophone speech enhancement and source separation,

S. Gannot, E. Vincent, S. Markovich-Golan, and A. Ozerov, “A consoli- dated perspective on multimicrophone speech enhancement and source separation,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 25, no. 4, pp. 692–730, 2017

work page 2017
[10]

Theoretical analysis of beamforming steering vector for- mulations for acoustic source localization,

G. Chardon, “Theoretical analysis of beamforming steering vector for- mulations for acoustic source localization,” J. Sound Vibration, vol. 517, p. 116544, 2022

work page 2022
[11]

Surround by sound: A review of spatial audio recording and reproduction,

W. Zhang, P. N. Samarasinghe, H. Chen, and T. D. Abhayapala, “Surround by sound: A review of spatial audio recording and reproduction,” Appl. Sci., vol. 7, no. 5, p. 532, 2017

work page 2017
[12]

Neural fields in visual comput- ing and beyond,

Y . Xie, T. Takikawa, S. Saito, O. Litany, S. Yan, N. Khan, F. Tombari, J. Tompkin, V . Sitzmann, and S. Sridhar, “Neural fields in visual comput- ing and beyond,” Comput. Graphics Forum, vol. 41, no. 2, pp. 641–676, 2022

work page 2022
[13]

Fifty years of artificial reverberation,

V . Valimaki, J. D. Parker, L. Savioja, J. O. Smith, and J. S. Abel, “Fifty years of artificial reverberation,” IEEE Audio, Speech, Language Process., vol. 20, no. 5, pp. 1421–1448, 2012

work page 2012
[14]

RIR-in-a-Box: Estimating room acoustics from 3D mesh data through shoebox approximation,

L. Kelley, D. Di Carlo, A. A. Nugraha, M. Fontaine, Y . Bando, and K. Yoshii, “RIR-in-a-Box: Estimating room acoustics from 3D mesh data through shoebox approximation,” in Proc. INTERSPEECH, 2024, pp. 3255–3259

work page 2024
[15]

Speech derever- beration constrained on room impulse response characteristics,

L. Bahrman, M. Fontaine, J. Le Roux, and G. Richard, “Speech derever- beration constrained on room impulse response characteristics,” in Proc. INTERSPEECH, 2024, pp. 622–626

work page 2024
[16]

Neural full-rank spatial covariance analysis for blind source separation,

Y . Bando, K. Sekiguchi, Y . Masuyama, A. A. Nugraha, M. Fontaine, and K. Yoshii, “Neural full-rank spatial covariance analysis for blind source separation,” IEEE Signal Process. Lett. , vol. 28, pp. 1670–1674, 2021

work page 2021
[17]

Analysis and design of spherical microphone arrays,

B. Rafaely, “Analysis and design of spherical microphone arrays,” IEEE Trans. Speech Audio Process. , vol. 13, no. 1, pp. 135–143, 2004

work page 2004
[18]

Comparison of spherical harmonics and nearest-neighbor based interpolation of head- related transfer functions,

C. P ¨orschmann, J. M. Arend, D. Bau, and T. L ¨ubeck, “Comparison of spherical harmonics and nearest-neighbor based interpolation of head- related transfer functions,” in Proc. AES Int. Conf. Audio Virtual Aug- mented Reality, 2020

work page 2020
[19]

Gaussian processes for sound field recon- struction,

D. Caviedes-Nozal, N. A. Riis, F. M. Heuchel, J. Brunskog, P. Gerstoft, and E. Fernandez-Grande, “Gaussian processes for sound field recon- struction,” J. Acoust. Soc. Amer. , vol. 149, no. 2, pp. 1107–1119, 2021

work page 2021
[20]

Sound field estimation based on physics-constrained kernel interpolation adapted to environment,

J. G. Ribeiro, S. Koyama, R. Horiuchi, and H. Saruwatari, “Sound field estimation based on physics-constrained kernel interpolation adapted to environment,” IEEE/ACM Trans. Audio, Speech, Language Process. , 2024

work page 2024
[21]

Compressive sens- ing in acoustic imaging,

N. Bertin, L. Daudet, V . Emiya, and R. Gribonval, “Compressive sens- ing in acoustic imaging,” in Compressed Sensing and its Applications: MATHEON Workshop 2013. Springer, 2015, pp. 169–192

work page 2013
[22]

Sparse representation of a spatial sound field in a reverberant environment,

S. Koyama and L. Daudet, “Sparse representation of a spatial sound field in a reverberant environment,” IEEE J. Sel. Topics Signal Process. , vol. 13, no. 1, pp. 172–184, 2019

work page 2019
[23]

Sparsity-based sound field separation in the spherical harmonics domain,

M. Pezzoli, M. Cobos, F. Antonacci, and A. Sarti, “Sparsity-based sound field separation in the spherical harmonics domain,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2022, pp. 1051–1055

work page 2022
[24]

Room impulse response interpo- lation from a sparse set of measurements using a modal architecture,

O. Das, P. Calamia, and S. V . A. Gari, “Room impulse response interpo- lation from a sparse set of measurements using a modal architecture,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2021, pp. 960– 964

work page 2021
[25]

Room impulse response interpolation using a sparse spatio- temporal representation of the sound field,

N. Antonello, E. De Sena, M. Moonen, P. A. Naylor, and T. Van Wa- terschoot, “Room impulse response interpolation using a sparse spatio- temporal representation of the sound field,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 25, no. 10, pp. 1929–1941, 2017

work page 1929
[26]

Sound field reconstruction in rooms: Inpainting meets super-resolution,

F. Lluis, P. Martinez-Nuevo, M. Bo Møller, and S. Ewan Shepstone, “Sound field reconstruction in rooms: Inpainting meets super-resolution,” J. Acoust. Soc. Amer. , vol. 148, no. 2, pp. 649–659, 2020

work page 2020
[27]

Deep sound field reconstruction in real rooms: Introducing the ISOBEL sound field dataset,

M. S. Kristoffersen, M. B. Møller, P. Mart ´ınez-Nuevo, and J. Østergaard, “Deep sound field reconstruction in real rooms: Introducing the ISOBEL sound field dataset,” arXiv preprint arXiv:2102.06455 , 2021

work page arXiv 2021
[28]

Generative adversarial networks with physical sound field priors,

X. Karakonstantis and E. Fernandez-Grande, “Generative adversarial networks with physical sound field priors,” J. Acoust. Soc. Amer. , vol. 154, no. 2, pp. 1226–1238, 2023

work page 2023
[29]

Deep prior approach for room impulse response reconstruction,

M. Pezzoli, D. Perini, A. Bernardini, F. Borra, F. Antonacci, and A. Sarti, “Deep prior approach for room impulse response reconstruction,” Sensors, vol. 22, no. 7, p. 2710, 2022

work page 2022
[30]

Reconstruction of sound field through diffusion models,

F. Miotello, L. Comanducci, M. Pezzoli, A. Bernardini, F. Antonacci, and A. Sarti, “Reconstruction of sound field through diffusion models,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2024, pp. 1476–1480

work page 2024
[31]

FRA-RIR: Fast random approximation of the image- source method,

Y . Luo and J. Yu, “FRA-RIR: Fast random approximation of the image- source method,” in Proc. INTERSPEECH, 2023, pp. 3884–3888

work page 2023
[32]

Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responses,

M. Pezzoli, F. Antonacci, A. Sarti et al., “Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responses,” in Proc. Forum Acusticum, 2023, pp. 2177–2184

work page 2023
[33]

Sound field reconstruction using a compact acoustics-informed neural network,

F. Ma, S. Zhao, and I. S. Burnett, “Sound field reconstruction using a compact acoustics-informed neural network,” J. Acoust. Soc. Amer. , vol. 156, no. 3, pp. 2009–2021, 2024

work page 2009
[34]

Room impulse response reconstruction with physics-informed deep learning,

X. Karakonstantis, D. Caviedes-Nozal, A. Richard, and E. Fernandez- Grande, “Room impulse response reconstruction with physics-informed deep learning,” J. Acoust. Soc. Amer. , vol. 155, no. 2, pp. 1048–1059, 2024

work page 2024
[35]

Spatio-temporal Bayesian regression for room impulse response reconstruction with spherical waves,

D. Caviedes-Nozal and E. Fernandez-Grande, “Spatio-temporal Bayesian regression for room impulse response reconstruction with spherical waves,” IEEE/ACM Trans. Audio, Speech, Language Process. , 2023

work page 2023
[36]

Head-related transfer function interpolation from spatially sparse measurements using autoencoder with source position conditioning,

Y . Ito, T. Nakamura, S. Koyama, and H. Saruwatari, “Head-related transfer function interpolation from spatially sparse measurements using autoencoder with source position conditioning,” in Proc. Int. Workshop Acoust. Signal Enhancement , 2022, pp. 1–5

work page 2022
[37]

Room impulse response re- construction using pattern-coupled sparse Bayesian learning with spheri- cal waves,

X. Feng, J. Cheng, S. Chen, and Y . Shen, “Room impulse response re- construction using pattern-coupled sparse Bayesian learning with spheri- cal waves,” IEEE Signal Process. Lett. , 2024

work page 2024
[38]

Point neuron learning: A new physics- informed neural network architecture,

H. Bi and T. D. Abhayapala, “Point neuron learning: A new physics- informed neural network architecture,” EURASIP J. Audio, Speech, Music Process., vol. 2024, no. 1, p. 56, 2024

work page 2024
[39]

On the apparent Pareto front of physics-informed neural networks,

F. M. Rohrhofer, S. Posch, C. G ¨oßnitzer, and B. C. Geiger, “On the apparent Pareto front of physics-informed neural networks,” IEEE Access, 2023

work page 2023
[40]

C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning. MIT press Cambridge, MA, 2006, vol. 2, no. 3

work page 2006
[41]

A review on head-related transfer function generation for spatial audio,

V . Bruschi, L. Grossi, N. A. Dourou, A. Quattrini, A. Vancheri, T. Leidi, and S. Cecchi, “A review on head-related transfer function generation for spatial audio,” Applied Sciences, vol. 14, no. 23, p. 11242, 2024

work page 2024
[42]

Recovery of individual head-related transfer functions from a small set of measurements,

B.-S. Xie, “Recovery of individual head-related transfer functions from a small set of measurements,” J. Acoust. Soc. Amer. , vol. 132, no. 1, pp. 282–294, 2012

work page 2012
[43]

Modeling individual head- related transfer functions from sparse measurements using a convolutional neural network,

Z. Jiang, J. Sang, C. Zheng, A. Li, and X. Li, “Modeling individual head- related transfer functions from sparse measurements using a convolutional neural network,” J. Acoust. Soc. Amer., vol. 153, no. 1, pp. 248–259, 2023

work page 2023
[44]

Implicit HRTF modeling using temporal convolu- tional networks,

I. D. Gebru, D. Markovi ´c, A. Richard, S. Krenn, G. A. Butler, F. De la Torre, and Y . Sheikh, “Implicit HRTF modeling using temporal convolu- tional networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Pro- cess., 2021, pp. 3385–3389

work page 2021
[45]

HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection,

A. O. Hogg, M. Jenkins, H. Liu, I. Squires, S. J. Cooper, and L. Pic- inali, “HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection,” IEEE/ACM Trans. Audio, Speech, Language Process., 2024

work page 2024
[46]

Autoencoding hrtfs for DNN based HRTF personalization using anthropometric features,

T.-Y . Chen, T.-H. Kuo, and T.-S. Chi, “Autoencoding hrtfs for DNN based HRTF personalization using anthropometric features,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2019, pp. 271–275. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, JANUARY 2025 13

work page 2019
[47]

HRTF field: Unifying measured HRTF magnitude representation with neural fields,

Y . Zhang, Y . Wang, and Z. Duan, “HRTF field: Unifying measured HRTF magnitude representation with neural fields,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2023, pp. 1–5

work page 2023
[48]

The manifolds of spatial hearing,

R. Duraiswami and V . C. Raykar, “The manifolds of spatial hearing,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , vol. 3, 2005, pp. iii–285

work page 2005
[49]

Acoustic space learning for sound-source separation and localization on binaural manifolds,

A. Deleforge, F. Forbes, and R. Horaud, “Acoustic space learning for sound-source separation and localization on binaural manifolds,” Int. J. of Neural Systems , vol. 25, no. 01, p. 21, 2015

work page 2015
[50]

Interpolation of head-related transfer functions using manifold learning,

F. Grijalva, L. C. Martini, D. Florencio, and S. Goldenstein, “Interpolation of head-related transfer functions using manifold learning,” IEEE Signal Process. Lett., vol. 24, no. 2, pp. 221–225, 2017

work page 2017
[51]

Virtual sound source positioning using vector base amplitude panning,

V . Pulkki, “Virtual sound source positioning using vector base amplitude panning,” Journal of the audio engineering society , vol. 45, no. 6, pp. 456–466, 1997

work page 1997
[52]

Kernel regression for head-related transfer function interpolation and spectral extrema extraction,

Y . Luo, D. N. Zotkin, H. Daume, and R. Duraiswami, “Kernel regression for head-related transfer function interpolation and spectral extrema extraction,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2013, pp. 256–260

work page 2013
[53]

Head- related transfer function interpolation with a spherical cnn,

X. Chen, F. Ma, Y . Zhang, A. Bastine, and P. N. Samarasinghe, “Head- related transfer function interpolation with a spherical cnn,”arXiv preprint arXiv:2309.08290, 2023

work page arXiv 2023
[54]

Regularized HRTF fitting using spherical harmonics,

D. N. Zotkin, R. Duraiswami, and N. A. Gumerov, “Regularized HRTF fitting using spherical harmonics,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. , 2009, pp. 257–260

work page 2009
[55]

Interpolation of head-related transfer functions based on the common-acoustical-pole and residue model,

K. Watanabe, S. Takane, and Y . Suzuki, “Interpolation of head-related transfer functions based on the common-acoustical-pole and residue model,” Acoust. Sci. Technol., vol. 24, no. 5, pp. 335–337, 2003

work page 2003
[56]

Spatial interpolation of hrtfs approximated by parametric iir filters,

P. Nowak and U. Z ¨olzer, “Spatial interpolation of hrtfs approximated by parametric iir filters,” in Proc. DAGA, 2022

work page 2022
[57]

Neural fourier shift for binaural speech rendering,

J. W. Lee and K. Lee, “Neural fourier shift for binaural speech rendering,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2023, pp. 1–5

work page 2023
[58]

Niirf: Neural iir filter field for HRTF upsampling and personalization,

Y . Masuyama, G. Wichern, F. G. Germain, Z. Pan, S. Khurana, C. Hori, and J. Le Roux, “Niirf: Neural iir filter field for HRTF upsampling and personalization,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2024, pp. 1016–1020

work page 2024
[59]

Analyzing head-related transfer function measurements using surface spherical harmonics,

M. J. Evans, J. A. Angus, and A. I. Tew, “Analyzing head-related transfer function measurements using surface spherical harmonics,” J. Acoust. Soc. Amer., vol. 104, no. 4, pp. 2400–2411, 1998

work page 1998
[60]

Interpolation and range extrapolation of HRTFs [head related transfer functions],

R. Duraiswami, D. N. Zotkin, and N. A. Gumerov, “Interpolation and range extrapolation of HRTFs [head related transfer functions],” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , vol. 4, 2004, pp. iv–iv

work page 2004
[61]

HRTF magnitude modeling using a non-regularized least-squares fit of spherical harmonics coef- ficients on incomplete data,

J. Ahrens, M. R. Thomas, and I. Tashev, “HRTF magnitude modeling using a non-regularized least-squares fit of spherical harmonics coef- ficients on incomplete data,” in Proc. Asia Pacific Signal Inf. Process. Assoc. Annu. Summit Conf. , 2012, pp. 1–5

work page 2012
[62]

Directional equalization of sparse head-related transfer function sets for spatial upsampling,

C. P¨orschmann, J. M. Arend, and F. Brinkmann, “Directional equalization of sparse head-related transfer function sets for spatial upsampling,” IEEE/ACM Trans. Audio, Speech, Language Process. , vol. 27, no. 6, pp. 1060–1071, 2019

work page 2019
[63]

Efficient representa- tion and sparse sampling of head-related transfer functions using phase- correction based on ear alignment,

Z. Ben-Hur, D. L. Alon, R. Mehra, and B. Rafaely, “Efficient representa- tion and sparse sampling of head-related transfer functions using phase- correction based on ear alignment,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 27, no. 12, pp. 2249–2262, 2019

work page 2019
[64]

Assessing spherical harmonics interpolation of time-aligned head-related transfer functions,

J. M. Arend, F. Brinkmann, and C. P ¨orschmann, “Assessing spherical harmonics interpolation of time-aligned head-related transfer functions,” J. Audio Eng. Soc. , vol. 69, no. 1/2, pp. 104–117, 2021

work page 2021
[65]

Sound- field reconstruction in reverberant rooms based on compressive sensing and image-source models of early reflections,

S. Damiano, F. Borra, A. Bernardini, F. Antonacci, and A. Sarti, “Sound- field reconstruction in reverberant rooms based on compressive sensing and image-source models of early reflections,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. , 2021, pp. 366–370

work page 2021
[66]

A bayesian framework for the estimation of head-related transfer functions,

G. D. Romigh, R. M. Stern, D. S. Brungart, and B. D. Simpson, “A bayesian framework for the estimation of head-related transfer functions,” J. Acoust. Soc. Amer., vol. 137, no. 4 Supplement, pp. 2323–2323, 2015

work page 2015
[67]

J. M. Arend and C. P ¨orschmann, Spatial upsampling of sparse head- related transfer function sets by directional equalization-influence of the spherical sampling scheme . Universit ¨atsbibliothek der RWTH Aachen Aachen, Germany, 2019

work page 2019
[68]

Comparison of head-related transfer functions pre-processing techniques for spherical harmonics decomposi- tion,

F. Brinkmann and S. Weinzierl, “Comparison of head-related transfer functions pre-processing techniques for spherical harmonics decomposi- tion,” in Proc. AES Int. Conf. Audio Virtual Augmented Reality , 2018

work page 2018
[69]

Manikas, Differential geometry in array processing

A. Manikas, Differential geometry in array processing. Imperial College Press, 2004

work page 2004
[70]

An overview of self-calibration in sensor array processing,

L. Qiong, G. Long, and Y . Zhongfu, “An overview of self-calibration in sensor array processing,” in Proc. Int. Symp. Antennas Propag. EM Theory, 2003, pp. 279–282

work page 2003
[71]

Multichannel eigenspace beam- forming in a reverberant noisy environment with multiple interfering speech signals,

S. Markovich, S. Gannot, and I. Cohen, “Multichannel eigenspace beam- forming in a reverberant noisy environment with multiple interfering speech signals,” IEEE Audio, Speech, Language Process. , vol. 17, no. 6, pp. 1071–1086, 2009

work page 2009
[72]

Joint audio source localization and separation with distributed microphone arrays based on spatially-regularized multichannel nmf,

Y . Sumura, D. Di Carlo, A. A. Nugraha, Y . Bando, and K. Yoshii, “Joint audio source localization and separation with distributed microphone arrays based on spatially-regularized multichannel nmf,” in Proc. Int. Workshop Acoust. Signal Enhancement , 2024, pp. 145–149

work page 2024
[73]

Data-driven multi- microphone speaker localization on manifolds,

B. Laufer-Goldshtein, R. Talmon, S. Gannot et al., “Data-driven multi- microphone speaker localization on manifolds,” Foundations and Trends® in Signal Processing , vol. 14, no. 1–2, pp. 1–161, 2020

work page 2020
[74]

The inverse scattering problem for time-harmonic acoustic waves,

D. Colton, “The inverse scattering problem for time-harmonic acoustic waves,” SIAM review, vol. 26, no. 3, pp. 323–350, 1984

work page 1984
[75]

E. G. Williams, Fourier acoustics: sound radiation and nearfield acous- tical holography. Academic press, 1999

work page 1999
[76]

Sound field estimation: Theories and appli- cations,

N. Ueno, S. Koyama et al., “Sound field estimation: Theories and appli- cations,” Foundations and Trends® in Signal Processing , vol. 19, no. 1, pp. 1–98, 2025

work page 2025
[77]

Gaussian process data fusion for heterogeneous HRTF datasets,

Y . Luo, D. N. Zotkin, and R. Duraiswami, “Gaussian process data fusion for heterogeneous HRTF datasets,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. , 2013, pp. 1–4

work page 2013
[78]

Range dependence of the response of a spherical head model,

R. O. Duda and W. L. Martens, “Range dependence of the response of a spherical head model,” J. Acoust. Soc. Amer. , vol. 104, no. 5, pp. 3048– 3058, 1998

work page 1998
[79]

Learning in sinusoidal spaces with physics-informed neural networks,

J. C. Wong, C. C. Ooi, A. Gupta, and Y .-S. Ong, “Learning in sinusoidal spaces with physics-informed neural networks,” IEEE Trans. Artif. Intell., vol. 5, no. 3, pp. 985–1000, 2022

work page 2022
[80]

Heat flow from the earth’s interior: analysis of the global data set,

H. N. Pollack, S. J. Hurter, and J. R. Johnson, “Heat flow from the earth’s interior: analysis of the global data set,” Reviews of Geophysics , vol. 31, no. 3, pp. 267–280, 1993

work page 1993

Showing first 80 references.

[1] [1]

Microphone array processing for augmented listening,

R. M. Corey, “Microphone array processing for augmented listening,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, 2019

work page 2019

[2] [2]

Personal sound zones: Delivering interface-free audio to multiple listeners,

T. Betlehem, W. Zhang, M. A. Poletti, and T. D. Abhayapala, “Personal sound zones: Delivering interface-free audio to multiple listeners,” IEEE Signal Process. Mag. , vol. 32, no. 2, pp. 81–91, 2015

work page 2015

[3] [3]

An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction,

M. Cobos, J. Ahrens, K. Kowalczyk, and A. Politis, “An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction,” EURASIP J. Audio, Speech, Music Process., vol. 2022, no. 1, p. 10, 2022

work page 2022

[4] [4]

Physics-informed machine learning for sound field estimation: Funda- mentals, state of the art, and challenges,

S. Koyama, J. G. Ribeiro, T. Nakamura, N. Ueno, and M. Pezzoli, “Physics-informed machine learning for sound field estimation: Funda- mentals, state of the art, and challenges,” IEEE Signal Process. Mag. , vol. 41, no. 6, pp. 60–71, 2025

work page 2025

[5] [5]

Differentiable artificial reverberation,

S. Lee, H.-S. Choi, and K. Lee, “Differentiable artificial reverberation,” IEEE/ACM Trans. Audio, Speech, Language Process. , vol. 30, pp. 2541– 2556, 2022

work page 2022

[6] [6]

H. L. Van Trees, Optimum array processing: Part IV of detection, esti- mation, and modulation theory . John Wiley & Sons, 2002

work page 2002

[7] [7]

The SPEAR challenge – review of results,

V . Tourbabin, P. Guiraud, S. Hafezi, P. A. Naylor, A. H. Moore, J. Donley, and T. Lunner, “The SPEAR challenge – review of results,” in Proc. Forum Acusticum, 2023, pp. 623–629

work page 2023

[8] [8]

Subspace hybrid beamforming for head-worn microphone arrays,

S. Hafezi, A. H. Moore, P. Guiraud, P. A. Naylor, J. Donley, V . Tourbabin, and T. Lunner, “Subspace hybrid beamforming for head-worn microphone arrays,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2023, pp. 1–5

work page 2023

[9] [9]

A consoli- dated perspective on multimicrophone speech enhancement and source separation,

S. Gannot, E. Vincent, S. Markovich-Golan, and A. Ozerov, “A consoli- dated perspective on multimicrophone speech enhancement and source separation,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 25, no. 4, pp. 692–730, 2017

work page 2017

[10] [10]

Theoretical analysis of beamforming steering vector for- mulations for acoustic source localization,

G. Chardon, “Theoretical analysis of beamforming steering vector for- mulations for acoustic source localization,” J. Sound Vibration, vol. 517, p. 116544, 2022

work page 2022

[11] [11]

Surround by sound: A review of spatial audio recording and reproduction,

W. Zhang, P. N. Samarasinghe, H. Chen, and T. D. Abhayapala, “Surround by sound: A review of spatial audio recording and reproduction,” Appl. Sci., vol. 7, no. 5, p. 532, 2017

work page 2017

[12] [12]

Neural fields in visual comput- ing and beyond,

Y . Xie, T. Takikawa, S. Saito, O. Litany, S. Yan, N. Khan, F. Tombari, J. Tompkin, V . Sitzmann, and S. Sridhar, “Neural fields in visual comput- ing and beyond,” Comput. Graphics Forum, vol. 41, no. 2, pp. 641–676, 2022

work page 2022

[13] [13]

Fifty years of artificial reverberation,

V . Valimaki, J. D. Parker, L. Savioja, J. O. Smith, and J. S. Abel, “Fifty years of artificial reverberation,” IEEE Audio, Speech, Language Process., vol. 20, no. 5, pp. 1421–1448, 2012

work page 2012

[14] [14]

RIR-in-a-Box: Estimating room acoustics from 3D mesh data through shoebox approximation,

L. Kelley, D. Di Carlo, A. A. Nugraha, M. Fontaine, Y . Bando, and K. Yoshii, “RIR-in-a-Box: Estimating room acoustics from 3D mesh data through shoebox approximation,” in Proc. INTERSPEECH, 2024, pp. 3255–3259

work page 2024

[15] [15]

Speech derever- beration constrained on room impulse response characteristics,

L. Bahrman, M. Fontaine, J. Le Roux, and G. Richard, “Speech derever- beration constrained on room impulse response characteristics,” in Proc. INTERSPEECH, 2024, pp. 622–626

work page 2024

[16] [16]

Neural full-rank spatial covariance analysis for blind source separation,

Y . Bando, K. Sekiguchi, Y . Masuyama, A. A. Nugraha, M. Fontaine, and K. Yoshii, “Neural full-rank spatial covariance analysis for blind source separation,” IEEE Signal Process. Lett. , vol. 28, pp. 1670–1674, 2021

work page 2021

[17] [17]

Analysis and design of spherical microphone arrays,

B. Rafaely, “Analysis and design of spherical microphone arrays,” IEEE Trans. Speech Audio Process. , vol. 13, no. 1, pp. 135–143, 2004

work page 2004

[18] [18]

Comparison of spherical harmonics and nearest-neighbor based interpolation of head- related transfer functions,

C. P ¨orschmann, J. M. Arend, D. Bau, and T. L ¨ubeck, “Comparison of spherical harmonics and nearest-neighbor based interpolation of head- related transfer functions,” in Proc. AES Int. Conf. Audio Virtual Aug- mented Reality, 2020

work page 2020

[19] [19]

Gaussian processes for sound field recon- struction,

D. Caviedes-Nozal, N. A. Riis, F. M. Heuchel, J. Brunskog, P. Gerstoft, and E. Fernandez-Grande, “Gaussian processes for sound field recon- struction,” J. Acoust. Soc. Amer. , vol. 149, no. 2, pp. 1107–1119, 2021

work page 2021

[20] [20]

Sound field estimation based on physics-constrained kernel interpolation adapted to environment,

J. G. Ribeiro, S. Koyama, R. Horiuchi, and H. Saruwatari, “Sound field estimation based on physics-constrained kernel interpolation adapted to environment,” IEEE/ACM Trans. Audio, Speech, Language Process. , 2024

work page 2024

[21] [21]

Compressive sens- ing in acoustic imaging,

N. Bertin, L. Daudet, V . Emiya, and R. Gribonval, “Compressive sens- ing in acoustic imaging,” in Compressed Sensing and its Applications: MATHEON Workshop 2013. Springer, 2015, pp. 169–192

work page 2013

[22] [22]

Sparse representation of a spatial sound field in a reverberant environment,

S. Koyama and L. Daudet, “Sparse representation of a spatial sound field in a reverberant environment,” IEEE J. Sel. Topics Signal Process. , vol. 13, no. 1, pp. 172–184, 2019

work page 2019

[23] [23]

Sparsity-based sound field separation in the spherical harmonics domain,

M. Pezzoli, M. Cobos, F. Antonacci, and A. Sarti, “Sparsity-based sound field separation in the spherical harmonics domain,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2022, pp. 1051–1055

work page 2022

[24] [24]

Room impulse response interpo- lation from a sparse set of measurements using a modal architecture,

O. Das, P. Calamia, and S. V . A. Gari, “Room impulse response interpo- lation from a sparse set of measurements using a modal architecture,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2021, pp. 960– 964

work page 2021

[25] [25]

Room impulse response interpolation using a sparse spatio- temporal representation of the sound field,

N. Antonello, E. De Sena, M. Moonen, P. A. Naylor, and T. Van Wa- terschoot, “Room impulse response interpolation using a sparse spatio- temporal representation of the sound field,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 25, no. 10, pp. 1929–1941, 2017

work page 1929

[26] [26]

Sound field reconstruction in rooms: Inpainting meets super-resolution,

F. Lluis, P. Martinez-Nuevo, M. Bo Møller, and S. Ewan Shepstone, “Sound field reconstruction in rooms: Inpainting meets super-resolution,” J. Acoust. Soc. Amer. , vol. 148, no. 2, pp. 649–659, 2020

work page 2020

[27] [27]

Deep sound field reconstruction in real rooms: Introducing the ISOBEL sound field dataset,

M. S. Kristoffersen, M. B. Møller, P. Mart ´ınez-Nuevo, and J. Østergaard, “Deep sound field reconstruction in real rooms: Introducing the ISOBEL sound field dataset,” arXiv preprint arXiv:2102.06455 , 2021

work page arXiv 2021

[28] [28]

Generative adversarial networks with physical sound field priors,

X. Karakonstantis and E. Fernandez-Grande, “Generative adversarial networks with physical sound field priors,” J. Acoust. Soc. Amer. , vol. 154, no. 2, pp. 1226–1238, 2023

work page 2023

[29] [29]

Deep prior approach for room impulse response reconstruction,

M. Pezzoli, D. Perini, A. Bernardini, F. Borra, F. Antonacci, and A. Sarti, “Deep prior approach for room impulse response reconstruction,” Sensors, vol. 22, no. 7, p. 2710, 2022

work page 2022

[30] [30]

Reconstruction of sound field through diffusion models,

F. Miotello, L. Comanducci, M. Pezzoli, A. Bernardini, F. Antonacci, and A. Sarti, “Reconstruction of sound field through diffusion models,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2024, pp. 1476–1480

work page 2024

[31] [31]

FRA-RIR: Fast random approximation of the image- source method,

Y . Luo and J. Yu, “FRA-RIR: Fast random approximation of the image- source method,” in Proc. INTERSPEECH, 2023, pp. 3884–3888

work page 2023

[32] [32]

Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responses,

M. Pezzoli, F. Antonacci, A. Sarti et al., “Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responses,” in Proc. Forum Acusticum, 2023, pp. 2177–2184

work page 2023

[33] [33]

Sound field reconstruction using a compact acoustics-informed neural network,

F. Ma, S. Zhao, and I. S. Burnett, “Sound field reconstruction using a compact acoustics-informed neural network,” J. Acoust. Soc. Amer. , vol. 156, no. 3, pp. 2009–2021, 2024

work page 2009

[34] [34]

Room impulse response reconstruction with physics-informed deep learning,

X. Karakonstantis, D. Caviedes-Nozal, A. Richard, and E. Fernandez- Grande, “Room impulse response reconstruction with physics-informed deep learning,” J. Acoust. Soc. Amer. , vol. 155, no. 2, pp. 1048–1059, 2024

work page 2024

[35] [35]

Spatio-temporal Bayesian regression for room impulse response reconstruction with spherical waves,

D. Caviedes-Nozal and E. Fernandez-Grande, “Spatio-temporal Bayesian regression for room impulse response reconstruction with spherical waves,” IEEE/ACM Trans. Audio, Speech, Language Process. , 2023

work page 2023

[36] [36]

Head-related transfer function interpolation from spatially sparse measurements using autoencoder with source position conditioning,

Y . Ito, T. Nakamura, S. Koyama, and H. Saruwatari, “Head-related transfer function interpolation from spatially sparse measurements using autoencoder with source position conditioning,” in Proc. Int. Workshop Acoust. Signal Enhancement , 2022, pp. 1–5

work page 2022

[37] [37]

Room impulse response re- construction using pattern-coupled sparse Bayesian learning with spheri- cal waves,

X. Feng, J. Cheng, S. Chen, and Y . Shen, “Room impulse response re- construction using pattern-coupled sparse Bayesian learning with spheri- cal waves,” IEEE Signal Process. Lett. , 2024

work page 2024

[38] [38]

Point neuron learning: A new physics- informed neural network architecture,

H. Bi and T. D. Abhayapala, “Point neuron learning: A new physics- informed neural network architecture,” EURASIP J. Audio, Speech, Music Process., vol. 2024, no. 1, p. 56, 2024

work page 2024

[39] [39]

On the apparent Pareto front of physics-informed neural networks,

F. M. Rohrhofer, S. Posch, C. G ¨oßnitzer, and B. C. Geiger, “On the apparent Pareto front of physics-informed neural networks,” IEEE Access, 2023

work page 2023

[40] [40]

C. E. Rasmussen and C. K. Williams, Gaussian processes for machine learning. MIT press Cambridge, MA, 2006, vol. 2, no. 3

work page 2006

[41] [41]

A review on head-related transfer function generation for spatial audio,

V . Bruschi, L. Grossi, N. A. Dourou, A. Quattrini, A. Vancheri, T. Leidi, and S. Cecchi, “A review on head-related transfer function generation for spatial audio,” Applied Sciences, vol. 14, no. 23, p. 11242, 2024

work page 2024

[42] [42]

Recovery of individual head-related transfer functions from a small set of measurements,

B.-S. Xie, “Recovery of individual head-related transfer functions from a small set of measurements,” J. Acoust. Soc. Amer. , vol. 132, no. 1, pp. 282–294, 2012

work page 2012

[43] [43]

Modeling individual head- related transfer functions from sparse measurements using a convolutional neural network,

Z. Jiang, J. Sang, C. Zheng, A. Li, and X. Li, “Modeling individual head- related transfer functions from sparse measurements using a convolutional neural network,” J. Acoust. Soc. Amer., vol. 153, no. 1, pp. 248–259, 2023

work page 2023

[44] [44]

Implicit HRTF modeling using temporal convolu- tional networks,

I. D. Gebru, D. Markovi ´c, A. Richard, S. Krenn, G. A. Butler, F. De la Torre, and Y . Sheikh, “Implicit HRTF modeling using temporal convolu- tional networks,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Pro- cess., 2021, pp. 3385–3389

work page 2021

[45] [45]

HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection,

A. O. Hogg, M. Jenkins, H. Liu, I. Squires, S. J. Cooper, and L. Pic- inali, “HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection,” IEEE/ACM Trans. Audio, Speech, Language Process., 2024

work page 2024

[46] [46]

Autoencoding hrtfs for DNN based HRTF personalization using anthropometric features,

T.-Y . Chen, T.-H. Kuo, and T.-S. Chi, “Autoencoding hrtfs for DNN based HRTF personalization using anthropometric features,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2019, pp. 271–275. JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. XX, JANUARY 2025 13

work page 2019

[47] [47]

HRTF field: Unifying measured HRTF magnitude representation with neural fields,

Y . Zhang, Y . Wang, and Z. Duan, “HRTF field: Unifying measured HRTF magnitude representation with neural fields,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2023, pp. 1–5

work page 2023

[48] [48]

The manifolds of spatial hearing,

R. Duraiswami and V . C. Raykar, “The manifolds of spatial hearing,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , vol. 3, 2005, pp. iii–285

work page 2005

[49] [49]

Acoustic space learning for sound-source separation and localization on binaural manifolds,

A. Deleforge, F. Forbes, and R. Horaud, “Acoustic space learning for sound-source separation and localization on binaural manifolds,” Int. J. of Neural Systems , vol. 25, no. 01, p. 21, 2015

work page 2015

[50] [50]

Interpolation of head-related transfer functions using manifold learning,

F. Grijalva, L. C. Martini, D. Florencio, and S. Goldenstein, “Interpolation of head-related transfer functions using manifold learning,” IEEE Signal Process. Lett., vol. 24, no. 2, pp. 221–225, 2017

work page 2017

[51] [51]

Virtual sound source positioning using vector base amplitude panning,

V . Pulkki, “Virtual sound source positioning using vector base amplitude panning,” Journal of the audio engineering society , vol. 45, no. 6, pp. 456–466, 1997

work page 1997

[52] [52]

Kernel regression for head-related transfer function interpolation and spectral extrema extraction,

Y . Luo, D. N. Zotkin, H. Daume, and R. Duraiswami, “Kernel regression for head-related transfer function interpolation and spectral extrema extraction,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2013, pp. 256–260

work page 2013

[53] [53]

Head- related transfer function interpolation with a spherical cnn,

X. Chen, F. Ma, Y . Zhang, A. Bastine, and P. N. Samarasinghe, “Head- related transfer function interpolation with a spherical cnn,”arXiv preprint arXiv:2309.08290, 2023

work page arXiv 2023

[54] [54]

Regularized HRTF fitting using spherical harmonics,

D. N. Zotkin, R. Duraiswami, and N. A. Gumerov, “Regularized HRTF fitting using spherical harmonics,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. , 2009, pp. 257–260

work page 2009

[55] [55]

Interpolation of head-related transfer functions based on the common-acoustical-pole and residue model,

K. Watanabe, S. Takane, and Y . Suzuki, “Interpolation of head-related transfer functions based on the common-acoustical-pole and residue model,” Acoust. Sci. Technol., vol. 24, no. 5, pp. 335–337, 2003

work page 2003

[56] [56]

Spatial interpolation of hrtfs approximated by parametric iir filters,

P. Nowak and U. Z ¨olzer, “Spatial interpolation of hrtfs approximated by parametric iir filters,” in Proc. DAGA, 2022

work page 2022

[57] [57]

Neural fourier shift for binaural speech rendering,

J. W. Lee and K. Lee, “Neural fourier shift for binaural speech rendering,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , 2023, pp. 1–5

work page 2023

[58] [58]

Niirf: Neural iir filter field for HRTF upsampling and personalization,

Y . Masuyama, G. Wichern, F. G. Germain, Z. Pan, S. Khurana, C. Hori, and J. Le Roux, “Niirf: Neural iir filter field for HRTF upsampling and personalization,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2024, pp. 1016–1020

work page 2024

[59] [59]

Analyzing head-related transfer function measurements using surface spherical harmonics,

M. J. Evans, J. A. Angus, and A. I. Tew, “Analyzing head-related transfer function measurements using surface spherical harmonics,” J. Acoust. Soc. Amer., vol. 104, no. 4, pp. 2400–2411, 1998

work page 1998

[60] [60]

Interpolation and range extrapolation of HRTFs [head related transfer functions],

R. Duraiswami, D. N. Zotkin, and N. A. Gumerov, “Interpolation and range extrapolation of HRTFs [head related transfer functions],” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , vol. 4, 2004, pp. iv–iv

work page 2004

[61] [61]

HRTF magnitude modeling using a non-regularized least-squares fit of spherical harmonics coef- ficients on incomplete data,

J. Ahrens, M. R. Thomas, and I. Tashev, “HRTF magnitude modeling using a non-regularized least-squares fit of spherical harmonics coef- ficients on incomplete data,” in Proc. Asia Pacific Signal Inf. Process. Assoc. Annu. Summit Conf. , 2012, pp. 1–5

work page 2012

[62] [62]

Directional equalization of sparse head-related transfer function sets for spatial upsampling,

C. P¨orschmann, J. M. Arend, and F. Brinkmann, “Directional equalization of sparse head-related transfer function sets for spatial upsampling,” IEEE/ACM Trans. Audio, Speech, Language Process. , vol. 27, no. 6, pp. 1060–1071, 2019

work page 2019

[63] [63]

Efficient representa- tion and sparse sampling of head-related transfer functions using phase- correction based on ear alignment,

Z. Ben-Hur, D. L. Alon, R. Mehra, and B. Rafaely, “Efficient representa- tion and sparse sampling of head-related transfer functions using phase- correction based on ear alignment,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 27, no. 12, pp. 2249–2262, 2019

work page 2019

[64] [64]

Assessing spherical harmonics interpolation of time-aligned head-related transfer functions,

J. M. Arend, F. Brinkmann, and C. P ¨orschmann, “Assessing spherical harmonics interpolation of time-aligned head-related transfer functions,” J. Audio Eng. Soc. , vol. 69, no. 1/2, pp. 104–117, 2021

work page 2021

[65] [65]

Sound- field reconstruction in reverberant rooms based on compressive sensing and image-source models of early reflections,

S. Damiano, F. Borra, A. Bernardini, F. Antonacci, and A. Sarti, “Sound- field reconstruction in reverberant rooms based on compressive sensing and image-source models of early reflections,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. , 2021, pp. 366–370

work page 2021

[66] [66]

A bayesian framework for the estimation of head-related transfer functions,

G. D. Romigh, R. M. Stern, D. S. Brungart, and B. D. Simpson, “A bayesian framework for the estimation of head-related transfer functions,” J. Acoust. Soc. Amer., vol. 137, no. 4 Supplement, pp. 2323–2323, 2015

work page 2015

[67] [67]

J. M. Arend and C. P ¨orschmann, Spatial upsampling of sparse head- related transfer function sets by directional equalization-influence of the spherical sampling scheme . Universit ¨atsbibliothek der RWTH Aachen Aachen, Germany, 2019

work page 2019

[68] [68]

Comparison of head-related transfer functions pre-processing techniques for spherical harmonics decomposi- tion,

F. Brinkmann and S. Weinzierl, “Comparison of head-related transfer functions pre-processing techniques for spherical harmonics decomposi- tion,” in Proc. AES Int. Conf. Audio Virtual Augmented Reality , 2018

work page 2018

[69] [69]

Manikas, Differential geometry in array processing

A. Manikas, Differential geometry in array processing. Imperial College Press, 2004

work page 2004

[70] [70]

An overview of self-calibration in sensor array processing,

L. Qiong, G. Long, and Y . Zhongfu, “An overview of self-calibration in sensor array processing,” in Proc. Int. Symp. Antennas Propag. EM Theory, 2003, pp. 279–282

work page 2003

[71] [71]

Multichannel eigenspace beam- forming in a reverberant noisy environment with multiple interfering speech signals,

S. Markovich, S. Gannot, and I. Cohen, “Multichannel eigenspace beam- forming in a reverberant noisy environment with multiple interfering speech signals,” IEEE Audio, Speech, Language Process. , vol. 17, no. 6, pp. 1071–1086, 2009

work page 2009

[72] [72]

Joint audio source localization and separation with distributed microphone arrays based on spatially-regularized multichannel nmf,

Y . Sumura, D. Di Carlo, A. A. Nugraha, Y . Bando, and K. Yoshii, “Joint audio source localization and separation with distributed microphone arrays based on spatially-regularized multichannel nmf,” in Proc. Int. Workshop Acoust. Signal Enhancement , 2024, pp. 145–149

work page 2024

[73] [73]

Data-driven multi- microphone speaker localization on manifolds,

B. Laufer-Goldshtein, R. Talmon, S. Gannot et al., “Data-driven multi- microphone speaker localization on manifolds,” Foundations and Trends® in Signal Processing , vol. 14, no. 1–2, pp. 1–161, 2020

work page 2020

[74] [74]

The inverse scattering problem for time-harmonic acoustic waves,

D. Colton, “The inverse scattering problem for time-harmonic acoustic waves,” SIAM review, vol. 26, no. 3, pp. 323–350, 1984

work page 1984

[75] [75]

E. G. Williams, Fourier acoustics: sound radiation and nearfield acous- tical holography. Academic press, 1999

work page 1999

[76] [76]

Sound field estimation: Theories and appli- cations,

N. Ueno, S. Koyama et al., “Sound field estimation: Theories and appli- cations,” Foundations and Trends® in Signal Processing , vol. 19, no. 1, pp. 1–98, 2025

work page 2025

[77] [77]

Gaussian process data fusion for heterogeneous HRTF datasets,

Y . Luo, D. N. Zotkin, and R. Duraiswami, “Gaussian process data fusion for heterogeneous HRTF datasets,” in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. , 2013, pp. 1–4

work page 2013

[78] [78]

Range dependence of the response of a spherical head model,

R. O. Duda and W. L. Martens, “Range dependence of the response of a spherical head model,” J. Acoust. Soc. Amer. , vol. 104, no. 5, pp. 3048– 3058, 1998

work page 1998

[79] [79]

Learning in sinusoidal spaces with physics-informed neural networks,

J. C. Wong, C. C. Ooi, A. Gupta, and Y .-S. Ong, “Learning in sinusoidal spaces with physics-informed neural networks,” IEEE Trans. Artif. Intell., vol. 5, no. 3, pp. 985–1000, 2022

work page 2022

[80] [80]

Heat flow from the earth’s interior: analysis of the global data set,

H. N. Pollack, S. J. Hurter, and J. R. Johnson, “Heat flow from the earth’s interior: analysis of the global data set,” Reviews of Geophysics , vol. 31, no. 3, pp. 267–280, 1993

work page 1993