RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses

Chunyi Sun; Jihui Zhang; Prasanga N. Samarasinghe; Shaoheng Xu; Thushara D. Abhayapala

arxiv: 2602.01861 · v3 · submitted 2026-02-02 · 📡 eess.AS · cs.LG

RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses

Shaoheng Xu , Chunyi Sun , Jihui Zhang , Prasanga N. Samarasinghe , Thushara D. Abhayapala This is my paper

Pith reviewed 2026-05-16 08:39 UTC · model grok-4.3

classification 📡 eess.AS cs.LG

keywords room impulse responsetransformercontinuous reconstructionmicrophone arrayinterpolationacoustic signal processingearly reflectionslate reverberation

0 comments

The pith

A coordinate-guided transformer reconstructs room impulse responses continuously from sparse microphone arrays.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RIR-Former as a model that reconstructs room impulse responses at arbitrary points using only measurements from sparse microphones. It adds sinusoidal encoding of positions to a transformer so the network can work in continuous space rather than on a fixed grid. A segmented decoder processes the early reflections and late reverberation in separate branches to reduce error across the full response. Tests in varied simulated rooms show lower normalized mean square error and cosine distance than prior methods at different missing rates and array layouts. If the approach generalizes, fewer sensors could suffice for many spatial audio tasks.

Core claim

RIR-Former is a grid-free, one-step feed-forward transformer that incorporates microphone coordinates through sinusoidal encoding and uses a segmented multi-branch decoder to reconstruct both early and late parts of the room impulse response, delivering lower NMSE and cosine distance than baselines across simulated environments with varying missing rates and array configurations.

What carries the argument

Sinusoidal encoding module for microphone position information inside a transformer backbone, paired with a segmented multi-branch decoder that separates early reflections from late reverberation.

If this is right

Lower normalized mean square error and cosine distance than state-of-the-art methods under different missing rates.
Grid-free interpolation at any array location without retraining.
Separate treatment of early and late segments improves accuracy over the whole impulse response.
One-step feed-forward inference that supports practical acoustic processing pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The coordinate encoding could allow the same architecture to handle irregular three-dimensional arrays without architectural changes.
Extending the decoder branches to include time-varying sources would open the method to dynamic scenes.
If real-world results match the simulations, the approach could lower the sensor count required for room-acoustics modeling in virtual reality and teleconferencing.

Load-bearing premise

Performance gains measured on simulated rooms with random linear arrays will hold when the same model is applied to real recorded data and more complex microphone geometries.

What would settle it

Apply the trained model to a set of real measured room impulse responses captured with a non-linear microphone array in a physical room and check whether the NMSE and cosine distance remain better than the same baselines.

read the original abstract

Room impulse responses (RIRs) are essential for many acoustic signal processing tasks, yet measuring them densely across space is often impractical. In this work, we propose RIR-Former, a grid-free, one-step feed-forward model for RIR reconstruction. By introducing a sinusoidal encoding module into a transformer backbone, our method effectively incorporates microphone position information, enabling interpolation at arbitrary array locations. Furthermore, a segmented multi-branch decoder is designed to separately handle early reflections and late reverberation, improving reconstruction across the entire RIR. Experiments on diverse simulated acoustic environments demonstrate that RIR-Former consistently outperforms state-of-the-art baselines in terms of normalized mean square error (NMSE) and cosine distance (CD), under varying missing rates and array configurations. These results highlight the potential of our approach for practical deployment and motivate future work on scaling from randomly spaced linear arrays to complex array geometries, dynamic acoustic scenes, and real-world environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RIR-Former adds a position-encoded transformer with split early/late decoding that beats baselines on simulated RIR interpolation, but all gains stay inside synthetic data.

read the letter

The core contribution is a transformer that takes microphone coordinates through sinusoidal encoding and uses a multi-branch decoder to treat early reflections and late reverberation separately. This lets it interpolate RIRs at arbitrary points without a fixed grid. On the simulated rooms they tested, it improves NMSE and cosine distance over prior methods across different missing rates and linear array setups. That is a clean engineering adaptation for the continuous reconstruction task. The split decoder makes sense because early and late parts have different statistical structure, and the position encoding directly addresses the coordinate input problem. The paper reports consistent gains, which is useful for anyone who needs to fill in sparse array measurements in controlled settings. The main limitation is that every quantitative result comes from image-source or similar synthetic generators with random linear arrays. No real-room measurements from standard corpora appear, so the margins have not been checked against scattering, sensor noise, or non-exponential decay that the abstract itself lists as future work. Without those tests or reported ablations and error bars, it is hard to know how much of the improvement is tied to the simulation distribution. The architecture itself looks internally consistent and the citations track the relevant prior acoustic modeling literature. This is the kind of paper that belongs in a reading group for spatial audio or VR acoustics people who already work with simulated data and want a new interpolation baseline. It is worth sending to peer review so referees can ask for real-data experiments and implementation details rather than desk-rejecting it outright.

Referee Report

2 major / 2 minor

Summary. The paper introduces RIR-Former, a grid-free one-step transformer model for continuous RIR reconstruction from sparse microphone measurements. It incorporates a sinusoidal encoding module to embed microphone coordinates and a segmented multi-branch decoder that separately processes early reflections and late reverberation. Experiments on simulated acoustic environments with random linear arrays report consistent outperformance over baselines in NMSE and cosine distance across varying missing rates and configurations.

Significance. If the gains are robust, the approach could enable efficient interpolation of RIRs at arbitrary positions without dense sampling, benefiting spatial audio, VR, and acoustic modeling. The combination of coordinate-guided attention and segmented decoding addresses the distinct temporal characteristics of RIRs in a feed-forward manner. The exclusive reliance on simulated data, however, limits the assessed significance for the practical deployment highlighted in the abstract.

major comments (2)

[Abstract] Abstract: the claim of 'consistent outperformance' is presented without any numerical margins, error bars, statistical tests, or ablation results, preventing verification of the magnitude or reliability of the reported improvements in NMSE and CD.
[Experiments] Experiments section: all quantitative results are confined to synthetic RIRs generated by image-source methods using random linear arrays. No evaluation appears on real measured corpora (e.g., AIR, RWCP), despite the abstract explicitly flagging domain shift to real-world environments as future work; this is load-bearing for the practical-deployment claim.

minor comments (2)

[Methods] Methods: provide the precise formulation of the sinusoidal positional encoding and the training loss (including any weighting between early and late segments) to support reproducibility.
[Evaluation] Notation: define NMSE and CD explicitly with their normalization and reference signals in the evaluation section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point-by-point below, proposing targeted revisions to improve clarity and accuracy without altering the core contributions or experimental scope.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'consistent outperformance' is presented without any numerical margins, error bars, statistical tests, or ablation results, preventing verification of the magnitude or reliability of the reported improvements in NMSE and CD.

Authors: We agree that the abstract would benefit from greater specificity. In the revised version, we will incorporate concrete quantitative margins drawn directly from the existing experimental results (e.g., average NMSE reductions and CD improvements across missing rates), along with a brief statement that values are means over multiple random configurations. This change requires only textual editing and will allow readers to assess the scale of the reported gains. revision: yes
Referee: [Experiments] Experiments section: all quantitative results are confined to synthetic RIRs generated by image-source methods using random linear arrays. No evaluation appears on real measured corpora (e.g., AIR, RWCP), despite the abstract explicitly flagging domain shift to real-world environments as future work; this is load-bearing for the practical-deployment claim.

Authors: We acknowledge the exclusive use of simulated data generated via the image-source method. This choice follows standard practice in the RIR reconstruction literature, enabling precise control over room parameters, array geometries, and missing rates for rigorous benchmarking. The abstract already qualifies results as simulated and explicitly lists real-world evaluation as future work. To address the concern, we will revise the abstract to state that the method demonstrates effectiveness in simulated settings with potential for practical deployment, pending validation on measured data. We cannot add real-corpus experiments at this stage, as they would require new data acquisition and annotation beyond the current scope. revision: partial

Circularity Check

0 steps flagged

No circularity: data-driven model with experimental validation only

full rationale

The manuscript presents RIR-Former as a coordinate-guided transformer architecture with sinusoidal positional encoding and a segmented early/late decoder. All performance claims rest on direct experimental comparisons (NMSE, CD) against baselines on synthetically generated RIRs; no derivation chain, uniqueness theorem, or fitted-parameter prediction is invoked. No equations reduce to their inputs by construction, no self-citations are load-bearing for the core method, and the approach is fully self-contained as a supervised learning model. The simulation-to-real gap noted in the abstract is a generalization concern, not a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented physical entities are stated. The model itself is a new neural architecture whose internal weights are learned from data.

pith-pipeline@v0.9.0 · 5484 in / 946 out tokens · 23209 ms · 2026-05-16T08:39:09.320793+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sinusoidal encoding module into a transformer backbone... segmented multi-branch decoder... Experiments on diverse simulated acoustic environments

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 2 internal anchors

[1]

INTRODUCTION Room Impulse Responses (RIRs) play a crucial role in acoustic sig- nal processing. They encapsulate the acoustic characteristics of an environment and are essential for tasks such as: 1) quantifying objec- tive metrics for room design [1], 2) enabling applications like sound source localization [2], and 3) supporting immersive experiences in ...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Consider a general three-dimensional acoustic environment ex- hibiting reverberant characteristics

PROBLEM FORMULA TION The goal of this work is to reconstruct full RIRs at unmeasured lo- cations based on a limited set of measured RIRs within a room. Consider a general three-dimensional acoustic environment ex- hibiting reverberant characteristics. Let there beMmicrophones located at positionsx m ≡(x m, y m, z m)form= 1,2, . . . , M, andQsources locate...

work page
[3]

γ𝐱"γ𝐱"#! γ𝐱

PROPOSED METHOD Relying on handcrafted geometric priors is often inflexible; per- scene optimization is computationally expensive and lacks general- ization; and treating the RIR as an image with local generative mod- els imposes strong locality assumptions, emphasizing pattern com- pletion over understanding spatial relationships. A more principled solut...

work page
[4]

We compare our method against three existing approaches

EXPERIMENTS In this section, we evaluate the RIR reconstruction performance of our proposedRIR-F ormerthrough Monte Carlo simulations under diverse acoustic scenarios. We compare our method against three existing approaches. 4.1. Experiment Setup We simulate realistic meeting room environments via Monte Carlo tests. A total of 8000 shoebox rooms are gener...

work page 2024
[5]

CONCLUSION In this paper, we proposed a grid-free, one-step feed-forward model for RIR reconstruction. By incorporating a sinusoidal encoding module into a Transformer architecture, our model effectively en- codes microphone positions, enabling accurate reconstruction at arbitrary spatial locations. The segmented multi-branch decoder balances the importan...

work page
[6]

Review of objective room acoustics measures and future needs,

J. S. Bradley, “Review of objective room acoustics measures and future needs,”Appl. Acoust., vol. 72, no. 10, pp. 713–720, 2011

work page 2011
[7]

Acoustic reflector localization: Novel image source reversion and direct localization methods,

L. Remaggi, P. J. B. Jackson, P. Coleman, and W. Wang, “Acoustic reflector localization: Novel image source reversion and direct localization methods,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 2, pp. 296–309, 2017

work page 2017
[8]

V orländer,Auralization: Fundamentals of Acoustics, Mod- elling, Simulation, Algorithms and Acoustic Virtual Reality, Springer, Berlin, Heidelberg, 2008

M. V orländer,Auralization: Fundamentals of Acoustics, Mod- elling, Simulation, Algorithms and Acoustic Virtual Reality, Springer, Berlin, Heidelberg, 2008

work page 2008
[9]

Generative data augmentation challenge: Synthesis of room acoustics for speaker distance estimation,

J. Lin, G. Götz, H. S. Llopis, H. Hafsteinsson, S. Guðjónsson, D. G. Nielsen, F. Pind, P. Smaragdis, D. Manocha, J. Hershey, T. Kristjansson, and M. Kim, “Generative data augmentation challenge: Synthesis of room acoustics for speaker distance estimation,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. Workshops (ICASSPW), 2025

work page 2025
[10]

Kernel ridge re- gression with constraint of Helmholtz equation for sound field interpolation,

N. Ueno, S. Koyama, and H. Saruwatari, “Kernel ridge re- gression with constraint of Helmholtz equation for sound field interpolation,” inProc. Int. Workshop Acoust. Signal Enhanc. (IWAENC), 2018, pp. 436–440

work page 2018
[11]

Kernel interpolation of incident sound field in region includ- ing scattering objects,

S. Koyama, M. Nakada, J. G. C. Ribeiro, and H. Saruwatari, “Kernel interpolation of incident sound field in region includ- ing scattering objects,” inProc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), 2023, pp. 1–5

work page 2023
[12]

Geometry-based spatial sound acquisition using distributed microphone arrays,

O. Thiergart, G. Del Galdo, M. Taseska, and E. A. P. Habets, “Geometry-based spatial sound acquisition using distributed microphone arrays,”IEEE Trans. Audio, Speech, Lang. Pro- cess., vol. 21, no. 12, pp. 2583–2594, 2013

work page 2013
[13]

A parametric approach to virtual miking for sources of arbitrary directivity,

M. Pezzoli, F. Borra, F. Antonacci, S. Tubaro, and A. Sarti, “A parametric approach to virtual miking for sources of arbitrary directivity,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 2333–2348, 2020

work page 2020
[14]

Compressed sensing of impulse responses in rooms of unknown properties and contents,

E. Zea, “Compressed sensing of impulse responses in rooms of unknown properties and contents,”J. Sound Vib., vol. 459, pp. 114871, 2019

work page 2019
[15]

Sound field separation in a mixed acoustic environment using a sparse array of higher order spherical microphones,

A. Fahim, P. N. Samarasinghe, and T. D. Abhayapala, “Sound field separation in a mixed acoustic environment using a sparse array of higher order spherical microphones,” inProc. Hands- free Speech Commun. Microphone Arrays, 2017, pp. 151–155

work page 2017
[16]

Sparse sound field representation using complex orthogonal matching pursuit,

S. Xu, J. A. Zhang, T. D. Abhayapala, A. Bastine, W. T. Lai, and P. N. Samarasinghe, “Sparse sound field representation using complex orthogonal matching pursuit,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024, pp. 1336–1340

work page 2024
[17]

Sparsity- based sound field separation in the spherical harmonics do- main,

M. Pezzoli, M. Cobos, F. Antonacci, and A. Sarti, “Sparsity- based sound field separation in the spherical harmonics do- main,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Pro- cess. (ICASSP), 2022, pp. 1051–1055

work page 2022
[18]

Iterative and complex orthogonal matching pursuit for broadband sparse sound field reconstruction,

S. Xu, J. A. Zhang, T. D. Abhayapala, A. Bastine, and P. N. Samarasinghe, “Iterative and complex orthogonal matching pursuit for broadband sparse sound field reconstruction,” in Proc. Int. Workshop Acoust. Signal Enhanc. (IWAENC), 2024, pp. 195–199

work page 2024
[19]

Virtual navigation via higher order distributed sound sources,

T. D. Abhayapala, J. A. Zhang, S. Xu, D. L. Alon, Z. Ben-Hur, and P. N. Samarasinghe, “Virtual navigation via higher order distributed sound sources,” inProc. F orum Acusticum, Turin, Italy, 2023, pp. 647–653

work page 2023
[20]

Physics-informed machine learning for sound field estimation: Fundamentals, state of the art, and chal- lenges,

S. Koyama, J. G. C. Ribeiro, T. Nakamura, N. Ueno, and M. Pezzoli, “Physics-informed machine learning for sound field estimation: Fundamentals, state of the art, and chal- lenges,”IEEE Signal Process. Mag., vol. 41, no. 6, pp. 60–71, 2024

work page 2024
[21]

Generative models for sound field reconstruc- tion,

E. Fernandez-Grande, X. Karakonstantis, D. Caviedes-Nozal, and P. Gerstoft, “Generative models for sound field reconstruc- tion,”J. Acoust. Soc. Am., vol. 153, no. 2, pp. 1179–1190, 2023

work page 2023
[22]

Deep prior approach for room impulse response reconstruction,

M. Pezzoli, D. Perini, A. Bernardini, F. Borra, F. Antonacci, and A. Sarti, “Deep prior approach for room impulse response reconstruction,”Sensors, vol. 22, no. 7, pp. 2710, 2022

work page 2022
[23]

Low- rank adaptation of deep prior neural networks for room impulse response reconstruction,

M. Pezzoli, F. Miotello, S. Koyama, and F. Antonacci, “Low- rank adaptation of deep prior neural networks for room impulse response reconstruction,” inProc. IEEE Workshop Appl. Sig- nal Process. Audio Acoust. (WASPAA), 2025, pp. 1–4

work page 2025
[24]

A physics-informed neural network approach for nearfield acous- tic holography,

M. Olivieri, M. Pezzoli, F. Antonacci, and A. Sarti, “A physics-informed neural network approach for nearfield acous- tic holography,”Sensors, vol. 21, no. 23, pp. 7834, 2021

work page 2021
[25]

Room impulse response reconstruction with physics-informed deep learning,

X. Karakonstantis, D. Caviedes-Nozal, A. Richard, and E. Fernandez-Grande, “Room impulse response reconstruction with physics-informed deep learning,”J. Acoust. Soc. Am., vol. 155, no. 2, pp. 1048–1059, 2024

work page 2024
[26]

Reconstruction of sound field through diffusion models,

F. Miotello, L. Comanducci, M. Pezzoli, A. Bernardini, F. An- tonacci, and A. Sarti, “Reconstruction of sound field through diffusion models,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024, pp. 1476–1480

work page 2024
[27]

INRAS: Implicit neural representation for audio scenes,

K. Su, M. Chen, and E. Shlizerman, “INRAS: Implicit neural representation for audio scenes,” inAdv. Neural Inf. Process. Syst. (NeurIPS), 2022, vol. 35, pp. 8144–8158

work page 2022
[28]

Learning neural acoustic fields,

A. Luo, Y . Du, M. Tarr, J. Tenenbaum, A. Torralba, and C. Gan, “Learning neural acoustic fields,” inAdv. Neural Inf. Process. Syst. (NeurIPS), 2022, vol. 35, pp. 3165–3177

work page 2022
[29]

Dif- fusionRIR: Room impulse response interpolation using diffu- sion models,

S. Della Torre, M. Pezzoli, F. Antonacci, and S. Gannot, “Dif- fusionRIR: Room impulse response interpolation using diffu- sion models,”arXiv preprint arXiv:2504.20625, 2025

work page arXiv 2025
[30]

On the evalu- ation of estimated impulse responses,

D. R. Morgan, J. Benesty, and M. M. Sondhi, “On the evalu- ation of estimated impulse responses,”IEEE Signal Process. Lett., vol. 5, no. 7, pp. 174–176, 1998

work page 1998
[31]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdv. Neural Inf. Process. Syst. (NeurIPS), 2017, vol. 30, pp. 5998–6008

work page 2017
[32]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, “Decoupled weight decay regular- ization,”arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

Pyroomacoustics: A Python package for audio room simulation and array pro- cessing algorithms,

R. Scheibler, E. Bezzam, and I. Dokmanic, “Pyroomacoustics: A Python package for audio room simulation and array pro- cessing algorithms,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2018, pp. 351–355

work page 2018
[34]

Image method for efficiently simu- lating small-room acoustics,

J. Allen and D. Berkley, “Image method for efficiently simu- lating small-room acoustics,”J. Acoust. Soc. Am., vol. 65, no. 4, pp. 943–950, 1979

work page 1979
[35]

Room impulse response generator,

E. A. P. Habets, “Room impulse response generator,” Tech. Rep. 2.4, Technische Universiteit Eindhoven, 2006

work page 2006
[36]

de Boor,A Practical Guide to Splines, vol

C. de Boor,A Practical Guide to Splines, vol. 27, Springer, New York, NY , 1978. 5

work page 1978

[1] [1]

INTRODUCTION Room Impulse Responses (RIRs) play a crucial role in acoustic sig- nal processing. They encapsulate the acoustic characteristics of an environment and are essential for tasks such as: 1) quantifying objec- tive metrics for room design [1], 2) enabling applications like sound source localization [2], and 3) supporting immersive experiences in ...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

Consider a general three-dimensional acoustic environment ex- hibiting reverberant characteristics

PROBLEM FORMULA TION The goal of this work is to reconstruct full RIRs at unmeasured lo- cations based on a limited set of measured RIRs within a room. Consider a general three-dimensional acoustic environment ex- hibiting reverberant characteristics. Let there beMmicrophones located at positionsx m ≡(x m, y m, z m)form= 1,2, . . . , M, andQsources locate...

work page

[3] [3]

γ𝐱"γ𝐱"#! γ𝐱

PROPOSED METHOD Relying on handcrafted geometric priors is often inflexible; per- scene optimization is computationally expensive and lacks general- ization; and treating the RIR as an image with local generative mod- els imposes strong locality assumptions, emphasizing pattern com- pletion over understanding spatial relationships. A more principled solut...

work page

[4] [4]

We compare our method against three existing approaches

EXPERIMENTS In this section, we evaluate the RIR reconstruction performance of our proposedRIR-F ormerthrough Monte Carlo simulations under diverse acoustic scenarios. We compare our method against three existing approaches. 4.1. Experiment Setup We simulate realistic meeting room environments via Monte Carlo tests. A total of 8000 shoebox rooms are gener...

work page 2024

[5] [5]

CONCLUSION In this paper, we proposed a grid-free, one-step feed-forward model for RIR reconstruction. By incorporating a sinusoidal encoding module into a Transformer architecture, our model effectively en- codes microphone positions, enabling accurate reconstruction at arbitrary spatial locations. The segmented multi-branch decoder balances the importan...

work page

[6] [6]

Review of objective room acoustics measures and future needs,

J. S. Bradley, “Review of objective room acoustics measures and future needs,”Appl. Acoust., vol. 72, no. 10, pp. 713–720, 2011

work page 2011

[7] [7]

Acoustic reflector localization: Novel image source reversion and direct localization methods,

L. Remaggi, P. J. B. Jackson, P. Coleman, and W. Wang, “Acoustic reflector localization: Novel image source reversion and direct localization methods,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 2, pp. 296–309, 2017

work page 2017

[8] [8]

V orländer,Auralization: Fundamentals of Acoustics, Mod- elling, Simulation, Algorithms and Acoustic Virtual Reality, Springer, Berlin, Heidelberg, 2008

M. V orländer,Auralization: Fundamentals of Acoustics, Mod- elling, Simulation, Algorithms and Acoustic Virtual Reality, Springer, Berlin, Heidelberg, 2008

work page 2008

[9] [9]

Generative data augmentation challenge: Synthesis of room acoustics for speaker distance estimation,

J. Lin, G. Götz, H. S. Llopis, H. Hafsteinsson, S. Guðjónsson, D. G. Nielsen, F. Pind, P. Smaragdis, D. Manocha, J. Hershey, T. Kristjansson, and M. Kim, “Generative data augmentation challenge: Synthesis of room acoustics for speaker distance estimation,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. Workshops (ICASSPW), 2025

work page 2025

[10] [10]

Kernel ridge re- gression with constraint of Helmholtz equation for sound field interpolation,

N. Ueno, S. Koyama, and H. Saruwatari, “Kernel ridge re- gression with constraint of Helmholtz equation for sound field interpolation,” inProc. Int. Workshop Acoust. Signal Enhanc. (IWAENC), 2018, pp. 436–440

work page 2018

[11] [11]

Kernel interpolation of incident sound field in region includ- ing scattering objects,

S. Koyama, M. Nakada, J. G. C. Ribeiro, and H. Saruwatari, “Kernel interpolation of incident sound field in region includ- ing scattering objects,” inProc. IEEE Workshop Appl. Signal Process. Audio Acoust. (WASPAA), 2023, pp. 1–5

work page 2023

[12] [12]

Geometry-based spatial sound acquisition using distributed microphone arrays,

O. Thiergart, G. Del Galdo, M. Taseska, and E. A. P. Habets, “Geometry-based spatial sound acquisition using distributed microphone arrays,”IEEE Trans. Audio, Speech, Lang. Pro- cess., vol. 21, no. 12, pp. 2583–2594, 2013

work page 2013

[13] [13]

A parametric approach to virtual miking for sources of arbitrary directivity,

M. Pezzoli, F. Borra, F. Antonacci, S. Tubaro, and A. Sarti, “A parametric approach to virtual miking for sources of arbitrary directivity,”IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 2333–2348, 2020

work page 2020

[14] [14]

Compressed sensing of impulse responses in rooms of unknown properties and contents,

E. Zea, “Compressed sensing of impulse responses in rooms of unknown properties and contents,”J. Sound Vib., vol. 459, pp. 114871, 2019

work page 2019

[15] [15]

Sound field separation in a mixed acoustic environment using a sparse array of higher order spherical microphones,

A. Fahim, P. N. Samarasinghe, and T. D. Abhayapala, “Sound field separation in a mixed acoustic environment using a sparse array of higher order spherical microphones,” inProc. Hands- free Speech Commun. Microphone Arrays, 2017, pp. 151–155

work page 2017

[16] [16]

Sparse sound field representation using complex orthogonal matching pursuit,

S. Xu, J. A. Zhang, T. D. Abhayapala, A. Bastine, W. T. Lai, and P. N. Samarasinghe, “Sparse sound field representation using complex orthogonal matching pursuit,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024, pp. 1336–1340

work page 2024

[17] [17]

Sparsity- based sound field separation in the spherical harmonics do- main,

M. Pezzoli, M. Cobos, F. Antonacci, and A. Sarti, “Sparsity- based sound field separation in the spherical harmonics do- main,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Pro- cess. (ICASSP), 2022, pp. 1051–1055

work page 2022

[18] [18]

Iterative and complex orthogonal matching pursuit for broadband sparse sound field reconstruction,

S. Xu, J. A. Zhang, T. D. Abhayapala, A. Bastine, and P. N. Samarasinghe, “Iterative and complex orthogonal matching pursuit for broadband sparse sound field reconstruction,” in Proc. Int. Workshop Acoust. Signal Enhanc. (IWAENC), 2024, pp. 195–199

work page 2024

[19] [19]

Virtual navigation via higher order distributed sound sources,

T. D. Abhayapala, J. A. Zhang, S. Xu, D. L. Alon, Z. Ben-Hur, and P. N. Samarasinghe, “Virtual navigation via higher order distributed sound sources,” inProc. F orum Acusticum, Turin, Italy, 2023, pp. 647–653

work page 2023

[20] [20]

Physics-informed machine learning for sound field estimation: Fundamentals, state of the art, and chal- lenges,

S. Koyama, J. G. C. Ribeiro, T. Nakamura, N. Ueno, and M. Pezzoli, “Physics-informed machine learning for sound field estimation: Fundamentals, state of the art, and chal- lenges,”IEEE Signal Process. Mag., vol. 41, no. 6, pp. 60–71, 2024

work page 2024

[21] [21]

Generative models for sound field reconstruc- tion,

E. Fernandez-Grande, X. Karakonstantis, D. Caviedes-Nozal, and P. Gerstoft, “Generative models for sound field reconstruc- tion,”J. Acoust. Soc. Am., vol. 153, no. 2, pp. 1179–1190, 2023

work page 2023

[22] [22]

Deep prior approach for room impulse response reconstruction,

M. Pezzoli, D. Perini, A. Bernardini, F. Borra, F. Antonacci, and A. Sarti, “Deep prior approach for room impulse response reconstruction,”Sensors, vol. 22, no. 7, pp. 2710, 2022

work page 2022

[23] [23]

Low- rank adaptation of deep prior neural networks for room impulse response reconstruction,

M. Pezzoli, F. Miotello, S. Koyama, and F. Antonacci, “Low- rank adaptation of deep prior neural networks for room impulse response reconstruction,” inProc. IEEE Workshop Appl. Sig- nal Process. Audio Acoust. (WASPAA), 2025, pp. 1–4

work page 2025

[24] [24]

A physics-informed neural network approach for nearfield acous- tic holography,

M. Olivieri, M. Pezzoli, F. Antonacci, and A. Sarti, “A physics-informed neural network approach for nearfield acous- tic holography,”Sensors, vol. 21, no. 23, pp. 7834, 2021

work page 2021

[25] [25]

Room impulse response reconstruction with physics-informed deep learning,

X. Karakonstantis, D. Caviedes-Nozal, A. Richard, and E. Fernandez-Grande, “Room impulse response reconstruction with physics-informed deep learning,”J. Acoust. Soc. Am., vol. 155, no. 2, pp. 1048–1059, 2024

work page 2024

[26] [26]

Reconstruction of sound field through diffusion models,

F. Miotello, L. Comanducci, M. Pezzoli, A. Bernardini, F. An- tonacci, and A. Sarti, “Reconstruction of sound field through diffusion models,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2024, pp. 1476–1480

work page 2024

[27] [27]

INRAS: Implicit neural representation for audio scenes,

K. Su, M. Chen, and E. Shlizerman, “INRAS: Implicit neural representation for audio scenes,” inAdv. Neural Inf. Process. Syst. (NeurIPS), 2022, vol. 35, pp. 8144–8158

work page 2022

[28] [28]

Learning neural acoustic fields,

A. Luo, Y . Du, M. Tarr, J. Tenenbaum, A. Torralba, and C. Gan, “Learning neural acoustic fields,” inAdv. Neural Inf. Process. Syst. (NeurIPS), 2022, vol. 35, pp. 3165–3177

work page 2022

[29] [29]

Dif- fusionRIR: Room impulse response interpolation using diffu- sion models,

S. Della Torre, M. Pezzoli, F. Antonacci, and S. Gannot, “Dif- fusionRIR: Room impulse response interpolation using diffu- sion models,”arXiv preprint arXiv:2504.20625, 2025

work page arXiv 2025

[30] [30]

On the evalu- ation of estimated impulse responses,

D. R. Morgan, J. Benesty, and M. M. Sondhi, “On the evalu- ation of estimated impulse responses,”IEEE Signal Process. Lett., vol. 5, no. 7, pp. 174–176, 1998

work page 1998

[31] [31]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdv. Neural Inf. Process. Syst. (NeurIPS), 2017, vol. 30, pp. 5998–6008

work page 2017

[32] [32]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter, “Decoupled weight decay regular- ization,”arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [33]

Pyroomacoustics: A Python package for audio room simulation and array pro- cessing algorithms,

R. Scheibler, E. Bezzam, and I. Dokmanic, “Pyroomacoustics: A Python package for audio room simulation and array pro- cessing algorithms,” inProc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2018, pp. 351–355

work page 2018

[34] [34]

Image method for efficiently simu- lating small-room acoustics,

J. Allen and D. Berkley, “Image method for efficiently simu- lating small-room acoustics,”J. Acoust. Soc. Am., vol. 65, no. 4, pp. 943–950, 1979

work page 1979

[35] [35]

Room impulse response generator,

E. A. P. Habets, “Room impulse response generator,” Tech. Rep. 2.4, Technische Universiteit Eindhoven, 2006

work page 2006

[36] [36]

de Boor,A Practical Guide to Splines, vol

C. de Boor,A Practical Guide to Splines, vol. 27, Springer, New York, NY , 1978. 5

work page 1978