Multimodal Deep Learning Method for Real-Time Spatial Room Impulse Response Computing

· 2026 · eess.AS · arXiv 2604.05545

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We propose a multimodal deep learning model for VR auralization that generates spatial room impulse responses (SRIRs) in real time to reconstruct scene-specific auditory perception. Employing SRIRs as the output reduces computational complexity and facilitates integration with personalized head-related transfer functions. The model takes two modalities as input: scene information and waveforms, where the waveform corresponds to the low-order reflections (LoR). LoR can be efficiently computed using geometrical acoustics (GA) but remains difficult for deep learning models to predict accurately. Scene geometry, acoustic properties, source coordinates, and listener coordinates are first used to compute LoR in real time via GA, and both LoR and these features are subsequently provided as inputs to the model. A new dataset was constructed, consisting of multiple scenes and their corresponding SRIRs. The dataset exhibits greater diversity. Experimental results demonstrate the superior performance of the proposed model.

representative citing papers

Multimodal Deep Learning Method for Real-Time Spatial Room Impulse Response Computing

eess.AS · 2026-04-07 · unverdicted · novelty 5.0

A multimodal neural network generates real-time spatial room impulse responses by feeding geometrically computed low-order reflections and scene features into a deep learning model trained on a new diverse dataset.

citing papers explorer

Showing 1 of 1 citing paper.

Multimodal Deep Learning Method for Real-Time Spatial Room Impulse Response Computing eess.AS · 2026-04-07 · unverdicted · none · ref 2 · internal anchor
A multimodal neural network generates real-time spatial room impulse responses by feeding geometrically computed low-order reflections and scene features into a deep learning model trained on a new diverse dataset.

Multimodal Deep Learning Method for Real-Time Spatial Room Impulse Response Computing

fields

years

verdicts

representative citing papers

citing papers explorer