pith. sign in

arxiv: 2606.09557 · v1 · pith:H6WIWXYCnew · submitted 2026-06-08 · 📡 eess.AS

Your U-Net Dereverberation Model is Secretly an RIR Encoder

classification 📡 eess.AS
keywords dereverberationmodelroomabilitydiffusion-baseddiscriminativeembeddingsperformance
0
0 comments X
read the original abstract

In this work, we analyze the ability of NCSN++ U-Net based audio dereverberation models to capture global room characteristics in their intermediate representations. Through an empirical study of both a state-of-the-art diffusion-based model and a discriminative counterpart, we show that deeper layers encode structured room impulse response (RIR)-dependent embeddings. Moreover, the discriminative ability of this implicit room representation correlates with dereverberation performance across objective metrics. Motivated by this observation, we propose a training strategy that explicitly conditions the network on pre-trained RIR embeddings, obtained via self-supervised contrastive learning. Incorporating RIR conditioning improves representation quality, accelerates convergence, and enhances dereverberation performance, while significantly reducing the number of reverse diffusion steps required by the diffusion-based model during inference.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.