pith. sign in

arxiv: 2512.16318 · v2 · submitted 2025-12-18 · 📡 eess.AS

Learning Filters in Feedback Delay Networks from Noisy Room Impulse Responses

Pith reviewed 2026-05-16 21:40 UTC · model grok-4.3

classification 📡 eess.AS
keywords feedback delay networksattenuation filtersroom impulse responsesnoise modelingdifferentiable signal processinggradient optimizationartificial reverberation
0
0 comments X

The pith

Explicitly modeling noise during optimization yields accurate attenuation filter estimates for feedback delay networks even when room impulse responses are noisy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a method for tuning the recursive attenuation filters inside feedback delay networks when the target room impulse responses contain background noise. Standard gradient-based optimization produces biased filter estimates because noise creates spurious loss minima; the proposed approach adds an explicit stationary noise term to the loss so that the optimizer attributes energy decay to the filters rather than to the noise floor. This matters for real-world reverberation design, where measured impulse responses are rarely clean, and the method is shown to reduce estimation error on both synthetic and measured targets while also revealing how frequency-independent parameters affect the tuning stability.

Core claim

By augmenting the loss function with an explicit noise model, the optimization of frequency-dependent attenuation filters in a feedback delay network recovers the intended decay rates and spectral shape even when the target impulse response has low signal-to-noise ratio; the same procedure also quantifies the sensitivity of those filters to small changes in the network’s frequency-independent gains and delays.

What carries the argument

Explicit additive noise term inside the differentiable loss used to optimize recursive attenuation filters of a feedback delay network.

If this is right

  • Attenuation filter estimates remain accurate down to lower signal-to-noise ratios than previously possible.
  • Gradient optimization of feedback delay networks becomes more reproducible once frequency-independent parameters are held fixed or jointly optimized with care.
  • The same noise-modeling step can be inserted into any differentiable loss that compares synthesized and measured impulse responses.
  • Statistical tests on both synthetic and real data confirm the accuracy gain is consistent across multiple room geometries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The stationary-noise assumption could be relaxed to a slowly varying noise floor without changing the overall optimization architecture.
  • Similar explicit modeling of measurement artifacts may improve other differentiable audio tasks such as equalizer design or modal synthesis from noisy data.
  • The observed sensitivity to frequency-independent parameters suggests that joint optimization schedules or regularization on those parameters would further stabilize filter learning.

Load-bearing premise

Background noise behaves as a simple stationary additive process whose interaction with the reverberant tail does not systematically bias the gradient updates for the attenuation filters.

What would settle it

On a set of measured room impulse responses whose true attenuation filters are known from a controlled anechoic reference, measure whether the noise-aware optimizer recovers filter coefficients within a stated error tolerance while the baseline optimizer without noise modeling does not.

Figures

Figures reproduced from arXiv: 2512.16318 by Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa V\"alim\"aki.

Figure 1
Figure 1. Figure 1: Structure of a SISO FDN with 𝑁 = 3 delay lines, where the attenuation filters are located in blocks Γ𝑖(𝑧). that is, longer delays require stronger attenuation [JC91]. This proportionality allows the definition of a global per-sample attenuation that is valid regardless of the signal’s path through the feedback loop. More specifically, for a target 𝑇60 (𝜔) the prototype magnitude response of the attenuation… view at source ↗
Figure 2
Figure 2. Figure 2: Magnitude response (top) and 𝑇60 curve (bottom) produced by the first-order shelving filter for each delay line with lengths [997, 1153, 1327, 1559, 1801, 2099] samples. The color gradient is chosen so that darker colors correspond to longer delay lines, and hence stronger attenuation. The vertical dashed lines indicate the crossover frequency 𝑓c = 10 kHz. The reverberation times at dc and Nyquist limit ar… view at source ↗
Figure 3
Figure 3. Figure 3: Schematic example of parameter perturbations applied to the FDN. The output of the system for each [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: STFT of (a) the noisy target and (b) the modeled IR [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (top) Distance J1 between the STFT target ℎ(𝑡) and modeled IR ℎˆ(𝑡; 𝜃 ∗Γ) for different window lengths. (bottom) Zoomed view of the distance over the first 500 ms curves differ slightly, mainly due to differences in echo and modal density, which are more noticeable in the lower frequency bands. The target signal’s background noise appears as a clear plateau, whereas ℎˆ(𝑡; 𝜃 ∗ Γ ) exhibits a clean decay. 3.… view at source ↗
Figure 6
Figure 6. Figure 6: Linear distance J2 between the STFT target ℎ(𝑡) and modeled IR ℎˆ(𝑡; 𝜃 ∗ Γ ) at different window lengths between 128 (top left) and 4096 (bottom right) samples [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: EDC of the of target ℎ(𝑡)and modeled IR ℎˆ(𝑡; 𝜃 ∗ Γ ) at one-octave bands form 31.5 Hz to 16 kHz. Dashed lines indicate the -60 dB level. parameters, under the noise-aware condition, are reported in the first rows of [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Loss profile for 1000 steps between two states of [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Loss profile for 1000 steps between two states of [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Loss profile of LMSS and its componetns LSC and LSM, for 1000 steps between two states of 𝜃𝚪 = (𝑇 dc 60 , 𝑓c), analogous to [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Loss profile for 200 steps between two states of [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Loss profile for 200 steps between two states of [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
read the original abstract

Recursion is a fundamental concept in the design of filters and audio systems. In particular, artificial reverberation systems that use delay networks depend on recursive paths to control both echo density and the decay rate of modal components. The differentiable digital signal processing framework has shown promise in automatically tuning recursive and non-recursive elements using gradient-based optimization with perceptually or physically motivated loss functions, such as energy decay or spectrogram differences. These representations are highly sensitive to model mismatches, which can lead to spurious loss minima. In particular, discrepancies in background noise can result in inaccurate attenuation estimates. This paper addresses the problem of tuning recursive attenuation filters of a feedback delay network when targets are noisy. We analyze the loss profile associated with different optimization objectives and propose a method that explicitly models noise, improving the accuracy of the estimated attenuation filters under low signal-to-noise conditions. We demonstrate the effectiveness of the proposed approach through statistical analysis on both synthetic and real target data. Furthermore, we identify the sensitivity of attenuation filter parameters tuning to perturbations in frequency-independent parameters. These findings provide practical guidelines for more robust and reproducible gradient-based optimization of feedback delay networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes explicitly modeling stationary background noise within the loss function used to optimize per-frequency attenuation filters of a feedback delay network (FDN) when the target room impulse responses (RIRs) are noisy. It analyzes how noise creates spurious minima in standard energy-decay or spectrogram losses, introduces a joint optimization over filter coefficients and noise parameters, and reports statistical improvements in filter accuracy on both synthetic and measured RIRs at low SNR. The work additionally quantifies sensitivity of the tuned filters to small perturbations in frequency-independent FDN parameters.

Significance. If the stationary-noise modeling proves robust, the contribution would be a practical, low-overhead improvement to differentiable DSP pipelines for artificial reverberation. It directly mitigates a known source of optimization failure when fitting recursive structures to real acoustic measurements, thereby increasing reproducibility of automated FDN design.

major comments (2)
  1. [§3] §3 (noise-augmented loss): the claim that the added stationary noise term prevents biased gradients rests on the untested assumption that real background noise does not share modal structure or exhibit non-stationarity correlated with the reverberation tail. No simulation or measurement is shown in which the noise floor is allowed to decay or to excite the same modes as the FDN; without this, the reported accuracy gains may not generalize.
  2. [§4.3] §4.3 and associated tables: the statistical validation on real RIRs reports improved attenuation-filter error but does not provide the exact definition of the noise variance parameter, the optimizer hyperparameters, or the baseline loss without the noise term. These omissions make it impossible to verify that the improvement is attributable to the proposed modeling rather than to differences in regularization or initialization.
minor comments (2)
  1. [Figure 2] Figure 2 caption and axis labels: the loss-surface plots would benefit from explicit annotation of the location of the global minimum with and without the noise term.
  2. [Abstract] The abstract states that the method 'improves accuracy' but does not quantify the improvement (e.g., mean dB error reduction); adding a single sentence with the observed effect size would strengthen the summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment point by point below, indicating where revisions will be made to improve clarity and completeness.

read point-by-point responses
  1. Referee: [§3] §3 (noise-augmented loss): the claim that the added stationary noise term prevents biased gradients rests on the untested assumption that real background noise does not share modal structure or exhibit non-stationarity correlated with the reverberation tail. No simulation or measurement is shown in which the noise floor is allowed to decay or to excite the same modes as the FDN; without this, the reported accuracy gains may not generalize.

    Authors: We agree that the analysis assumes stationary additive noise independent of the reverberation modes, which is a standard model for background measurement noise in RIRs. The noise-augmented loss is specifically derived to counteract the bias that arises when a stationary floor is present in the target. We will revise §3 to state these assumptions explicitly and add a short discussion of limitations for non-stationary or mode-correlated noise. We will also include a supplementary simulation with a decaying noise floor to illustrate the method's sensitivity at the boundary of the assumption. revision: partial

  2. Referee: [§4.3] §4.3 and associated tables: the statistical validation on real RIRs reports improved attenuation-filter error but does not provide the exact definition of the noise variance parameter, the optimizer hyperparameters, or the baseline loss without the noise term. These omissions make it impossible to verify that the improvement is attributable to the proposed modeling rather than to differences in regularization or initialization.

    Authors: We acknowledge the lack of implementation detail. In the revised manuscript we will expand §4.3 (and the associated tables) to specify: the exact parameterization and initialization of the noise variance, the full optimizer hyperparameters (learning rate, iteration count, convergence criteria), and the precise mathematical form of the baseline loss without the noise term. These additions will enable direct reproduction and confirm that the reported gains arise from the noise modeling. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper proposes adding an explicit stationary noise model to the loss function when optimizing FDN attenuation filters from noisy RIRs. This is presented as an independent modeling choice that improves gradient behavior under low SNR, rather than a quantity fitted to or defined by the target attenuation filters themselves. No equations or steps in the abstract reduce the proposed noise term to a self-definition, a renamed fit, or a load-bearing self-citation whose validity depends on the current result. The method is evaluated on synthetic and real data, keeping the central claim externally falsifiable. The reader's assessment of score 2.0 aligns with a minor possible self-citation that is not load-bearing for the noise-modeling contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work rests on standard assumptions of linear time-invariant systems, differentiability of the DSP operations, and the existence of a stationary noise component separable from the reverberant decay; no new free parameters, axioms, or invented entities are introduced beyond the noise model itself.

pith-pipeline@v0.9.0 · 5506 in / 1036 out tokens · 31128 ms · 2026-05-16T21:40:49.883667+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

  1. [1]

    Differentiable grouped feedback delay networks for learning coupled volume acoustics

    [DDS+25] Orchisama Das, Gloria Dal Santo, Sebastian J Schlecht, Vesa Välimäki, and Zoran Cvetković. Differentiable grouped feedback delay networks for learning coupled volume acoustics. arXiv preprint arXiv:2508.06686,

  2. [2]

    Schlecht, and Vesa Välimäki

    [DPSV25] Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, and Vesa Välimäki. Optimizing tiny colorless feedback delay networks. EURASIP J. Audio Speech Music Process., 2025(13),

  3. [3]

    Ddsp: Differentiable digital signal processing.arXiv preprint arXiv:2001.04643, 2020

    [EHGR20] J. Engel, L. Hantrakul, C. Gu, and A. Roberts. DDSP: Differentiable Digital Signal Processing. arXiv preprint 2001.04643,

  4. [4]

    Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching

    [HSF25] Ben Hayes, Charalampos Saitis, and GyĂśrgy Fazekas. Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching. arXiv preprint arXiv:2506.07199,

  5. [5]

    Data- driven room acoustic modeling via differentiable feedback delay networks with learnable delay lines

    [MGDSB24a] Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena, and Alberto Bernardini. Data- driven room acoustic modeling via differentiable feedback delay networks with learnable delay lines. EURASIP J. Audio Speech Music Process. , 2024(1):1–20,

  6. [6]

    Data- driven room acoustic modeling via differentiable feedback delay networks with learnable delay lines

    [MGDSB24b] Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena, and Alberto Bernardini. Data- driven room acoustic modeling via differentiable feedback delay networks with learnable delay lines. EURASIP J. Audio Speech Music Process. , 2024(51),

  7. [7]

    Scattering in feedback delay networks

    [SH20] Sebastian J Schlecht and Emanuël AP Habets. Scattering in feedback delay networks. IEEE/ACM Trans. Audio Speech Lang. Process., 28:1915–1924, Oct