Gradient-based Optimisation of Modulation Effects

Alec Wright; Alistair Carson; Stefan Bilbao

arxiv: 2601.04867 · v2 · submitted 2026-01-08 · 📡 eess.AS · cs.LG· cs.SD

Gradient-based Optimisation of Modulation Effects

Alistair Carson , Alec Wright , Stefan Bilbao This is my paper

Pith reviewed 2026-05-16 16:33 UTC · model grok-4.3

classification 📡 eess.AS cs.LGcs.SD

keywords modulation effectsdifferentiable signal processingflangerchorusphaserdelay optimizationanalog emulationneural audio modeling

0 comments

The pith

Low-frequency weighting of the loss function allows gradient-based optimization to learn accurate delay times in differentiable models of flanger, chorus and phaser effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework for emulating analog modulation effects using differentiable digital signal processing. Training occurs in the time-frequency domain with a specially weighted loss to handle delay parameters, while inference runs purely in the time domain for zero added latency. This setup achieves sound outputs that are perceptually indistinguishable from real analog units in some cases. The work highlights remaining difficulties when effects involve extended delay times or strong feedback paths. Readers interested in digital audio modeling would value the zero-latency advantage over prior neural approaches.

Core claim

By applying low-frequency weighting to the training loss, gradient descent converges to suitable delay-time values in models of modulation effects, enabling time-domain inference that produces outputs perceptually matching analog references for standard flanger, chorus, and phaser settings.

What carries the argument

Differentiable digital signal processing model trained with low-frequency-weighted loss for optimizing delay parameters in modulation effects.

If this is right

The trained model requires no latency during real-time use.
Low-frequency loss weighting avoids poor local minima during delay-time optimization.
Some emulations reach perceptual equivalence with analog hardware.
Effects with long delays or feedback remain harder to match accurately.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This technique might generalize to other parameter-sensitive audio processors such as echo or reverb units.
Adaptive or multi-band loss weighting could address the remaining challenges with long-delay effects.
Real-time guitarists could benefit from plug-ins that combine this model with other DSP blocks without added delay.

Load-bearing premise

That weighting the loss toward low frequencies is enough to steer gradient descent toward correct delay times even for effects that use long delays and feedback.

What would settle it

Train the model on an analog flanger with a known long delay and strong feedback, then check whether the optimized delay parameter matches the physical unit within a few samples; a large mismatch would disprove sufficiency of the weighting.

Figures

Figures reproduced from arXiv: 2601.04867 by Alec Wright, Alistair Carson, Stefan Bilbao.

**Figure 1.** Figure 1: Loss surface Γ(D, k ˆ ) (top) and its mean over bin index L(Dˆ) for a spectrally flat input signal (bottom) for DFT domain estimation of a delay of D = 100 samples. narrow, as shown in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 3.** Figure 3: The loss surface as a function of pole location of a cascade of [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 2.** Figure 2: Triangular kernels of length N′ (top); their N = 256 point DFTs (middle); and the corresponding loss surface as a function of Dˆ for a target delay of D = 100 samples as in Eq. (5) (bottom). [2]. Assuming sufficient zero-padding, this can be expressed in the DFT domain as: Y (k) = X(k) · Ap(k) K (6) where Ap(k) is the frequency response of the APF section with pole p, given by: Ap(k) = p − e −j2πk/N 1 − pe… view at source ↗

**Figure 4.** Figure 4: Proposed model structure as it appears during training (a) and at inference (b). Training uses the frequency sampling method over short frames of [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 7.** Figure 7: Time-domain validation ESR for models trained on the BF-2 Flanger [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 6.** Figure 6: Learned LFOs of the median performing models trained on the AP [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 10.** Figure 10: Time-domain validation ESR results for models trained on three [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 8.** Figure 8: Flanger pedal without feedback (BF-2-A) modelling: mel [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Flanger pedal with feedback (BF-2-B) modelling: mel-spectrograms [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 11.** Figure 11: Target (top) and model (bottom) output mel-spectrograms for the [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 12.** Figure 12: Time-domain validation ESR for models trained on the [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗

read the original abstract

Modulation effects such as phasers, flangers and chorus effects are heavily used in conjunction with the electric guitar. Machine learning based emulation of analog modulation units has been investigated in recent years, but most methods have either been limited to one class of effect or suffer from a high computational cost or latency compared to canonical digital implementations. Here, we build on previous work and present a framework for modelling flanger, chorus and phaser effects based on differentiable digital signal processing. The model is trained in the time-frequency domain, but at inference operates in the time-domain, requiring zero latency. We investigate the challenges associated with gradient-based optimisation of such effects, and show that low-frequency weighting of loss functions avoids convergence to local minima when learning delay times. We show that when trained against analog effects units, sound output from the model is in some cases perceptually indistinguishable from the reference, but challenges still remain for effects with long delay times and feedback.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A practical differentiable DSP framework for flanger/chorus/phaser with a useful loss weighting trick, but the optimization still falls short on long-delay feedback cases.

read the letter

This paper gives a unified differentiable DSP setup for modeling flanger, chorus, and phaser effects. Training happens in the time-frequency domain while inference stays in the time domain at zero latency, and the low-frequency loss weighting is meant to steer gradient descent away from bad delay-time solutions. That combination is the main new piece on top of earlier differentiable audio work. It shows perceptual matches to analog units in some cases, which is a concrete result for real-time music tools. The approach is straightforward to implement and directly tackles a known pain point in fitting delay parameters. The authors are honest about the remaining problems with long delays and feedback, which keeps the claims grounded. The soft spot is exactly the one the stress-test flags: low-frequency weighting does not appear to fully flatten the loss surface for effects where small delay shifts produce similar low-end envelopes but divergent high-frequency modulation and phase accumulation. The abstract itself flags persistent challenges in that regime, so the weighting is a partial fix rather than a complete solution. If the full results lack strong ablations or quantitative breakdowns on those hard cases, the broad applicability claim weakens. This is aimed at audio ML and virtual analog researchers who already work with differentiable DSP. It has enough technical detail and practical relevance to go to peer review, even if the experiments need tightening on the tougher effects.

Referee Report

2 major / 1 minor

Summary. The paper presents a differentiable DSP framework for emulating analog flanger, chorus, and phaser modulation effects. Models are trained in the time-frequency domain using a low-frequency-weighted loss to optimize delay times and other parameters, then run in the time domain at inference with zero latency. The central result is that outputs can be perceptually indistinguishable from analog references in some cases, while the authors explicitly note remaining optimization difficulties for long-delay and high-feedback regimes.

Significance. If the low-frequency weighting strategy proves robust, the work would offer a practical route to accurate, low-latency machine-learning emulations of widely used guitar effects, extending differentiable audio modeling beyond single-effect classes while preserving real-time viability.

major comments (2)

[Abstract] Abstract: the claim that low-frequency weighting 'avoids convergence to local minima when learning delay times' is presented without ablation studies, loss-surface analysis, or quantitative comparison to unweighted training; the same paragraph immediately flags persistent failures precisely for long delays and feedback, indicating the weighting does not fully resolve the optimization problem across the claimed range of effects.
[Abstract] Abstract / results: the statement of 'perceptual indistinguishability in some cases' is not accompanied by listening-test protocols, statistical significance, or error metrics (e.g., mean opinion scores or ABX results) in the provided summary, leaving the strength of the central empirical claim difficult to evaluate.

minor comments (1)

[Methods] The transition from time-frequency training to time-domain inference should be illustrated with a block diagram or explicit equations showing how the learned parameters are transferred without introducing latency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and have revised the manuscript to improve clarity and evidence presentation.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that low-frequency weighting 'avoids convergence to local minima when learning delay times' is presented without ablation studies, loss-surface analysis, or quantitative comparison to unweighted training; the same paragraph immediately flags persistent failures precisely for long delays and feedback, indicating the weighting does not fully resolve the optimization problem across the claimed range of effects.

Authors: Sections 3 and 4 of the manuscript investigate the gradient-based optimization challenges through training dynamics and parameter convergence examples, showing that low-frequency weighting enables successful learning of delay times in moderate regimes where unweighted training fails. The abstract already notes the remaining difficulties for long delays and high feedback, which is consistent with our findings that the weighting improves but does not universally solve the problem. We agree that formal ablations and quantitative comparisons are absent and will add loss-surface visualizations plus weighted vs. unweighted training curves in the revised manuscript. revision: yes
Referee: [Abstract] Abstract / results: the statement of 'perceptual indistinguishability in some cases' is not accompanied by listening-test protocols, statistical significance, or error metrics (e.g., mean opinion scores or ABX results) in the provided summary, leaving the strength of the central empirical claim difficult to evaluate.

Authors: Section 5 of the manuscript details the listening-test protocol, participant count, and comparison methodology against analog references. The abstract summarizes the outcome concisely, but we acknowledge that explicit metrics and significance testing are not highlighted there. We will revise the abstract to reference the evaluation protocol and include quantitative results (e.g., ABX preference rates) in the results section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

Differentiable DSP framework validated against analog references; low-frequency weighting is empirical technique, not self-defined

full rationale

The paper constructs a time-domain inference model from differentiable DSP blocks for flanger/chorus/phaser effects and trains it in the time-frequency domain against external analog hardware recordings. The low-frequency loss weighting is introduced as an empirical intervention to mitigate local minima in delay-time optimization; its effectiveness is demonstrated via the paper's own gradient-descent experiments rather than by algebraic reduction to fitted parameters. While the work cites prior differentiable-DSP literature, the central claims (perceptual indistinguishability in some cases, persistent difficulties with long-delay feedback) rest on external reference signals and listening tests, not on self-citation chains or definitions that presuppose the target result. This configuration yields only minor, non-load-bearing self-reference and is therefore scored at the low end of the non-circular range.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that modulation effects admit a differentiable parameterization whose delay parameters can be reliably optimized via gradient descent with modified loss weighting. Free parameters are the learned delay times and filter coefficients.

free parameters (2)

delay times
Optimized via gradient descent; low-frequency loss weighting is introduced to avoid local minima.
feedback and modulation coefficients
Model parameters fitted during training against analog references.

axioms (1)

domain assumption Modulation effects can be represented by a differentiable combination of delay lines and time-varying filters
Core modeling choice that enables gradient-based training.

pith-pipeline@v0.9.0 · 5457 in / 1294 out tokens · 36490 ms · 2026-05-16T16:33:42.459308+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

History of electronic sound modification,

H. Bode, “History of electronic sound modification,”J. Audio Eng. Soc., vol. 32, no. 10, pp. 730–739, 1984

work page 1984
[2]

Filters and delays,

P. Dutilleux, M. Holters, S. Disch, and U. Z ¨olzer, “Filters and delays,” inDAFX: Digital Audio Effects, U. Z ¨olzer, Ed. John Wiley & Sons, Ltd, 2011, ch. 2, pp. 47–81

work page 2011
[3]

An allpass approach to digital phasing and flanging,

J. O. Smith, “An allpass approach to digital phasing and flanging,” Tech. Rep., 1982

work page 1982
[4]

Bucket-brigade electronics: new possibilities for delay, time-axis conversion, and scanning,

F. Sangster and K. Teer, “Bucket-brigade electronics: new possibilities for delay, time-axis conversion, and scanning,”IEEE Journal of Solid- State Circuits, vol. 4, no. 3, pp. 131–136, 1969

work page 1969
[5]

A scientific explanation of phasing (flanging),

B. Bartlett, “A scientific explanation of phasing (flanging),”J. Audio Eng. Soc., vol. 18, no. 6, pp. 674–675, 1970

work page 1970
[6]

J. O. Smith III,Physical Audio Signal Processing. http://ccrma.stanford. edu/∼jos/pasp/, accessed 28/2/23, online book, 2010 edition

work page 2010
[7]

Virtual analog effects,

V . V ¨alim¨aki, S. Bilbao, J. O. Smith, J. S. Abel, J. Pakarinen, and D. Berners, “Virtual analog effects,” inDAFX: Digital Audio Effects, U. Z ¨olzer, Ed. John Wiley & Sons, Ltd, 2011, pp. 473–522

work page 2011
[8]

Physical modeling of the MXR Pphase 90 guitar effect pedal,

F. Eichas, M. Fink, M. Holters, and U. Z ¨olzer, “Physical modeling of the MXR Pphase 90 guitar effect pedal,” in17th Int. Conf. Digital Audio Effects (DAFx14), Erlangen, Germany, Sept. 2014

work page 2014
[9]

Enhanced digital models for analog modulation ef- fects,

A. Huovilainen, “Enhanced digital models for analog modulation ef- fects,” in3rd Int. Conf. Digital Audio Effects (DAFx05), 9 2005

work page 2005
[10]

Wave digital model of the MXR Phase 90 based on a time-varying resistor approximation of JFET elements,

R. Giampiccolo, S. D. Moro, C. Eutizi, O. M. Mattia Massimi, and A. Bernardini, “Wave digital model of the MXR Phase 90 based on a time-varying resistor approximation of JFET elements,” in27th Inf. Conf. on Digital Audio Effects (DAFx24), Guildford, UK, 9 2024

work page 2024
[11]

A combined model for a bucket brigade device and its input and output filters,

M. Holters and J. D. Parker, “A combined model for a bucket brigade device and its input and output filters,” in21st Int. Conf. Digital Audio Effects (DAFx-18), 2018

work page 2018
[12]

Deep learning for black-box modeling of audio effects,

M. A. Mart ´ınez Ram´ırez, E. Benetos, and J. D. Reiss, “Deep learning for black-box modeling of audio effects,”Applied Sciences, vol. 10, no. 2, 2020

work page 2020
[13]

Neural modeling of phaser and flanging effects,

A. Wright and V . V ¨alim¨aki, “Neural modeling of phaser and flanging effects,”J. Audio Eng. Soc., vol. 69, no. 7, pp. 517–529, 2021

work page 2021
[14]

Modu- lation extraction for lfo-driven audio effects,

C. Mitcheltree, C. J. Steinmetz, M. Comunit `a, and J. D. Reiss, “Modu- lation extraction for lfo-driven audio effects,” in26th Int. Conf. Digital Audio Effects (DAFx23), 5 2023

work page 2023
[15]

Real-time guitar amplifier emulation with deep learning,

A. Wright, E.-P. Damsk ¨agg, L. Juvela, and V . V ¨alim¨aki, “Real-time guitar amplifier emulation with deep learning,”Appl. Sci., vol. 10, no. 2, 2020

work page 2020
[16]

DDSP: Differentiable digital signal processing,

J. Engel, L. Hantrakul, C. Gu, and A. Roberts, “DDSP: Differentiable digital signal processing,” inInt. Conf. Learning Repr., 2020

work page 2020
[17]

A review of differentiable digital signal processing for music and speech synthesis,

B. Hayes, J. Shier, G. Fazekas, A. McPherson, and C. Saitis, “A review of differentiable digital signal processing for music and speech synthesis,”Frontiers in Signal Processing, vol. 3, 2024

work page 2024
[18]

Differen- tiable grey-box modelling of phaser effects using frame-based spectral processing,

A. Carson, S. King, C. Valentini Botinhao, and S. Bilbao, “Differen- tiable grey-box modelling of phaser effects using frame-based spectral processing,” in26th Int. Conf. on Digital Audio Effects, Sep. 2023

work page 2023
[19]

Conmod: Controllable neural frame-based modulation effects,

G. Lee, H. Kim, J. Lee, and J. Nam, “Conmod: Controllable neural frame-based modulation effects,” in27th Int. Conf. Digital Audio Effects (DAFx24), 6 2024

work page 2024
[20]

Differentiable all-pole filters for time-varying audio sys- tems,

C.-Y . Yu, C. Mitcheltree, A. Carson, S. Bilbao, J. D. Reiss, and G. Fazekas, “Differentiable all-pole filters for time-varying audio sys- tems,” in27th Int. Conf. Digital Audio Effects (DAFx24), 2024

work page 2024
[21]

Modulation discovery with differentiable digital signal processing,

C. Mitcheltree, H. H. Tan, and J. D. Reiss, “Modulation discovery with differentiable digital signal processing,” inIEEE Workshop on Apps. Signal Processing to Audio and Acoustics (WASPAA), 2025

work page 2025
[22]

Splitting the unit delay,

T. I. Laakso, V . V ¨alim¨aki, M. Karjalainen, and U. K. Laine, “Splitting the unit delay,”IEEE Signal Process. Mag., vol. 13, no. 1, pp. 30–60, Jan. 1996

work page 1996
[23]

A. V . Oppenheim and R. W. Schafer,Discrete-time signal processing. Prentice-Hall International, 1989

work page 1989
[24]

Flamo: An open-source library for frequency-domain differentiable audio processing,

G. D. Santo, G. M. D. Bortoli, K. Prawda, S. J. Schlecht, and V . V¨alim¨aki, “Flamo: An open-source library for frequency-domain differentiable audio processing,” inIEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP). Institute of Electrical and Electronics Engineers Inc., 2025

work page 2025
[25]

Differentiable artificial reverberation,

S. Lee, H.-S. Choi, and K. Lee, “Differentiable artificial reverberation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2541–2556, 2022

work page 2022
[26]

Perceptual loss function for neural model- ing of audio systems,

A. Wright and V . V ¨alim¨aki, “Perceptual loss function for neural model- ing of audio systems,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), May 2020, pp. 251–255

work page 2020
[27]

Spectral delay filters,

V . V¨alim¨aki, J. Abel, and J. Smith, “Spectral delay filters,”AES: Journal of the Audio Engineering Society, vol. 57, pp. 521–531, 07 2009

work page 2009
[28]

Group delay-based allpass filters for abstract sound synthesis and audio effects processing,

E. K. Canfield-Dafilou and J. S. Abel, “Group delay-based allpass filters for abstract sound synthesis and audio effects processing,” in21st Int. Conf. Digital Audio Effects (DAFx18), Aveiro, Portugal, 9 2018

work page 2018
[29]

An allpass chirp for constant signal-to- noise ratio impulse response measurement,

E. Canfield-Dafilou and J. Abel, “An allpass chirp for constant signal-to- noise ratio impulse response measurement,” in144th Audio Engineering Society Convention, 2018

work page 2018
[30]

Time-variant gray-box model- ing of a phaser pedal,

R. Kiiski, F. Esqueda, and V . V ¨alim¨aki, “Time-variant gray-box model- ing of a phaser pedal,” in19th Int. Conf. Digital Audio Effects (DAFx16), Brno, Czech Republic, Sept. 2016

work page 2016
[31]

Sinusoidal frequency estimation by gradient descent,

B. Hayes, C. Saitis, and G. Fazekas, “Sinusoidal frequency estimation by gradient descent,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process., Rhodes, Greece, 2023

work page 2023
[32]

Differentiable IIR filters for machine learning applications,

B. Kuznetsov, J. Parker, and F. Esqueda, “Differentiable IIR filters for machine learning applications,” in23rd Int. Conf. Digital Audio Effects (DAFx20), Vienna, Austria, Sept. 2020

work page 2020
[33]

J. D. Reiss and A. McPherson,Audio effects : theory, implementation and application, 1st ed. Boca Raton, FL: CRC Press, an imprint of Taylor and Francis, 2014. [34]Boss BF-2 Flanger instructions, Roland, Japan, July 1985. [35]SV-1 Supervibe chorus instructions, Marshall. [36]Method for the subjective assessment of intermediate quality level of audio syste...

work page 2014

[1] [1]

History of electronic sound modification,

H. Bode, “History of electronic sound modification,”J. Audio Eng. Soc., vol. 32, no. 10, pp. 730–739, 1984

work page 1984

[2] [2]

Filters and delays,

P. Dutilleux, M. Holters, S. Disch, and U. Z ¨olzer, “Filters and delays,” inDAFX: Digital Audio Effects, U. Z ¨olzer, Ed. John Wiley & Sons, Ltd, 2011, ch. 2, pp. 47–81

work page 2011

[3] [3]

An allpass approach to digital phasing and flanging,

J. O. Smith, “An allpass approach to digital phasing and flanging,” Tech. Rep., 1982

work page 1982

[4] [4]

Bucket-brigade electronics: new possibilities for delay, time-axis conversion, and scanning,

F. Sangster and K. Teer, “Bucket-brigade electronics: new possibilities for delay, time-axis conversion, and scanning,”IEEE Journal of Solid- State Circuits, vol. 4, no. 3, pp. 131–136, 1969

work page 1969

[5] [5]

A scientific explanation of phasing (flanging),

B. Bartlett, “A scientific explanation of phasing (flanging),”J. Audio Eng. Soc., vol. 18, no. 6, pp. 674–675, 1970

work page 1970

[6] [6]

J. O. Smith III,Physical Audio Signal Processing. http://ccrma.stanford. edu/∼jos/pasp/, accessed 28/2/23, online book, 2010 edition

work page 2010

[7] [7]

Virtual analog effects,

V . V ¨alim¨aki, S. Bilbao, J. O. Smith, J. S. Abel, J. Pakarinen, and D. Berners, “Virtual analog effects,” inDAFX: Digital Audio Effects, U. Z ¨olzer, Ed. John Wiley & Sons, Ltd, 2011, pp. 473–522

work page 2011

[8] [8]

Physical modeling of the MXR Pphase 90 guitar effect pedal,

F. Eichas, M. Fink, M. Holters, and U. Z ¨olzer, “Physical modeling of the MXR Pphase 90 guitar effect pedal,” in17th Int. Conf. Digital Audio Effects (DAFx14), Erlangen, Germany, Sept. 2014

work page 2014

[9] [9]

Enhanced digital models for analog modulation ef- fects,

A. Huovilainen, “Enhanced digital models for analog modulation ef- fects,” in3rd Int. Conf. Digital Audio Effects (DAFx05), 9 2005

work page 2005

[10] [10]

Wave digital model of the MXR Phase 90 based on a time-varying resistor approximation of JFET elements,

R. Giampiccolo, S. D. Moro, C. Eutizi, O. M. Mattia Massimi, and A. Bernardini, “Wave digital model of the MXR Phase 90 based on a time-varying resistor approximation of JFET elements,” in27th Inf. Conf. on Digital Audio Effects (DAFx24), Guildford, UK, 9 2024

work page 2024

[11] [11]

A combined model for a bucket brigade device and its input and output filters,

M. Holters and J. D. Parker, “A combined model for a bucket brigade device and its input and output filters,” in21st Int. Conf. Digital Audio Effects (DAFx-18), 2018

work page 2018

[12] [12]

Deep learning for black-box modeling of audio effects,

M. A. Mart ´ınez Ram´ırez, E. Benetos, and J. D. Reiss, “Deep learning for black-box modeling of audio effects,”Applied Sciences, vol. 10, no. 2, 2020

work page 2020

[13] [13]

Neural modeling of phaser and flanging effects,

A. Wright and V . V ¨alim¨aki, “Neural modeling of phaser and flanging effects,”J. Audio Eng. Soc., vol. 69, no. 7, pp. 517–529, 2021

work page 2021

[14] [14]

Modu- lation extraction for lfo-driven audio effects,

C. Mitcheltree, C. J. Steinmetz, M. Comunit `a, and J. D. Reiss, “Modu- lation extraction for lfo-driven audio effects,” in26th Int. Conf. Digital Audio Effects (DAFx23), 5 2023

work page 2023

[15] [15]

Real-time guitar amplifier emulation with deep learning,

A. Wright, E.-P. Damsk ¨agg, L. Juvela, and V . V ¨alim¨aki, “Real-time guitar amplifier emulation with deep learning,”Appl. Sci., vol. 10, no. 2, 2020

work page 2020

[16] [16]

DDSP: Differentiable digital signal processing,

J. Engel, L. Hantrakul, C. Gu, and A. Roberts, “DDSP: Differentiable digital signal processing,” inInt. Conf. Learning Repr., 2020

work page 2020

[17] [17]

A review of differentiable digital signal processing for music and speech synthesis,

B. Hayes, J. Shier, G. Fazekas, A. McPherson, and C. Saitis, “A review of differentiable digital signal processing for music and speech synthesis,”Frontiers in Signal Processing, vol. 3, 2024

work page 2024

[18] [18]

Differen- tiable grey-box modelling of phaser effects using frame-based spectral processing,

A. Carson, S. King, C. Valentini Botinhao, and S. Bilbao, “Differen- tiable grey-box modelling of phaser effects using frame-based spectral processing,” in26th Int. Conf. on Digital Audio Effects, Sep. 2023

work page 2023

[19] [19]

Conmod: Controllable neural frame-based modulation effects,

G. Lee, H. Kim, J. Lee, and J. Nam, “Conmod: Controllable neural frame-based modulation effects,” in27th Int. Conf. Digital Audio Effects (DAFx24), 6 2024

work page 2024

[20] [20]

Differentiable all-pole filters for time-varying audio sys- tems,

C.-Y . Yu, C. Mitcheltree, A. Carson, S. Bilbao, J. D. Reiss, and G. Fazekas, “Differentiable all-pole filters for time-varying audio sys- tems,” in27th Int. Conf. Digital Audio Effects (DAFx24), 2024

work page 2024

[21] [21]

Modulation discovery with differentiable digital signal processing,

C. Mitcheltree, H. H. Tan, and J. D. Reiss, “Modulation discovery with differentiable digital signal processing,” inIEEE Workshop on Apps. Signal Processing to Audio and Acoustics (WASPAA), 2025

work page 2025

[22] [22]

Splitting the unit delay,

T. I. Laakso, V . V ¨alim¨aki, M. Karjalainen, and U. K. Laine, “Splitting the unit delay,”IEEE Signal Process. Mag., vol. 13, no. 1, pp. 30–60, Jan. 1996

work page 1996

[23] [23]

A. V . Oppenheim and R. W. Schafer,Discrete-time signal processing. Prentice-Hall International, 1989

work page 1989

[24] [24]

Flamo: An open-source library for frequency-domain differentiable audio processing,

G. D. Santo, G. M. D. Bortoli, K. Prawda, S. J. Schlecht, and V . V¨alim¨aki, “Flamo: An open-source library for frequency-domain differentiable audio processing,” inIEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP). Institute of Electrical and Electronics Engineers Inc., 2025

work page 2025

[25] [25]

Differentiable artificial reverberation,

S. Lee, H.-S. Choi, and K. Lee, “Differentiable artificial reverberation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2541–2556, 2022

work page 2022

[26] [26]

Perceptual loss function for neural model- ing of audio systems,

A. Wright and V . V ¨alim¨aki, “Perceptual loss function for neural model- ing of audio systems,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), May 2020, pp. 251–255

work page 2020

[27] [27]

Spectral delay filters,

V . V¨alim¨aki, J. Abel, and J. Smith, “Spectral delay filters,”AES: Journal of the Audio Engineering Society, vol. 57, pp. 521–531, 07 2009

work page 2009

[28] [28]

Group delay-based allpass filters for abstract sound synthesis and audio effects processing,

E. K. Canfield-Dafilou and J. S. Abel, “Group delay-based allpass filters for abstract sound synthesis and audio effects processing,” in21st Int. Conf. Digital Audio Effects (DAFx18), Aveiro, Portugal, 9 2018

work page 2018

[29] [29]

An allpass chirp for constant signal-to- noise ratio impulse response measurement,

E. Canfield-Dafilou and J. Abel, “An allpass chirp for constant signal-to- noise ratio impulse response measurement,” in144th Audio Engineering Society Convention, 2018

work page 2018

[30] [30]

Time-variant gray-box model- ing of a phaser pedal,

R. Kiiski, F. Esqueda, and V . V ¨alim¨aki, “Time-variant gray-box model- ing of a phaser pedal,” in19th Int. Conf. Digital Audio Effects (DAFx16), Brno, Czech Republic, Sept. 2016

work page 2016

[31] [31]

Sinusoidal frequency estimation by gradient descent,

B. Hayes, C. Saitis, and G. Fazekas, “Sinusoidal frequency estimation by gradient descent,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process., Rhodes, Greece, 2023

work page 2023

[32] [32]

Differentiable IIR filters for machine learning applications,

B. Kuznetsov, J. Parker, and F. Esqueda, “Differentiable IIR filters for machine learning applications,” in23rd Int. Conf. Digital Audio Effects (DAFx20), Vienna, Austria, Sept. 2020

work page 2020

[33] [33]

J. D. Reiss and A. McPherson,Audio effects : theory, implementation and application, 1st ed. Boca Raton, FL: CRC Press, an imprint of Taylor and Francis, 2014. [34]Boss BF-2 Flanger instructions, Roland, Japan, July 1985. [35]SV-1 Supervibe chorus instructions, Marshall. [36]Method for the subjective assessment of intermediate quality level of audio syste...

work page 2014