The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering
Pith reviewed 2026-05-25 18:06 UTC · model grok-4.3
The pith
Time-frequency scattering coefficients combined with gradient phase retrieval support re-synthesis of audio textures and enable scale-rate effects such as chirp rate inversion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Time-frequency scattering is a convolutional operator that extracts modulations in the time-frequency domain at different rates and scales. After phase retrieval by gradient backpropagation, the coefficients become invertible, allowing both faithful reconstruction of audio textures and a new class of transformations expressed directly in the scattering domain. One such transformation, chirp rate inversion, locally reverses the time direction of sonic events without altering the global arrow of time.
What carries the argument
Time-frequency scattering, a convolutional operator extracting modulations in the time-frequency domain at different rates and scales, combined with gradient-based phase retrieval to invert the coefficients back to audio.
If this is right
- Audio textures become re-synthesizable from their scattering coefficients alone.
- Scale-rate DAFx such as chirp rate inversion can be applied directly in the coefficient domain.
- The same pipeline supports creation of new electroacoustic works and commercial remixes.
- Source code and audio demos make the transformations reproducible for further music production.
Where Pith is reading between the lines
- The scattering domain might allow quantitative comparison or interpolation between different audio textures.
- Gradient phase retrieval could be tested on other convolutional time-frequency representations beyond scattering.
- Local time manipulations like chirp rate inversion might combine with global tempo changes to produce hybrid rhythmic effects.
Load-bearing premise
The time-frequency scattering representation preserves enough perceptual information about audio textures to allow faithful re-synthesis and musically useful manipulations without additional perceptual modeling.
What would settle it
Re-synthesis from the scattering coefficients either fails to match the original texture perceptually or produces manipulated versions, such as chirp-rate-inverted sounds, that do not exhibit the intended local time reversal.
Figures
read the original abstract
This article explains how to apply time--frequency scattering, a convolutional operator extracting modulations in the time--frequency domain at different rates and scales, to the re-synthesis and manipulation of audio textures. After implementing phase retrieval in the scattering network by gradient backpropagation, we introduce scale--rate DAFx, a class of audio transformations expressed in the domain of time--frequency scattering coefficients. One example of scale--rate DAFx is chirp rate inversion, which causes each sonic event to be locally reversed in time while leaving the arrow of time globally unchanged. Over the past two years, our work has led to the creation of four electroacoustic pieces: ``FAVN''; ``Modulator (Scattering Transform)''; ``Experimental Palimpsest''; ``Inspection''; and a remix of Lorenzo Senni's ``XAllegroX'', released by Warp Records on a vinyl entitled ``The Shape of RemiXXXes to Come''. The source code to reproduce experiments and figures is made freely available at: https://github.com/lostanlen/scattering.m. A companion website containing demos is at: https://lostanlen.com/pubs/dafx2019
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that time-frequency scattering coefficients, when inverted via gradient-based phase retrieval, enable faithful re-synthesis of audio textures and support a new class of scale-rate digital audio effects (DAFx) such as chirp-rate inversion; these are demonstrated through four electroacoustic compositions and a commercial remix, with open-source code provided.
Significance. If the phase-retrieval step succeeds reliably, the work would offer a mathematically grounded domain for texture manipulation that preserves modulation structure at multiple scales and rates, with potential utility in sound design and electroacoustic music. The explicit release of source code and a companion demo site is a clear strength that supports reproducibility.
major comments (3)
- [Phase retrieval subsection] The phase-retrieval procedure (described after the scattering definition) is performed by gradient backpropagation, yet the manuscript supplies neither the explicit loss function, initialization strategy, step-size schedule, nor any convergence analysis or failure cases. Because scattering discards phase, every downstream claim of faithful re-synthesis and musically useful DAFx rests on this uncharacterized optimization step.
- [Experimental results / musical examples] No quantitative reconstruction metrics (SNR, perceptual distances, or spectro-temporal error) or baseline comparisons are reported for the re-synthesized textures, nor are formal listening-test results provided to support the claim that the inverted signals are perceptually faithful.
- [Scale-rate DAFx section] The mathematical definition and implementation of scale-rate DAFx (e.g., chirp-rate inversion) are presented only at a high level; it is not shown how the transformation is applied directly to the scattering coefficients while preserving the global time arrow, nor is any analysis given of the resulting coefficient-domain properties.
minor comments (2)
- [Abstract] The abstract states that four pieces were created but then enumerates five titles; this minor inconsistency should be corrected.
- [Figures] Figure captions and axis labels in the time-frequency scattering visualizations could be expanded to improve readability for readers unfamiliar with the representation.
Simulated Author's Rebuttal
Thank you for the constructive review. We address each major comment below, indicating planned revisions to strengthen the methodological details, evaluation, and mathematical exposition while preserving the manuscript's focus on artistic applications.
read point-by-point responses
-
Referee: [Phase retrieval subsection] The phase-retrieval procedure (described after the scattering definition) is performed by gradient backpropagation, yet the manuscript supplies neither the explicit loss function, initialization strategy, step-size schedule, nor any convergence analysis or failure cases. Because scattering discards phase, every downstream claim of faithful re-synthesis and musically useful DAFx rests on this uncharacterized optimization step.
Authors: We agree that the phase retrieval step requires fuller documentation. In revision we will add a dedicated subsection specifying the loss (L2 distance on scattering coefficients), initialization (random phase), optimizer and schedule (Adam with fixed learning rate), observed convergence behavior, and typical failure cases (e.g., highly transient textures). These details are already implemented in the released code and will be cross-referenced in the text. revision: yes
-
Referee: [Experimental results / musical examples] No quantitative reconstruction metrics (SNR, perceptual distances, or spectro-temporal error) or baseline comparisons are reported for the re-synthesized textures, nor are formal listening-test results provided to support the claim that the inverted signals are perceptually faithful.
Authors: The manuscript prioritizes creative demonstration through composed works rather than perceptual benchmarking. We will nevertheless add quantitative reconstruction metrics (SNR and spectro-temporal error) and a baseline comparison to Griffin-Lim for the reported examples. Formal listening tests lie outside the artistic scope of the paper; we will instead clarify that perceptual claims rest on the composers' direct use of the outputs. revision: partial
-
Referee: [Scale-rate DAFx section] The mathematical definition and implementation of scale-rate DAFx (e.g., chirp-rate inversion) are presented only at a high level; it is not shown how the transformation is applied directly to the scattering coefficients while preserving the global time arrow, nor is any analysis given of the resulting coefficient-domain properties.
Authors: We will expand the Scale-rate DAFx section with explicit equations showing how each transformation (including chirp-rate inversion) is applied element-wise to the scattering coefficients, together with a proof sketch that the global time arrow is preserved. We will also add a short analysis of the resulting coefficient-domain invariants, such as modulation-rate preservation. revision: yes
Circularity Check
No circularity; derivation relies on established scattering operator and external optimization without self-referential reduction
full rationale
The paper applies time-frequency scattering (an established convolutional operator) to audio textures, implements phase retrieval via gradient backpropagation, and introduces scale-rate DAFx as new transformations. No equations, fitted parameters, or predictions are described that reduce by construction to inputs or prior self-citations. Artistic outputs (electroacoustic pieces) are presented as applications rather than load-bearing justifications. The central claims rest on the scattering representation and optimization succeeding for the textures, but this is not shown to be tautological or fitted; the work is self-contained against the linked code and demos.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
INTRODUCTION Several composers have pointed out the lack of a satisfying trade- off between interpretability and flexibility in the parametrization of sound transformations [ 1, 2, 3]. For example, the constant-Q wavelet transform (CQT) of an audio signal provides an intuitive display of its short-term energy distribution in time and frequency [4], but doe...
work page 2019
-
[2]
The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering
TIME–FREQUENCY SCATTERING In this section, we define the time–frequency scattering transform as a function of four variables — time t, frequencyλ, amplitude modulation rate α, and frequency modulation scale β — which we connect to spectrotemporal receptive fields (STRF) in auditory neurophysiology [8]. We refer to [9] for an in-depth mathematical introducti...
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[3]
AUDIO TEXTURE SYNTHESIS In this section, we describe how to pseudo-invert time–frequency scattering, that is, to generate a waveform whose scattering coeffi- cients match the scattering coefficients of some other, pre-recorded waveform. 3.1. From phase retrieval to texture synthesis Although the invertibility of the convolutional operator involved in the co...
work page 2019
-
[4]
SCALE-RATE DIGITAL AUDIO EFFECTS In this section, we introduce an algorithm to manipulate the finest time scales of spectrotemporal modulations (from 10 ms to 1 s) while preserving both the temporal envelope and spectral envelope at a coarser scale (beyond 1 s). As an example, we implement chirp rate reversal, a new digital audio effect that flips the pitch...
work page 2019
-
[5]
CONCLUSION The past decade has witnessed a breakthrough of deep convolu- tional architectures for signal classification, with some noteworthy applications in speech, music, and ecoacoustics. Yet there is, to this day, virtually no adoption of any recent deep learning system by electroacoustic music composers. This is due to several short- comings of deep l...
work page 2016
-
[6]
ACKNOWLEDGMENTS This work is supported by the ERC InvariantClass grant 320959 and NSF awards 1633259 and 1633206. The two authors wish to thank Bob Sturm for putting them in contact with each other; Lorenzo Senni for accepting that his record title, The Shape of RemiXXXes to Come, is being reused as the title of the present article; and the anonymous revi...
-
[7]
Formalizing the concept of sound,
H Kaper and S Tipei, “Formalizing the concept of sound,” in Proc. ICMC, 1999
work page 1999
-
[8]
Fifty years of digital sound for music,
J.-C Risset, “Fifty years of digital sound for music,” in Proc. SMC, 2007
work page 2007
-
[9]
Machine listening intelligence,
C.-E Cella, “Machine listening intelligence,” in Proc. Int. Workshop on Deep Learning for Music , 2017
work page 2017
-
[10]
Constructing an invertible constant-Q transform with non- stationary Gabor frames,
G. A Velasco, N Holighaus, M Dörfler, and T Grill, “Constructing an invertible constant-Q transform with non- stationary Gabor frames,” in Proc. DAFx, 2011
work page 2011
-
[11]
Neural audio synthesis of musical notes with WaveNet autoencoders,
J Engel, C Resnick, A Roberts, S Dieleman, D Eck, K Si- monyan, and M Norouzi, “Neural audio synthesis of musical notes with WaveNet autoencoders,” in Proc. ICML, 2017
work page 2017
-
[12]
Understanding deep convolutional networks,
S Mallat, “Understanding deep convolutional networks,” Phil. Trans. R. Soc. A, vol. 374, no. 2065, 2016
work page 2065
-
[13]
Joint time-frequency scattering for audio classification,
J Andén, V Lostanlen, and S Mallat, “Joint time-frequency scattering for audio classification,” in Proc. MLSP. IEEE, 2015, pp. 1–6
work page 2015
-
[14]
Music in our ears: the biological bases of musical timbre perception,
K Patil, D Pressnitzer, S Shamma, and M Elhilali, “Music in our ears: the biological bases of musical timbre perception,” PLoS computational biology, vol. 8, no. 11, 2012
work page 2012
-
[15]
J Andén, V Lostanlen, and S Mallat, “Ieee trans. sig. proc.,” IEEE Transactions on Signal Processing, vol. 67, no. 14, pp. 3704–3718, July 2019
work page 2019
-
[16]
Idealized computational models for auditory receptive fields,
T Lindeberg and A Friberg, “Idealized computational models for auditory receptive fields,” PLoS one, vol. 10, no. 3, 2015
work page 2015
-
[17]
S Mallat, “Group invariant scattering,” Comm. Pure Appl. Math., vol. 65, no. 10, pp. 1331–1398, 2012
work page 2012
-
[18]
K Siedenburg, I Fujinaga, and S McAdams, “A comparison of approaches to timbre descriptors in music information retrieval and music psychology,” J. New Music Research, vol. 45, no. 1, pp. 27–41, 2016
work page 2016
-
[19]
M. R Schädler, B. T Meyer, and B Kollmeier, “Spectro- temporal modulation subspace-spanning filter bank features for robust automatic speech recognition,” J. Acoust. Soc. of Am., vol. 131, no. 5, pp. 4134–4151, 2012
work page 2012
-
[20]
thesis, École normale supérieure, 2017
V Lostanlen, Convolutional operators in the time-frequency domain, Ph.D. thesis, École normale supérieure, 2017
work page 2017
-
[21]
S Mallat, A wavelet tour of signal processing: the sparse way , Academic press, 2008
work page 2008
-
[22]
Scattering representation of modu- lated sounds,
J Andén and S Mallat, “Scattering representation of modu- lated sounds,” in Proc. DAFx, 2012
work page 2012
-
[23]
Exponential decay of scattering coefficients,
I Waldspurger, “Exponential decay of scattering coefficients,” in Proc. IEEE SampTA, 2017
work page 2017
-
[24]
Phase retrieval for wavelet transforms,
I Waldspurger, “Phase retrieval for wavelet transforms,” IEEE Trans. Inf. Theory, vol. 63, no. 5, pp. 2993–3009, 2017
work page 2017
-
[25]
thesis, École normale supérieure, 2015
I Waldspurger, Wavelet transform modulus: phase retrieval and scattering, Ph.D. thesis, École normale supérieure, 2015
work page 2015
-
[26]
State of the art in sound texture synthesis,
D Schwarz, “State of the art in sound texture synthesis,” in Proc. DAFx, 2011
work page 2011
-
[27]
Concatenative sound texture synthesis methods and evaluation,
D Schwarz, A Röbel, C Yeh, and A Laburthe, “Concatenative sound texture synthesis methods and evaluation,” in Proc. DAFx, 2016
work page 2016
-
[28]
On the importance of initialization and momentum in deep learning,
I Sutskever, J Martens, G Dahl, and G Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. ICML, 2013, pp. 1139–1147
work page 2013
-
[29]
Audio Texture Synthesis with Scattering Moments
J Bruna and S Mallat, “Audio texture synthesis with scattering moments,” arXiv preprint arXiv:1311.0407, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[30]
Sound texture percep- tion via statistics of the auditory periphery: evidence from sound synthesis,
J. H McDermott and E. P Simoncelli, “Sound texture percep- tion via statistics of the auditory periphery: evidence from sound synthesis,” Neuron, vol. 71, no. 5, pp. 926–940, 2011
work page 2011
-
[31]
V . J. M Nicolaus Schafhausen, Ed., Florian Hecker: Hal- luzination, Perspektive, Synthese , Sternberg Press, Berlin, 2019
work page 2019
-
[32]
Alte Oper, Frankfurt, October 5th, 2016
R Mackay, Program notes to F A VN’s premiere. Alte Oper, Frankfurt, October 5th, 2016
work page 2016
-
[33]
On time-frequency scattering and computer music,
V Lostanlen, “On time-frequency scattering and computer music,” in Florian Hecker: Halluzination, Perspektive, Syn- these, N Schafhausen and V . J Müller, Eds. Sternberg Press, Berlin, 2019
work page 2019
-
[34]
Sound out of line: In conversation with Florian Hecker,
F Hecker and R Mackay, “Sound out of line: In conversation with Florian Hecker,” Urbanomic, , no. 3, 2009
work page 2009
-
[35]
F Hecker, Ed., Chimerizations, Primary Information, New York, 2013
work page 2013
-
[36]
Exploration of timbre by analysis and synthe- sis,
J.-C Risset, “Exploration of timbre by analysis and synthe- sis,” in The Psychology of Music, 2nd Ed. , D Deutsch, Ed., chapter 5, pp. 113–169. Elsevier, 1999
work page 1999
-
[37]
A shape-invariant phase vocoder for speech trans- formation,
A Röbel, “A shape-invariant phase vocoder for speech trans- formation,” in Proc. DAFx, 2010
work page 2010
-
[38]
The wavelet transform for analysis, synthesis, and processing of speech and music sounds,
R Kronland-Martinet, “The wavelet transform for analysis, synthesis, and processing of speech and music sounds,”Comp. Mus. J., vol. 12, no. 4, pp. 11–20, 1988
work page 1988
-
[39]
A new approach to transient processing in the phase vocoder,
A Röbel, “A new approach to transient processing in the phase vocoder,” in Proc. DAFX, 2003. DAFX-7 Proceedings of the 22 nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019
work page 2003
-
[40]
Instrument- specific harmonic atoms for mid-level music representation,
P Leveau, E Vincent, G Richard, and L Daudet, “Instrument- specific harmonic atoms for mid-level music representation,” IEEE Trans. Audio Speech Lang. Proc. , vol. 16, no. 1, pp. 116–128, 2008
work page 2008
-
[41]
J Andén and S Mallat, “Deep scattering spectrum,” IEEE Trans. Sig. Proc., vol. 62, no. 16, pp. 4114–4128, 2014
work page 2014
-
[42]
Transforming vibrato extent in monophonic sounds,
A Röbel, S Maller, and J Contreras, “Transforming vibrato extent in monophonic sounds,” in Proc. DAFx 2011, 2011
work page 2011
-
[43]
Short-time time-reversal on audio signals,
H.-S Kim and J. O. I Smith, “Short-time time-reversal on audio signals,” in Proc. DAFx, 2014
work page 2014
-
[44]
Wavelet scattering on the pitch spiral,
V Lostanlen and S Mallat, “Wavelet scattering on the pitch spiral,” in Proc. DAFx, 2015
work page 2015
-
[45]
Pitch circularity from tones comprising full harmonic series,
D Deutsch, K Dooley, and T Henthorn, “Pitch circularity from tones comprising full harmonic series,” J. Acoust. Soc. Am., vol. 124, no. 1, pp. 589–597, 2008
work page 2008
-
[46]
The Snail: a real-time software application to visualize sounds,
T Hélie and C Picasso, “The Snail: a real-time software application to visualize sounds,” in Proc. DAFx, 2017
work page 2017
-
[47]
Synthèse de textures sonores à partir de statis- tiques temps-fréquence,
H Caracalla, “Synthèse de textures sonores à partir de statis- tiques temps-fréquence,” M.S. thesis, Ircam, 2016
work page 2016
-
[48]
M Kim and P Smaragdis, “Bitwise neural networks,” in Proc. ICML, 2015
work page 2015
-
[49]
C Donahue, J Macaulay, and M Puckette, “Synthesizing audio with GANs,” in Proc. ICLR, workshop track, 2018
work page 2018
-
[50]
Fader networks: Manipulating images by sliding attributes,
G Lample, N Zeghidour, N Usunier, A Bordes, L Denoyer, et al., “Fader networks: Manipulating images by sliding attributes,” in Proc. NIPS, 2017
work page 2017
-
[51]
Generating similarity- based playlists using traveling salesman algorithms,
T Pohle, E Pampalk, and G Widmer, “Generating similarity- based playlists using traveling salesman algorithms,” in Proc. DAFx, 2005
work page 2005
-
[52]
I Xenakis, “Concerning time,” Perspectives of New Music, vol. 1, no. 27, pp. 84–92, 1989
work page 1989
-
[53]
Le compositeur et ses machines : de la recherche musicale,
J.-C Risset, “Le compositeur et ses machines : de la recherche musicale,” Esprit, vol. 3, no. 99, pp. 59–76, 1985
work page 1985
-
[54]
The C in IRCAM: Coordinating musical research at IRCAM,
A Cont and A Gerzso, “The C in IRCAM: Coordinating musical research at IRCAM,” in Proc. ICMC, 2010
work page 2010
-
[55]
R Mackay, Ed., Florian Hecker: F ormulations , Koenig Books, London, 2016
work page 2016
-
[56]
Scattering.m: a matlab toolbox for wavelet scattering,
V Lostanlen, “Scattering.m: a matlab toolbox for wavelet scattering,” June 2019. DAFX-8
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.