pith. sign in

arxiv: 1906.09334 · v2 · pith:NAGWYVKSnew · submitted 2019-06-21 · 💻 cs.SD · cs.MM· eess.AS

The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering

Pith reviewed 2026-05-25 18:06 UTC · model grok-4.3

classification 💻 cs.SD cs.MMeess.AS
keywords audio texture synthesistime-frequency scatteringphase retrievaldigital audio effectsscale-rate transformationschirp rate inversionelectroacoustic composition
0
0 comments X

The pith

Time-frequency scattering coefficients combined with gradient phase retrieval support re-synthesis of audio textures and enable scale-rate effects such as chirp rate inversion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to apply time-frequency scattering, which extracts modulations across scales and rates in the time-frequency plane, to the re-synthesis of audio textures. Phase retrieval is performed inside the scattering network by gradient backpropagation, turning the coefficients back into waveforms. The same representation supports a new family of audio transformations called scale-rate DAFx, one instance being chirp rate inversion that reverses individual events locally while the overall time arrow stays intact. These tools were used to produce four electroacoustic pieces and a commercial remix released by Warp Records.

Core claim

Time-frequency scattering is a convolutional operator that extracts modulations in the time-frequency domain at different rates and scales. After phase retrieval by gradient backpropagation, the coefficients become invertible, allowing both faithful reconstruction of audio textures and a new class of transformations expressed directly in the scattering domain. One such transformation, chirp rate inversion, locally reverses the time direction of sonic events without altering the global arrow of time.

What carries the argument

Time-frequency scattering, a convolutional operator extracting modulations in the time-frequency domain at different rates and scales, combined with gradient-based phase retrieval to invert the coefficients back to audio.

If this is right

  • Audio textures become re-synthesizable from their scattering coefficients alone.
  • Scale-rate DAFx such as chirp rate inversion can be applied directly in the coefficient domain.
  • The same pipeline supports creation of new electroacoustic works and commercial remixes.
  • Source code and audio demos make the transformations reproducible for further music production.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The scattering domain might allow quantitative comparison or interpolation between different audio textures.
  • Gradient phase retrieval could be tested on other convolutional time-frequency representations beyond scattering.
  • Local time manipulations like chirp rate inversion might combine with global tempo changes to produce hybrid rhythmic effects.

Load-bearing premise

The time-frequency scattering representation preserves enough perceptual information about audio textures to allow faithful re-synthesis and musically useful manipulations without additional perceptual modeling.

What would settle it

Re-synthesis from the scattering coefficients either fails to match the original texture perceptually or produces manipulated versions, such as chirp-rate-inverted sounds, that do not exhibit the intended local time reversal.

Figures

Figures reproduced from arXiv: 1906.09334 by Florian Hecker, Vincent Lostanlen.

Figure 1
Figure 1. Figure 1: Interference pattern between wavelets ψα(t) and ψβ (log2 λ) in the time–frequency domain (t, log2 λ) for differ￾ent combinations of amplitude modulation rate α and frequency modulation scale β. Darker shades of red (resp. blue) indicate higher positive (resp. lower negative) values of the real part. See Section 2 for details. flexibility. We describe the scattering-based DAFx underlying the synthesis of fi… view at source ↗
Figure 2
Figure 2. Figure 2: Filterbanks of Morlet wavelets in the Fourier domain: [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An example of chirp rate inversion with time–frequency scattering. Top: original audio material. Bottom: computer-generated [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

This article explains how to apply time--frequency scattering, a convolutional operator extracting modulations in the time--frequency domain at different rates and scales, to the re-synthesis and manipulation of audio textures. After implementing phase retrieval in the scattering network by gradient backpropagation, we introduce scale--rate DAFx, a class of audio transformations expressed in the domain of time--frequency scattering coefficients. One example of scale--rate DAFx is chirp rate inversion, which causes each sonic event to be locally reversed in time while leaving the arrow of time globally unchanged. Over the past two years, our work has led to the creation of four electroacoustic pieces: ``FAVN''; ``Modulator (Scattering Transform)''; ``Experimental Palimpsest''; ``Inspection''; and a remix of Lorenzo Senni's ``XAllegroX'', released by Warp Records on a vinyl entitled ``The Shape of RemiXXXes to Come''. The source code to reproduce experiments and figures is made freely available at: https://github.com/lostanlen/scattering.m. A companion website containing demos is at: https://lostanlen.com/pubs/dafx2019

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that time-frequency scattering coefficients, when inverted via gradient-based phase retrieval, enable faithful re-synthesis of audio textures and support a new class of scale-rate digital audio effects (DAFx) such as chirp-rate inversion; these are demonstrated through four electroacoustic compositions and a commercial remix, with open-source code provided.

Significance. If the phase-retrieval step succeeds reliably, the work would offer a mathematically grounded domain for texture manipulation that preserves modulation structure at multiple scales and rates, with potential utility in sound design and electroacoustic music. The explicit release of source code and a companion demo site is a clear strength that supports reproducibility.

major comments (3)
  1. [Phase retrieval subsection] The phase-retrieval procedure (described after the scattering definition) is performed by gradient backpropagation, yet the manuscript supplies neither the explicit loss function, initialization strategy, step-size schedule, nor any convergence analysis or failure cases. Because scattering discards phase, every downstream claim of faithful re-synthesis and musically useful DAFx rests on this uncharacterized optimization step.
  2. [Experimental results / musical examples] No quantitative reconstruction metrics (SNR, perceptual distances, or spectro-temporal error) or baseline comparisons are reported for the re-synthesized textures, nor are formal listening-test results provided to support the claim that the inverted signals are perceptually faithful.
  3. [Scale-rate DAFx section] The mathematical definition and implementation of scale-rate DAFx (e.g., chirp-rate inversion) are presented only at a high level; it is not shown how the transformation is applied directly to the scattering coefficients while preserving the global time arrow, nor is any analysis given of the resulting coefficient-domain properties.
minor comments (2)
  1. [Abstract] The abstract states that four pieces were created but then enumerates five titles; this minor inconsistency should be corrected.
  2. [Figures] Figure captions and axis labels in the time-frequency scattering visualizations could be expanded to improve readability for readers unfamiliar with the representation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive review. We address each major comment below, indicating planned revisions to strengthen the methodological details, evaluation, and mathematical exposition while preserving the manuscript's focus on artistic applications.

read point-by-point responses
  1. Referee: [Phase retrieval subsection] The phase-retrieval procedure (described after the scattering definition) is performed by gradient backpropagation, yet the manuscript supplies neither the explicit loss function, initialization strategy, step-size schedule, nor any convergence analysis or failure cases. Because scattering discards phase, every downstream claim of faithful re-synthesis and musically useful DAFx rests on this uncharacterized optimization step.

    Authors: We agree that the phase retrieval step requires fuller documentation. In revision we will add a dedicated subsection specifying the loss (L2 distance on scattering coefficients), initialization (random phase), optimizer and schedule (Adam with fixed learning rate), observed convergence behavior, and typical failure cases (e.g., highly transient textures). These details are already implemented in the released code and will be cross-referenced in the text. revision: yes

  2. Referee: [Experimental results / musical examples] No quantitative reconstruction metrics (SNR, perceptual distances, or spectro-temporal error) or baseline comparisons are reported for the re-synthesized textures, nor are formal listening-test results provided to support the claim that the inverted signals are perceptually faithful.

    Authors: The manuscript prioritizes creative demonstration through composed works rather than perceptual benchmarking. We will nevertheless add quantitative reconstruction metrics (SNR and spectro-temporal error) and a baseline comparison to Griffin-Lim for the reported examples. Formal listening tests lie outside the artistic scope of the paper; we will instead clarify that perceptual claims rest on the composers' direct use of the outputs. revision: partial

  3. Referee: [Scale-rate DAFx section] The mathematical definition and implementation of scale-rate DAFx (e.g., chirp-rate inversion) are presented only at a high level; it is not shown how the transformation is applied directly to the scattering coefficients while preserving the global time arrow, nor is any analysis given of the resulting coefficient-domain properties.

    Authors: We will expand the Scale-rate DAFx section with explicit equations showing how each transformation (including chirp-rate inversion) is applied element-wise to the scattering coefficients, together with a proof sketch that the global time arrow is preserved. We will also add a short analysis of the resulting coefficient-domain invariants, such as modulation-rate preservation. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on established scattering operator and external optimization without self-referential reduction

full rationale

The paper applies time-frequency scattering (an established convolutional operator) to audio textures, implements phase retrieval via gradient backpropagation, and introduces scale-rate DAFx as new transformations. No equations, fitted parameters, or predictions are described that reduce by construction to inputs or prior self-citations. Artistic outputs (electroacoustic pieces) are presented as applications rather than load-bearing justifications. The central claims rest on the scattering representation and optimization succeeding for the textures, but this is not shown to be tautological or fitted; the work is self-contained against the linked code and demos.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5748 in / 935 out tokens · 19166 ms · 2026-05-25T18:06:16.607143+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 2 internal anchors

  1. [1]

    scale-rate DAFx

    INTRODUCTION Several composers have pointed out the lack of a satisfying trade- off between interpretability and flexibility in the parametrization of sound transformations [ 1, 2, 3]. For example, the constant-Q wavelet transform (CQT) of an audio signal provides an intuitive display of its short-term energy distribution in time and frequency [4], but doe...

  2. [2]

    The Shape of RemiXXXes to Come: Audio Texture Synthesis with Time-frequency Scattering

    TIME–FREQUENCY SCATTERING In this section, we define the time–frequency scattering transform as a function of four variables — time t, frequencyλ, amplitude modulation rate α, and frequency modulation scale β — which we connect to spectrotemporal receptive fields (STRF) in auditory neurophysiology [8]. We refer to [9] for an in-depth mathematical introducti...

  3. [3]

    bold driver

    AUDIO TEXTURE SYNTHESIS In this section, we describe how to pseudo-invert time–frequency scattering, that is, to generate a waveform whose scattering coeffi- cients match the scattering coefficients of some other, pre-recorded waveform. 3.1. From phase retrieval to texture synthesis Although the invertibility of the convolutional operator involved in the co...

  4. [4]

    axis of time

    SCALE-RATE DIGITAL AUDIO EFFECTS In this section, we introduce an algorithm to manipulate the finest time scales of spectrotemporal modulations (from 10 ms to 1 s) while preserving both the temporal envelope and spectral envelope at a coarser scale (beyond 1 s). As an example, we implement chirp rate reversal, a new digital audio effect that flips the pitch...

  5. [5]

    Yet there is, to this day, virtually no adoption of any recent deep learning system by electroacoustic music composers

    CONCLUSION The past decade has witnessed a breakthrough of deep convolu- tional architectures for signal classification, with some noteworthy applications in speech, music, and ecoacoustics. Yet there is, to this day, virtually no adoption of any recent deep learning system by electroacoustic music composers. This is due to several short- comings of deep l...

  6. [6]

    ACKNOWLEDGMENTS This work is supported by the ERC InvariantClass grant 320959 and NSF awards 1633259 and 1633206. The two authors wish to thank Bob Sturm for putting them in contact with each other; Lorenzo Senni for accepting that his record title, The Shape of RemiXXXes to Come, is being reused as the title of the present article; and the anonymous revi...

  7. [7]

    Formalizing the concept of sound,

    H Kaper and S Tipei, “Formalizing the concept of sound,” in Proc. ICMC, 1999

  8. [8]

    Fifty years of digital sound for music,

    J.-C Risset, “Fifty years of digital sound for music,” in Proc. SMC, 2007

  9. [9]

    Machine listening intelligence,

    C.-E Cella, “Machine listening intelligence,” in Proc. Int. Workshop on Deep Learning for Music , 2017

  10. [10]

    Constructing an invertible constant-Q transform with non- stationary Gabor frames,

    G. A Velasco, N Holighaus, M Dörfler, and T Grill, “Constructing an invertible constant-Q transform with non- stationary Gabor frames,” in Proc. DAFx, 2011

  11. [11]

    Neural audio synthesis of musical notes with WaveNet autoencoders,

    J Engel, C Resnick, A Roberts, S Dieleman, D Eck, K Si- monyan, and M Norouzi, “Neural audio synthesis of musical notes with WaveNet autoencoders,” in Proc. ICML, 2017

  12. [12]

    Understanding deep convolutional networks,

    S Mallat, “Understanding deep convolutional networks,” Phil. Trans. R. Soc. A, vol. 374, no. 2065, 2016

  13. [13]

    Joint time-frequency scattering for audio classification,

    J Andén, V Lostanlen, and S Mallat, “Joint time-frequency scattering for audio classification,” in Proc. MLSP. IEEE, 2015, pp. 1–6

  14. [14]

    Music in our ears: the biological bases of musical timbre perception,

    K Patil, D Pressnitzer, S Shamma, and M Elhilali, “Music in our ears: the biological bases of musical timbre perception,” PLoS computational biology, vol. 8, no. 11, 2012

  15. [15]

    Ieee trans. sig. proc.,

    J Andén, V Lostanlen, and S Mallat, “Ieee trans. sig. proc.,” IEEE Transactions on Signal Processing, vol. 67, no. 14, pp. 3704–3718, July 2019

  16. [16]

    Idealized computational models for auditory receptive fields,

    T Lindeberg and A Friberg, “Idealized computational models for auditory receptive fields,” PLoS one, vol. 10, no. 3, 2015

  17. [17]

    Group invariant scattering,

    S Mallat, “Group invariant scattering,” Comm. Pure Appl. Math., vol. 65, no. 10, pp. 1331–1398, 2012

  18. [18]

    A comparison of approaches to timbre descriptors in music information retrieval and music psychology,

    K Siedenburg, I Fujinaga, and S McAdams, “A comparison of approaches to timbre descriptors in music information retrieval and music psychology,” J. New Music Research, vol. 45, no. 1, pp. 27–41, 2016

  19. [19]

    Spectro- temporal modulation subspace-spanning filter bank features for robust automatic speech recognition,

    M. R Schädler, B. T Meyer, and B Kollmeier, “Spectro- temporal modulation subspace-spanning filter bank features for robust automatic speech recognition,” J. Acoust. Soc. of Am., vol. 131, no. 5, pp. 4134–4151, 2012

  20. [20]

    thesis, École normale supérieure, 2017

    V Lostanlen, Convolutional operators in the time-frequency domain, Ph.D. thesis, École normale supérieure, 2017

  21. [21]

    S Mallat, A wavelet tour of signal processing: the sparse way , Academic press, 2008

  22. [22]

    Scattering representation of modu- lated sounds,

    J Andén and S Mallat, “Scattering representation of modu- lated sounds,” in Proc. DAFx, 2012

  23. [23]

    Exponential decay of scattering coefficients,

    I Waldspurger, “Exponential decay of scattering coefficients,” in Proc. IEEE SampTA, 2017

  24. [24]

    Phase retrieval for wavelet transforms,

    I Waldspurger, “Phase retrieval for wavelet transforms,” IEEE Trans. Inf. Theory, vol. 63, no. 5, pp. 2993–3009, 2017

  25. [25]

    thesis, École normale supérieure, 2015

    I Waldspurger, Wavelet transform modulus: phase retrieval and scattering, Ph.D. thesis, École normale supérieure, 2015

  26. [26]

    State of the art in sound texture synthesis,

    D Schwarz, “State of the art in sound texture synthesis,” in Proc. DAFx, 2011

  27. [27]

    Concatenative sound texture synthesis methods and evaluation,

    D Schwarz, A Röbel, C Yeh, and A Laburthe, “Concatenative sound texture synthesis methods and evaluation,” in Proc. DAFx, 2016

  28. [28]

    On the importance of initialization and momentum in deep learning,

    I Sutskever, J Martens, G Dahl, and G Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. ICML, 2013, pp. 1139–1147

  29. [29]

    Audio Texture Synthesis with Scattering Moments

    J Bruna and S Mallat, “Audio texture synthesis with scattering moments,” arXiv preprint arXiv:1311.0407, 2013

  30. [30]

    Sound texture percep- tion via statistics of the auditory periphery: evidence from sound synthesis,

    J. H McDermott and E. P Simoncelli, “Sound texture percep- tion via statistics of the auditory periphery: evidence from sound synthesis,” Neuron, vol. 71, no. 5, pp. 926–940, 2011

  31. [31]

    V . J. M Nicolaus Schafhausen, Ed., Florian Hecker: Hal- luzination, Perspektive, Synthese , Sternberg Press, Berlin, 2019

  32. [32]

    Alte Oper, Frankfurt, October 5th, 2016

    R Mackay, Program notes to F A VN’s premiere. Alte Oper, Frankfurt, October 5th, 2016

  33. [33]

    On time-frequency scattering and computer music,

    V Lostanlen, “On time-frequency scattering and computer music,” in Florian Hecker: Halluzination, Perspektive, Syn- these, N Schafhausen and V . J Müller, Eds. Sternberg Press, Berlin, 2019

  34. [34]

    Sound out of line: In conversation with Florian Hecker,

    F Hecker and R Mackay, “Sound out of line: In conversation with Florian Hecker,” Urbanomic, , no. 3, 2009

  35. [35]

    F Hecker, Ed., Chimerizations, Primary Information, New York, 2013

  36. [36]

    Exploration of timbre by analysis and synthe- sis,

    J.-C Risset, “Exploration of timbre by analysis and synthe- sis,” in The Psychology of Music, 2nd Ed. , D Deutsch, Ed., chapter 5, pp. 113–169. Elsevier, 1999

  37. [37]

    A shape-invariant phase vocoder for speech trans- formation,

    A Röbel, “A shape-invariant phase vocoder for speech trans- formation,” in Proc. DAFx, 2010

  38. [38]

    The wavelet transform for analysis, synthesis, and processing of speech and music sounds,

    R Kronland-Martinet, “The wavelet transform for analysis, synthesis, and processing of speech and music sounds,”Comp. Mus. J., vol. 12, no. 4, pp. 11–20, 1988

  39. [39]

    A new approach to transient processing in the phase vocoder,

    A Röbel, “A new approach to transient processing in the phase vocoder,” in Proc. DAFX, 2003. DAFX-7 Proceedings of the 22 nd International Conference on Digital Audio Effects (DAFx-19), Birmingham, UK, September 2–6, 2019

  40. [40]

    Instrument- specific harmonic atoms for mid-level music representation,

    P Leveau, E Vincent, G Richard, and L Daudet, “Instrument- specific harmonic atoms for mid-level music representation,” IEEE Trans. Audio Speech Lang. Proc. , vol. 16, no. 1, pp. 116–128, 2008

  41. [41]

    Deep scattering spectrum,

    J Andén and S Mallat, “Deep scattering spectrum,” IEEE Trans. Sig. Proc., vol. 62, no. 16, pp. 4114–4128, 2014

  42. [42]

    Transforming vibrato extent in monophonic sounds,

    A Röbel, S Maller, and J Contreras, “Transforming vibrato extent in monophonic sounds,” in Proc. DAFx 2011, 2011

  43. [43]

    Short-time time-reversal on audio signals,

    H.-S Kim and J. O. I Smith, “Short-time time-reversal on audio signals,” in Proc. DAFx, 2014

  44. [44]

    Wavelet scattering on the pitch spiral,

    V Lostanlen and S Mallat, “Wavelet scattering on the pitch spiral,” in Proc. DAFx, 2015

  45. [45]

    Pitch circularity from tones comprising full harmonic series,

    D Deutsch, K Dooley, and T Henthorn, “Pitch circularity from tones comprising full harmonic series,” J. Acoust. Soc. Am., vol. 124, no. 1, pp. 589–597, 2008

  46. [46]

    The Snail: a real-time software application to visualize sounds,

    T Hélie and C Picasso, “The Snail: a real-time software application to visualize sounds,” in Proc. DAFx, 2017

  47. [47]

    Synthèse de textures sonores à partir de statis- tiques temps-fréquence,

    H Caracalla, “Synthèse de textures sonores à partir de statis- tiques temps-fréquence,” M.S. thesis, Ircam, 2016

  48. [48]

    Bitwise neural networks,

    M Kim and P Smaragdis, “Bitwise neural networks,” in Proc. ICML, 2015

  49. [49]

    Synthesizing audio with GANs,

    C Donahue, J Macaulay, and M Puckette, “Synthesizing audio with GANs,” in Proc. ICLR, workshop track, 2018

  50. [50]

    Fader networks: Manipulating images by sliding attributes,

    G Lample, N Zeghidour, N Usunier, A Bordes, L Denoyer, et al., “Fader networks: Manipulating images by sliding attributes,” in Proc. NIPS, 2017

  51. [51]

    Generating similarity- based playlists using traveling salesman algorithms,

    T Pohle, E Pampalk, and G Widmer, “Generating similarity- based playlists using traveling salesman algorithms,” in Proc. DAFx, 2005

  52. [52]

    Concerning time,

    I Xenakis, “Concerning time,” Perspectives of New Music, vol. 1, no. 27, pp. 84–92, 1989

  53. [53]

    Le compositeur et ses machines : de la recherche musicale,

    J.-C Risset, “Le compositeur et ses machines : de la recherche musicale,” Esprit, vol. 3, no. 99, pp. 59–76, 1985

  54. [54]

    The C in IRCAM: Coordinating musical research at IRCAM,

    A Cont and A Gerzso, “The C in IRCAM: Coordinating musical research at IRCAM,” in Proc. ICMC, 2010

  55. [55]

    R Mackay, Ed., Florian Hecker: F ormulations , Koenig Books, London, 2016

  56. [56]

    Scattering.m: a matlab toolbox for wavelet scattering,

    V Lostanlen, “Scattering.m: a matlab toolbox for wavelet scattering,” June 2019. DAFX-8