pith. sign in

arxiv: 2605.17582 · v1 · pith:L7WAHJF5new · submitted 2026-05-17 · 💻 cs.LG · cs.CE

Scale-Equivariant Generative Forecasting: Weight-Tied Dilated Convolutions, Wavelet Scattering Inputs, and Spectral-Consistency Training for Self-Similar Time Series

Pith reviewed 2026-05-20 13:20 UTC · model grok-4.3

classification 💻 cs.LG cs.CE
keywords scale equivariancedilated convolutionsself-similar time seriesgenerative forecastingWaveNetwavelet scatteringscaling collapsefinancial returns
0
0 comments X

The pith

Tying kernel weights across dilation levels makes dilated-convolution stacks scale-equivariant and captures self-similarity in time series.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines discrete scale equivariance for causal 1D networks and shows that dyadic dilation commutes with any dilated-convolution stack whose kernels are shared across levels. This weight-tying embeds self-similarity as an inductive bias while cutting the convolutional parameter count by a factor of L, the depth. The resulting SE-WaveNet is wrapped with a one-level Daubechies-4 wavelet input, a Hurst-FiLM conditioning block, and a spectral-consistency loss that targets the expected power-law spectrum. On thirty years of S&P 500 daily log-returns the model reproduces the empirical scaling-collapse diagnostic on the Allan-Variance top-25 universe, while a capacity-matched vanilla WaveNet does not.

Core claim

A dilated-convolution stack whose kernel weights are shared across all dilation levels is discrete scale-equivariant because dyadic dilation commutes with the stack up to boundary effects; this equivariance, together with wavelet scattering inputs, local Hurst conditioning, and a spectral-consistency term, lets the conditional normalizing-flow head generate samples whose scaling statistics match those observed in real self-similar series such as equity returns.

What carries the argument

The weight-tied dilated convolution stack that enforces scale equivariance by sharing the same kernel at every dilation level.

Load-bearing premise

Dyadic dilation commutes with the weight-tied dilated convolution stack up to boundary effects.

What would settle it

An untied dilated-convolution model at matched capacity that achieves a median scaling-collapse diagnostic C* of 0.020 or lower on the same S&P 500 Allan-Variance top-25 universe would show that the equivariance is not required.

read the original abstract

Many natural and engineered time series -- equity returns, climate anomalies, turbulent velocities, neural recordings, packet-level network traffic -- are approximately self-similar: their horizon-$T$ distribution is tied to the horizon-$1$ distribution by one scaling exponent $H$. Standard deep generative sequence models (transformers, dilated TCNs, the WaveNet family) ignore this. Their receptive fields are wide, but kernel parameters live independently at every dilation level, yielding a multi-scale architecture, not a scale-equivariant one. We make three contributions. First, we give a precise definition of discrete scale equivariance for 1D causal networks and prove that dyadic dilation commutes (up to boundary effects) with any dilated-convolution stack whose kernel weights are shared across levels. Tying the kernel shrinks the convolutional parameter budget by an $L$-fold factor (where $L$ is depth) and hard-wires self-similarity in as an inductive bias. Second, we wrap this Scale-Equivariant WaveNet (SE-WaveNet) backbone in three components that carry the same prior: a one-level Daubechies-4 wavelet input, a Hurst-FiLM block exposing the local scaling exponent, and a spectral-consistency training term targeting the $|f|^{-(2H+1)}$ power-law spectrum. The head is a conditional normalising flow, chosen to preserve equivariance. Third, on 30 years of S&P 500 daily log-returns, SE-WaveNet samples reproduce the empirical scaling-collapse diagnostic on the Allan-Variance top-25 universe (median $\mathcal{C}^\star = 0.020$), while a vanilla WaveNet at matched capacity does not ($\geq 0.06$). NLL, KS-calibration, and tail energy distance tie or beat the baseline, with $L\times$ fewer convolutional parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Scale-Equivariant WaveNet (SE-WaveNet) for generative forecasting of approximately self-similar time series. It defines discrete scale equivariance for 1D causal networks and claims to prove that dyadic dilation commutes (up to boundary effects) with dilated-convolution stacks whose kernels are weight-tied across levels. The architecture is augmented with a one-level Daubechies-4 wavelet input, Hurst-FiLM conditioning on the local scaling exponent H, and a spectral-consistency loss targeting the |f|^{-(2H+1)} spectrum; the head is a conditional normalizing flow asserted to preserve equivariance. On 30 years of S&P 500 daily log-returns, SE-WaveNet samples achieve median C^*=0.020 on the Allan-variance scaling-collapse diagnostic for the top-25 universe, versus >=0.06 for a capacity-matched vanilla WaveNet, while tying or beating baselines on NLL, KS-calibration, and tail energy distance, with an L-fold reduction in convolutional parameters.

Significance. If the equivariance property is rigorously established and the observed improvement in scaling diagnostics is attributable to the weight-tying inductive bias rather than auxiliary components, the work would offer a parameter-efficient architectural prior for self-similar processes that complements loss-based regularization. The concrete numerical result on real financial data and the explicit parameter reduction constitute strengths that could influence generative modeling in turbulence, climate, and network traffic domains.

major comments (3)
  1. [Abstract / theoretical contribution] Abstract and theoretical section: the manuscript states a precise definition of discrete scale equivariance and claims a proof that dyadic dilation commutes with any weight-tied dilated-convolution stack (up to boundary effects), but supplies no derivation details, lemmas, or explicit handling of padding/truncation on finite causal sequences. This is load-bearing for the central claim that weight-tying hard-wires self-similarity and explains the Allan-variance improvement.
  2. [Experiments on S&P 500] Experiments section: the reported median C^*=0.020 for SE-WaveNet versus >=0.06 for vanilla WaveNet on the Allan-variance top-25 universe lacks error bars, multiple random seeds, or statistical tests. Without these, it is impossible to determine whether the difference is robust or could be explained by the wavelet inputs or spectral-consistency term alone.
  3. [Model architecture] Architecture and head description: the conditional normalizing-flow head is asserted to preserve equivariance, yet the construction (conditioning via Hurst-FiLM, etc.) is not shown to commute with dyadic dilation, and no verification against scale-dependent boundary or conditioning artifacts is provided. This directly affects whether the performance gap can be credited to the claimed equivariant backbone.
minor comments (2)
  1. [Abstract] The abstract refers to 'wavelet scattering inputs' while the body specifies 'one-level Daubechies-4 wavelet input'; consistent terminology and a brief diagram of the input pipeline would improve clarity.
  2. [Experiments] No ablation isolating the contribution of weight-tying versus the spectral-consistency loss (which itself depends on the estimated H) is presented; such an experiment would strengthen attribution of the scaling-collapse result.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions planned for the next manuscript version.

read point-by-point responses
  1. Referee: [Abstract / theoretical contribution] Abstract and theoretical section: the manuscript states a precise definition of discrete scale equivariance and claims a proof that dyadic dilation commutes with any weight-tied dilated-convolution stack (up to boundary effects), but supplies no derivation details, lemmas, or explicit handling of padding/truncation on finite causal sequences. This is load-bearing for the central claim that weight-tying hard-wires self-similarity and explains the Allan-variance improvement.

    Authors: We agree that additional derivation details are required to make the central theoretical claim fully rigorous. In the revised manuscript we will add a dedicated appendix that contains: (i) the complete definition of discrete scale equivariance, (ii) the sequence of lemmas establishing commutation of dyadic dilation with weight-tied dilated-convolution stacks, and (iii) explicit analysis of padding, truncation, and boundary effects on finite-length causal sequences. This will allow readers to verify the inductive-bias argument without ambiguity. revision: yes

  2. Referee: [Experiments on S&P 500] Experiments section: the reported median C^*=0.020 for SE-WaveNet versus >=0.06 for vanilla WaveNet on the Allan-variance top-25 universe lacks error bars, multiple random seeds, or statistical tests. Without these, it is impossible to determine whether the difference is robust or could be explained by the wavelet inputs or spectral-consistency term alone.

    Authors: We acknowledge that the current experimental presentation does not include variability estimates or formal statistical comparison. For the revision we will re-run all models with at least five independent random seeds, report means and standard deviations for the C* diagnostic (and other metrics), and add paired statistical tests (e.g., Wilcoxon signed-rank) between SE-WaveNet and the capacity-matched baseline. These additions will clarify whether the observed gap is robust and attributable to the weight-tying component. revision: yes

  3. Referee: [Model architecture] Architecture and head description: the conditional normalizing-flow head is asserted to preserve equivariance, yet the construction (conditioning via Hurst-FiLM, etc.) is not shown to commute with dyadic dilation, and no verification against scale-dependent boundary or conditioning artifacts is provided. This directly affects whether the performance gap can be credited to the claimed equivariant backbone.

    Authors: We thank the referee for highlighting this gap. While the normalizing-flow head is conditioned on the scale-invariant Hurst exponent through FiLM layers, an explicit commutation argument was omitted. In the revision we will add a short subsection proving that the Hurst-FiLM conditioning commutes with dyadic dilation (subject to the same boundary effects already analyzed for the backbone) and will include numerical verification on synthetic fractional Brownian motion sequences to confirm the absence of scale-dependent artifacts. revision: yes

Circularity Check

1 steps flagged

Mild dependence in spectral-consistency loss on H estimated via Hurst-FiLM block

specific steps
  1. fitted input called prediction [Abstract]
    "a Hurst-FiLM block exposing the local scaling exponent, and a spectral-consistency training term targeting the |f|^{-(2H+1)} power-law spectrum"

    The loss directly targets the spectrum whose exponent is the H value produced by the Hurst-FiLM block (itself estimated from input data). This makes the scaling-consistency objective dependent on a quantity derived within the model rather than an external fixed target, so reproduction of scaling-collapse diagnostics is partly forced by the training construction that incorporates the model's own H estimate.

full rationale

The paper's central derivation is a mathematical proof that dyadic dilation commutes with weight-tied dilated convolutions (up to boundary effects), presented as an independent property that hard-wires self-similarity. This stands apart from the data. However, the spectral-consistency term uses the same H exposed by the Hurst-FiLM block, creating a mild self-referential dependence when enforcing the scaling spectrum. The empirical reproduction of the Allan-variance diagnostic is therefore partly supported by this construction rather than solely by the equivariant backbone. No self-citation chains, ansatz smuggling, or renaming of known results appear in the provided text. The overall circularity remains limited because the core equivariance claim retains independent mathematical content and the comparison baseline lacks the full set of components.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central construction rests on the commutation property of shared-weight dilated convolutions under dyadic dilation and on the assumption that a conditional normalizing flow can be made to preserve the resulting equivariance; no new physical entities are postulated.

free parameters (1)
  • local Hurst exponent H
    Exposed by the Hurst-FiLM block and used both for conditioning and for the spectral-consistency target; its value is data-dependent.
axioms (2)
  • domain assumption dyadic dilation commutes with the dilated-convolution stack when kernel weights are shared across levels (up to boundary effects)
    Invoked in the first contribution as the mathematical justification for scale equivariance.
  • domain assumption the conditional normalizing-flow head preserves the scale equivariance of the backbone
    Stated as a design choice required for the overall architecture to remain equivariant.

pith-pipeline@v0.9.0 · 5890 in / 1527 out tokens · 41438 ms · 2026-05-20T13:20:25.127383+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    WaveNet: A Generative Model for Raw Audio

    A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A gener- ative model for raw audio,”arXiv preprint arXiv:1609.03499, 2016

  2. [2]

    An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

    S. Bai, J. Z. Kolter, and V . Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,”arXiv preprint arXiv:1803.01271, 2018

  3. [3]

    Informer: Beyond efficient transformer for long sequence time-series forecasting,

    H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, 2021, pp. 11 106–11 115

  4. [4]

    Empirical properties of asset returns: stylized facts and statistical issues,

    R. Cont, “Empirical properties of asset returns: stylized facts and statistical issues,”Quantitative Finance, vol. 1, no. 2, pp. 223–236, 2001

  5. [5]

    Multifractality in asset returns: theory and evidence,

    L. E. Calvet and A. J. Fisher, “Multifractality in asset returns: theory and evidence,”Review of Economics and Statistics, vol. 84, no. 3, pp. 381–406, 2002

  6. [6]

    Indication of a universal persistence law governing atmospheric variability,

    E. Koscielny-Bunde, A. Bunde, S. Havlin, H. E. Roman, Y . Goldreich, and H.-J. Schellnhuber, “Indication of a universal persistence law governing atmospheric variability,”Physical Review Letters, vol. 81, no. 3, pp. 729–732, 1998

  7. [7]

    Frisch,Turbulence: the legacy of AN Kolmogorov

    U. Frisch,Turbulence: the legacy of AN Kolmogorov. Cambridge University Press, 1995

  8. [8]

    Scale-free brain activity: past, present, and future,

    B. J. He, “Scale-free brain activity: past, present, and future,”Trends in Cognitive Sciences, vol. 18, no. 9, pp. 480–487, 2014. 9

  9. [9]

    Learning-induced modulation of scale-free properties of brain activity measured with MEG,

    N. Zilber, P. Ciuciu, P. Abry, and V . van Wassenhove, “Learning-induced modulation of scale-free properties of brain activity measured with MEG,” inIEEE International Symposium on Biomedical Imaging (ISBI), 2013

  10. [10]

    On the self-similar nature of Ethernet traffic (extended version),

    W. E. Leland, M. S. Taqqu, W. Willinger, and D. V . Wilson, “On the self-similar nature of Ethernet traffic (extended version),”IEEE/ACM Transactions on Networking, vol. 2, no. 1, pp. 1–15, 1994

  11. [11]

    Fractional Brownian motions, fractional noises and applications,

    B. B. Mandelbrot and J. W. Van Ness, “Fractional Brownian motions, fractional noises and applications,”SIAM Review, vol. 10, no. 4, pp. 422–437, 1968

  12. [12]

    Masked autoregressive flow for density estimation,

    G. Papamakarios, T. Pavlakou, and I. Murray, “Masked autoregressive flow for density estimation,” inNeurIPS, 2017

  13. [13]

    Density estimation using Real NVP,

    L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using Real NVP,” inICLR, 2017

  14. [14]

    Normalizing flows for probabilistic modeling and inference,

    G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, and B. Lakshminarayanan, “Normalizing flows for probabilistic modeling and inference,”JMLR, vol. 22, no. 57, pp. 1–64, 2021

  15. [15]

    Group equivariant convolutional networks,

    T. Cohen and M. Welling, “Group equivariant convolutional networks,” inInternational Conference on Machine Learning (ICML), 2016, pp. 2990–2999

  16. [16]

    Spherical CNNs,

    T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical CNNs,” inICLR, 2018

  17. [17]

    General E(2)-equivariant steerable CNNs,

    M. Weiler and G. Cesa, “General E(2)-equivariant steerable CNNs,” in NeurIPS, 2019

  18. [18]

    Deep scale-spaces: equivariance over scale,

    D. E. Worrall and M. Welling, “Deep scale-spaces: equivariance over scale,” inNeurIPS, 2019

  19. [19]

    Scale-equivariant steerable networks,

    I. Sosnovik, M. Szmaja, and A. Smeulders, “Scale-equivariant steerable networks,” inICLR, 2020

  20. [20]

    Scaling- translation-equivariant networks with decomposed convolutional filters,

    W. Zhu, Q. Qiu, R. Calderbank, G. Sapiro, and X. Cheng, “Scaling- translation-equivariant networks with decomposed convolutional filters,” Journal of Machine Learning Research, vol. 23, no. 68, pp. 1–45, 2022

  21. [21]

    Invariant scattering convolution networks,

    J. Bruna and S. Mallat, “Invariant scattering convolution networks,” IEEE TPAMI, vol. 35, no. 8, pp. 1872–1886, 2013

  22. [22]

    Deep scattering spectrum,

    J. Andén and S. Mallat, “Deep scattering spectrum,”IEEE Transactions on Signal Processing, vol. 62, no. 16, pp. 4114–4128, 2014

  23. [23]

    Scaling the scattering transform: deep hybrid networks,

    E. Oyallon, E. Belilovsky, and S. Zagoruyko, “Scaling the scattering transform: deep hybrid networks,” inICCV, 2017

  24. [24]

    Kymatio: scattering transforms in Python,

    M. Andreux, T. Angles, G. Exarchakis, R. Leonarduzzi, G. Rochette, L. Thiry, J. Zarka, S. Mallat, J. Andén, E. Belilovskyet al., “Kymatio: scattering transforms in Python,”JMLR, vol. 21, no. 60, pp. 1–6, 2020

  25. [25]

    DeepAR: probabilistic forecasting with autoregressive recurrent networks,

    D. Salinas, V . Flunkert, J. Gasthaus, and T. Januschowski, “DeepAR: probabilistic forecasting with autoregressive recurrent networks,”Inter- national Journal of Forecasting, vol. 36, no. 3, pp. 1181–1191, 2020

  26. [26]

    Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting,

    K. Rasul, C. Seward, I. Schuster, and R. V ollgraf, “Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting,” inICML, 2021, pp. 8857–8868

  27. [27]

    Legendre memory units: continuous-time representation in recurrent neural networks,

    A. V oelker, I. Kaji ´c, and C. Eliasmith, “Legendre memory units: continuous-time representation in recurrent neural networks,” in NeurIPS, 2019

  28. [28]

    Efficiently modeling long sequences with structured state spaces,

    A. Gu, K. Goel, and C. Ré, “Efficiently modeling long sequences with structured state spaces,” inICLR, 2022

  29. [29]

    Stochastic analysis of the fractional Brownian motion,

    L. Decreusefond and A. S. Üstünel, “Stochastic analysis of the fractional Brownian motion,”Potential Analysis, vol. 10, no. 2, pp. 177–214, 1999

  30. [30]

    A multifractal model of asset returns,

    B. B. Mandelbrot, A. J. Fisher, and L. E. Calvet, “A multifractal model of asset returns,”Cowles Foundation Discussion Paper, no. 1164, 1997

  31. [31]

    Multifractality in human heartbeat dynamics,

    P. C. Ivanov, L. A. N. Amaral, A. L. Goldberger, S. Havlin, M. G. Rosenblum, Z. R. Struzik, and H. E. Stanley, “Multifractality in human heartbeat dynamics,”Nature, vol. 399, no. 6735, pp. 461–465, 1999

  32. [32]

    Statistical properties of the volatility of price fluctuations,

    Y . Liu, P. Gopikrishnan, P. Cizeau, M. Meyer, C.-K. Peng, and H. E. Stanley, “Statistical properties of the volatility of price fluctuations,” Physical Review E, vol. 60, no. 2, pp. 1390–1400, 1999

  33. [33]

    D. L. Turcotte,Fractals and chaos in geology and geophysics. Cam- bridge University Press, 1997

  34. [34]

    Watch your up-convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions,

    R. Durall, M. Keuper, and J. Keuper, “Watch your up-convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions,” inCVPR, 2020

  35. [35]

    Spatial frequency bias in convolu- tional generative adversarial networks,

    M. Khayatkhoei and A. Elgammal, “Spatial frequency bias in convolu- tional generative adversarial networks,” inAAAI, 2022

  36. [36]

    Orthonormal bases of compactly supported wavelets,

    I. Daubechies, “Orthonormal bases of compactly supported wavelets,” Communications on Pure and Applied Mathematics, vol. 41, no. 7, pp. 909–996, 1988

  37. [37]

    FiLM: visual reasoning with a general conditioning layer,

    E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “FiLM: visual reasoning with a general conditioning layer,” inAAAI, 2018

  38. [38]

    The use of fast Fourier transform for the estimation of power spectra,

    P. Welch, “The use of fast Fourier transform for the estimation of power spectra,”IEEE Transactions on Audio and Electroacoustics, vol. 15, no. 2, pp. 70–73, 1967

  39. [39]

    Generalized autoregressive conditional heteroskedastic- ity,

    T. Bollerslev, “Generalized autoregressive conditional heteroskedastic- ity,”Journal of Econometrics, vol. 31, no. 3, pp. 307–327, 1986

  40. [40]

    Energy distance,

    M. L. Rizzo and G. J. Székely, “Energy distance,”Wiley Interdisciplinary Reviews: Computational Statistics, vol. 8, no. 1, pp. 27–38, 2016

  41. [41]

    Self- similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level,

    W. Willinger, M. S. Taqqu, R. Sherman, and D. V . Wilson, “Self- similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level,”IEEE/ACM Transactions on Networking, vol. 5, no. 1, pp. 71–86, 1997

  42. [42]

    ’1/f’ noise in music and speech,

    R. F. V oss and J. Clarke, “’1/f’ noise in music and speech,”Nature, vol. 258, no. 5533, pp. 317–318, 1975

  43. [43]

    Bootstrap for empirical multifractal analysis,

    H. Wendt, P. Abry, and S. Jaffard, “Bootstrap for empirical multifractal analysis,”IEEE Signal Processing Magazine, vol. 24, no. 4, pp. 38–48, 2007