pith. sign in

arxiv: 2606.24969 · v1 · pith:WK563OOXnew · submitted 2026-06-23 · 💻 cs.LG

Frequency Domain Reservoir Computing

Pith reviewed 2026-06-26 00:20 UTC · model grok-4.3

classification 💻 cs.LG
keywords reservoir computingecho state networksfrequency domainrecurrent neural networksefficient computationtime series forecastingsequential classification
0
0 comments X

The pith

Frequency Domain Reservoir Computing delivers O(N) complexity for echo state network updates while matching state-of-the-art performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces FRESCO, an architecture for echo state networks that performs all operations in the frequency domain. It uses a zero-padding embedding for inputs, a packed readout, and a frequency-domain non-linearity to reach linear complexity in the number of reservoir units. The approach maintains competitive accuracy on tasks involving memory, classification, and forecasting. A reader would care if they want recurrent models that can handle large reservoirs without the usual quadratic cost in computation.

Core claim

FRESCO is an ESN architecture operating entirely in the frequency domain while avoiding domain-shift overheads to achieve O(N) complexity for dense, non-linear recurrent updates. By employing a novel dimensional zero-padding input embedding, a packed FDh readout, and a natively applied frequency-domain non-linearity, FRESCO drastically reduces computational costs and energy consumption of training and inference. Furthermore, FRESCO matches the state-of-the-art predictive performance on memory benchmarks, sequential classification, and multivariate long-horizon forecasting.

What carries the argument

The combination of dimensional zero-padding input embedding, packed FDh readout, and natively applied frequency-domain non-linearity, which permits full frequency-domain operation with O(N) updates.

Load-bearing premise

The frequency-domain formulation with the zero-padding embedding, packed readout, and native non-linearity works without introducing domain-shift overheads or reducing the model's ability to capture complex dynamics.

What would settle it

Measuring the actual time complexity of the state updates in FRESCO and finding it exceeds linear scaling, or testing on a benchmark where it fails to match standard ESN performance at large sizes.

Figures

Figures reproduced from arXiv: 2606.24969 by Andrea Ceni, Claudio Gallicchio, David Kappel, Klaus Schertler, Xiomara Runge.

Figure 1
Figure 1. Figure 1: FRESCO (top row) accelerates ESNs by operating natively in the frequency domain while [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance scaling for Mackey-Glass (left) and NARMA10 (right) for reservoir sizes [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Accuracy, energy, and runtime (train￾ing and inference) for long-horizon forecasting, showing FRESCO’s superior efficiency. To further contextualize these results, [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Standard 1D zero-padding: The input vector [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Dimensional zero-padding: A 1D input vector [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of minimal computational times (lower is better) for different input embedding [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Memory layout of the standard RFFT of a 6x6 matrix in a row-major memory format. Real [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Memory layout of the proposed axis-reordered RFFT. See Proposition 5 for a definition of [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of minimal computational times for packed frequency-domain readout [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Relationship between the reservoir state representations. The spatial domain state [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 11
Figure 11. Figure 11: Crucially, the following mathematical formulation is constructive. Specifically, Step 4 of the proof (“Proving the Basis Spans W over R”) serves as the practical algorithmic recipe for creating these packed vectors. It details exactly how to isolate purely real self-conjugate elements, how to split complex conjugate pairs into real and imaginary components, and precisely how to scale them (e.g., by a fact… view at source ↗
read the original abstract

While the quadratic sequence-length bottleneck of transformers has fueled a resurgence in recurrent models, effectively capturing complex dynamics requires architectures that balance efficient training with highly expressive latent states. Echo State Networks (ESNs) offer a compelling approach by utilizing fixed recurrent weights to circumvent backpropagation through time, enabling a closed-form training solution. However, achieving the expressivity needed for complex tasks demands large reservoirs, exposing an $\mathcal{O}(N^2)$ state-update bottleneck that prevents ESNs from matching the scale of contemporary recurrent models. To address this limitation, we introduce Frequency Domain Reservoir Computing (FRESCO), an ESN architecture operating entirely in the frequency domain while avoiding domain-shift overheads to achieve $\mathcal{O}(N)$ complexity for dense, non-linear recurrent updates. By employing a novel dimensional zero-padding input embedding, a packed \FDh readout, and a natively applied frequency-domain non-linearity, FRESCO drastically reduces computational costs and energy consumption of training and inference. Furthermore, FRESCO matches the state-of-the-art predictive performance on memory benchmarks, sequential classification, and multivariate long-horizon forecasting, offering a scalable path forward for dense recurrent architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Frequency Domain Reservoir Computing (FRESCO), an Echo State Network variant that operates entirely in the frequency domain. It employs a dimensional zero-padding input embedding, a packed FDh readout, and a natively applied frequency-domain non-linearity to achieve O(N) complexity for dense non-linear recurrent updates while avoiding domain-shift overheads. The approach is claimed to match state-of-the-art predictive performance on memory benchmarks, sequential classification, and multivariate long-horizon forecasting tasks.

Significance. If the O(N) scaling and performance claims hold without hidden overheads or expressivity loss, FRESCO would offer a practical route to scaling reservoir computing to sizes competitive with modern recurrent models, with potential benefits for energy-efficient training and inference on sequential data.

major comments (2)
  1. [Abstract] Abstract: The central O(N) complexity claim for dense non-linear updates depends on the zero-padding embedding, packed FDh readout, and frequency-domain non-linearity; a load-bearing derivation or complexity table comparing to standard ESN O(N^2) updates is required to confirm no quadratic terms remain after these operations.
  2. [Methods] The weakest assumption—that the frequency-domain formulation preserves expressivity for complex dynamics without domain-shift costs—needs explicit support; if the non-linearity and embedding are defined in a later methods section, that section should include a proof sketch or ablation showing no degradation relative to time-domain ESNs on the reported benchmarks.
minor comments (1)
  1. [Abstract] Abstract: The acronym 'FDh' in 'packed FDh readout' is undefined; expand on first use or add a parenthetical reference to the defining equation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments. We address each major point below and will revise the manuscript to provide the requested supporting material.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central O(N) complexity claim for dense non-linear updates depends on the zero-padding embedding, packed FDh readout, and frequency-domain non-linearity; a load-bearing derivation or complexity table comparing to standard ESN O(N^2) updates is required to confirm no quadratic terms remain after these operations.

    Authors: We agree that an explicit breakdown is needed to substantiate the O(N) claim. In the revised manuscript we will add a dedicated complexity subsection in Methods containing (i) a step-by-step derivation showing that the zero-padding embedding, packed FDh readout, and native frequency-domain non-linearity each contribute only linear terms, and (ii) a table contrasting operation counts against the standard ESN O(N^2) dense update. This will confirm that no quadratic terms survive after the three listed operations. revision: yes

  2. Referee: [Methods] The weakest assumption—that the frequency-domain formulation preserves expressivity for complex dynamics without domain-shift costs—needs explicit support; if the non-linearity and embedding are defined in a later methods section, that section should include a proof sketch or ablation showing no degradation relative to time-domain ESNs on the reported benchmarks.

    Authors: The manuscript already reports that FRESCO matches state-of-the-art performance on the memory, classification, and forecasting benchmarks, which empirically indicates that expressivity is retained. To strengthen the presentation, we will augment the Methods section with (i) a short discussion of why the chosen frequency-domain operations avoid domain-shift overhead and (ii) an ablation that directly compares FRESCO against an equivalent time-domain ESN on a representative subset of the benchmarks, confirming no measurable degradation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces FRESCO as a new ESN architecture operating in the frequency domain, relying on explicitly novel components (dimensional zero-padding input embedding, packed FDh readout, natively applied frequency-domain non-linearity) to achieve claimed O(N) complexity. No load-bearing steps reduce by construction to fitted inputs, self-definitions, or self-citation chains; the central claims concern architectural design and empirical performance matching rather than predictions derived from the model's own parameters or prior self-referential results. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 3 invented entities

Abstract-only review yields no explicit free parameters, standard axioms, or independently evidenced invented entities beyond the three named architectural components.

invented entities (3)
  • dimensional zero-padding input embedding no independent evidence
    purpose: Enable frequency-domain input processing
    Novel component introduced to support the architecture.
  • packed FDh readout no independent evidence
    purpose: Provide readout in frequency domain
    Novel component introduced to support the architecture.
  • frequency-domain non-linearity no independent evidence
    purpose: Apply non-linearity without domain shift
    Novel component introduced to support the architecture.

pith-pipeline@v0.9.1-grok · 5735 in / 1141 out tokens · 28782 ms · 2026-06-26T00:20:06.423285+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 7 canonical work pages · 2 internal anchors

  1. [1]

    N. R. Chilkuri and C. Eliasmith. Parallelizing legendre memory unit training. InInternational conference on machine learning, pages 1898–1907. PMLR, 2021

  2. [2]

    Courty, V

    B. Courty, V . Schmidt, B. Feld, J. Lecourt, M. Léval, L. Blanche, A. Cruveiller, F. Zhao, A. Joshi, A. Bogroff, et al. mlco2/codecarbon: v2. 4.1.Zenodo, 2024

  3. [3]

    H. A. Dau, A. Bagnall, K. Kamgar, C.-C. M. Yeh, Y . Zhu, S. Gharghabi, C. A. Ratanamahatana, and E. Keogh. The ucr time series archive.IEEE/CAA Journal of Automatica Sinica, 6(6): 1293–1305, 2019. doi: 10.1109/JAS.2019.1911747

  4. [4]

    J. Demšar. Statistical comparisons of classifiers over multiple data sets.Journal of Machine learning research, 7(Jan):1–30, 2006

  5. [5]

    D. Y . Fu, T. Dao, K. K. Saab, A. W. Thomas, A. Rudra, and C. Ré. Hungry hungry hippos: Towards language modeling with state space models.arXiv preprint arXiv:2212.14052, 2022

  6. [6]

    Gallicchio, A

    C. Gallicchio, A. Micheli, and L. Pedrelli. Fast spectral radius initialization for recurrent neural networks. InINNS Big Data and Deep Learning conference, pages 380–390. Springer, 2019

  7. [7]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    A. Gu and T. Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

  8. [8]

    A. Gu, T. Dao, S. Ermon, A. Rudra, and C. Ré. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

  9. [9]

    A. Gu, K. Goel, and C. Ré. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021

  10. [10]

    Hochreiter and J

    S. Hochreiter and J. Schmidhuber. Long short-term memory.Neural computation, 9(8): 1735–1780, 1997

  11. [11]

    echo state

    H. Jaeger. The "echo state" approach to analysing and training recurrent neural networks-with an erratum note. Technical Report 148, German National Research Center for Information Technology GMD Bonn, Germany, 2001

  12. [12]

    Jaeger and H

    H. Jaeger and H. Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.science, 304(5667):78–80, 2004

  13. [13]

    Jaeger, M

    H. Jaeger, M. Lukoševiˇcius, D. Popovici, and U. Siewert. Optimization and applications of echo state networks with leaky-integrator neurons.Neural networks, 20(3):335–352, 2007

  14. [14]

    Lahoti, K

    A. Lahoti, K. Y . Li, B. Chen, C. Wang, A. Bick, J. Z. Kolter, T. Dao, and A. Gu. Mamba-3: Improved sequence modeling using state space principles.arXiv preprint arXiv:2603.15569, 2026

  15. [15]

    Lai, W.-C

    G. Lai, W.-C. Chang, Y . Yang, and H. Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018

  16. [16]

    Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long. itransformer: Inverted transformers are effective for time series forecasting. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=JePfAI8fah

  17. [17]

    Luko ˇseviˇcius and H

    M. Lukoševiˇcius and H. Jaeger. Reservoir computing approaches to recurrent neural network training.Computer Science Review, 3(3):127–149, 2009. ISSN 1574-0137. doi: https:// doi.org/10.1016/j.cosrev.2009.03.005. URL https://www.sciencedirect.com/science/ article/pii/S1574013709000173

  18. [18]

    Middlehurst, A

    M. Middlehurst, A. Ismail-Fawaz, A. Guillaume, C. Holder, D. Guijo-Rubio, G. Bulatova, L. Tsaprounis, L. Mentel, M. Walter, P. Schäfer, and A. Bagnall. aeon: a python toolkit for learning from time series.Journal of Machine Learning Research, 25(289):1–10, 2024. URL http://jmlr.org/papers/v25/23-1444.html. 10

  19. [19]

    Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam. A time series is worth 64 words: Long- term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=Jbdc0vTOcol

  20. [20]

    A. V . Oppenheim, R. W. Schafer, and J. R. Buck.Discrete-Time Signal Processing. Prentice Hall, 2nd edition, 1999

  21. [21]

    Orvieto, S

    A. Orvieto, S. L. Smith, A. Gu, A. Fernando, C. Gulcehre, R. Pascanu, and S. De. Resurrecting recurrent neural networks for long sequences. InInternational conference on machine learning, pages 26670–26698. PMLR, 2023

  22. [22]

    Singh and B

    P. Singh and B. Raman. Echo state networks as state-space models: A systems perspective. arXiv preprint arXiv:2509.04422, 2025

  23. [23]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

  24. [24]

    V oelker, I

    A. V oelker, I. Kaji´c, and C. Eliasmith. Legendre memory units: Continuous-time representation in recurrent neural networks.Advances in neural information processing systems, 32, 2019

  25. [25]

    H. Wu, J. Xu, J. Wang, and M. Long. Autoformer: Decomposition transformers with auto- correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

  26. [26]

    H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, and M. Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=ju_Uqw384Oq

  27. [27]

    M. Yan, C. Huang, P. Bienstman, P. Tino, W. Lin, and J. Sun. Emerging opportunities and challenges for the future of reservoir computing.Nature Communications, 15(1):2056, 2024

  28. [28]

    "" Computes the axis-reordered 2D RFFT for a real matrix X of shape (K1, K2). Returns a complex array of shape (K1//2 + 1, K2)

    T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 27...

  29. [29]

    Proving the Basis Spans W over R

    Although this leads to a global factor on the packed vector p, we have seen in Corollary 1 that this can be compensated for by a rescaled regularization parameterλ. Listing 2 provides an explicit implementation of the packing operation P{·}, which is practically used to accumulate sequential reservoir states into a dense feature matrix for training the re...

  30. [30]

    to conserve energy and satisfy the unitary condition. Note that while the proof explicitly constructs the unitary matrix U to establish theoretical equivalence, this large N×N matrix never needs to be instantiated in software; one simply applies the inexpensive, element-wise packing and scaling rules derived herein. Setup and definitions.Let x be a real-v...

  31. [31]

    We defineNbasis vectorsu m ∈C N piecewise: Form∈ I 0 :u m =e m (23) Form∈ I + :u m = 1√ 2(em +e S(m))(24) Form∈ I − :u m = i√ 2(eS(m) −e m)(25)

    Construction of the basis vectors.Let em ∈R N denote the standard basis column vector (a 1 at indexm, and0elsewhere). We defineNbasis vectorsu m ∈C N piecewise: Form∈ I 0 :u m =e m (23) Form∈ I + :u m = 1√ 2(em +e S(m))(24) Form∈ I − :u m = i√ 2(eS(m) −e m)(25)

  32. [32]

    Proving unitarity (U †U=I ).Let U= [ u0 . . . u N−1]. To show that U is unitary, we verify that its columns are orthonormal (u† aub =δ a,b). For unit norms (a=b): • Ifa∈ I 0:∥u a∥2 =e ⊤ a ea = 1. • Ifa∈ I +:∥u a∥2 = 1 2(e⊤ a ea +e ⊤ S(a)eS(a)) = 1. • Ifa∈ I −:∥u a∥2 = −i√ 2 i√ 2 (e⊤ S(a)eS(a) +e ⊤ a ea) = 1. For orthogonality (a̸=b ): If a and b do not be...

  33. [33]

    Freq.” denotes the sampling frequency, “Ch

    Proving the basis spans W over R.To prove U maps RN →W , we must show any X∈W can be uniquely constructed as X= PN−1 m=0 pmum using real coefficients pm ∈R . We define the coefficientsp m as follows: • Form∈ I 0:p m =X[m]. • Form∈ I +:p m = √ 2ℜ(X[m]). • Form∈ I −:p m = √ 2ℑ(X[S(m)]). Evaluating the sum for I0 yields X[m]e m. For a specific symmetric pair...