Frequency Domain Reservoir Computing

Andrea Ceni; Claudio Gallicchio; David Kappel; Klaus Schertler; Xiomara Runge

arxiv: 2606.24969 · v1 · pith:WK563OOXnew · submitted 2026-06-23 · 💻 cs.LG

Frequency Domain Reservoir Computing

Klaus Schertler , Xiomara Runge , Andrea Ceni , David Kappel , Claudio Gallicchio This is my paper

Pith reviewed 2026-06-26 00:20 UTC · model grok-4.3

classification 💻 cs.LG

keywords reservoir computingecho state networksfrequency domainrecurrent neural networksefficient computationtime series forecastingsequential classification

0 comments

The pith

Frequency Domain Reservoir Computing delivers O(N) complexity for echo state network updates while matching state-of-the-art performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces FRESCO, an architecture for echo state networks that performs all operations in the frequency domain. It uses a zero-padding embedding for inputs, a packed readout, and a frequency-domain non-linearity to reach linear complexity in the number of reservoir units. The approach maintains competitive accuracy on tasks involving memory, classification, and forecasting. A reader would care if they want recurrent models that can handle large reservoirs without the usual quadratic cost in computation.

Core claim

FRESCO is an ESN architecture operating entirely in the frequency domain while avoiding domain-shift overheads to achieve O(N) complexity for dense, non-linear recurrent updates. By employing a novel dimensional zero-padding input embedding, a packed FDh readout, and a natively applied frequency-domain non-linearity, FRESCO drastically reduces computational costs and energy consumption of training and inference. Furthermore, FRESCO matches the state-of-the-art predictive performance on memory benchmarks, sequential classification, and multivariate long-horizon forecasting.

What carries the argument

The combination of dimensional zero-padding input embedding, packed FDh readout, and natively applied frequency-domain non-linearity, which permits full frequency-domain operation with O(N) updates.

Load-bearing premise

The frequency-domain formulation with the zero-padding embedding, packed readout, and native non-linearity works without introducing domain-shift overheads or reducing the model's ability to capture complex dynamics.

What would settle it

Measuring the actual time complexity of the state updates in FRESCO and finding it exceeds linear scaling, or testing on a benchmark where it fails to match standard ESN performance at large sizes.

Figures

Figures reproduced from arXiv: 2606.24969 by Andrea Ceni, Claudio Gallicchio, David Kappel, Klaus Schertler, Xiomara Runge.

**Figure 2.** Figure 2: Performance scaling for Mackey-Glass (left) and NARMA10 (right) for reservoir sizes [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 4.** Figure 4: Accuracy, energy, and runtime (training and inference) for long-horizon forecasting, showing FRESCO’s superior efficiency. To further contextualize these results, [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Standard 1D zero-padding: The input vector [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Dimensional zero-padding: A 1D input vector [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of minimal computational times (lower is better) for different input embedding [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Memory layout of the standard RFFT of a 6x6 matrix in a row-major memory format. Real [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Memory layout of the proposed axis-reordered RFFT. See Proposition 5 for a definition of [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of minimal computational times for packed frequency-domain readout [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Relationship between the reservoir state representations. The spatial domain state [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

**Figure 11.** Figure 11: Crucially, the following mathematical formulation is constructive. Specifically, Step 4 of the proof (“Proving the Basis Spans W over R”) serves as the practical algorithmic recipe for creating these packed vectors. It details exactly how to isolate purely real self-conjugate elements, how to split complex conjugate pairs into real and imaginary components, and precisely how to scale them (e.g., by a fact… view at source ↗

read the original abstract

While the quadratic sequence-length bottleneck of transformers has fueled a resurgence in recurrent models, effectively capturing complex dynamics requires architectures that balance efficient training with highly expressive latent states. Echo State Networks (ESNs) offer a compelling approach by utilizing fixed recurrent weights to circumvent backpropagation through time, enabling a closed-form training solution. However, achieving the expressivity needed for complex tasks demands large reservoirs, exposing an $\mathcal{O}(N^2)$ state-update bottleneck that prevents ESNs from matching the scale of contemporary recurrent models. To address this limitation, we introduce Frequency Domain Reservoir Computing (FRESCO), an ESN architecture operating entirely in the frequency domain while avoiding domain-shift overheads to achieve $\mathcal{O}(N)$ complexity for dense, non-linear recurrent updates. By employing a novel dimensional zero-padding input embedding, a packed \FDh readout, and a natively applied frequency-domain non-linearity, FRESCO drastically reduces computational costs and energy consumption of training and inference. Furthermore, FRESCO matches the state-of-the-art predictive performance on memory benchmarks, sequential classification, and multivariate long-horizon forecasting, offering a scalable path forward for dense recurrent architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FRESCO claims an O(N) frequency-domain ESN via zero-padding embedding, packed readout, and native non-linearity that matches benchmark performance, but the no-overhead and expressivity claims need the derivations to confirm.

read the letter

The main takeaway is that this paper puts echo state networks into the frequency domain to reach linear complexity for dense non-linear updates. It uses a dimensional zero-padding input embedding, a packed frequency-domain readout, and applies the non-linearity directly in frequency space, avoiding the usual O(N^2) cost while reportedly keeping performance on memory tasks, sequential classification, and long-horizon forecasting.

The construction is new in the specific combination of those three pieces. The paper does a solid job naming the scaling limit in large reservoirs and giving a concrete route around it that also claims lower energy use.

The soft spot is the assumption that the frequency-domain steps introduce no domain-shift overhead or loss of the dynamics needed for complex sequences. The abstract states this holds, but without the actual equations, proofs, or experimental controls it is hard to judge whether the non-linearity preserves the echo-state property or whether the embedding adds hidden costs. The O(N) claim and the performance parity are the parts that require checking.

This is for people working on efficient recurrent models or alternatives to attention for long sequences. Readers who follow reservoir computing would get the most direct value.

The proposal is specific enough and the problem is real enough that it deserves a serious referee even if revisions on the math and experiments are likely.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Frequency Domain Reservoir Computing (FRESCO), an Echo State Network variant that operates entirely in the frequency domain. It employs a dimensional zero-padding input embedding, a packed FDh readout, and a natively applied frequency-domain non-linearity to achieve O(N) complexity for dense non-linear recurrent updates while avoiding domain-shift overheads. The approach is claimed to match state-of-the-art predictive performance on memory benchmarks, sequential classification, and multivariate long-horizon forecasting tasks.

Significance. If the O(N) scaling and performance claims hold without hidden overheads or expressivity loss, FRESCO would offer a practical route to scaling reservoir computing to sizes competitive with modern recurrent models, with potential benefits for energy-efficient training and inference on sequential data.

major comments (2)

[Abstract] Abstract: The central O(N) complexity claim for dense non-linear updates depends on the zero-padding embedding, packed FDh readout, and frequency-domain non-linearity; a load-bearing derivation or complexity table comparing to standard ESN O(N^2) updates is required to confirm no quadratic terms remain after these operations.
[Methods] The weakest assumption—that the frequency-domain formulation preserves expressivity for complex dynamics without domain-shift costs—needs explicit support; if the non-linearity and embedding are defined in a later methods section, that section should include a proof sketch or ablation showing no degradation relative to time-domain ESNs on the reported benchmarks.

minor comments (1)

[Abstract] Abstract: The acronym 'FDh' in 'packed FDh readout' is undefined; expand on first use or add a parenthetical reference to the defining equation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments. We address each major point below and will revise the manuscript to provide the requested supporting material.

read point-by-point responses

Referee: [Abstract] Abstract: The central O(N) complexity claim for dense non-linear updates depends on the zero-padding embedding, packed FDh readout, and frequency-domain non-linearity; a load-bearing derivation or complexity table comparing to standard ESN O(N^2) updates is required to confirm no quadratic terms remain after these operations.

Authors: We agree that an explicit breakdown is needed to substantiate the O(N) claim. In the revised manuscript we will add a dedicated complexity subsection in Methods containing (i) a step-by-step derivation showing that the zero-padding embedding, packed FDh readout, and native frequency-domain non-linearity each contribute only linear terms, and (ii) a table contrasting operation counts against the standard ESN O(N^2) dense update. This will confirm that no quadratic terms survive after the three listed operations. revision: yes
Referee: [Methods] The weakest assumption—that the frequency-domain formulation preserves expressivity for complex dynamics without domain-shift costs—needs explicit support; if the non-linearity and embedding are defined in a later methods section, that section should include a proof sketch or ablation showing no degradation relative to time-domain ESNs on the reported benchmarks.

Authors: The manuscript already reports that FRESCO matches state-of-the-art performance on the memory, classification, and forecasting benchmarks, which empirically indicates that expressivity is retained. To strengthen the presentation, we will augment the Methods section with (i) a short discussion of why the chosen frequency-domain operations avoid domain-shift overhead and (ii) an ablation that directly compares FRESCO against an equivalent time-domain ESN on a representative subset of the benchmarks, confirming no measurable degradation. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces FRESCO as a new ESN architecture operating in the frequency domain, relying on explicitly novel components (dimensional zero-padding input embedding, packed FDh readout, natively applied frequency-domain non-linearity) to achieve claimed O(N) complexity. No load-bearing steps reduce by construction to fitted inputs, self-definitions, or self-citation chains; the central claims concern architectural design and empirical performance matching rather than predictions derived from the model's own parameters or prior self-referential results. The derivation is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 3 invented entities

Abstract-only review yields no explicit free parameters, standard axioms, or independently evidenced invented entities beyond the three named architectural components.

invented entities (3)

dimensional zero-padding input embedding no independent evidence
purpose: Enable frequency-domain input processing
Novel component introduced to support the architecture.
packed FDh readout no independent evidence
purpose: Provide readout in frequency domain
Novel component introduced to support the architecture.
frequency-domain non-linearity no independent evidence
purpose: Apply non-linearity without domain shift
Novel component introduced to support the architecture.

pith-pipeline@v0.9.1-grok · 5735 in / 1141 out tokens · 28782 ms · 2026-06-26T00:20:06.423285+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 7 canonical work pages · 2 internal anchors

[1]

N. R. Chilkuri and C. Eliasmith. Parallelizing legendre memory unit training. InInternational conference on machine learning, pages 1898–1907. PMLR, 2021

1907
[2]

Courty, V

B. Courty, V . Schmidt, B. Feld, J. Lecourt, M. Léval, L. Blanche, A. Cruveiller, F. Zhao, A. Joshi, A. Bogroff, et al. mlco2/codecarbon: v2. 4.1.Zenodo, 2024

2024
[3]

H. A. Dau, A. Bagnall, K. Kamgar, C.-C. M. Yeh, Y . Zhu, S. Gharghabi, C. A. Ratanamahatana, and E. Keogh. The ucr time series archive.IEEE/CAA Journal of Automatica Sinica, 6(6): 1293–1305, 2019. doi: 10.1109/JAS.2019.1911747

work page doi:10.1109/jas.2019.1911747 2019
[4]

J. Demšar. Statistical comparisons of classifiers over multiple data sets.Journal of Machine learning research, 7(Jan):1–30, 2006

2006
[5]

D. Y . Fu, T. Dao, K. K. Saab, A. W. Thomas, A. Rudra, and C. Ré. Hungry hungry hippos: Towards language modeling with state space models.arXiv preprint arXiv:2212.14052, 2022

work page arXiv 2022
[6]

Gallicchio, A

C. Gallicchio, A. Micheli, and L. Pedrelli. Fast spectral radius initialization for recurrent neural networks. InINNS Big Data and Deep Learning conference, pages 380–390. Springer, 2019

2019
[7]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

A. Gu and T. Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

A. Gu, T. Dao, S. Ermon, A. Rudra, and C. Ré. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

2020
[9]

A. Gu, K. Goel, and C. Ré. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[10]

Hochreiter and J

S. Hochreiter and J. Schmidhuber. Long short-term memory.Neural computation, 9(8): 1735–1780, 1997

1997
[11]

echo state

H. Jaeger. The "echo state" approach to analysing and training recurrent neural networks-with an erratum note. Technical Report 148, German National Research Center for Information Technology GMD Bonn, Germany, 2001

2001
[12]

Jaeger and H

H. Jaeger and H. Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.science, 304(5667):78–80, 2004

2004
[13]

Jaeger, M

H. Jaeger, M. Lukoševiˇcius, D. Popovici, and U. Siewert. Optimization and applications of echo state networks with leaky-integrator neurons.Neural networks, 20(3):335–352, 2007

2007
[14]

Lahoti, K

A. Lahoti, K. Y . Li, B. Chen, C. Wang, A. Bick, J. Z. Kolter, T. Dao, and A. Gu. Mamba-3: Improved sequence modeling using state space principles.arXiv preprint arXiv:2603.15569, 2026

work page arXiv 2026
[15]

Lai, W.-C

G. Lai, W.-C. Chang, Y . Yang, and H. Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018

2018
[16]

Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long. itransformer: Inverted transformers are effective for time series forecasting. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=JePfAI8fah

2024
[17]

Luko ˇseviˇcius and H

M. Lukoševiˇcius and H. Jaeger. Reservoir computing approaches to recurrent neural network training.Computer Science Review, 3(3):127–149, 2009. ISSN 1574-0137. doi: https:// doi.org/10.1016/j.cosrev.2009.03.005. URL https://www.sciencedirect.com/science/ article/pii/S1574013709000173

work page doi:10.1016/j.cosrev.2009.03.005 2009
[18]

Middlehurst, A

M. Middlehurst, A. Ismail-Fawaz, A. Guillaume, C. Holder, D. Guijo-Rubio, G. Bulatova, L. Tsaprounis, L. Mentel, M. Walter, P. Schäfer, and A. Bagnall. aeon: a python toolkit for learning from time series.Journal of Machine Learning Research, 25(289):1–10, 2024. URL http://jmlr.org/papers/v25/23-1444.html. 10

2024
[19]

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam. A time series is worth 64 words: Long- term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=Jbdc0vTOcol

2023
[20]

A. V . Oppenheim, R. W. Schafer, and J. R. Buck.Discrete-Time Signal Processing. Prentice Hall, 2nd edition, 1999

1999
[21]

Orvieto, S

A. Orvieto, S. L. Smith, A. Gu, A. Fernando, C. Gulcehre, R. Pascanu, and S. De. Resurrecting recurrent neural networks for long sequences. InInternational conference on machine learning, pages 26670–26698. PMLR, 2023

2023
[22]

Singh and B

P. Singh and B. Raman. Echo state networks as state-space models: A systems perspective. arXiv preprint arXiv:2509.04422, 2025

work page arXiv 2025
[23]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017
[24]

V oelker, I

A. V oelker, I. Kaji´c, and C. Eliasmith. Legendre memory units: Continuous-time representation in recurrent neural networks.Advances in neural information processing systems, 32, 2019

2019
[25]

H. Wu, J. Xu, J. Wang, and M. Long. Autoformer: Decomposition transformers with auto- correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

2021
[26]

H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, and M. Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=ju_Uqw384Oq

2023
[27]

M. Yan, C. Huang, P. Bienstman, P. Tino, W. Lin, and J. Sun. Emerging opportunities and challenges for the future of reservoir computing.Nature Communications, 15(1):2056, 2024

2056
[28]

"" Computes the axis-reordered 2D RFFT for a real matrix X of shape (K1, K2). Returns a complex array of shape (K1//2 + 1, K2)

T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 27...

2022
[29]

Proving the Basis Spans W over R

Although this leads to a global factor on the packed vector p, we have seen in Corollary 1 that this can be compensated for by a rescaled regularization parameterλ. Listing 2 provides an explicit implementation of the packing operation P{·}, which is practically used to accumulate sequential reservoir states into a dense feature matrix for training the re...

2048
[30]

to conserve energy and satisfy the unitary condition. Note that while the proof explicitly constructs the unitary matrix U to establish theoretical equivalence, this large N×N matrix never needs to be instantiated in software; one simply applies the inexpensive, element-wise packing and scaling rules derived herein. Setup and definitions.Let x be a real-v...
[31]

We defineNbasis vectorsu m ∈C N piecewise: Form∈ I 0 :u m =e m (23) Form∈ I + :u m = 1√ 2(em +e S(m))(24) Form∈ I − :u m = i√ 2(eS(m) −e m)(25)

Construction of the basis vectors.Let em ∈R N denote the standard basis column vector (a 1 at indexm, and0elsewhere). We defineNbasis vectorsu m ∈C N piecewise: Form∈ I 0 :u m =e m (23) Form∈ I + :u m = 1√ 2(em +e S(m))(24) Form∈ I − :u m = i√ 2(eS(m) −e m)(25)
[32]

Proving unitarity (U †U=I ).Let U= [ u0 . . . u N−1]. To show that U is unitary, we verify that its columns are orthonormal (u† aub =δ a,b). For unit norms (a=b): • Ifa∈ I 0:∥u a∥2 =e ⊤ a ea = 1. • Ifa∈ I +:∥u a∥2 = 1 2(e⊤ a ea +e ⊤ S(a)eS(a)) = 1. • Ifa∈ I −:∥u a∥2 = −i√ 2 i√ 2 (e⊤ S(a)eS(a) +e ⊤ a ea) = 1. For orthogonality (a̸=b ): If a and b do not be...
[33]

Freq.” denotes the sampling frequency, “Ch

Proving the basis spans W over R.To prove U maps RN →W , we must show any X∈W can be uniquely constructed as X= PN−1 m=0 pmum using real coefficients pm ∈R . We define the coefficientsp m as follows: • Form∈ I 0:p m =X[m]. • Form∈ I +:p m = √ 2ℜ(X[m]). • Form∈ I −:p m = √ 2ℑ(X[S(m)]). Evaluating the sum for I0 yields X[m]e m. For a specific symmetric pair...

2048

[1] [1]

N. R. Chilkuri and C. Eliasmith. Parallelizing legendre memory unit training. InInternational conference on machine learning, pages 1898–1907. PMLR, 2021

1907

[2] [2]

Courty, V

B. Courty, V . Schmidt, B. Feld, J. Lecourt, M. Léval, L. Blanche, A. Cruveiller, F. Zhao, A. Joshi, A. Bogroff, et al. mlco2/codecarbon: v2. 4.1.Zenodo, 2024

2024

[3] [3]

H. A. Dau, A. Bagnall, K. Kamgar, C.-C. M. Yeh, Y . Zhu, S. Gharghabi, C. A. Ratanamahatana, and E. Keogh. The ucr time series archive.IEEE/CAA Journal of Automatica Sinica, 6(6): 1293–1305, 2019. doi: 10.1109/JAS.2019.1911747

work page doi:10.1109/jas.2019.1911747 2019

[4] [4]

J. Demšar. Statistical comparisons of classifiers over multiple data sets.Journal of Machine learning research, 7(Jan):1–30, 2006

2006

[5] [5]

D. Y . Fu, T. Dao, K. K. Saab, A. W. Thomas, A. Rudra, and C. Ré. Hungry hungry hippos: Towards language modeling with state space models.arXiv preprint arXiv:2212.14052, 2022

work page arXiv 2022

[6] [6]

Gallicchio, A

C. Gallicchio, A. Micheli, and L. Pedrelli. Fast spectral radius initialization for recurrent neural networks. InINNS Big Data and Deep Learning conference, pages 380–390. Springer, 2019

2019

[7] [7]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

A. Gu and T. Dao. Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[8] [8]

A. Gu, T. Dao, S. Ermon, A. Rudra, and C. Ré. Hippo: Recurrent memory with optimal polynomial projections.Advances in neural information processing systems, 33:1474–1487, 2020

2020

[9] [9]

A. Gu, K. Goel, and C. Ré. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[10] [10]

Hochreiter and J

S. Hochreiter and J. Schmidhuber. Long short-term memory.Neural computation, 9(8): 1735–1780, 1997

1997

[11] [11]

echo state

H. Jaeger. The "echo state" approach to analysing and training recurrent neural networks-with an erratum note. Technical Report 148, German National Research Center for Information Technology GMD Bonn, Germany, 2001

2001

[12] [12]

Jaeger and H

H. Jaeger and H. Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication.science, 304(5667):78–80, 2004

2004

[13] [13]

Jaeger, M

H. Jaeger, M. Lukoševiˇcius, D. Popovici, and U. Siewert. Optimization and applications of echo state networks with leaky-integrator neurons.Neural networks, 20(3):335–352, 2007

2007

[14] [14]

Lahoti, K

A. Lahoti, K. Y . Li, B. Chen, C. Wang, A. Bick, J. Z. Kolter, T. Dao, and A. Gu. Mamba-3: Improved sequence modeling using state space principles.arXiv preprint arXiv:2603.15569, 2026

work page arXiv 2026

[15] [15]

Lai, W.-C

G. Lai, W.-C. Chang, Y . Yang, and H. Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018

2018

[16] [16]

Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long. itransformer: Inverted transformers are effective for time series forecasting. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=JePfAI8fah

2024

[17] [17]

Luko ˇseviˇcius and H

M. Lukoševiˇcius and H. Jaeger. Reservoir computing approaches to recurrent neural network training.Computer Science Review, 3(3):127–149, 2009. ISSN 1574-0137. doi: https:// doi.org/10.1016/j.cosrev.2009.03.005. URL https://www.sciencedirect.com/science/ article/pii/S1574013709000173

work page doi:10.1016/j.cosrev.2009.03.005 2009

[18] [18]

Middlehurst, A

M. Middlehurst, A. Ismail-Fawaz, A. Guillaume, C. Holder, D. Guijo-Rubio, G. Bulatova, L. Tsaprounis, L. Mentel, M. Walter, P. Schäfer, and A. Bagnall. aeon: a python toolkit for learning from time series.Journal of Machine Learning Research, 25(289):1–10, 2024. URL http://jmlr.org/papers/v25/23-1444.html. 10

2024

[19] [19]

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam. A time series is worth 64 words: Long- term forecasting with transformers. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=Jbdc0vTOcol

2023

[20] [20]

A. V . Oppenheim, R. W. Schafer, and J. R. Buck.Discrete-Time Signal Processing. Prentice Hall, 2nd edition, 1999

1999

[21] [21]

Orvieto, S

A. Orvieto, S. L. Smith, A. Gu, A. Fernando, C. Gulcehre, R. Pascanu, and S. De. Resurrecting recurrent neural networks for long sequences. InInternational conference on machine learning, pages 26670–26698. PMLR, 2023

2023

[22] [22]

Singh and B

P. Singh and B. Raman. Echo state networks as state-space models: A systems perspective. arXiv preprint arXiv:2509.04422, 2025

work page arXiv 2025

[23] [23]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017

[24] [24]

V oelker, I

A. V oelker, I. Kaji´c, and C. Eliasmith. Legendre memory units: Continuous-time representation in recurrent neural networks.Advances in neural information processing systems, 32, 2019

2019

[25] [25]

H. Wu, J. Xu, J. Wang, and M. Long. Autoformer: Decomposition transformers with auto- correlation for long-term series forecasting.Advances in neural information processing systems, 34:22419–22430, 2021

2021

[26] [26]

H. Wu, T. Hu, Y . Liu, H. Zhou, J. Wang, and M. Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=ju_Uqw384Oq

2023

[27] [27]

M. Yan, C. Huang, P. Bienstman, P. Tino, W. Lin, and J. Sun. Emerging opportunities and challenges for the future of reservoir computing.Nature Communications, 15(1):2056, 2024

2056

[28] [28]

"" Computes the axis-reordered 2D RFFT for a real matrix X of shape (K1, K2). Returns a complex array of shape (K1//2 + 1, K2)

T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 27...

2022

[29] [29]

Proving the Basis Spans W over R

Although this leads to a global factor on the packed vector p, we have seen in Corollary 1 that this can be compensated for by a rescaled regularization parameterλ. Listing 2 provides an explicit implementation of the packing operation P{·}, which is practically used to accumulate sequential reservoir states into a dense feature matrix for training the re...

2048

[30] [30]

to conserve energy and satisfy the unitary condition. Note that while the proof explicitly constructs the unitary matrix U to establish theoretical equivalence, this large N×N matrix never needs to be instantiated in software; one simply applies the inexpensive, element-wise packing and scaling rules derived herein. Setup and definitions.Let x be a real-v...

[31] [31]

We defineNbasis vectorsu m ∈C N piecewise: Form∈ I 0 :u m =e m (23) Form∈ I + :u m = 1√ 2(em +e S(m))(24) Form∈ I − :u m = i√ 2(eS(m) −e m)(25)

Construction of the basis vectors.Let em ∈R N denote the standard basis column vector (a 1 at indexm, and0elsewhere). We defineNbasis vectorsu m ∈C N piecewise: Form∈ I 0 :u m =e m (23) Form∈ I + :u m = 1√ 2(em +e S(m))(24) Form∈ I − :u m = i√ 2(eS(m) −e m)(25)

[32] [32]

Proving unitarity (U †U=I ).Let U= [ u0 . . . u N−1]. To show that U is unitary, we verify that its columns are orthonormal (u† aub =δ a,b). For unit norms (a=b): • Ifa∈ I 0:∥u a∥2 =e ⊤ a ea = 1. • Ifa∈ I +:∥u a∥2 = 1 2(e⊤ a ea +e ⊤ S(a)eS(a)) = 1. • Ifa∈ I −:∥u a∥2 = −i√ 2 i√ 2 (e⊤ S(a)eS(a) +e ⊤ a ea) = 1. For orthogonality (a̸=b ): If a and b do not be...

[33] [33]

Freq.” denotes the sampling frequency, “Ch

Proving the basis spans W over R.To prove U maps RN →W , we must show any X∈W can be uniquely constructed as X= PN−1 m=0 pmum using real coefficients pm ∈R . We define the coefficientsp m as follows: • Form∈ I 0:p m =X[m]. • Form∈ I +:p m = √ 2ℜ(X[m]). • Form∈ I −:p m = √ 2ℑ(X[S(m)]). Evaluating the sum for I0 yields X[m]e m. For a specific symmetric pair...

2048