arxiv: 2604.19343 · v1 · submitted 2026-04-21 · 💻 cs.NE · cs.LG

Recognition: unknown

Scalable Memristive-Friendly Reservoir Computing for Time Series Classification

Co\c{s}ku Can Horuz , Andrea Ceni , Claudio Gallicchio , Sebastian Otte

Authors on Pith no claims yet

Pith reviewed 2026-05-10 01:03 UTC · model grok-4.3

classification 💻 cs.NE cs.LG

keywords reservoir computingmemristive devicesecho state networkstime series classificationneuromorphic computinggradient-free learningparallel architecturesskip connections

0 comments

The pith

MARS parallel reservoirs train in milliseconds and outperform LRU, S5, and Mamba on long sequence tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MARS, a simplified reservoir computing architecture that runs multiple memristive-friendly reservoirs in parallel and links them with subtractive skip connections. Only the final readout layer is trained, leaving the reservoirs fixed. This produces models that train up to 21 times faster than standard echo state networks and deliver higher accuracy than strong gradient-based sequence models on long time-series benchmarks. The design is intended to translate directly to memristive hardware for low-power, low-latency inference.

Core claim

MARS combines parallel memristive-friendly reservoirs with subtractive skip connections to enable scalable, deeper reservoir compositions that retain stable dynamics, deliver up to 21x training speedups over baseline echo state networks, and achieve higher accuracy than LRU, S5, and Mamba while completing full training in seconds or milliseconds.

What carries the argument

Memristive-friendly parallelized reservoirs (MARS) connected by subtractive skip connections, which support parallel computation and deeper stacking while training only the readout layer.

If this is right

Training time drops from minutes or hours to seconds or a few hundred milliseconds on long sequence tasks.
Compact gradient-free models exceed the accuracy of LRU, S5, and Mamba on multiple benchmarks.
The architecture supplies a direct route to energy-efficient, low-latency implementations on memristive and in-memory hardware.
Deeper reservoir compositions become practical without the usual stability or training-cost penalties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The parallel reservoir layout could map directly onto memristive crossbar arrays, potentially multiplying the efficiency gains beyond the simulated results.
Because no gradients are back-propagated through the reservoirs, the approach may tolerate device variability better than back-propagation-based models when realized in hardware.
The same parallel-plus-skip pattern could be tested on other data types such as images or graphs to check whether the speed and accuracy benefits generalize.

Load-bearing premise

The subtractive skip connections and parallel reservoir structure preserve stable dynamics and the reported accuracy when the models run on physical memristive hardware rather than in software simulation.

What would settle it

Deploying the MARS models on physical memristive devices and checking whether accuracy and training-time advantages over LRU, S5, and Mamba are retained on the same long-sequence classification benchmarks.

Figures

Figures reproduced from arXiv: 2604.19343 by Andrea Ceni, Claudio Gallicchio, Co\c{s}ku Can Horuz, Sebastian Otte.

**Figure 2.** Figure 2: A toy example illustrating the low-pass filtering effect of subtractive skip connections [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Runtime comparison of MARS and ESN across varying sequence lengths. The x-axis [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Memristive devices present a promising foundation for next-generation information processing by combining memory and computation within a single physical substrate. This unique characteristic enables efficient, fast, and adaptive computing, particularly well suited for deep learning applications. Among recent developments, the memristive-friendly echo state network (MF-ESN) has emerged as a promising approach that combines memristive-inspired dynamics with the training simplicity of reservoir computing, where only the readout layer is learned. Building on this framework, we propose memristive-friendly parallelized reservoirs (MARS), a simplified yet more effective architecture that enables efficient scalable parallel computation and deeper model composition through novel subtractive skip connections. This design yields two key advantages: substantial training speedups of up to 21x over the inherently lightweight echo state network baseline and significantly improved predictive performance. Moreover, MARS demonstrates what is possible with parallel memristive-friendly reservoir computing: on several long sequence benchmarks our compact gradient-free models substantially outperform strong gradient-based sequence models such as LRU, S5, and Mamba, while reducing full training time from minutes or hours down seconds or even only a few hundred milliseconds. Our work positions parallel memristive-friendly computing as a promising route towards scalable neuromorphic learning systems that combine high predictive capability with radically improved computational efficiency, while providing a clear pathway to energy-efficient, low-latency implementations on emerging memristive and in-memory hardware.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MARS adds parallel reservoirs and subtractive skips to memristive-friendly ESNs, showing solid speed and accuracy gains over some gradient models in software but no hardware validation.

read the letter

The key takeaway here is that MARS extends memristive-friendly echo state networks with parallel reservoirs and subtractive skips, delivering faster training and stronger results than some gradient-based sequence models in software tests on long time series data. The new architecture allows scaling up the reservoir approach without losing the gradient-free training advantage. That part is done well—the idea of subtractive connections for deeper composition makes sense as a way to add capacity while keeping things simple and fast. The reported speedups up to 21x and the ability to train in hundreds of milliseconds are concrete advantages for practical use. The soft spots are around the hardware claims. All the performance numbers come from software simulations of the dynamics. There are no results from actual memristive devices or even detailed models of device variability, conductance drift, or read noise. The subtractive skip connections could easily change the echo state property or introduce instability once you move off ideal math. Without those checks, the pathway to neuromorphic hardware remains unproven. This paper is aimed at researchers in reservoir computing and neuromorphic systems who want alternatives to backpropagation for sequence tasks. Someone working on edge AI or low-power hardware would get value from the architecture and the benchmark comparisons, even if they treat the memristive part as future work. The work shows clear thinking on how to adapt reservoir methods for hardware constraints. It deserves a serious referee to examine the experimental details, check reproducibility, and press on the hardware validation gap. I would recommend putting it through peer review. The software results are interesting enough to warrant feedback, and reviewers can help clarify what additional steps are needed for the hardware angle.

Referee Report

3 major / 1 minor

Summary. The paper proposes MARS, a memristive-friendly parallelized reservoir architecture extending MF-ESN via parallel reservoirs and novel subtractive skip connections. It claims up to 21x training speedups over ESN baselines, substantially better accuracy than gradient-based models (LRU, S5, Mamba) on long-sequence time-series benchmarks, and training times reduced to seconds or milliseconds, while providing a pathway to energy-efficient neuromorphic implementations on memristive hardware.

Significance. If the performance and stability claims hold under rigorous validation, the work could meaningfully advance hardware-friendly reservoir computing by showing how parallelization and subtractive skips enable both accuracy gains and extreme training efficiency in gradient-free models. The emphasis on memristive compatibility and low-latency training is a clear strength relative to backpropagation-heavy sequence models.

major comments (3)

[Abstract] Abstract: the central claim that compact gradient-free MARS models 'substantially outperform' LRU, S5, and Mamba on long-sequence benchmarks is unsupported by any reported datasets, baseline implementations, metrics, statistical tests, or ablation results, directly undermining the headline empirical contribution.
[Abstract] Abstract: no device-level experiments, variability modeling, SPICE simulations, or hardware-in-the-loop results are described to confirm that subtractive skip connections and parallel reservoir composition preserve echo-state stability or deliver the claimed accuracy under realistic memristor non-idealities (conductance drift, read noise, variability), which is load-bearing for the asserted 'clear pathway' to neuromorphic hardware.
[Architecture description] The description of MARS architecture: the paper provides no analysis or bounds showing that the proposed subtractive skip connections maintain the echo-state property or prevent instability when reservoirs are parallelized, leaving the claimed scalability and stability unverified even in simulation.

minor comments (1)

[Abstract] The abstract would be clearer if it specified the exact long-sequence benchmarks, number of runs, and quantitative metrics (e.g., accuracy deltas) supporting the outperformance and speedup claims.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive and detailed review. We address each major comment point-by-point below, indicating planned revisions where the manuscript can be strengthened without misrepresenting our contributions.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that compact gradient-free MARS models 'substantially outperform' LRU, S5, and Mamba on long-sequence benchmarks is unsupported by any reported datasets, baseline implementations, metrics, statistical tests, or ablation results, directly undermining the headline empirical contribution.

Authors: We appreciate the referee highlighting the need for clearer linkage between the abstract and supporting evidence. The abstract summarizes quantitative results already presented in Section 4 (Experiments), which include direct comparisons on long-sequence time-series benchmarks against LRU, S5, and Mamba, along with ablation studies on parallel reservoirs and skip connections. To address the concern, we will revise the abstract to explicitly name the datasets, report key accuracy and training-time metrics, and reference the statistical comparisons and ablations from the main text. This will make the empirical support immediately visible. revision: partial
Referee: [Abstract] Abstract: no device-level experiments, variability modeling, SPICE simulations, or hardware-in-the-loop results are described to confirm that subtractive skip connections and parallel reservoir composition preserve echo-state stability or deliver the claimed accuracy under realistic memristor non-idealities (conductance drift, read noise, variability), which is load-bearing for the asserted 'clear pathway' to neuromorphic hardware.

Authors: We agree that hardware-level validation would further strengthen the neuromorphic pathway claim. The current work centers on algorithmic design and floating-point simulation to establish performance and efficiency gains, building on the memristive-friendly properties of the MF-ESN baseline. We will add a dedicated discussion subsection qualitatively addressing potential impacts of non-idealities (e.g., how subtractive skips may help average read noise) and will moderate the abstract language from 'clear pathway' to 'promising direction toward energy-efficient neuromorphic implementations.' Full device experiments and SPICE modeling lie beyond the scope of this computational study. revision: partial
Referee: [Architecture description] The description of MARS architecture: the paper provides no analysis or bounds showing that the proposed subtractive skip connections maintain the echo-state property or prevent instability when reservoirs are parallelized, leaving the claimed scalability and stability unverified even in simulation.

Authors: This is a fair critique of the theoretical support. In the revised version we will insert a new subsection (under Architecture) that derives sufficient conditions for preserving the echo-state property under parallel composition and subtractive skip connections, including bounds on the effective spectral radius. These will be complemented by additional simulation sweeps confirming stability across reservoir sizes and skip strengths. We expect this addition to directly verify the scalability claims. revision: yes

standing simulated objections not resolved

Complete physical device-level experiments, variability modeling, and SPICE/hardware-in-the-loop validation under realistic memristor non-idealities, which require specialized laboratory facilities unavailable for the present study.

Circularity Check

0 steps flagged

No circularity: performance claims are empirical, architecture is independent design choice

full rationale

The paper presents MARS as an architectural extension of MF-ESN using parallel reservoirs and subtractive skip connections. All reported advantages (training speedups up to 21x, outperformance on long-sequence benchmarks versus LRU/S5/Mamba) are stated as outcomes of experimental evaluation rather than any mathematical derivation, prediction formula, or first-principles result. No equations appear that could reduce a claimed quantity to a fitted parameter or self-referential definition. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked to force the central claims. The derivation chain is therefore self-contained; the design choices and benchmark results stand as independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the effectiveness of the novel subtractive skip connections and parallelization in the MARS model, which are introduced in this work without prior independent evidence or formal derivation.

invented entities (1)

MARS architecture no independent evidence
purpose: scalable parallel memristive-friendly reservoir computing with subtractive skip connections
New proposed model architecture whose performance benefits are claimed in the abstract but lack external validation or hardware demonstration.

pith-pipeline@v0.9.0 · 5563 in / 1154 out tokens · 50524 ms · 2026-05-10T01:03:00.955366+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

phys-MCP: A Control Plane for Heterogeneous Physical Neural Networks
cs.DC 2026-05 conditional novelty 6.0

phys-MCP is a substrate-aware orchestration layer that exposes heterogeneous physical neural networks as invocable resources with standardized capability, lifecycle, telemetry, and digital-twin interfaces.

Reference graph

Works this paper leans on

36 extracted references · 8 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

A review of green artificial intelligence: Towards a more sustainable future.Neurocomputing, page 128096, 2024

Verónica Bolón-Canedo, Laura Morán-Fernández, Brais Cancela, and Amparo Alonso-Betanzos. A review of green artificial intelligence: Towards a more sustainable future.Neurocomputing, page 128096, 2024

2024
[2]

Opportunities for neuromorphic computing algorithms and applications.Nature Computational Science, 2(1):10–19, 2022

Catherine D Schuman, Shruti R Kulkarni, Maryam Parsa, J Parker Mitchell, Prasanna Date, and Bill Kay. Opportunities for neuromorphic computing algorithms and applications.Nature Computational Science, 2(1):10–19, 2022

2022
[3]

Springer, 2021

Kohei Nakajima and Ingo Fischer.Reservoir computing. Springer, 2021

2021
[4]

Recent advances in physical reservoir computing: A review.Neural Networks, 115:100–123, 2019

Gouhei Tanaka, Toshiyuki Yamane, Jean Benoit Héroux, Ryosho Nakane, Naoki Kanazawa, Seiji Takeda, Hidetoshi Numata, Daiju Nakano, and Akira Hirose. Recent advances in physical reservoir computing: A review.Neural Networks, 115:100–123, 2019

2019
[5]

H. Jaeger. The echo state approach to analysing and training recurrent neural networks. GMD Report 148, GMD - German National Research Institute for Computer Science, 2001

2001
[6]

A practical guide to applying echo state networks

Mantas Lukoševiˇcius. A practical guide to applying echo state networks. InNeural Networks: Tricks of the Trade: Second Edition, pages 659–686. Springer, 2012

2012
[7]

A memristive computational neural network model for time-series processing,

Veronica Pistolesi, Andrea Ceni, Gianluca Milano, Carlo Ricciardi, and Claudio Gallicchio. A memristive computational neural network model for time-series processing.APL Machine Learning, 3(1):016117, mar 2025. ISSN 2770-9019. doi: 10.1063/5.0255168. 9

work page doi:10.1063/5.0255168 2025
[8]

Enrique Miranda, Gianluca Milano, and Carlo Ricciardi. Modeling of short-term synaptic plasticity effects in zno nanowire-based memristors using a potentiation-depression rate balance equation.IEEE Transactions on Nanotechnology, 19:609–612, 2020

2020
[9]

In materia reservoir computing with a fully memristive architecture based on self-organizing nanowire networks.Nature materials, 21(2): 195–202, 2022

Gianluca Milano, Giacomo Pedretti, Kevin Montano, Saverio Ricci, Shahin Hashemkhani, Luca Boarino, Daniele Ielmini, and Carlo Ricciardi. In materia reservoir computing with a fully memristive architecture based on self-organizing nanowire networks.Nature materials, 21(2): 195–202, 2022

2022
[10]

Efficiently modeling long sequences with structured state spaces

Albert Gu, Karan Goel, and Christopher Ré. Efficiently modeling long sequences with structured state spaces. InICLR, 2022

2022
[11]

Orvieto, S

Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, and Soham De. Resurrecting recurrent neural networks for long sequences.arXiv preprint arXiv:2303.06349, 2023. URLhttps://arxiv.org/abs/2303.06349

work page arXiv 2023
[12]

Mamba: Linear-time sequence modeling with selective state spaces, 2024

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces, 2024

2024
[13]

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Soham De, Samuel L Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, et al. Griffin: Mixing gated linear recurrences with local attention for efficient language models.arXiv preprint arXiv:2402.19427, 2024

work page internal anchor Pith review arXiv 2024
[14]

xlstm: Ex- tended long short-term memory

Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. xlstm: Extended long short-term memory.arXiv preprint arXiv:2405.04517, 2024

work page arXiv 2024
[15]

Prefix sums and their applications

Guy E Blelloch. Prefix sums and their applications. InSythesis of parallel algorithms, pages 35—60. Morgan Kaufmann Publishers Inc., 1990

1990
[16]

On the difficulty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. InInternational conference on machine learning, pages 1310–1318. Pmlr, 2013

2013
[17]

A survey on reservoir computing and its inter- disciplinary applications beyond traditional machine learning.IEEE Access, 11:81033–81070, 2023

Heng Zhang and Danilo Vasconcellos Vargas. A survey on reservoir computing and its inter- disciplinary applications beyond traditional machine learning.IEEE Access, 11:81033–81070, 2023

2023
[18]

A model of memristive nanowire neuron for recurrent neural networks

Veronica Pistolesi, Andrea Ceni, Gianluca Milano, Carlo Ricciardi, and Claudio Gallicchio. A model of memristive nanowire neuron for recurrent neural networks. InProceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pages 479–484, Bruges, Belgium and online, 2025. ESANN 2025, 23–25 April 2025

2025
[19]

Connectome of memristive nanowire networks through graph theory.Neural Networks, 150:137–148, 2022

Gianluca Milano, Enrique Miranda, and Carlo Ricciardi. Connectome of memristive nanowire networks through graph theory.Neural Networks, 150:137–148, 2022

2022
[20]

Were RNNs all we needed?arXiv preprint arXiv:2410.01201, 2024

Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, and Hossein Hajimir- sadeghi. Were rnns all we needed?arXiv preprint arXiv:2410.01201, 2024

work page arXiv 2024
[21]

Combining recurrent, convolutional, and continuous-time models with linear state-space layers

Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré. Combining recurrent, convolutional, and continuous-time models with linear state-space layers. InNeurIPS. Curran Associates Inc., 2021

2021
[22]

Parallelizing linear recurrent neural nets over sequence length

Eric Martin and Chris Cundy. Parallelizing linear recurrent neural nets over sequence length. In International Conference on Learning Representations (ICLR), 2018

2018
[23]

Hierarchically gated recurrent neural network for sequence modeling

Zhen Qin, Songlin Yang, and Yiran Zhong. Hierarchically gated recurrent neural network for sequence modeling. InNeural Information Processing Systems (NeurIPS), 2023

2023
[24]

On the parameterization and initialization of diagonal state space models

Albert Gu, Ankit Gupta, Karan Goel, and Christopher Ré. On the parameterization and initialization of diagonal state space models. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY , USA, 2022. Curran Associates Inc. ISBN 9781713871088. 10

2022
[25]

Franz A. Heinsen. Efficient parallelization of a ubiquitous sequential computation, 2023

2023
[26]

Deep residual learning for im- age recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im- age recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

2016
[27]

The ucr time series archive.IEEE/CAA Journal of Automatica Sinica, 6(6):1293–1305, 2019

Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, and Eamonn Keogh. The ucr time series archive.IEEE/CAA Journal of Automatica Sinica, 6(6):1293–1305, 2019

2019
[28]

McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, and Terry Lyons

Benjamin Walker, Andrew D. McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, and Terry Lyons. Log neural controlled differential equations: The lie brackets make a difference. In Proceedings of the 41st International Conference on Machine Learning (ICML), volume 235 of Proceedings of Machine Learning Research, Vienna, Austria, 2024. PMLR

2024
[29]

Smith, Andrew Warrington, and Scott W

Jimmy T.H. Smith, Andrew Warrington, and Scott W. Linderman. Simplified state space layers for sequence modeling. InICLR, 2023

2023
[30]

A., Lines, J., Flynn, M., Large, J., Bostrom, A.,

Anthony Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn Keogh. The uea multivariate time series classification archive. arXiv preprint arXiv:1811.00075, 2018

work page arXiv 2018
[31]

Konstantin Rusch and Daniela Rus

T. Konstantin Rusch and Daniela Rus. Oscillatory state-space models. InInternational Conference on Learning Representations (ICLR), 2025

2025
[32]

Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library.Advances in neural information processing systems, 32, 2019

2019
[33]

JAX: composable transformations of Python+NumPy programs, 2018

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Yash Katariya, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman- Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy programs, 2018. URLhttp://github.com/jax-ml/jax

2018
[34]

Neural machine translation in linear time.arXiv:1610.10099,

Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aäron van den Oord, Alex Graves, and Ko- ray Kavukcuoglu. Neural machine translation in linear time.arXiv preprint arXiv:1610.10099, 2016

work page arXiv 2016
[35]

Butz, Danil Koryakin, Fabian Becker, Marcus Liwicki, and Andreas Zell

Sebastian Otte, Martin V . Butz, Danil Koryakin, Fabian Becker, Marcus Liwicki, and Andreas Zell. Optimizing recurrent reservoirs with neuro-evolution.Neurocomputing, 192:128–138,
[36]

doi: 10.1016/j.neucom.2016.01.088

ISSN 0925-2312. doi: 10.1016/j.neucom.2016.01.088. 11 A Further information about selected datasets Table 6: Overview of the datasets used in the classification experiments in subsection 4.1. Reported information includes: the number of sequences in training (Train Size) and test (Test Size), the max length of a sequence in the dataset (Length), the numbe...

work page doi:10.1016/j.neucom.2016.01.088 2016