Convolutional Reservoir Computing for World Models

Hanten Chang; Katsuya Futagami

arxiv: 1907.08040 · v1 · pith:GDCBZB2Rnew · submitted 2019-07-18 · 💻 cs.LG · cs.NE· stat.ML

Convolutional Reservoir Computing for World Models

Hanten Chang , Katsuya Futagami This is my paper

Pith reviewed 2026-05-24 19:43 UTC · model grok-4.3

classification 💻 cs.LG cs.NEstat.ML

keywords reinforcement learningreservoir computingconvolutional neural networksevolution strategyfixed random weightsfeature extractionworld models

0 comments

The pith

A reinforcement learning model using random fixed-weight convolutional and reservoir layers achieves state-of-the-art scores without training those layers or storing data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the RCRC model for reinforcement learning that relies on random fixed-weight CNNs to extract visual features and reservoir computing for time-series features. These components do not require training, allowing the model to process data quickly and avoid storing large volumes of past playing data. Actions are decided using an evolution strategy. The approach reaches state-of-the-art performance on a popular RL task. Even simpler networks with only one dense layer and fixed random weights can achieve high scores.

Core claim

The RCRC model extracts visual and time-series features very fast because it uses random fixed-weight CNN and the reservoir computing model. It does not require the training data to be stored because it extracts features without training and decides action with evolution strategy. Furthermore, the model achieves state of the art score in the popular reinforcement learning task. Incredibly, random weight-fixed simple networks like only one dense layer network can also reach high score in the RL task.

What carries the argument

Convolutional reservoir computing (RCRC) with random fixed-weight CNN and reservoir layers, paired with evolution strategy for action selection.

If this is right

Feature extraction occurs without training the CNN or reservoir layers.
Past playing data does not need to be stored.
The model reaches state-of-the-art scores on standard RL benchmarks.
Simple fixed-weight networks consisting of only one dense layer perform well on these tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could reduce computational resources required for RL in visual environments by eliminating weight training in the feature extractors.
It suggests that many control tasks may not require learned features and that random projections can suffice in the tested settings.
Fixed-weight reservoir approaches might extend to other sequential decision problems if the environments share similar visual and temporal structure.

Load-bearing premise

Random fixed weights in the CNN and reservoir computing layers are sufficient to extract task-relevant visual and temporal features for the RL environments tested, without any training or adaptation of those weights.

What would settle it

A direct comparison on the same RL task showing that a version with trained CNN and reservoir weights significantly outperforms the fixed random version or that the fixed version falls below competitive scores.

Figures

Figures reproduced from arXiv: 1907.08040 by Hanten Chang, Katsuya Futagami.

**Figure 2.** Figure 2: RCRC overview to choose the action for CarRacing-v0: the first and second layers are collectively called the convolutional reservoir computing layer, and both layers’ model weights are sampled from Gaussian distribution and then fixed. transformation of these features. This implies that it only requires features that sufficiently express the environment state, rather than features trained to solve the task… view at source ↗

**Figure 3.** Figure 3: Example environment state image of CarRacing-v0 and three parameters in the enviroments. The score is added when the car passes through a tile laid on the course. In this process, T represents an update step of the weight matrix Wout, and n is the number of solution candidates Wout generated at each step. The worker is an agent that implements RCRC, and each worker extracts features, takes the action and p… view at source ↗

**Figure 4.** Figure 4: The best average score over 8 randomly created tracks among 16 workers at [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Recently, reinforcement learning models have achieved great success, completing complex tasks such as mastering Go and other games with higher scores than human players. Many of these models collect considerable data on the tasks and improve accuracy by extracting visual and time-series features using convolutional neural networks (CNNs) and recurrent neural networks, respectively. However, these networks have very high computational costs because they need to be trained by repeatedly using a large volume of past playing data. In this study, we propose a novel practical approach called reinforcement learning with convolutional reservoir computing (RCRC) model. The RCRC model has several desirable features: 1. it can extract visual and time-series features very fast because it uses random fixed-weight CNN and the reservoir computing model; 2. it does not require the training data to be stored because it extracts features without training and decides action with evolution strategy. Furthermore, the model achieves state of the art score in the popular reinforcement learning task. Incredibly, we find the random weight-fixed simple networks like only one dense layer network can also reach high score in the RL task.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RCRC claims fixed random CNN plus reservoir computing plus evolution strategy reaches SOTA on an RL task without training the feature layers or storing data, but the abstract supplies almost no numbers or controls to back that up.

read the letter

The core idea is to skip training the visual and temporal extractors entirely by freezing a random CNN and a reservoir at initialization, then evolve only the policy parameters. This targets the memory and compute burden of standard visual RL that repeatedly trains CNNs and RNNs on replay buffers. Reservoir computing fits the temporal side naturally since it is designed around fixed random dynamics that still produce usable state representations. The combination for this exact use case is the part presented as new, even though each piece has prior work behind it. The practical motivation around avoiding data storage and repeated training is stated clearly and is worth considering for resource-constrained settings. The soft spot is the empirical support. The abstract asserts state-of-the-art scores and even claims that a single random dense layer suffices, yet it names neither the task, the baselines, the scores, nor any error bars or run counts. Without those details the central claim cannot be checked. The stress-test point lands: random fixed projections rarely align with task semantics in visual RL, and the paper description gives no sign of ablations, feature visualizations, or direct comparisons to trained CNN baselines that would test whether the random weights actually extract relevant information. If the full experiments section has only single-run results on one unnamed game, that would leave the surprising result under-supported. This is for readers already working on evolutionary or reservoir-based RL who want to test cheap feature pipelines. A serious referee should see it so the community can verify the numbers and controls; the idea is straightforward enough that proper evidence would make it usable.

Referee Report

2 major / 1 minor

Summary. The paper proposes the RCRC model for reinforcement learning, which extracts visual features via random fixed-weight CNN layers and temporal features via reservoir computing, then uses an evolution strategy to select actions. It claims this avoids the need to store or repeatedly train on large volumes of past data, achieves state-of-the-art scores on popular RL tasks, and that even a single random dense layer can reach high performance.

Significance. If the empirical results hold with proper controls, the work would be significant for demonstrating that untrained random projections can suffice for competitive RL performance, substantially lowering computational cost and memory requirements compared to trained CNN/RNN feature extractors. The data-free aspect and the surprising efficacy of minimal random networks would be notable contributions to efficient world-model approaches in RL.

major comments (2)

[Experiments section (inferred from abstract claims)] The central empirical claim (SOTA performance via untrained random CNN and reservoir layers) is load-bearing yet unsupported by any analysis of why the particular random initialization succeeds; no feature visualizations, ablation on reservoir spectral radius, or comparison against trained CNN baselines appear to be present to address the weakest assumption that random fixed weights extract task-relevant features.
[Abstract and results claims] The assertion that 'only one dense layer network can also reach high score' requires quantitative evidence (e.g., scores, baselines, variance) to be load-bearing; without reported benchmark names, error bars, or statistical tests, the claim that random fixed networks match trained models cannot be evaluated.

minor comments (1)

[Abstract] The abstract would benefit from naming the specific RL environments, baselines, and quantitative scores to allow immediate assessment of the SOTA claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the presentation of the empirical results.

read point-by-point responses

Referee: [Experiments section (inferred from abstract claims)] The central empirical claim (SOTA performance via untrained random CNN and reservoir layers) is load-bearing yet unsupported by any analysis of why the particular random initialization succeeds; no feature visualizations, ablation on reservoir spectral radius, or comparison against trained CNN baselines appear to be present to address the weakest assumption that random fixed weights extract task-relevant features.

Authors: We agree that the manuscript would benefit from additional analyses to better support the assumption that random fixed weights extract relevant features. The current work emphasizes the practical advantages and observed performance, but we will add feature visualizations, an ablation study on the reservoir spectral radius, and comparisons against trained CNN baselines in the revised version. revision: yes
Referee: [Abstract and results claims] The assertion that 'only one dense layer network can also reach high score' requires quantitative evidence (e.g., scores, baselines, variance) to be load-bearing; without reported benchmark names, error bars, or statistical tests, the claim that random fixed networks match trained models cannot be evaluated.

Authors: The manuscript reports results on standard reinforcement learning benchmarks, but we acknowledge that more detailed quantitative support—including explicit benchmark names, scores with error bars, variance across runs, and statistical comparisons—would make the claim more readily evaluable. We will expand the results section with these elements in the revision. revision: yes

Circularity Check

0 steps flagged

Empirical proposal with no derivation chain or fitted predictions

full rationale

The paper presents an empirical model (random fixed-weight CNN + reservoir computing + evolution strategy) and reports experimental RL scores. No equations, derivations, or first-principles results appear; claims are not quantities defined in terms of fitted parameters, self-citations, or ansatzes that reduce to inputs by construction. The central assertion (untrained random weights suffice for SOTA) is an empirical hypothesis tested on environments, not a self-referential prediction. This matches the default case of a self-contained empirical result with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract; the central claim rests on the domain assumption that untrained random networks extract useful features and on the implicit modeling choice that evolution strategies suffice for policy search in the tested environments. No free parameters or invented entities are identifiable from the abstract.

axioms (1)

domain assumption Random fixed weights in CNN and reservoir layers extract task-relevant features without training
The model is built on this premise to avoid training the feature extractors.

pith-pipeline@v0.9.0 · 5717 in / 1244 out tokens · 24605 ms · 2026-05-24T19:43:18.588887+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 12 internal anchors

[1]

Mastering the game of go with deep neural networks and tree search

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016

work page 2016
[2]

Mastering the game of go without human knowledge

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017

work page 2017
[3]

Playing Atari with Deep Reinforcement Learning

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[4]

Distributed Prioritized Experience Replay

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, and David Silver. Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[5]

Recurrent experience replay in distributed reinforcement learning

Steven Kapturowski, Georg Ostrovski, Will Dabney, John Quan, and Remi Munos. Recurrent experience replay in distributed reinforcement learning. In International Conference on Learning Representations, 2019

work page 2019
[6]

Deep reinforcement learning: A brief survey

Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017

work page 2017
[7]

Deep recurrent q-learning for partially observable mdps

Matthew Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposium Series, 2015

work page 2015
[8]

Asynchronous methods for deep reinforcement learning

V olodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937, 2016

work page 1928
[9]

Prioritized Experience Replay

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[10]

World Models

David Ha and Jürgen Schmidhuber. World models. arXiv preprint arXiv:1803.10122, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

Recurrent world models facilitate policy evolution

David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 2450–2462. Curran Associates, Inc., 2018

work page 2018
[12]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[13]

Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[14]

Generating Sequences With Recurrent Neural Networks

Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[15]

Recurrent neural network tutorial for artists

David Ha. Recurrent neural network tutorial for artists. blog.otoro.net, 2017

work page 2017
[16]

Completely derandomized self-adaptation in evolution strategies

Nikolaus Hansen and Andreas Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9(2):159–195, 2001

work page 2001
[17]

The CMA Evolution Strategy: A Tutorial

Nikolaus Hansen. The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

An experimental uniﬁcation of reservoir computing methods

David Verstraeten, Benjamin Schrauwen, Michiel d’Haene, and Dirk Stroobandt. An experimental uniﬁcation of reservoir computing methods. Neural networks, 20(3):391–403, 2007. 9 A PREPRINT - JULY 19, 2019

work page 2007
[19]

Reservoir computing approaches to recurrent neural network training

Mantas Lukoševiˇcius and Herbert Jaeger. Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3):127–149, 2009

work page 2009
[20]

echo state

Herbert Jaeger. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 148(34):13, 2001

work page 2001
[21]

Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication

Herbert Jaeger and Harald Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. science, 304(5667):78–80, 2004

work page 2004
[22]

A practical guide to applying echo state networks

Mantas Lukoševiˇcius. A practical guide to applying echo state networks. In Neural networks: Tricks of the trade, pages 659–686. Springer, 2012

work page 2012
[23]

Time series classiﬁcation using time warping invariant echo state networks

Pattreeya Tanisaro and Gunther Heidemann. Time series classiﬁcation using time warping invariant echo state networks. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 831–836. IEEE, 2016

work page 2016
[24]

Functional echo state network for time series classiﬁcation

Qianli Ma, Lifeng Shen, Weibiao Chen, Jiabin Wang, Jia Wei, and Zhiwen Yu. Functional echo state network for time series classiﬁcation. Information Sciences, 373:1–20, 2016

work page 2016
[25]

Reinforcement learning with echo state networks

István Szita, Viktor Gyenes, and András L˝orincz. Reinforcement learning with echo state networks. In Interna- tional Conference on Artiﬁcial Neural Networks, pages 830–839. Springer, 2006

work page 2006
[26]

Reservoir computing with untrained convolutional neural networks for image recognition

Zhiqiang Tong and Gouhei Tanaka. Reservoir computing with untrained convolutional neural networks for image recognition. In 2018 24th International Conference on Pattern Recognition (ICPR), pages 1289–1294. IEEE, 2018

work page 2018
[27]

Reservoir computing beyond memory-nonlinearity trade-off

Masanobu Inubushi and Kazuyuki Yoshimura. Reservoir computing beyond memory-nonlinearity trade-off. Scientiﬁc reports, 7(1):10199, 2017

work page 2017
[28]

Effect of shapes of activation functions on predictability in the echo state network

Hanten Chang, Shinji Nakaoka, and Hiroyasu Ando. Effect of shapes of activation functions on predictability in the echo state network. arXiv preprint arXiv:1905.09419, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[29]

A Comparative Study of Reservoir Computing for Temporal Signal Processing

Alireza Goudarzi, Peter Banda, Matthew R. Lakin, Christof Teuscher, and Darko Stefanovic. A comparative study of reservoir computing for temporal signal processing. arXiv preprint arXiv:1401.2224, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[30]

Learning Latent Dynamics for Planning from Pixels

Danijar Hafner, Timothy P. Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. arXiv preprint arXiv:1811.04551, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[31]

Carracing-v0

Oleg Klimov. Carracing-v0. https://gym.openai.com/envs/CarRacing-v0/, 2016

work page 2016
[32]

world models

Corentin Tallec, Léonard Blier, and Diviyan Kalainathan. Reproducing "world models". is training the recurrent network really needed ? https://ctallec.github.io/world-models/, 2018

work page 2018
[33]

Sebastian Risi and Kenneth O. Stanley. Deep neuroevolution of recurrent and discrete world models. arXiv preprint arXiv:1906.08857, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[34]

Long short-term memory

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

work page 1997
[35]

The mnist database of handwritten digits

Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998

work page 1998
[36]

Openai gym, 2016

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016

work page 2016
[37]

Luc. Prieur. Deep-Q learning for Box2d racecar RL problem. https://goo.gl/VpDqSw, 2017

work page 2017
[38]

Solving ope- nai’s car racing environment with deep reinforcement learning and dropout

Patrik Gerber, Jiajing Guan, Elvis Nunez, Kaman Phamdo, Tonmoy Monsoor, and Nicholas Malaya. Solving ope- nai’s car racing environment with deep reinforcement learning and dropout. https://github.com/AMD-RIPS/ RL-2018/blob/master/documents/nips/nips_2018.pdf, 2018

work page 2018
[39]

Reinforcement Car Racing with A3C

Se Won Jang, Jesik Min, and Chan Lee. Reinforcement Car Racing with A3C. https://www.scribd.com/ document/358019044/, 2017

work page arXiv 2017
[40]

Mean-ﬁeld theory of echo state networks

Marc Massar and Serge Massar. Mean-ﬁeld theory of echo state networks. Physical Review E, 87(4):042809, 2013. 10

work page 2013

[1] [1]

Mastering the game of go with deep neural networks and tree search

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016

work page 2016

[2] [2]

Mastering the game of go without human knowledge

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017

work page 2017

[3] [3]

Playing Atari with Deep Reinforcement Learning

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[4] [4]

Distributed Prioritized Experience Replay

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, and David Silver. Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[5] [5]

Recurrent experience replay in distributed reinforcement learning

Steven Kapturowski, Georg Ostrovski, Will Dabney, John Quan, and Remi Munos. Recurrent experience replay in distributed reinforcement learning. In International Conference on Learning Representations, 2019

work page 2019

[6] [6]

Deep reinforcement learning: A brief survey

Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017

work page 2017

[7] [7]

Deep recurrent q-learning for partially observable mdps

Matthew Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposium Series, 2015

work page 2015

[8] [8]

Asynchronous methods for deep reinforcement learning

V olodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937, 2016

work page 1928

[9] [9]

Prioritized Experience Replay

Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[10] [10]

World Models

David Ha and Jürgen Schmidhuber. World models. arXiv preprint arXiv:1803.10122, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[11] [11]

Recurrent world models facilitate policy evolution

David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 2450–2462. Curran Associates, Inc., 2018

work page 2018

[12] [12]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[13] [13]

Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[14] [14]

Generating Sequences With Recurrent Neural Networks

Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[15] [15]

Recurrent neural network tutorial for artists

David Ha. Recurrent neural network tutorial for artists. blog.otoro.net, 2017

work page 2017

[16] [16]

Completely derandomized self-adaptation in evolution strategies

Nikolaus Hansen and Andreas Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9(2):159–195, 2001

work page 2001

[17] [17]

The CMA Evolution Strategy: A Tutorial

Nikolaus Hansen. The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

An experimental uniﬁcation of reservoir computing methods

David Verstraeten, Benjamin Schrauwen, Michiel d’Haene, and Dirk Stroobandt. An experimental uniﬁcation of reservoir computing methods. Neural networks, 20(3):391–403, 2007. 9 A PREPRINT - JULY 19, 2019

work page 2007

[19] [19]

Reservoir computing approaches to recurrent neural network training

Mantas Lukoševiˇcius and Herbert Jaeger. Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3):127–149, 2009

work page 2009

[20] [20]

echo state

Herbert Jaeger. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 148(34):13, 2001

work page 2001

[21] [21]

Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication

Herbert Jaeger and Harald Haas. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. science, 304(5667):78–80, 2004

work page 2004

[22] [22]

A practical guide to applying echo state networks

Mantas Lukoševiˇcius. A practical guide to applying echo state networks. In Neural networks: Tricks of the trade, pages 659–686. Springer, 2012

work page 2012

[23] [23]

Time series classiﬁcation using time warping invariant echo state networks

Pattreeya Tanisaro and Gunther Heidemann. Time series classiﬁcation using time warping invariant echo state networks. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 831–836. IEEE, 2016

work page 2016

[24] [24]

Functional echo state network for time series classiﬁcation

Qianli Ma, Lifeng Shen, Weibiao Chen, Jiabin Wang, Jia Wei, and Zhiwen Yu. Functional echo state network for time series classiﬁcation. Information Sciences, 373:1–20, 2016

work page 2016

[25] [25]

Reinforcement learning with echo state networks

István Szita, Viktor Gyenes, and András L˝orincz. Reinforcement learning with echo state networks. In Interna- tional Conference on Artiﬁcial Neural Networks, pages 830–839. Springer, 2006

work page 2006

[26] [26]

Reservoir computing with untrained convolutional neural networks for image recognition

Zhiqiang Tong and Gouhei Tanaka. Reservoir computing with untrained convolutional neural networks for image recognition. In 2018 24th International Conference on Pattern Recognition (ICPR), pages 1289–1294. IEEE, 2018

work page 2018

[27] [27]

Reservoir computing beyond memory-nonlinearity trade-off

Masanobu Inubushi and Kazuyuki Yoshimura. Reservoir computing beyond memory-nonlinearity trade-off. Scientiﬁc reports, 7(1):10199, 2017

work page 2017

[28] [28]

Effect of shapes of activation functions on predictability in the echo state network

Hanten Chang, Shinji Nakaoka, and Hiroyasu Ando. Effect of shapes of activation functions on predictability in the echo state network. arXiv preprint arXiv:1905.09419, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[29] [29]

A Comparative Study of Reservoir Computing for Temporal Signal Processing

Alireza Goudarzi, Peter Banda, Matthew R. Lakin, Christof Teuscher, and Darko Stefanovic. A comparative study of reservoir computing for temporal signal processing. arXiv preprint arXiv:1401.2224, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[30] [30]

Learning Latent Dynamics for Planning from Pixels

Danijar Hafner, Timothy P. Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. Learning latent dynamics for planning from pixels. arXiv preprint arXiv:1811.04551, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[31] [31]

Carracing-v0

Oleg Klimov. Carracing-v0. https://gym.openai.com/envs/CarRacing-v0/, 2016

work page 2016

[32] [32]

world models

Corentin Tallec, Léonard Blier, and Diviyan Kalainathan. Reproducing "world models". is training the recurrent network really needed ? https://ctallec.github.io/world-models/, 2018

work page 2018

[33] [33]

Sebastian Risi and Kenneth O. Stanley. Deep neuroevolution of recurrent and discrete world models. arXiv preprint arXiv:1906.08857, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[34] [34]

Long short-term memory

Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997

work page 1997

[35] [35]

The mnist database of handwritten digits

Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998

work page 1998

[36] [36]

Openai gym, 2016

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016

work page 2016

[37] [37]

Luc. Prieur. Deep-Q learning for Box2d racecar RL problem. https://goo.gl/VpDqSw, 2017

work page 2017

[38] [38]

Solving ope- nai’s car racing environment with deep reinforcement learning and dropout

Patrik Gerber, Jiajing Guan, Elvis Nunez, Kaman Phamdo, Tonmoy Monsoor, and Nicholas Malaya. Solving ope- nai’s car racing environment with deep reinforcement learning and dropout. https://github.com/AMD-RIPS/ RL-2018/blob/master/documents/nips/nips_2018.pdf, 2018

work page 2018

[39] [39]

Reinforcement Car Racing with A3C

Se Won Jang, Jesik Min, and Chan Lee. Reinforcement Car Racing with A3C. https://www.scribd.com/ document/358019044/, 2017

work page arXiv 2017

[40] [40]

Mean-ﬁeld theory of echo state networks

Marc Massar and Serge Massar. Mean-ﬁeld theory of echo state networks. Physical Review E, 87(4):042809, 2013. 10

work page 2013