Plasticity Loss in Deep Reinforcement Learning: A Survey

Christoph Luther; Claudia Plant; Lukas Miklautz; Manus McAuliffe; Sebastian Tschiatschek; Timo Klein

arxiv: 2411.04832 · v3 · submitted 2024-11-07 · 💻 cs.AI · cs.LG

Plasticity Loss in Deep Reinforcement Learning: A Survey

Timo Klein , Christoph Luther , Manus McAuliffe , Lukas Miklautz , Claudia Plant , Sebastian Tschiatschek This is my paper

Pith reviewed 2026-05-23 17:59 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords plasticity lossdeep reinforcement learningregularization techniquesevaluation practicestaxonomymitigation strategiesscaling failuresoverestimation bias

0 comments

The pith

General regularization techniques often outperform domain-specific interventions for plasticity loss in deep reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified definition of plasticity loss, examines its drivers and pathologies, and organizes over 50 mitigation strategies into the first comprehensive taxonomy. A sympathetic reader would care because loss of plasticity leads to performance plateaus, scaling failures, overestimation bias, and insufficient exploration in deep RL agents. The analysis identifies gaps in current evaluation practices across the field. It concludes that broad regularization methods tend to work better than tailored, domain-specific solutions.

Core claim

By proposing a unified definition of plasticity and examining its drivers and pathologies, the authors organize over 50 mitigation strategies into the first comprehensive taxonomy. Their analysis reveals gaps in evaluation practices and shows that general regularization techniques often outperform domain-specific interventions. Future research should focus on understanding the underlying mechanisms of plasticity loss.

What carries the argument

The taxonomy of over 50 mitigation strategies for plasticity loss, which groups approaches to enable systematic comparison of effectiveness.

If this is right

Evaluation practices across plasticity research require standardization to reliably compare interventions.
General regularization techniques merit priority as baselines in new mitigation studies.
Addressing plasticity loss can reduce related problems such as overestimation bias and poor exploration.
Mechanistic studies of plasticity loss will support more reliable scaling of deep RL systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Plasticity loss may arise primarily from generic neural network training dynamics rather than reinforcement learning specifics alone.
Widespread adoption of the taxonomy could reduce redundant experiments by providing shared categories.
Applying the taxonomy to emerging large-scale RL benchmarks could test whether the performance pattern holds.

Load-bearing premise

The reviewed set of over 50 papers is representative of the field without systematic selection bias, and the proposed taxonomy accurately captures distinct categories without overlaps or omissions that would change the conclusion on regularization performance.

What would settle it

A controlled study applying both general regularization and multiple domain-specific interventions to identical environments and agents, then measuring whether the specialized methods produce higher final performance and sustained plasticity.

Figures

Figures reproduced from arXiv: 2411.04832 by Christoph Luther, Claudia Plant, Lukas Miklautz, Manus McAuliffe, Sebastian Tschiatschek, Timo Klein.

**Figure 1.** Figure 1: Gradient covariance structure at different time steps on Atari. (a) For the Atari game Freeway, the gradient covariance matrix displays a pronounced structure at one million steps. (b) Later in the training, the structure becomes less noticeable at 3.5M steps. (c) and (d) show the gradient covariance matrices for the game SpaceInvaders at the same time steps. Here, the structure is less pronounced. We can … view at source ↗

**Figure 2.** Figure 2: Possible connections between factors and causes of plasticity loss in value-based RL. Large-mean regression targets combined with non-stationarity of deep RL training cause large and unstable gradients, leading to an increase in parameter norms. Large parameter norms are known to increase loss sharpness and cause other pathologies, together leading to reduced agent performance. potential causes of plastici… view at source ↗

**Figure 3.** Figure 3: Visualization of categorical losses for deep RL. The two-hot representation [93] proportionally assigns probability mass to the two neighboring bins of a scalar target y. HL-Gauss [49] constructs a Gaussian with fixed standard deviation and integrates over each bin to obtain the corresponding probability mass. Distribution RL algorithms such as C51 [8] model the full return distribution. Detailed descrip… view at source ↗

read the original abstract

Plasticity refers to a network's ability to adapt to changing data distributions, which is crucial for the successful training of deep reinforcement learning agents. Loss of plasticity causes performance plateaus and contributes to scaling failures, overestimation bias, and insufficient exploration. To deepen the understanding of plasticity loss, we propose a unified definition, examine its drivers and pathologies, and organize over 50 mitigation strategies into the first comprehensive taxonomy of the field. Our analysis shows gaps in current evaluation practices and reveals that general regularization techniques often outperform domain-specific interventions. Future research should prioritize understanding the mechanisms underlying plasticity loss.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A survey that gives a unified definition and first taxonomy for plasticity loss in RL, but its claim on regularization outperforming rests on unshown selection details for the 50+ papers.

read the letter

The main takeaway is that this is a survey paper offering a unified definition of plasticity loss and the first comprehensive taxonomy for over 50 mitigation strategies in deep RL. That organization is the actual new piece, along with linking the issue to scaling failures, overestimation, and exploration problems. The paper does a reasonable job laying out the drivers like shifting data distributions and noting that current evaluations are inconsistent across methods. If the taxonomy holds without too much overlap, it could help people pick approaches more systematically instead of chasing narrow fixes. The observation that general regularization techniques tend to work better than domain-specific ones is the kind of practical pointer that might cut down on redundant experiments, assuming the comparison is fair. The soft spot is the missing information on how the literature was gathered. The abstract mentions the analysis but gives no search protocol, inclusion rules, or checks against bias, so the performance conclusion could shift if the sample leans toward papers that already favor regularization. The taxonomy would also need explicit validation to show categories are distinct and exhaustive. This is aimed at RL researchers working on stability and scaling in larger agents. Someone already in the subfield who needs a map of existing fixes would get value from the structure and the call for mechanism-focused work, though they'd still verify the originals. It deserves peer review because a solid survey can reduce overlap in the area, and referees could usefully press for the selection details and taxonomy examples without needing new experiments.

Referee Report

2 major / 0 minor

Summary. This survey proposes a unified definition of plasticity loss in deep RL, examines its drivers and pathologies, organizes over 50 mitigation strategies into a taxonomy, identifies gaps in current evaluation practices, and concludes that general regularization techniques often outperform domain-specific interventions, with a call for future work on underlying mechanisms.

Significance. If the taxonomy is exhaustive and the performance comparison is based on a representative, unbiased sample with transparent classification, the survey could usefully consolidate the literature and direct attention to evaluation standards; the absence of such methodological transparency currently limits its utility as a reference.

major comments (2)

[Abstract] Abstract: the claim that 'general regularization techniques often outperform domain-specific interventions' is presented without any description of the paper-selection protocol, search strategy, inclusion/exclusion criteria, or quantitative aggregation method used across the >50 papers; this directly undermines the reliability of the comparative conclusion.
[Abstract] Abstract / taxonomy description: no information is given on how the taxonomy was constructed (e.g., inter-rater agreement, handling of overlapping strategies, or verification that categories are exhaustive), so it is impossible to determine whether the reported performance advantage is an artifact of classification choices rather than an empirical pattern.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater methodological transparency. We agree these details strengthen a survey and will revise the manuscript to address both points.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'general regularization techniques often outperform domain-specific interventions' is presented without any description of the paper-selection protocol, search strategy, inclusion/exclusion criteria, or quantitative aggregation method used across the >50 papers; this directly undermines the reliability of the comparative conclusion.

Authors: We agree the abstract (and current manuscript) does not describe the literature review protocol. Our review covered papers from major conferences (NeurIPS, ICML, ICLR, ICRA, CoRL) and journals through mid-2024, identified via keyword searches and citation chaining, but without a pre-registered systematic protocol or formal quantitative aggregation. We will add an explicit 'Survey Methodology' subsection detailing the search strategy, inclusion criteria, and how the performance comparison was derived, while qualifying the claim to reflect the narrative synthesis rather than a meta-analysis. revision: yes
Referee: [Abstract] Abstract / taxonomy description: no information is given on how the taxonomy was constructed (e.g., inter-rater agreement, handling of overlapping strategies, or verification that categories are exhaustive), so it is impossible to determine whether the reported performance advantage is an artifact of classification choices rather than an empirical pattern.

Authors: The taxonomy was constructed iteratively by grouping strategies according to their primary intervention mechanism (e.g., regularization, architectural changes), with overlaps noted and resolved by primary intent. No formal inter-rater agreement was computed. We will expand the taxonomy section to describe the construction process, overlap handling, and category rationale, and will add a limitations paragraph acknowledging that exhaustiveness cannot be formally verified. The performance observation will be presented with appropriate caveats. revision: yes

Circularity Check

0 steps flagged

No circularity: survey compiles external literature without self-referential reductions

full rationale

This is a survey paper that proposes a definition and taxonomy based on review of over 50 external papers, analyzes gaps in evaluation, and compares regularization techniques. No equations, fitted parameters, or derivations are present that could reduce to self-definition or self-citation chains. The central claims rest on the reviewed literature rather than any internal construction that loops back to the paper's own inputs. Per the rules, absence of quoted reductions matching the enumerated patterns yields a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper, the contribution is organizational synthesis of existing research; no new free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5633 in / 1090 out tokens · 25901 ms · 2026-05-23T17:59:52.287403+00:00 · methodology

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

TeLAPA maintains archives of behaviorally diverse yet competent policies aligned in a shared latent space to preserve plasticity and enable faster recovery after interference in continual reinforcement learning.
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

SPHERE applies a Parseval penalty derived from a Neural Tangent Kernel proxy for spectral plasticity to Mixture-of-Experts policies, raising average success rates by 133% on MetaWorld and 50% on HumanoidBench in conti...
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

SPHERE applies a Parseval penalty to MoE policies in continual RL to maintain spectral plasticity, yielding 133% and 50% higher average success on MetaWorld and HumanoidBench versus unregularized MoE baselines.
Safe Continual Reinforcement Learning in Non-stationary Environments
cs.LG 2026-04 unverdicted novelty 6.0

Safe continual RL methods face a fundamental tension between enforcing safety constraints and preventing catastrophic forgetting in non-stationary environments, with regularization providing only partial mitigation.
A Survey of Continual Reinforcement Learning
cs.LG 2025-06 accept novelty 6.0

The paper surveys CRL literature, proposes a taxonomy of methods into four categories based on knowledge storage and transfer, reviews metrics and benchmarks, and outlines challenges and future research directions.
Activation Function Design Sustains Plasticity in Continual Learning
cs.LG 2025-09 unverdicted novelty 5.0

Smooth-Leaky and Randomized Smooth-Leaky activations mitigate loss of plasticity in continual learning by targeting negative-branch shape and saturation behavior.

Reference graph

Works this paper leans on

113 extracted references · 113 canonical work pages · cited by 5 Pith papers · 9 internal anchors

[1]

Zaheer Abbas, Rosie Zhao, Joseph Modayil, Adam White, and Marlos C. Machado. Loss of plasticity in continual deep reinforcement learning. In Conference on Lifelong Learning Agents (CoLLAs), pages 620–636, 2023

work page 2023
[2]

A definition of continual reinforcement learning

David Abel, Andr´ e Barreto, Benjamin Van Roy, Doina Precup, Hado Philip van Hasselt, and Satinder Singh. A definition of continual reinforcement learning. In 49 Klein, Miklautz, Sidak, Plant, and Tschiatschek Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems...

work page 2023
[3]

A Brief Survey of Deep Reinforcement Learning

Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. A brief survey of deep reinforcement learning. CoRR, abs/1708.05866, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

Resetting the optimizer in deep RL: an empirical study

Kavosh Asadi, Rasool Fakoor, and Shoham Sabach. Resetting the optimizer in deep RL: an empirical study. In Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023
[5]

Ash and Ryan P

Jordan T. Ash and Ryan P. Adams. On warm-starting neural network training. In Advances in Neural Information Processing Systems (NeurIPS) , 2020

work page 2020
[6]

Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. CoRR, abs/1607.06450, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling

Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res., 47:253–279, 2013

work page 2013
[8]

Bellemare, Will Dabney, and R´ emi Munos

Marc G. Bellemare, Will Dabney, and R´ emi Munos. A distributional perspective on reinforcement learning. In International Conference on Machine Learning (ICML) , volume 70, pages 449–458, 2017

work page 2017
[9]

Bellemare, Will Dabney, and Mark Rowland

Marc G. Bellemare, Will Dabney, and Mark Rowland. Distributional Reinforcement Learning. MIT Press, 2023. http://www.distributional-rl.org

work page 2023
[10]

Smith, Razvan Pascanu, and Claudia Clopath

Tudor Berariu, Wojciech Czarnecki, Soham De, J¨ org Bornschein, Samuel L. Smith, Razvan Pascanu, and Claudia Clopath. A study on the plasticity of neural networks. CoRR, abs/2106.00042, 2021

work page arXiv 2021
[11]

Dota 2 with Large Scale Deep Reinforcement Learning

Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemyslaw De- biak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Christopher Hesse, Rafal J´ ozefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pond´ e de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Si...

work page internal anchor Pith review Pith/arXiv arXiv 1912
[12]

Crossq: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity

Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, and Jan Peters. Crossq: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity. In International Conference on Learning Representations (ICLR), 2024

work page 2024
[13]

Chatgpt broke the turing test-the race is on for new ways to assess ai

Celeste Biever. Chatgpt broke the turing test-the race is on for new ways to assess ai. Nature, 619(7971):686–689, 2023

work page 2023
[14]

Gomes, and Kilian Q

Johan Bjorck, Carla P. Gomes, and Kilian Q. Weinberger. Towards deeper deep reinforcement learning with spectral normalization, 2021. 50 Plasticity Loss in Deep RL: A Survey

work page 2021
[15]

Gomes, and Kilian Q

Johan Bjorck, Carla P. Gomes, and Kilian Q. Weinberger. Is High Variance Unavoid- able in RL? A Case Study in Continuous Control. In International Conference on Learning Representations (ICLR). OpenReview.net, 2022

work page 2022
[16]

JAX: composable transformations of Python+NumPy pro- grams, 2018

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman- Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy pro- grams, 2018

work page 2018
[17]

Dopamine: A Research Framework for Deep Reinforcement Learning

Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, and Marc G. Bellemare. Dopamine: A research framework for deep reinforcement learning. CoRR, abs/1812.06110, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[18]

Learning pessimism for reinforcement learning

Edoardo Cetin and Oya C ¸ eliktutan. Learning pessimism for reinforcement learning. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Conference on Artificial Intelligence (AAAI), pages 6971–6979. AAAI Press, 2023

work page 2023
[19]

Xinyue Chen, Che Wang, Zijian Zhou, and Keith W. Ross. Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations (ICLR). OpenReview.net, 2021

work page 2021
[20]

Fast and accurate deep network learning by exponential linear units (elus)

Djork-Arn´ e Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations (ICLR), 2016

work page 2016
[21]

Quantifying generalization in reinforcement learning

Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, and John Schul- man. Quantifying generalization in reinforcement learning. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 1282–1289. PMLR, 2019

work page 2019
[22]

Adaptive rational activations to boost deep reinforcement learning

Quentin Delfosse, Patrick Schramowski, Martin Mundt, Alejandro Molina, and Kris- tian Kersting. Adaptive rational activations to boost deep reinforcement learning. In International Conference on Learning Representations (ICLR) . OpenReview.net, 2024

work page 2024
[24]

Fernando Hernandez-Garcia, Parash Rahman, Richard S

Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, Richard S. Sut- ton, and A. Rupam Mahmood. Maintaining plasticity in deep continual learning. CoRR, abs/2306.13812, 2023

work page arXiv 2023
[25]

Belle- mare, and Aaron C

Pierluca D’Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G. Belle- mare, and Aaron C. Courville. Sample-efficient reinforcement learning by breaking the replay ratio barrier. In International Conference on Learning Representations (ICLR). OpenReview.net, 2023. 51 Klein, Miklautz, Sidak, Plant, and Tschiatschek

work page 2023
[26]

Rupam Mahmood

Mohamed Elsayed and A. Rupam Mahmood. Addressing loss of plasticity and catas- trophic forgetting in continual learning. In International Conference on Learning Representations (ICLR), 2024

work page 2024
[27]

Rupam Mahmood

Mohamed Elsayed, Qingfeng Lan, Clare Lyle, and A. Rupam Mahmood. Weight clipping for deep continual and reinforcement learning. CoRR, abs/2407.01704, 2024

work page arXiv 2024
[28]

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Lasse Espeholt, Hubert Soyer, R´ emi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In International Conference on Machine Learning (ICML), pages 1406–1415, 2018

work page 2018
[29]

Stop regressing: Training value functions via clas- sification for scalable deep RL

Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Ta¨ ıga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, and Rishabh Agarwal. Stop regressing: Training value functions via clas- sification for scalable deep RL. In International Conference on Machine Learning (ICML), 2024

work page 2024
[30]

Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera- Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Ju- lian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis, and Pushmeet Kohli. Discovering faster matrix multiplication algorithms with reinforcement learn- ing. Nat., 610(7930):47–53, 2022

work page 2022
[31]

Sharpness- aware minimization for efficiently improving generalization

Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness- aware minimization for efficiently improving generalization. In International Confer- ence on Learning Representations (ICLR) , 2021

work page 2021
[32]

Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem

C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. Brax - a differentiable physics engine for large scale rigid body simulation, 2021

work page 2021
[33]

Addressing function approxima- tion error in actor-critic methods

Scott Fujimoto, Herke van Hoof, and David Meger. Addressing function approxima- tion error in actor-critic methods. In International Conference on Machine Learning (ICML), pages 1582–1591, 2018

work page 2018
[34]

N., and Martin, M

Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, and Mario Martin. Simplifying deep temporal difference learning. CoRR, abs/2407.04811, 2024

work page arXiv 2024
[35]

Luke B. Godfrey. An evaluation of parametric activation functions for deep learning. In International Conference on Systems, Man and Cybernetics (SMC) , pages 3006–

work page
[36]

Spectral normalisation for deep reinforcement learning: An optimisation perspective

Florin Gogianu, Tudor Berariu, Mihaela Rosca, Claudia Clopath, Lucian Busoniu, and Razvan Pascanu. Spectral normalisation for deep reinforcement learning: An optimisation perspective. In International Conference on Machine Learning (ICML) , pages 3734–3744, 2021. 52 Plasticity Loss in Deep RL: A Survey

work page 2021
[37]

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[38]

An empirical study of implicit regularization in deep offline RL

C ¸ aglar G¨ ul¸ cehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matthew Hoffman, Razvan Pascanu, and Arnaud Doucet. An empirical study of implicit regularization in deep offline RL. Machine Learning Research, 2022, 2022

work page 2022
[39]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (ICML) , pages 1856–1865, 2018

work page 2018
[40]

Delving deep into recti- fiers: Surpassing human-level performance on imagenet classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into recti- fiers: Surpassing human-level performance on imagenet classification. In International Conference on Computer Vision (ICCV) , pages 1026–1034, 2015

work page 2015
[41]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778. IEEE, 2016

work page 2016
[42]

Adaptive regularization of representation rank as an implicit constraint of bellman equation

Qiang He, Tianyi Zhou, Meng Fang, and Setareh Maghsudi. Adaptive regularization of representation rank as an implicit constraint of bellman equation. In International Conference on Learning Representations (ICLR), 2024

work page 2024
[43]

Distilling the Knowledge in a Neural Network

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[44]

The vanishing gradient problem during learning recurrent neural nets and problem solutions

Sepp Hochreiter. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. , 6(2): 107–116, 1998

work page 1998
[45]

Flat Minima

Sepp Hochreiter and J¨ urgen Schmidhuber. Flat Minima. Neural Computation, 9(1): 1–42, 01 1997. ISSN 0899-7667

work page 1997
[46]

The low-rank simplicity bias in deep networks

Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, and Phillip Isola. The low-rank simplicity bias in deep networks. Trans. Mach. Learn. Res., 2023, 2023

work page 2023
[47]

Transient non-stationarity and generalisation in deep reinforcement learning

Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, and Shi- mon Whiteson. Transient non-stationarity and generalisation in deep reinforcement learning. In International Conference on Learning Representations (ICLR). OpenRe- view.net, 2021

work page 2021
[48]

Baird III

Leemon C. Baird III. Residual algorithms: Reinforcement learning with function approximation. In Armand Prieditis and Stuart Russell, editors, International Con- ference on Machine Learning (ICML) , pages 30–37. Morgan Kaufmann, 1995. 53 Klein, Miklautz, Sidak, Plant, and Tschiatschek

work page 1995
[49]

Improving regression performance with distribu- tional losses

Ehsan Imani and Martha White. Improving regression performance with distribu- tional losses. In Jennifer G. Dy and Andreas Krause, editors, International Confer- ence on Machine Learning (ICML) , volume 80 of Proceedings of Machine Learning Research, pages 2162–2171. PMLR, 2018

work page 2018
[50]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Francis R. Bach and David M. Blei, editors, International Conference on Machine Learning (ICML) , volume 37 of JMLR Workshop and Conference Proceedings, pages 448–456. JMLR.org, 2015

work page 2015
[51]

ACE: off-policy actor-critic with causality-aware entropy regularization

Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, and Huazhe Xu. ACE: off-policy actor-critic with causality-aware entropy regularization. In International Conference on Machine Learning (ICML). OpenReview.net, 2024

work page 2024
[52]

Camp- bell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski

Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Camp- bell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski. Model based reinforcement learning for atari. In International Conference on Learn- ing Representations (ICLR). OpenRevie...

work page 2020
[53]

Towards con- tinual reinforcement learning: A review and perspectives

Khimya Khetarpal, Matthew Riemer, Irina Rish, and Doina Precup. Towards con- tinual reinforcement learning: A review and perspectives. J. Artif. Intell. Res. , 75: 1401–1476, 2022

work page 2022
[54]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations (ICLR), 2015

work page 2015
[55]

Conservative q- learning for offline reinforcement learning

Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. Conservative q- learning for offline reinforcement learning. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems (NeurIPS) , 2020

work page 2020
[56]

Implicit under- parameterization inhibits data-efficient deep reinforcement learning

Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, and Sergey Levine. Implicit under- parameterization inhibits data-efficient deep reinforcement learning. In International Conference on Learning Representations (ICLR). OpenReview.net, 2021

work page 2021
[57]

Courville, George Tucker, and Sergey Levine

Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron C. Courville, George Tucker, and Sergey Levine. DR3: value-based deep reinforcement learning requires explicit regu- larization. In International Conference on Learning Representations (ICLR) . Open- Review.net, 2022

work page 2022
[58]

Offline q-learning on diverse multi-task data both scales and generalizes

Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, and Sergey Levine. Offline q-learning on diverse multi-task data both scales and generalizes. In Interna- tional Conference on Learning Representations (ICLR) . OpenReview.net, 2023

work page 2023
[59]

gymnax: A JAX-based reinforcement learning environment library, 2022

Robert Tjarko Lange. gymnax: A JAX-based reinforcement learning environment library, 2022. 54 Plasticity Loss in Deep RL: A Survey

work page 2022
[60]

Reinforcement learning with augmented data

Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Ar- avind Srinivas. Reinforcement learning with augmented data. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, ed- itors, Advances in Neural Information Processing Systems (NeurIPS) , 2020

work page 2020
[61]

PLASTIC: improving input and label plas- ticity for sample efficient reinforcement learning

Hojoon Lee, Hanseul Cho, Hyunseung Kim, Daehoon Gwak, Joonkee Kim, Jaegul Choo, Se-Young Yun, and Chulhee Yun. PLASTIC: improving input and label plas- ticity for sample efficient reinforcement learning. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems (N...

work page 2023
[62]

Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks

Hojoon Lee, Hyeonseo Cho, Hyunseung Kim, Donghu Kim, Dugki Min, Jaegul Choo, and Clare Lyle. Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks. In International Conference on Machine Learning (ICML) . OpenReview.net, 2024

work page 2024
[63]

Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, and Marlos C. Machado. Cur- vature explains loss of plasticity. CoRR, abs/2312.00246, 2023

work page arXiv 2023
[64]

On the effect of aux- iliary tasks on representation dynamics

Clare Lyle, Mark Rowland, Georg Ostrovski, and Will Dabney. On the effect of aux- iliary tasks on representation dynamics. In Arindam Banerjee and Kenji Fukumizu, editors, International Conference on Artificial Intelligence and Statistics (AISTATS), volume 130 of Proceedings of Machine Learning Research, pages 1–9. PMLR, 2021

work page 2021
[65]

Understanding and preventing capacity loss in reinforcement learning

Clare Lyle, Mark Rowland, and Will Dabney. Understanding and preventing capacity loss in reinforcement learning. In International Conference on Learning Representa- tions (ICLR), 2022

work page 2022
[66]

Understanding plasticity in neural networks

Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo ´Avila Pires, Razvan Pascanu, and Will Dabney. Understanding plasticity in neural networks. In International Conference on Machine Learning (ICML) , volume 202, pages 23190–23211, 2023

work page 2023
[67]

Normalization and effective learning rates in rein- forcement learning

Clare Lyle, Zeyu Zheng, Khimya Khetarpal, James Martens, Hado van Hasselt, Raz- van Pascanu, and Will Dabney. Normalization and effective learning rates in rein- forcement learning. CoRR, abs/2407.01800, 2024

work page arXiv 2024
[68]

Disentangling the Causes of Plasticity Loss in Neural Networks , February 2024

Clare Lyle, Zeyu Zheng, Khimya Khetarpal, Hado van Hasselt, Razvan Pascanu, James Martens, and Will Dabney. Disentangling the causes of plasticity loss in neural networks. CoRR, abs/2402.18762, 2024

work page arXiv 2024
[69]

Revisiting plasticity in visual reinforcement learn- ing: Data, modules and training stages

Guozheng Ma, Lu Li, Sen Zhang, Zixuan Liu, Zhen Wang, Yixin Chen, Li Shen, Xueqian Wang, and Dacheng Tao. Revisiting plasticity in visual reinforcement learn- ing: Data, modules and training stages. In International Conference on Learning Representations (ICLR). OpenReview.net, 2024

work page 2024
[70]

Rectifier nonlinearities improve neural network acoustic models

Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In International Conference on Machine Learning (ICML), volume 28 of JMLR Workshop and Conference Proceedings. JMLR.org, 2013. 55 Klein, Miklautz, Sidak, Plant, and Tschiatschek

work page 2013
[71]

Reinforcement learning with selective perception and hidden state

Andrew Kachites McCallum. Reinforcement learning with selective perception and hidden state. University of Rochester, 1996

work page 1996
[72]

Catastrophic interference in connectionist networks: The sequential learning problem

Michael McCloskey and Neal J Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier, 1989

work page 1989
[73]

Edan Meyer, Adam White, and Marlos C. Machado. Harnessing discrete representa- tions for continual reinforcement learning. CoRR, abs/2312.01203, 2023

work page arXiv 2023
[74]

Spectral normalization for generative adversarial networks

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018

work page 2018
[75]

Rusu, Joel Veness, Marc G

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human- level control through deep reinforcement ...

work page 2015
[76]

Structure in deep reinforcement learning: A survey and open problems

Aditya Mohan, Amy Zhang, and Marius Lindauer. Structure in deep reinforcement learning: A survey and open problems. J. Artif. Intell. Res. , 79:1167–1236, 2024

work page 2024
[77]

Kevin P. Murphy. Probabilistic Machine Learning: An Introduction . Adaptive Com- putation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts,

work page
[78]

ISBN 978-0-262-04682-4

work page
[79]

Kevin P. Murphy. Probabilistic Machine Learning: Advanced Topics. Adaptive Com- putation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts,

work page
[80]

ISBN 978-0-262-04843-9

work page
[81]

On the theory of risk-aware agents: Bridging actor-critic and economics

Michal Nauman and Marek Cygan. On the theory of risk-aware agents: Bridging actor-critic and economics. In ICML 2024 Workshop: Aligning Reinforcement Learn- ing Experimentalists and Theorists , 2023

work page 2024

Showing first 80 references.

[1] [1]

Zaheer Abbas, Rosie Zhao, Joseph Modayil, Adam White, and Marlos C. Machado. Loss of plasticity in continual deep reinforcement learning. In Conference on Lifelong Learning Agents (CoLLAs), pages 620–636, 2023

work page 2023

[2] [2]

A definition of continual reinforcement learning

David Abel, Andr´ e Barreto, Benjamin Van Roy, Doina Precup, Hado Philip van Hasselt, and Satinder Singh. A definition of continual reinforcement learning. In 49 Klein, Miklautz, Sidak, Plant, and Tschiatschek Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems...

work page 2023

[3] [3]

A Brief Survey of Deep Reinforcement Learning

Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. A brief survey of deep reinforcement learning. CoRR, abs/1708.05866, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[4] [4]

Resetting the optimizer in deep RL: an empirical study

Kavosh Asadi, Rasool Fakoor, and Shoham Sabach. Resetting the optimizer in deep RL: an empirical study. In Advances in Neural Information Processing Systems (NeurIPS), 2023

work page 2023

[5] [5]

Ash and Ryan P

Jordan T. Ash and Ryan P. Adams. On warm-starting neural network training. In Advances in Neural Information Processing Systems (NeurIPS) , 2020

work page 2020

[6] [6]

Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. CoRR, abs/1607.06450, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [7]

Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling

Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res., 47:253–279, 2013

work page 2013

[8] [8]

Bellemare, Will Dabney, and R´ emi Munos

Marc G. Bellemare, Will Dabney, and R´ emi Munos. A distributional perspective on reinforcement learning. In International Conference on Machine Learning (ICML) , volume 70, pages 449–458, 2017

work page 2017

[9] [9]

Bellemare, Will Dabney, and Mark Rowland

Marc G. Bellemare, Will Dabney, and Mark Rowland. Distributional Reinforcement Learning. MIT Press, 2023. http://www.distributional-rl.org

work page 2023

[10] [10]

Smith, Razvan Pascanu, and Claudia Clopath

Tudor Berariu, Wojciech Czarnecki, Soham De, J¨ org Bornschein, Samuel L. Smith, Razvan Pascanu, and Claudia Clopath. A study on the plasticity of neural networks. CoRR, abs/2106.00042, 2021

work page arXiv 2021

[11] [11]

Dota 2 with Large Scale Deep Reinforcement Learning

Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemyslaw De- biak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Christopher Hesse, Rafal J´ ozefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pond´ e de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Si...

work page internal anchor Pith review Pith/arXiv arXiv 1912

[12] [12]

Crossq: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity

Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, and Jan Peters. Crossq: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity. In International Conference on Learning Representations (ICLR), 2024

work page 2024

[13] [13]

Chatgpt broke the turing test-the race is on for new ways to assess ai

Celeste Biever. Chatgpt broke the turing test-the race is on for new ways to assess ai. Nature, 619(7971):686–689, 2023

work page 2023

[14] [14]

Gomes, and Kilian Q

Johan Bjorck, Carla P. Gomes, and Kilian Q. Weinberger. Towards deeper deep reinforcement learning with spectral normalization, 2021. 50 Plasticity Loss in Deep RL: A Survey

work page 2021

[15] [15]

Gomes, and Kilian Q

Johan Bjorck, Carla P. Gomes, and Kilian Q. Weinberger. Is High Variance Unavoid- able in RL? A Case Study in Continuous Control. In International Conference on Learning Representations (ICLR). OpenReview.net, 2022

work page 2022

[16] [16]

JAX: composable transformations of Python+NumPy pro- grams, 2018

James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman- Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy pro- grams, 2018

work page 2018

[17] [17]

Dopamine: A Research Framework for Deep Reinforcement Learning

Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, and Marc G. Bellemare. Dopamine: A research framework for deep reinforcement learning. CoRR, abs/1812.06110, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[18] [18]

Learning pessimism for reinforcement learning

Edoardo Cetin and Oya C ¸ eliktutan. Learning pessimism for reinforcement learning. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Conference on Artificial Intelligence (AAAI), pages 6971–6979. AAAI Press, 2023

work page 2023

[19] [19]

Xinyue Chen, Che Wang, Zijian Zhou, and Keith W. Ross. Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations (ICLR). OpenReview.net, 2021

work page 2021

[20] [20]

Fast and accurate deep network learning by exponential linear units (elus)

Djork-Arn´ e Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations (ICLR), 2016

work page 2016

[21] [21]

Quantifying generalization in reinforcement learning

Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, and John Schul- man. Quantifying generalization in reinforcement learning. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 1282–1289. PMLR, 2019

work page 2019

[22] [22]

Adaptive rational activations to boost deep reinforcement learning

Quentin Delfosse, Patrick Schramowski, Martin Mundt, Alejandro Molina, and Kris- tian Kersting. Adaptive rational activations to boost deep reinforcement learning. In International Conference on Learning Representations (ICLR) . OpenReview.net, 2024

work page 2024

[23] [24]

Fernando Hernandez-Garcia, Parash Rahman, Richard S

Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, Richard S. Sut- ton, and A. Rupam Mahmood. Maintaining plasticity in deep continual learning. CoRR, abs/2306.13812, 2023

work page arXiv 2023

[24] [25]

Belle- mare, and Aaron C

Pierluca D’Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G. Belle- mare, and Aaron C. Courville. Sample-efficient reinforcement learning by breaking the replay ratio barrier. In International Conference on Learning Representations (ICLR). OpenReview.net, 2023. 51 Klein, Miklautz, Sidak, Plant, and Tschiatschek

work page 2023

[25] [26]

Rupam Mahmood

Mohamed Elsayed and A. Rupam Mahmood. Addressing loss of plasticity and catas- trophic forgetting in continual learning. In International Conference on Learning Representations (ICLR), 2024

work page 2024

[26] [27]

Rupam Mahmood

Mohamed Elsayed, Qingfeng Lan, Clare Lyle, and A. Rupam Mahmood. Weight clipping for deep continual and reinforcement learning. CoRR, abs/2407.01704, 2024

work page arXiv 2024

[27] [28]

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Lasse Espeholt, Hubert Soyer, R´ emi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In International Conference on Machine Learning (ICML), pages 1406–1415, 2018

work page 2018

[28] [29]

Stop regressing: Training value functions via clas- sification for scalable deep RL

Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Ta¨ ıga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, and Rishabh Agarwal. Stop regressing: Training value functions via clas- sification for scalable deep RL. In International Conference on Machine Learning (ICML), 2024

work page 2024

[29] [30]

Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera- Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Ju- lian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis, and Pushmeet Kohli. Discovering faster matrix multiplication algorithms with reinforcement learn- ing. Nat., 610(7930):47–53, 2022

work page 2022

[30] [31]

Sharpness- aware minimization for efficiently improving generalization

Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness- aware minimization for efficiently improving generalization. In International Confer- ence on Learning Representations (ICLR) , 2021

work page 2021

[31] [32]

Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem

C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. Brax - a differentiable physics engine for large scale rigid body simulation, 2021

work page 2021

[32] [33]

Addressing function approxima- tion error in actor-critic methods

Scott Fujimoto, Herke van Hoof, and David Meger. Addressing function approxima- tion error in actor-critic methods. In International Conference on Machine Learning (ICML), pages 1582–1591, 2018

work page 2018

[33] [34]

N., and Martin, M

Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, and Mario Martin. Simplifying deep temporal difference learning. CoRR, abs/2407.04811, 2024

work page arXiv 2024

[34] [35]

Luke B. Godfrey. An evaluation of parametric activation functions for deep learning. In International Conference on Systems, Man and Cybernetics (SMC) , pages 3006–

work page

[35] [36]

Spectral normalisation for deep reinforcement learning: An optimisation perspective

Florin Gogianu, Tudor Berariu, Mihaela Rosca, Claudia Clopath, Lucian Busoniu, and Razvan Pascanu. Spectral normalisation for deep reinforcement learning: An optimisation perspective. In International Conference on Machine Learning (ICML) , pages 3734–3744, 2021. 52 Plasticity Loss in Deep RL: A Survey

work page 2021

[36] [37]

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[37] [38]

An empirical study of implicit regularization in deep offline RL

C ¸ aglar G¨ ul¸ cehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matthew Hoffman, Razvan Pascanu, and Arnaud Doucet. An empirical study of implicit regularization in deep offline RL. Machine Learning Research, 2022, 2022

work page 2022

[38] [39]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (ICML) , pages 1856–1865, 2018

work page 2018

[39] [40]

Delving deep into recti- fiers: Surpassing human-level performance on imagenet classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into recti- fiers: Surpassing human-level performance on imagenet classification. In International Conference on Computer Vision (ICCV) , pages 1026–1034, 2015

work page 2015

[40] [41]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778. IEEE, 2016

work page 2016

[41] [42]

Adaptive regularization of representation rank as an implicit constraint of bellman equation

Qiang He, Tianyi Zhou, Meng Fang, and Setareh Maghsudi. Adaptive regularization of representation rank as an implicit constraint of bellman equation. In International Conference on Learning Representations (ICLR), 2024

work page 2024

[42] [43]

Distilling the Knowledge in a Neural Network

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[43] [44]

The vanishing gradient problem during learning recurrent neural nets and problem solutions

Sepp Hochreiter. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. , 6(2): 107–116, 1998

work page 1998

[44] [45]

Flat Minima

Sepp Hochreiter and J¨ urgen Schmidhuber. Flat Minima. Neural Computation, 9(1): 1–42, 01 1997. ISSN 0899-7667

work page 1997

[45] [46]

The low-rank simplicity bias in deep networks

Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, and Phillip Isola. The low-rank simplicity bias in deep networks. Trans. Mach. Learn. Res., 2023, 2023

work page 2023

[46] [47]

Transient non-stationarity and generalisation in deep reinforcement learning

Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, and Shi- mon Whiteson. Transient non-stationarity and generalisation in deep reinforcement learning. In International Conference on Learning Representations (ICLR). OpenRe- view.net, 2021

work page 2021

[47] [48]

Baird III

Leemon C. Baird III. Residual algorithms: Reinforcement learning with function approximation. In Armand Prieditis and Stuart Russell, editors, International Con- ference on Machine Learning (ICML) , pages 30–37. Morgan Kaufmann, 1995. 53 Klein, Miklautz, Sidak, Plant, and Tschiatschek

work page 1995

[48] [49]

Improving regression performance with distribu- tional losses

Ehsan Imani and Martha White. Improving regression performance with distribu- tional losses. In Jennifer G. Dy and Andreas Krause, editors, International Confer- ence on Machine Learning (ICML) , volume 80 of Proceedings of Machine Learning Research, pages 2162–2171. PMLR, 2018

work page 2018

[49] [50]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Francis R. Bach and David M. Blei, editors, International Conference on Machine Learning (ICML) , volume 37 of JMLR Workshop and Conference Proceedings, pages 448–456. JMLR.org, 2015

work page 2015

[50] [51]

ACE: off-policy actor-critic with causality-aware entropy regularization

Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, and Huazhe Xu. ACE: off-policy actor-critic with causality-aware entropy regularization. In International Conference on Machine Learning (ICML). OpenReview.net, 2024

work page 2024

[51] [52]

Camp- bell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski

Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Camp- bell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski. Model based reinforcement learning for atari. In International Conference on Learn- ing Representations (ICLR). OpenRevie...

work page 2020

[52] [53]

Towards con- tinual reinforcement learning: A review and perspectives

Khimya Khetarpal, Matthew Riemer, Irina Rish, and Doina Precup. Towards con- tinual reinforcement learning: A review and perspectives. J. Artif. Intell. Res. , 75: 1401–1476, 2022

work page 2022

[53] [54]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations (ICLR), 2015

work page 2015

[54] [55]

Conservative q- learning for offline reinforcement learning

Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. Conservative q- learning for offline reinforcement learning. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems (NeurIPS) , 2020

work page 2020

[55] [56]

Implicit under- parameterization inhibits data-efficient deep reinforcement learning

Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, and Sergey Levine. Implicit under- parameterization inhibits data-efficient deep reinforcement learning. In International Conference on Learning Representations (ICLR). OpenReview.net, 2021

work page 2021

[56] [57]

Courville, George Tucker, and Sergey Levine

Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron C. Courville, George Tucker, and Sergey Levine. DR3: value-based deep reinforcement learning requires explicit regu- larization. In International Conference on Learning Representations (ICLR) . Open- Review.net, 2022

work page 2022

[57] [58]

Offline q-learning on diverse multi-task data both scales and generalizes

Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, and Sergey Levine. Offline q-learning on diverse multi-task data both scales and generalizes. In Interna- tional Conference on Learning Representations (ICLR) . OpenReview.net, 2023

work page 2023

[58] [59]

gymnax: A JAX-based reinforcement learning environment library, 2022

Robert Tjarko Lange. gymnax: A JAX-based reinforcement learning environment library, 2022. 54 Plasticity Loss in Deep RL: A Survey

work page 2022

[59] [60]

Reinforcement learning with augmented data

Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Ar- avind Srinivas. Reinforcement learning with augmented data. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, ed- itors, Advances in Neural Information Processing Systems (NeurIPS) , 2020

work page 2020

[60] [61]

PLASTIC: improving input and label plas- ticity for sample efficient reinforcement learning

Hojoon Lee, Hanseul Cho, Hyunseung Kim, Daehoon Gwak, Joonkee Kim, Jaegul Choo, Se-Young Yun, and Chulhee Yun. PLASTIC: improving input and label plas- ticity for sample efficient reinforcement learning. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems (N...

work page 2023

[61] [62]

Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks

Hojoon Lee, Hyeonseo Cho, Hyunseung Kim, Donghu Kim, Dugki Min, Jaegul Choo, and Clare Lyle. Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks. In International Conference on Machine Learning (ICML) . OpenReview.net, 2024

work page 2024

[62] [63]

Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, and Marlos C. Machado. Cur- vature explains loss of plasticity. CoRR, abs/2312.00246, 2023

work page arXiv 2023

[63] [64]

On the effect of aux- iliary tasks on representation dynamics

Clare Lyle, Mark Rowland, Georg Ostrovski, and Will Dabney. On the effect of aux- iliary tasks on representation dynamics. In Arindam Banerjee and Kenji Fukumizu, editors, International Conference on Artificial Intelligence and Statistics (AISTATS), volume 130 of Proceedings of Machine Learning Research, pages 1–9. PMLR, 2021

work page 2021

[64] [65]

Understanding and preventing capacity loss in reinforcement learning

Clare Lyle, Mark Rowland, and Will Dabney. Understanding and preventing capacity loss in reinforcement learning. In International Conference on Learning Representa- tions (ICLR), 2022

work page 2022

[65] [66]

Understanding plasticity in neural networks

Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo ´Avila Pires, Razvan Pascanu, and Will Dabney. Understanding plasticity in neural networks. In International Conference on Machine Learning (ICML) , volume 202, pages 23190–23211, 2023

work page 2023

[66] [67]

Normalization and effective learning rates in rein- forcement learning

Clare Lyle, Zeyu Zheng, Khimya Khetarpal, James Martens, Hado van Hasselt, Raz- van Pascanu, and Will Dabney. Normalization and effective learning rates in rein- forcement learning. CoRR, abs/2407.01800, 2024

work page arXiv 2024

[67] [68]

Disentangling the Causes of Plasticity Loss in Neural Networks , February 2024

Clare Lyle, Zeyu Zheng, Khimya Khetarpal, Hado van Hasselt, Razvan Pascanu, James Martens, and Will Dabney. Disentangling the causes of plasticity loss in neural networks. CoRR, abs/2402.18762, 2024

work page arXiv 2024

[68] [69]

Revisiting plasticity in visual reinforcement learn- ing: Data, modules and training stages

Guozheng Ma, Lu Li, Sen Zhang, Zixuan Liu, Zhen Wang, Yixin Chen, Li Shen, Xueqian Wang, and Dacheng Tao. Revisiting plasticity in visual reinforcement learn- ing: Data, modules and training stages. In International Conference on Learning Representations (ICLR). OpenReview.net, 2024

work page 2024

[69] [70]

Rectifier nonlinearities improve neural network acoustic models

Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In International Conference on Machine Learning (ICML), volume 28 of JMLR Workshop and Conference Proceedings. JMLR.org, 2013. 55 Klein, Miklautz, Sidak, Plant, and Tschiatschek

work page 2013

[70] [71]

Reinforcement learning with selective perception and hidden state

Andrew Kachites McCallum. Reinforcement learning with selective perception and hidden state. University of Rochester, 1996

work page 1996

[71] [72]

Catastrophic interference in connectionist networks: The sequential learning problem

Michael McCloskey and Neal J Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier, 1989

work page 1989

[72] [73]

Edan Meyer, Adam White, and Marlos C. Machado. Harnessing discrete representa- tions for continual reinforcement learning. CoRR, abs/2312.01203, 2023

work page arXiv 2023

[73] [74]

Spectral normalization for generative adversarial networks

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018

work page 2018

[74] [75]

Rusu, Joel Veness, Marc G

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human- level control through deep reinforcement ...

work page 2015

[75] [76]

Structure in deep reinforcement learning: A survey and open problems

Aditya Mohan, Amy Zhang, and Marius Lindauer. Structure in deep reinforcement learning: A survey and open problems. J. Artif. Intell. Res. , 79:1167–1236, 2024

work page 2024

[76] [77]

Kevin P. Murphy. Probabilistic Machine Learning: An Introduction . Adaptive Com- putation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts,

work page

[77] [78]

ISBN 978-0-262-04682-4

work page

[78] [79]

Kevin P. Murphy. Probabilistic Machine Learning: Advanced Topics. Adaptive Com- putation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts,

work page

[79] [80]

ISBN 978-0-262-04843-9

work page

[80] [81]

On the theory of risk-aware agents: Bridging actor-critic and economics

Michal Nauman and Marek Cygan. On the theory of risk-aware agents: Bridging actor-critic and economics. In ICML 2024 Workshop: Aligning Reinforcement Learn- ing Experimentalists and Theorists , 2023

work page 2024