pith. sign in

arxiv: 2411.04832 · v3 · submitted 2024-11-07 · 💻 cs.AI · cs.LG

Plasticity Loss in Deep Reinforcement Learning: A Survey

Pith reviewed 2026-05-23 17:59 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords plasticity lossdeep reinforcement learningregularization techniquesevaluation practicestaxonomymitigation strategiesscaling failuresoverestimation bias
0
0 comments X

The pith

General regularization techniques often outperform domain-specific interventions for plasticity loss in deep reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified definition of plasticity loss, examines its drivers and pathologies, and organizes over 50 mitigation strategies into the first comprehensive taxonomy. A sympathetic reader would care because loss of plasticity leads to performance plateaus, scaling failures, overestimation bias, and insufficient exploration in deep RL agents. The analysis identifies gaps in current evaluation practices across the field. It concludes that broad regularization methods tend to work better than tailored, domain-specific solutions.

Core claim

By proposing a unified definition of plasticity and examining its drivers and pathologies, the authors organize over 50 mitigation strategies into the first comprehensive taxonomy. Their analysis reveals gaps in evaluation practices and shows that general regularization techniques often outperform domain-specific interventions. Future research should focus on understanding the underlying mechanisms of plasticity loss.

What carries the argument

The taxonomy of over 50 mitigation strategies for plasticity loss, which groups approaches to enable systematic comparison of effectiveness.

If this is right

  • Evaluation practices across plasticity research require standardization to reliably compare interventions.
  • General regularization techniques merit priority as baselines in new mitigation studies.
  • Addressing plasticity loss can reduce related problems such as overestimation bias and poor exploration.
  • Mechanistic studies of plasticity loss will support more reliable scaling of deep RL systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Plasticity loss may arise primarily from generic neural network training dynamics rather than reinforcement learning specifics alone.
  • Widespread adoption of the taxonomy could reduce redundant experiments by providing shared categories.
  • Applying the taxonomy to emerging large-scale RL benchmarks could test whether the performance pattern holds.

Load-bearing premise

The reviewed set of over 50 papers is representative of the field without systematic selection bias, and the proposed taxonomy accurately captures distinct categories without overlaps or omissions that would change the conclusion on regularization performance.

What would settle it

A controlled study applying both general regularization and multiple domain-specific interventions to identical environments and agents, then measuring whether the specialized methods produce higher final performance and sustained plasticity.

Figures

Figures reproduced from arXiv: 2411.04832 by Christoph Luther, Claudia Plant, Lukas Miklautz, Manus McAuliffe, Sebastian Tschiatschek, Timo Klein.

Figure 1
Figure 1. Figure 1: Gradient covariance structure at different time steps on Atari. (a) For the Atari game Freeway, the gradient covariance matrix displays a pronounced structure at one million steps. (b) Later in the training, the structure becomes less noticeable at 3.5M steps. (c) and (d) show the gradient covariance matrices for the game SpaceInvaders at the same time steps. Here, the structure is less pronounced. We can … view at source ↗
Figure 2
Figure 2. Figure 2: Possible connections between factors and causes of plasticity loss in value-based RL. Large-mean regression targets combined with non-stationarity of deep RL training cause large and unstable gradients, leading to an increase in parameter norms. Large parameter norms are known to increase loss sharpness and cause other pathologies, together leading to reduced agent performance. potential causes of plastici… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of categorical losses for deep RL. The two-hot representa￾tion [93] proportionally assigns probability mass to the two neighboring bins of a scalar target y. HL-Gauss [49] constructs a Gaussian with fixed standard devia￾tion and integrates over each bin to obtain the corresponding probability mass. Distribution RL algorithms such as C51 [8] model the full return distribution. Detailed descrip… view at source ↗
read the original abstract

Plasticity refers to a network's ability to adapt to changing data distributions, which is crucial for the successful training of deep reinforcement learning agents. Loss of plasticity causes performance plateaus and contributes to scaling failures, overestimation bias, and insufficient exploration. To deepen the understanding of plasticity loss, we propose a unified definition, examine its drivers and pathologies, and organize over 50 mitigation strategies into the first comprehensive taxonomy of the field. Our analysis shows gaps in current evaluation practices and reveals that general regularization techniques often outperform domain-specific interventions. Future research should prioritize understanding the mechanisms underlying plasticity loss.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. This survey proposes a unified definition of plasticity loss in deep RL, examines its drivers and pathologies, organizes over 50 mitigation strategies into a taxonomy, identifies gaps in current evaluation practices, and concludes that general regularization techniques often outperform domain-specific interventions, with a call for future work on underlying mechanisms.

Significance. If the taxonomy is exhaustive and the performance comparison is based on a representative, unbiased sample with transparent classification, the survey could usefully consolidate the literature and direct attention to evaluation standards; the absence of such methodological transparency currently limits its utility as a reference.

major comments (2)
  1. [Abstract] Abstract: the claim that 'general regularization techniques often outperform domain-specific interventions' is presented without any description of the paper-selection protocol, search strategy, inclusion/exclusion criteria, or quantitative aggregation method used across the >50 papers; this directly undermines the reliability of the comparative conclusion.
  2. [Abstract] Abstract / taxonomy description: no information is given on how the taxonomy was constructed (e.g., inter-rater agreement, handling of overlapping strategies, or verification that categories are exhaustive), so it is impossible to determine whether the reported performance advantage is an artifact of classification choices rather than an empirical pattern.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater methodological transparency. We agree these details strengthen a survey and will revise the manuscript to address both points.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'general regularization techniques often outperform domain-specific interventions' is presented without any description of the paper-selection protocol, search strategy, inclusion/exclusion criteria, or quantitative aggregation method used across the >50 papers; this directly undermines the reliability of the comparative conclusion.

    Authors: We agree the abstract (and current manuscript) does not describe the literature review protocol. Our review covered papers from major conferences (NeurIPS, ICML, ICLR, ICRA, CoRL) and journals through mid-2024, identified via keyword searches and citation chaining, but without a pre-registered systematic protocol or formal quantitative aggregation. We will add an explicit 'Survey Methodology' subsection detailing the search strategy, inclusion criteria, and how the performance comparison was derived, while qualifying the claim to reflect the narrative synthesis rather than a meta-analysis. revision: yes

  2. Referee: [Abstract] Abstract / taxonomy description: no information is given on how the taxonomy was constructed (e.g., inter-rater agreement, handling of overlapping strategies, or verification that categories are exhaustive), so it is impossible to determine whether the reported performance advantage is an artifact of classification choices rather than an empirical pattern.

    Authors: The taxonomy was constructed iteratively by grouping strategies according to their primary intervention mechanism (e.g., regularization, architectural changes), with overlaps noted and resolved by primary intent. No formal inter-rater agreement was computed. We will expand the taxonomy section to describe the construction process, overlap handling, and category rationale, and will add a limitations paragraph acknowledging that exhaustiveness cannot be formally verified. The performance observation will be presented with appropriate caveats. revision: yes

Circularity Check

0 steps flagged

No circularity: survey compiles external literature without self-referential reductions

full rationale

This is a survey paper that proposes a definition and taxonomy based on review of over 50 external papers, analyzes gaps in evaluation, and compares regularization techniques. No equations, fitted parameters, or derivations are present that could reduce to self-definition or self-citation chains. The central claims rest on the reviewed literature rather than any internal construction that loops back to the paper's own inputs. Per the rules, absence of quoted reductions matching the enumerated patterns yields a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper, the contribution is organizational synthesis of existing research; no new free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5633 in / 1090 out tokens · 25901 ms · 2026-05-23T17:59:52.287403+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning

    cs.LG 2026-04 unverdicted novelty 7.0

    TeLAPA maintains archives of behaviorally diverse yet competent policies aligned in a shared latent space to preserve plasticity and enable faster recovery after interference in continual reinforcement learning.

  2. SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    SPHERE applies a Parseval penalty derived from a Neural Tangent Kernel proxy for spectral plasticity to Mixture-of-Experts policies, raising average success rates by 133% on MetaWorld and 50% on HumanoidBench in conti...

  3. SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    SPHERE applies a Parseval penalty to MoE policies in continual RL to maintain spectral plasticity, yielding 133% and 50% higher average success on MetaWorld and HumanoidBench versus unregularized MoE baselines.

  4. Safe Continual Reinforcement Learning in Non-stationary Environments

    cs.LG 2026-04 unverdicted novelty 6.0

    Safe continual RL methods face a fundamental tension between enforcing safety constraints and preventing catastrophic forgetting in non-stationary environments, with regularization providing only partial mitigation.

  5. A Survey of Continual Reinforcement Learning

    cs.LG 2025-06 accept novelty 6.0

    The paper surveys CRL literature, proposes a taxonomy of methods into four categories based on knowledge storage and transfer, reviews metrics and benchmarks, and outlines challenges and future research directions.

  6. Activation Function Design Sustains Plasticity in Continual Learning

    cs.LG 2025-09 unverdicted novelty 5.0

    Smooth-Leaky and Randomized Smooth-Leaky activations mitigate loss of plasticity in continual learning by targeting negative-branch shape and saturation behavior.

Reference graph

Works this paper leans on

113 extracted references · 113 canonical work pages · cited by 5 Pith papers · 9 internal anchors

  1. [1]

    Zaheer Abbas, Rosie Zhao, Joseph Modayil, Adam White, and Marlos C. Machado. Loss of plasticity in continual deep reinforcement learning. In Conference on Lifelong Learning Agents (CoLLAs), pages 620–636, 2023

  2. [2]

    A definition of continual reinforcement learning

    David Abel, Andr´ e Barreto, Benjamin Van Roy, Doina Precup, Hado Philip van Hasselt, and Satinder Singh. A definition of continual reinforcement learning. In 49 Klein, Miklautz, Sidak, Plant, and Tschiatschek Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems...

  3. [3]

    A Brief Survey of Deep Reinforcement Learning

    Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. A brief survey of deep reinforcement learning. CoRR, abs/1708.05866, 2017

  4. [4]

    Resetting the optimizer in deep RL: an empirical study

    Kavosh Asadi, Rasool Fakoor, and Shoham Sabach. Resetting the optimizer in deep RL: an empirical study. In Advances in Neural Information Processing Systems (NeurIPS), 2023

  5. [5]

    Ash and Ryan P

    Jordan T. Ash and Ryan P. Adams. On warm-starting neural network training. In Advances in Neural Information Processing Systems (NeurIPS) , 2020

  6. [6]

    Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. CoRR, abs/1607.06450, 2016

  7. [7]

    Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling

    Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res., 47:253–279, 2013

  8. [8]

    Bellemare, Will Dabney, and R´ emi Munos

    Marc G. Bellemare, Will Dabney, and R´ emi Munos. A distributional perspective on reinforcement learning. In International Conference on Machine Learning (ICML) , volume 70, pages 449–458, 2017

  9. [9]

    Bellemare, Will Dabney, and Mark Rowland

    Marc G. Bellemare, Will Dabney, and Mark Rowland. Distributional Reinforcement Learning. MIT Press, 2023. http://www.distributional-rl.org

  10. [10]

    Smith, Razvan Pascanu, and Claudia Clopath

    Tudor Berariu, Wojciech Czarnecki, Soham De, J¨ org Bornschein, Samuel L. Smith, Razvan Pascanu, and Claudia Clopath. A study on the plasticity of neural networks. CoRR, abs/2106.00042, 2021

  11. [11]

    Dota 2 with Large Scale Deep Reinforcement Learning

    Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemyslaw De- biak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Christopher Hesse, Rafal J´ ozefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pond´ e de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Si...

  12. [12]

    Crossq: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity

    Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, and Jan Peters. Crossq: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity. In International Conference on Learning Representations (ICLR), 2024

  13. [13]

    Chatgpt broke the turing test-the race is on for new ways to assess ai

    Celeste Biever. Chatgpt broke the turing test-the race is on for new ways to assess ai. Nature, 619(7971):686–689, 2023

  14. [14]

    Gomes, and Kilian Q

    Johan Bjorck, Carla P. Gomes, and Kilian Q. Weinberger. Towards deeper deep reinforcement learning with spectral normalization, 2021. 50 Plasticity Loss in Deep RL: A Survey

  15. [15]

    Gomes, and Kilian Q

    Johan Bjorck, Carla P. Gomes, and Kilian Q. Weinberger. Is High Variance Unavoid- able in RL? A Case Study in Continuous Control. In International Conference on Learning Representations (ICLR). OpenReview.net, 2022

  16. [16]

    JAX: composable transformations of Python+NumPy pro- grams, 2018

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman- Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy pro- grams, 2018

  17. [17]

    Dopamine: A Research Framework for Deep Reinforcement Learning

    Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, and Marc G. Bellemare. Dopamine: A research framework for deep reinforcement learning. CoRR, abs/1812.06110, 2018

  18. [18]

    Learning pessimism for reinforcement learning

    Edoardo Cetin and Oya C ¸ eliktutan. Learning pessimism for reinforcement learning. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Conference on Artificial Intelligence (AAAI), pages 6971–6979. AAAI Press, 2023

  19. [19]

    Xinyue Chen, Che Wang, Zijian Zhou, and Keith W. Ross. Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations (ICLR). OpenReview.net, 2021

  20. [20]

    Fast and accurate deep network learning by exponential linear units (elus)

    Djork-Arn´ e Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations (ICLR), 2016

  21. [21]

    Quantifying generalization in reinforcement learning

    Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, and John Schul- man. Quantifying generalization in reinforcement learning. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 1282–1289. PMLR, 2019

  22. [22]

    Adaptive rational activations to boost deep reinforcement learning

    Quentin Delfosse, Patrick Schramowski, Martin Mundt, Alejandro Molina, and Kris- tian Kersting. Adaptive rational activations to boost deep reinforcement learning. In International Conference on Learning Representations (ICLR) . OpenReview.net, 2024

  23. [24]

    Fernando Hernandez-Garcia, Parash Rahman, Richard S

    Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, Richard S. Sut- ton, and A. Rupam Mahmood. Maintaining plasticity in deep continual learning. CoRR, abs/2306.13812, 2023

  24. [25]

    Belle- mare, and Aaron C

    Pierluca D’Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G. Belle- mare, and Aaron C. Courville. Sample-efficient reinforcement learning by breaking the replay ratio barrier. In International Conference on Learning Representations (ICLR). OpenReview.net, 2023. 51 Klein, Miklautz, Sidak, Plant, and Tschiatschek

  25. [26]

    Rupam Mahmood

    Mohamed Elsayed and A. Rupam Mahmood. Addressing loss of plasticity and catas- trophic forgetting in continual learning. In International Conference on Learning Representations (ICLR), 2024

  26. [27]

    Rupam Mahmood

    Mohamed Elsayed, Qingfeng Lan, Clare Lyle, and A. Rupam Mahmood. Weight clipping for deep continual and reinforcement learning. CoRR, abs/2407.01704, 2024

  27. [28]

    IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

    Lasse Espeholt, Hubert Soyer, R´ emi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In International Conference on Machine Learning (ICML), pages 1406–1415, 2018

  28. [29]

    Stop regressing: Training value functions via clas- sification for scalable deep RL

    Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Ta¨ ıga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, and Rishabh Agarwal. Stop regressing: Training value functions via clas- sification for scalable deep RL. In International Conference on Machine Learning (ICML), 2024

  29. [30]

    Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera- Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Ju- lian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis, and Pushmeet Kohli. Discovering faster matrix multiplication algorithms with reinforcement learn- ing. Nat., 610(7930):47–53, 2022

  30. [31]

    Sharpness- aware minimization for efficiently improving generalization

    Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness- aware minimization for efficiently improving generalization. In International Confer- ence on Learning Representations (ICLR) , 2021

  31. [32]

    Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem

    C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. Brax - a differentiable physics engine for large scale rigid body simulation, 2021

  32. [33]

    Addressing function approxima- tion error in actor-critic methods

    Scott Fujimoto, Herke van Hoof, and David Meger. Addressing function approxima- tion error in actor-critic methods. In International Conference on Machine Learning (ICML), pages 1582–1591, 2018

  33. [34]

    N., and Martin, M

    Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, and Mario Martin. Simplifying deep temporal difference learning. CoRR, abs/2407.04811, 2024

  34. [35]

    Luke B. Godfrey. An evaluation of parametric activation functions for deep learning. In International Conference on Systems, Man and Cybernetics (SMC) , pages 3006–

  35. [36]

    Spectral normalisation for deep reinforcement learning: An optimisation perspective

    Florin Gogianu, Tudor Berariu, Mihaela Rosca, Claudia Clopath, Lucian Busoniu, and Razvan Pascanu. Spectral normalisation for deep reinforcement learning: An optimisation perspective. In International Conference on Machine Learning (ICML) , pages 3734–3744, 2021. 52 Plasticity Loss in Deep RL: A Survey

  36. [37]

    An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

    Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 , 2013

  37. [38]

    An empirical study of implicit regularization in deep offline RL

    C ¸ aglar G¨ ul¸ cehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matthew Hoffman, Razvan Pascanu, and Arnaud Doucet. An empirical study of implicit regularization in deep offline RL. Machine Learning Research, 2022, 2022

  38. [39]

    Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (ICML) , pages 1856–1865, 2018

  39. [40]

    Delving deep into recti- fiers: Surpassing human-level performance on imagenet classification

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into recti- fiers: Surpassing human-level performance on imagenet classification. In International Conference on Computer Vision (ICCV) , pages 1026–1034, 2015

  40. [41]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778. IEEE, 2016

  41. [42]

    Adaptive regularization of representation rank as an implicit constraint of bellman equation

    Qiang He, Tianyi Zhou, Meng Fang, and Setareh Maghsudi. Adaptive regularization of representation rank as an implicit constraint of bellman equation. In International Conference on Learning Representations (ICLR), 2024

  42. [43]

    Distilling the Knowledge in a Neural Network

    Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015

  43. [44]

    The vanishing gradient problem during learning recurrent neural nets and problem solutions

    Sepp Hochreiter. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. , 6(2): 107–116, 1998

  44. [45]

    Flat Minima

    Sepp Hochreiter and J¨ urgen Schmidhuber. Flat Minima. Neural Computation, 9(1): 1–42, 01 1997. ISSN 0899-7667

  45. [46]

    The low-rank simplicity bias in deep networks

    Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, and Phillip Isola. The low-rank simplicity bias in deep networks. Trans. Mach. Learn. Res., 2023, 2023

  46. [47]

    Transient non-stationarity and generalisation in deep reinforcement learning

    Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, and Shi- mon Whiteson. Transient non-stationarity and generalisation in deep reinforcement learning. In International Conference on Learning Representations (ICLR). OpenRe- view.net, 2021

  47. [48]

    Baird III

    Leemon C. Baird III. Residual algorithms: Reinforcement learning with function approximation. In Armand Prieditis and Stuart Russell, editors, International Con- ference on Machine Learning (ICML) , pages 30–37. Morgan Kaufmann, 1995. 53 Klein, Miklautz, Sidak, Plant, and Tschiatschek

  48. [49]

    Improving regression performance with distribu- tional losses

    Ehsan Imani and Martha White. Improving regression performance with distribu- tional losses. In Jennifer G. Dy and Andreas Krause, editors, International Confer- ence on Machine Learning (ICML) , volume 80 of Proceedings of Machine Learning Research, pages 2162–2171. PMLR, 2018

  49. [50]

    Batch normalization: Accelerating deep network training by reducing internal covariate shift

    Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Francis R. Bach and David M. Blei, editors, International Conference on Machine Learning (ICML) , volume 37 of JMLR Workshop and Conference Proceedings, pages 448–456. JMLR.org, 2015

  50. [51]

    ACE: off-policy actor-critic with causality-aware entropy regularization

    Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, and Huazhe Xu. ACE: off-policy actor-critic with causality-aware entropy regularization. In International Conference on Machine Learning (ICML). OpenReview.net, 2024

  51. [52]

    Camp- bell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski

    Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Camp- bell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski. Model based reinforcement learning for atari. In International Conference on Learn- ing Representations (ICLR). OpenRevie...

  52. [53]

    Towards con- tinual reinforcement learning: A review and perspectives

    Khimya Khetarpal, Matthew Riemer, Irina Rish, and Doina Precup. Towards con- tinual reinforcement learning: A review and perspectives. J. Artif. Intell. Res. , 75: 1401–1476, 2022

  53. [54]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations (ICLR), 2015

  54. [55]

    Conservative q- learning for offline reinforcement learning

    Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. Conservative q- learning for offline reinforcement learning. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems (NeurIPS) , 2020

  55. [56]

    Implicit under- parameterization inhibits data-efficient deep reinforcement learning

    Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, and Sergey Levine. Implicit under- parameterization inhibits data-efficient deep reinforcement learning. In International Conference on Learning Representations (ICLR). OpenReview.net, 2021

  56. [57]

    Courville, George Tucker, and Sergey Levine

    Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron C. Courville, George Tucker, and Sergey Levine. DR3: value-based deep reinforcement learning requires explicit regu- larization. In International Conference on Learning Representations (ICLR) . Open- Review.net, 2022

  57. [58]

    Offline q-learning on diverse multi-task data both scales and generalizes

    Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, and Sergey Levine. Offline q-learning on diverse multi-task data both scales and generalizes. In Interna- tional Conference on Learning Representations (ICLR) . OpenReview.net, 2023

  58. [59]

    gymnax: A JAX-based reinforcement learning environment library, 2022

    Robert Tjarko Lange. gymnax: A JAX-based reinforcement learning environment library, 2022. 54 Plasticity Loss in Deep RL: A Survey

  59. [60]

    Reinforcement learning with augmented data

    Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Ar- avind Srinivas. Reinforcement learning with augmented data. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, ed- itors, Advances in Neural Information Processing Systems (NeurIPS) , 2020

  60. [61]

    PLASTIC: improving input and label plas- ticity for sample efficient reinforcement learning

    Hojoon Lee, Hanseul Cho, Hyunseung Kim, Daehoon Gwak, Joonkee Kim, Jaegul Choo, Se-Young Yun, and Chulhee Yun. PLASTIC: improving input and label plas- ticity for sample efficient reinforcement learning. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems (N...

  61. [62]

    Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks

    Hojoon Lee, Hyeonseo Cho, Hyunseung Kim, Donghu Kim, Dugki Min, Jaegul Choo, and Clare Lyle. Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks. In International Conference on Machine Learning (ICML) . OpenReview.net, 2024

  62. [63]

    Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, and Marlos C. Machado. Cur- vature explains loss of plasticity. CoRR, abs/2312.00246, 2023

  63. [64]

    On the effect of aux- iliary tasks on representation dynamics

    Clare Lyle, Mark Rowland, Georg Ostrovski, and Will Dabney. On the effect of aux- iliary tasks on representation dynamics. In Arindam Banerjee and Kenji Fukumizu, editors, International Conference on Artificial Intelligence and Statistics (AISTATS), volume 130 of Proceedings of Machine Learning Research, pages 1–9. PMLR, 2021

  64. [65]

    Understanding and preventing capacity loss in reinforcement learning

    Clare Lyle, Mark Rowland, and Will Dabney. Understanding and preventing capacity loss in reinforcement learning. In International Conference on Learning Representa- tions (ICLR), 2022

  65. [66]

    Understanding plasticity in neural networks

    Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo ´Avila Pires, Razvan Pascanu, and Will Dabney. Understanding plasticity in neural networks. In International Conference on Machine Learning (ICML) , volume 202, pages 23190–23211, 2023

  66. [67]

    Normalization and effective learning rates in rein- forcement learning

    Clare Lyle, Zeyu Zheng, Khimya Khetarpal, James Martens, Hado van Hasselt, Raz- van Pascanu, and Will Dabney. Normalization and effective learning rates in rein- forcement learning. CoRR, abs/2407.01800, 2024

  67. [68]

    Disentangling the Causes of Plasticity Loss in Neural Networks , February 2024

    Clare Lyle, Zeyu Zheng, Khimya Khetarpal, Hado van Hasselt, Razvan Pascanu, James Martens, and Will Dabney. Disentangling the causes of plasticity loss in neural networks. CoRR, abs/2402.18762, 2024

  68. [69]

    Revisiting plasticity in visual reinforcement learn- ing: Data, modules and training stages

    Guozheng Ma, Lu Li, Sen Zhang, Zixuan Liu, Zhen Wang, Yixin Chen, Li Shen, Xueqian Wang, and Dacheng Tao. Revisiting plasticity in visual reinforcement learn- ing: Data, modules and training stages. In International Conference on Learning Representations (ICLR). OpenReview.net, 2024

  69. [70]

    Rectifier nonlinearities improve neural network acoustic models

    Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In International Conference on Machine Learning (ICML), volume 28 of JMLR Workshop and Conference Proceedings. JMLR.org, 2013. 55 Klein, Miklautz, Sidak, Plant, and Tschiatschek

  70. [71]

    Reinforcement learning with selective perception and hidden state

    Andrew Kachites McCallum. Reinforcement learning with selective perception and hidden state. University of Rochester, 1996

  71. [72]

    Catastrophic interference in connectionist networks: The sequential learning problem

    Michael McCloskey and Neal J Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier, 1989

  72. [73]

    Edan Meyer, Adam White, and Marlos C. Machado. Harnessing discrete representa- tions for continual reinforcement learning. CoRR, abs/2312.01203, 2023

  73. [74]

    Spectral normalization for generative adversarial networks

    Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018

  74. [75]

    Rusu, Joel Veness, Marc G

    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human- level control through deep reinforcement ...

  75. [76]

    Structure in deep reinforcement learning: A survey and open problems

    Aditya Mohan, Amy Zhang, and Marius Lindauer. Structure in deep reinforcement learning: A survey and open problems. J. Artif. Intell. Res. , 79:1167–1236, 2024

  76. [77]

    Kevin P. Murphy. Probabilistic Machine Learning: An Introduction . Adaptive Com- putation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts,

  77. [78]

    ISBN 978-0-262-04682-4

  78. [79]

    Kevin P. Murphy. Probabilistic Machine Learning: Advanced Topics. Adaptive Com- putation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts,

  79. [80]

    ISBN 978-0-262-04843-9

  80. [81]

    On the theory of risk-aware agents: Bridging actor-critic and economics

    Michal Nauman and Marek Cygan. On the theory of risk-aware agents: Bridging actor-critic and economics. In ICML 2024 Workshop: Aligning Reinforcement Learn- ing Experimentalists and Theorists , 2023

Showing first 80 references.