Plasticity Loss in Deep Reinforcement Learning: A Survey
Pith reviewed 2026-05-23 17:59 UTC · model grok-4.3
The pith
General regularization techniques often outperform domain-specific interventions for plasticity loss in deep reinforcement learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By proposing a unified definition of plasticity and examining its drivers and pathologies, the authors organize over 50 mitigation strategies into the first comprehensive taxonomy. Their analysis reveals gaps in evaluation practices and shows that general regularization techniques often outperform domain-specific interventions. Future research should focus on understanding the underlying mechanisms of plasticity loss.
What carries the argument
The taxonomy of over 50 mitigation strategies for plasticity loss, which groups approaches to enable systematic comparison of effectiveness.
If this is right
- Evaluation practices across plasticity research require standardization to reliably compare interventions.
- General regularization techniques merit priority as baselines in new mitigation studies.
- Addressing plasticity loss can reduce related problems such as overestimation bias and poor exploration.
- Mechanistic studies of plasticity loss will support more reliable scaling of deep RL systems.
Where Pith is reading between the lines
- Plasticity loss may arise primarily from generic neural network training dynamics rather than reinforcement learning specifics alone.
- Widespread adoption of the taxonomy could reduce redundant experiments by providing shared categories.
- Applying the taxonomy to emerging large-scale RL benchmarks could test whether the performance pattern holds.
Load-bearing premise
The reviewed set of over 50 papers is representative of the field without systematic selection bias, and the proposed taxonomy accurately captures distinct categories without overlaps or omissions that would change the conclusion on regularization performance.
What would settle it
A controlled study applying both general regularization and multiple domain-specific interventions to identical environments and agents, then measuring whether the specialized methods produce higher final performance and sustained plasticity.
Figures
read the original abstract
Plasticity refers to a network's ability to adapt to changing data distributions, which is crucial for the successful training of deep reinforcement learning agents. Loss of plasticity causes performance plateaus and contributes to scaling failures, overestimation bias, and insufficient exploration. To deepen the understanding of plasticity loss, we propose a unified definition, examine its drivers and pathologies, and organize over 50 mitigation strategies into the first comprehensive taxonomy of the field. Our analysis shows gaps in current evaluation practices and reveals that general regularization techniques often outperform domain-specific interventions. Future research should prioritize understanding the mechanisms underlying plasticity loss.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This survey proposes a unified definition of plasticity loss in deep RL, examines its drivers and pathologies, organizes over 50 mitigation strategies into a taxonomy, identifies gaps in current evaluation practices, and concludes that general regularization techniques often outperform domain-specific interventions, with a call for future work on underlying mechanisms.
Significance. If the taxonomy is exhaustive and the performance comparison is based on a representative, unbiased sample with transparent classification, the survey could usefully consolidate the literature and direct attention to evaluation standards; the absence of such methodological transparency currently limits its utility as a reference.
major comments (2)
- [Abstract] Abstract: the claim that 'general regularization techniques often outperform domain-specific interventions' is presented without any description of the paper-selection protocol, search strategy, inclusion/exclusion criteria, or quantitative aggregation method used across the >50 papers; this directly undermines the reliability of the comparative conclusion.
- [Abstract] Abstract / taxonomy description: no information is given on how the taxonomy was constructed (e.g., inter-rater agreement, handling of overlapping strategies, or verification that categories are exhaustive), so it is impossible to determine whether the reported performance advantage is an artifact of classification choices rather than an empirical pattern.
Simulated Author's Rebuttal
We thank the referee for the constructive comments highlighting the need for greater methodological transparency. We agree these details strengthen a survey and will revise the manuscript to address both points.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'general regularization techniques often outperform domain-specific interventions' is presented without any description of the paper-selection protocol, search strategy, inclusion/exclusion criteria, or quantitative aggregation method used across the >50 papers; this directly undermines the reliability of the comparative conclusion.
Authors: We agree the abstract (and current manuscript) does not describe the literature review protocol. Our review covered papers from major conferences (NeurIPS, ICML, ICLR, ICRA, CoRL) and journals through mid-2024, identified via keyword searches and citation chaining, but without a pre-registered systematic protocol or formal quantitative aggregation. We will add an explicit 'Survey Methodology' subsection detailing the search strategy, inclusion criteria, and how the performance comparison was derived, while qualifying the claim to reflect the narrative synthesis rather than a meta-analysis. revision: yes
-
Referee: [Abstract] Abstract / taxonomy description: no information is given on how the taxonomy was constructed (e.g., inter-rater agreement, handling of overlapping strategies, or verification that categories are exhaustive), so it is impossible to determine whether the reported performance advantage is an artifact of classification choices rather than an empirical pattern.
Authors: The taxonomy was constructed iteratively by grouping strategies according to their primary intervention mechanism (e.g., regularization, architectural changes), with overlaps noted and resolved by primary intent. No formal inter-rater agreement was computed. We will expand the taxonomy section to describe the construction process, overlap handling, and category rationale, and will add a limitations paragraph acknowledging that exhaustiveness cannot be formally verified. The performance observation will be presented with appropriate caveats. revision: yes
Circularity Check
No circularity: survey compiles external literature without self-referential reductions
full rationale
This is a survey paper that proposes a definition and taxonomy based on review of over 50 external papers, analyzes gaps in evaluation, and compares regularization techniques. No equations, fitted parameters, or derivations are present that could reduce to self-definition or self-citation chains. The central claims rest on the reviewed literature rather than any internal construction that loops back to the paper's own inputs. Per the rules, absence of quoted reductions matching the enumerated patterns yields a score of 0.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 6 Pith papers
-
Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning
TeLAPA maintains archives of behaviorally diverse yet competent policies aligned in a shared latent space to preserve plasticity and enable faster recovery after interference in continual reinforcement learning.
-
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
SPHERE applies a Parseval penalty derived from a Neural Tangent Kernel proxy for spectral plasticity to Mixture-of-Experts policies, raising average success rates by 133% on MetaWorld and 50% on HumanoidBench in conti...
-
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
SPHERE applies a Parseval penalty to MoE policies in continual RL to maintain spectral plasticity, yielding 133% and 50% higher average success on MetaWorld and HumanoidBench versus unregularized MoE baselines.
-
Safe Continual Reinforcement Learning in Non-stationary Environments
Safe continual RL methods face a fundamental tension between enforcing safety constraints and preventing catastrophic forgetting in non-stationary environments, with regularization providing only partial mitigation.
-
A Survey of Continual Reinforcement Learning
The paper surveys CRL literature, proposes a taxonomy of methods into four categories based on knowledge storage and transfer, reviews metrics and benchmarks, and outlines challenges and future research directions.
-
Activation Function Design Sustains Plasticity in Continual Learning
Smooth-Leaky and Randomized Smooth-Leaky activations mitigate loss of plasticity in continual learning by targeting negative-branch shape and saturation behavior.
Reference graph
Works this paper leans on
-
[1]
Zaheer Abbas, Rosie Zhao, Joseph Modayil, Adam White, and Marlos C. Machado. Loss of plasticity in continual deep reinforcement learning. In Conference on Lifelong Learning Agents (CoLLAs), pages 620–636, 2023
work page 2023
-
[2]
A definition of continual reinforcement learning
David Abel, Andr´ e Barreto, Benjamin Van Roy, Doina Precup, Hado Philip van Hasselt, and Satinder Singh. A definition of continual reinforcement learning. In 49 Klein, Miklautz, Sidak, Plant, and Tschiatschek Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems...
work page 2023
-
[3]
A Brief Survey of Deep Reinforcement Learning
Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. A brief survey of deep reinforcement learning. CoRR, abs/1708.05866, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[4]
Resetting the optimizer in deep RL: an empirical study
Kavosh Asadi, Rasool Fakoor, and Shoham Sabach. Resetting the optimizer in deep RL: an empirical study. In Advances in Neural Information Processing Systems (NeurIPS), 2023
work page 2023
-
[5]
Jordan T. Ash and Ryan P. Adams. On warm-starting neural network training. In Advances in Neural Information Processing Systems (NeurIPS) , 2020
work page 2020
-
[6]
Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. CoRR, abs/1607.06450, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[7]
Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling
Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res., 47:253–279, 2013
work page 2013
-
[8]
Bellemare, Will Dabney, and R´ emi Munos
Marc G. Bellemare, Will Dabney, and R´ emi Munos. A distributional perspective on reinforcement learning. In International Conference on Machine Learning (ICML) , volume 70, pages 449–458, 2017
work page 2017
-
[9]
Bellemare, Will Dabney, and Mark Rowland
Marc G. Bellemare, Will Dabney, and Mark Rowland. Distributional Reinforcement Learning. MIT Press, 2023. http://www.distributional-rl.org
work page 2023
-
[10]
Smith, Razvan Pascanu, and Claudia Clopath
Tudor Berariu, Wojciech Czarnecki, Soham De, J¨ org Bornschein, Samuel L. Smith, Razvan Pascanu, and Claudia Clopath. A study on the plasticity of neural networks. CoRR, abs/2106.00042, 2021
-
[11]
Dota 2 with Large Scale Deep Reinforcement Learning
Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemyslaw De- biak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Christopher Hesse, Rafal J´ ozefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pond´ e de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Si...
work page internal anchor Pith review Pith/arXiv arXiv 1912
-
[12]
Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, and Jan Peters. Crossq: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity. In International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[13]
Chatgpt broke the turing test-the race is on for new ways to assess ai
Celeste Biever. Chatgpt broke the turing test-the race is on for new ways to assess ai. Nature, 619(7971):686–689, 2023
work page 2023
-
[14]
Johan Bjorck, Carla P. Gomes, and Kilian Q. Weinberger. Towards deeper deep reinforcement learning with spectral normalization, 2021. 50 Plasticity Loss in Deep RL: A Survey
work page 2021
-
[15]
Johan Bjorck, Carla P. Gomes, and Kilian Q. Weinberger. Is High Variance Unavoid- able in RL? A Case Study in Continuous Control. In International Conference on Learning Representations (ICLR). OpenReview.net, 2022
work page 2022
-
[16]
JAX: composable transformations of Python+NumPy pro- grams, 2018
James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman- Milne, and Qiao Zhang. JAX: composable transformations of Python+NumPy pro- grams, 2018
work page 2018
-
[17]
Dopamine: A Research Framework for Deep Reinforcement Learning
Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, and Marc G. Bellemare. Dopamine: A research framework for deep reinforcement learning. CoRR, abs/1812.06110, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[18]
Learning pessimism for reinforcement learning
Edoardo Cetin and Oya C ¸ eliktutan. Learning pessimism for reinforcement learning. In Brian Williams, Yiling Chen, and Jennifer Neville, editors, Conference on Artificial Intelligence (AAAI), pages 6971–6979. AAAI Press, 2023
work page 2023
-
[19]
Xinyue Chen, Che Wang, Zijian Zhou, and Keith W. Ross. Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations (ICLR). OpenReview.net, 2021
work page 2021
-
[20]
Fast and accurate deep network learning by exponential linear units (elus)
Djork-Arn´ e Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations (ICLR), 2016
work page 2016
-
[21]
Quantifying generalization in reinforcement learning
Karl Cobbe, Oleg Klimov, Christopher Hesse, Taehoon Kim, and John Schul- man. Quantifying generalization in reinforcement learning. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 1282–1289. PMLR, 2019
work page 2019
-
[22]
Adaptive rational activations to boost deep reinforcement learning
Quentin Delfosse, Patrick Schramowski, Martin Mundt, Alejandro Molina, and Kris- tian Kersting. Adaptive rational activations to boost deep reinforcement learning. In International Conference on Learning Representations (ICLR) . OpenReview.net, 2024
work page 2024
-
[24]
Fernando Hernandez-Garcia, Parash Rahman, Richard S
Shibhansh Dohare, J. Fernando Hernandez-Garcia, Parash Rahman, Richard S. Sut- ton, and A. Rupam Mahmood. Maintaining plasticity in deep continual learning. CoRR, abs/2306.13812, 2023
-
[25]
Pierluca D’Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G. Belle- mare, and Aaron C. Courville. Sample-efficient reinforcement learning by breaking the replay ratio barrier. In International Conference on Learning Representations (ICLR). OpenReview.net, 2023. 51 Klein, Miklautz, Sidak, Plant, and Tschiatschek
work page 2023
-
[26]
Mohamed Elsayed and A. Rupam Mahmood. Addressing loss of plasticity and catas- trophic forgetting in continual learning. In International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[27]
Mohamed Elsayed, Qingfeng Lan, Clare Lyle, and A. Rupam Mahmood. Weight clipping for deep continual and reinforcement learning. CoRR, abs/2407.01704, 2024
-
[28]
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Lasse Espeholt, Hubert Soyer, R´ emi Munos, Karen Simonyan, Volodymyr Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane Legg, and Koray Kavukcuoglu. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. In International Conference on Machine Learning (ICML), pages 1406–1415, 2018
work page 2018
-
[29]
Stop regressing: Training value functions via clas- sification for scalable deep RL
Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Ta¨ ıga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, and Rishabh Agarwal. Stop regressing: Training value functions via clas- sification for scalable deep RL. In International Conference on Machine Learning (ICML), 2024
work page 2024
-
[30]
Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera- Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Ju- lian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis, and Pushmeet Kohli. Discovering faster matrix multiplication algorithms with reinforcement learn- ing. Nat., 610(7930):47–53, 2022
work page 2022
-
[31]
Sharpness- aware minimization for efficiently improving generalization
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. Sharpness- aware minimization for efficiently improving generalization. In International Confer- ence on Learning Representations (ICLR) , 2021
work page 2021
-
[32]
Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem
C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. Brax - a differentiable physics engine for large scale rigid body simulation, 2021
work page 2021
-
[33]
Addressing function approxima- tion error in actor-critic methods
Scott Fujimoto, Herke van Hoof, and David Meger. Addressing function approxima- tion error in actor-critic methods. In International Conference on Machine Learning (ICML), pages 1582–1591, 2018
work page 2018
-
[34]
Matteo Gallici, Mattie Fellows, Benjamin Ellis, Bartomeu Pou, Ivan Masmitja, Jakob Nicolaus Foerster, and Mario Martin. Simplifying deep temporal difference learning. CoRR, abs/2407.04811, 2024
-
[35]
Luke B. Godfrey. An evaluation of parametric activation functions for deep learning. In International Conference on Systems, Man and Cybernetics (SMC) , pages 3006–
-
[36]
Spectral normalisation for deep reinforcement learning: An optimisation perspective
Florin Gogianu, Tudor Berariu, Mihaela Rosca, Claudia Clopath, Lucian Busoniu, and Razvan Pascanu. Spectral normalisation for deep reinforcement learning: An optimisation perspective. In International Conference on Machine Learning (ICML) , pages 3734–3744, 2021. 52 Plasticity Loss in Deep RL: A Survey
work page 2021
-
[37]
An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 , 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[38]
An empirical study of implicit regularization in deep offline RL
C ¸ aglar G¨ ul¸ cehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matthew Hoffman, Razvan Pascanu, and Arnaud Doucet. An empirical study of implicit regularization in deep offline RL. Machine Learning Research, 2022, 2022
work page 2022
-
[39]
Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (ICML) , pages 1856–1865, 2018
work page 2018
-
[40]
Delving deep into recti- fiers: Surpassing human-level performance on imagenet classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into recti- fiers: Surpassing human-level performance on imagenet classification. In International Conference on Computer Vision (ICCV) , pages 1026–1034, 2015
work page 2015
-
[41]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778. IEEE, 2016
work page 2016
-
[42]
Adaptive regularization of representation rank as an implicit constraint of bellman equation
Qiang He, Tianyi Zhou, Meng Fang, and Setareh Maghsudi. Adaptive regularization of representation rank as an implicit constraint of bellman equation. In International Conference on Learning Representations (ICLR), 2024
work page 2024
-
[43]
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[44]
The vanishing gradient problem during learning recurrent neural nets and problem solutions
Sepp Hochreiter. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. , 6(2): 107–116, 1998
work page 1998
-
[45]
Sepp Hochreiter and J¨ urgen Schmidhuber. Flat Minima. Neural Computation, 9(1): 1–42, 01 1997. ISSN 0899-7667
work page 1997
-
[46]
The low-rank simplicity bias in deep networks
Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, and Phillip Isola. The low-rank simplicity bias in deep networks. Trans. Mach. Learn. Res., 2023, 2023
work page 2023
-
[47]
Transient non-stationarity and generalisation in deep reinforcement learning
Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, and Shi- mon Whiteson. Transient non-stationarity and generalisation in deep reinforcement learning. In International Conference on Learning Representations (ICLR). OpenRe- view.net, 2021
work page 2021
-
[48]
Leemon C. Baird III. Residual algorithms: Reinforcement learning with function approximation. In Armand Prieditis and Stuart Russell, editors, International Con- ference on Machine Learning (ICML) , pages 30–37. Morgan Kaufmann, 1995. 53 Klein, Miklautz, Sidak, Plant, and Tschiatschek
work page 1995
-
[49]
Improving regression performance with distribu- tional losses
Ehsan Imani and Martha White. Improving regression performance with distribu- tional losses. In Jennifer G. Dy and Andreas Krause, editors, International Confer- ence on Machine Learning (ICML) , volume 80 of Proceedings of Machine Learning Research, pages 2162–2171. PMLR, 2018
work page 2018
-
[50]
Batch normalization: Accelerating deep network training by reducing internal covariate shift
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Francis R. Bach and David M. Blei, editors, International Conference on Machine Learning (ICML) , volume 37 of JMLR Workshop and Conference Proceedings, pages 448–456. JMLR.org, 2015
work page 2015
-
[51]
ACE: off-policy actor-critic with causality-aware entropy regularization
Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, and Huazhe Xu. ACE: off-policy actor-critic with causality-aware entropy regularization. In International Conference on Machine Learning (ICML). OpenReview.net, 2024
work page 2024
-
[52]
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Camp- bell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski. Model based reinforcement learning for atari. In International Conference on Learn- ing Representations (ICLR). OpenRevie...
work page 2020
-
[53]
Towards con- tinual reinforcement learning: A review and perspectives
Khimya Khetarpal, Matthew Riemer, Irina Rish, and Doina Precup. Towards con- tinual reinforcement learning: A review and perspectives. J. Artif. Intell. Res. , 75: 1401–1476, 2022
work page 2022
-
[54]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations (ICLR), 2015
work page 2015
-
[55]
Conservative q- learning for offline reinforcement learning
Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. Conservative q- learning for offline reinforcement learning. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems (NeurIPS) , 2020
work page 2020
-
[56]
Implicit under- parameterization inhibits data-efficient deep reinforcement learning
Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, and Sergey Levine. Implicit under- parameterization inhibits data-efficient deep reinforcement learning. In International Conference on Learning Representations (ICLR). OpenReview.net, 2021
work page 2021
-
[57]
Courville, George Tucker, and Sergey Levine
Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron C. Courville, George Tucker, and Sergey Levine. DR3: value-based deep reinforcement learning requires explicit regu- larization. In International Conference on Learning Representations (ICLR) . Open- Review.net, 2022
work page 2022
-
[58]
Offline q-learning on diverse multi-task data both scales and generalizes
Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, and Sergey Levine. Offline q-learning on diverse multi-task data both scales and generalizes. In Interna- tional Conference on Learning Representations (ICLR) . OpenReview.net, 2023
work page 2023
-
[59]
gymnax: A JAX-based reinforcement learning environment library, 2022
Robert Tjarko Lange. gymnax: A JAX-based reinforcement learning environment library, 2022. 54 Plasticity Loss in Deep RL: A Survey
work page 2022
-
[60]
Reinforcement learning with augmented data
Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Ar- avind Srinivas. Reinforcement learning with augmented data. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, ed- itors, Advances in Neural Information Processing Systems (NeurIPS) , 2020
work page 2020
-
[61]
PLASTIC: improving input and label plas- ticity for sample efficient reinforcement learning
Hojoon Lee, Hanseul Cho, Hyunseung Kim, Daehoon Gwak, Joonkee Kim, Jaegul Choo, Se-Young Yun, and Chulhee Yun. PLASTIC: improving input and label plas- ticity for sample efficient reinforcement learning. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems (N...
work page 2023
-
[62]
Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks
Hojoon Lee, Hyeonseo Cho, Hyunseung Kim, Donghu Kim, Dugki Min, Jaegul Choo, and Clare Lyle. Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks. In International Conference on Machine Learning (ICML) . OpenReview.net, 2024
work page 2024
- [63]
-
[64]
On the effect of aux- iliary tasks on representation dynamics
Clare Lyle, Mark Rowland, Georg Ostrovski, and Will Dabney. On the effect of aux- iliary tasks on representation dynamics. In Arindam Banerjee and Kenji Fukumizu, editors, International Conference on Artificial Intelligence and Statistics (AISTATS), volume 130 of Proceedings of Machine Learning Research, pages 1–9. PMLR, 2021
work page 2021
-
[65]
Understanding and preventing capacity loss in reinforcement learning
Clare Lyle, Mark Rowland, and Will Dabney. Understanding and preventing capacity loss in reinforcement learning. In International Conference on Learning Representa- tions (ICLR), 2022
work page 2022
-
[66]
Understanding plasticity in neural networks
Clare Lyle, Zeyu Zheng, Evgenii Nikishin, Bernardo ´Avila Pires, Razvan Pascanu, and Will Dabney. Understanding plasticity in neural networks. In International Conference on Machine Learning (ICML) , volume 202, pages 23190–23211, 2023
work page 2023
-
[67]
Normalization and effective learning rates in rein- forcement learning
Clare Lyle, Zeyu Zheng, Khimya Khetarpal, James Martens, Hado van Hasselt, Raz- van Pascanu, and Will Dabney. Normalization and effective learning rates in rein- forcement learning. CoRR, abs/2407.01800, 2024
-
[68]
Disentangling the Causes of Plasticity Loss in Neural Networks , February 2024
Clare Lyle, Zeyu Zheng, Khimya Khetarpal, Hado van Hasselt, Razvan Pascanu, James Martens, and Will Dabney. Disentangling the causes of plasticity loss in neural networks. CoRR, abs/2402.18762, 2024
-
[69]
Revisiting plasticity in visual reinforcement learn- ing: Data, modules and training stages
Guozheng Ma, Lu Li, Sen Zhang, Zixuan Liu, Zhen Wang, Yixin Chen, Li Shen, Xueqian Wang, and Dacheng Tao. Revisiting plasticity in visual reinforcement learn- ing: Data, modules and training stages. In International Conference on Learning Representations (ICLR). OpenReview.net, 2024
work page 2024
-
[70]
Rectifier nonlinearities improve neural network acoustic models
Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In International Conference on Machine Learning (ICML), volume 28 of JMLR Workshop and Conference Proceedings. JMLR.org, 2013. 55 Klein, Miklautz, Sidak, Plant, and Tschiatschek
work page 2013
-
[71]
Reinforcement learning with selective perception and hidden state
Andrew Kachites McCallum. Reinforcement learning with selective perception and hidden state. University of Rochester, 1996
work page 1996
-
[72]
Catastrophic interference in connectionist networks: The sequential learning problem
Michael McCloskey and Neal J Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pages 109–165. Elsevier, 1989
work page 1989
- [73]
-
[74]
Spectral normalization for generative adversarial networks
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018
work page 2018
-
[75]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human- level control through deep reinforcement ...
work page 2015
-
[76]
Structure in deep reinforcement learning: A survey and open problems
Aditya Mohan, Amy Zhang, and Marius Lindauer. Structure in deep reinforcement learning: A survey and open problems. J. Artif. Intell. Res. , 79:1167–1236, 2024
work page 2024
-
[77]
Kevin P. Murphy. Probabilistic Machine Learning: An Introduction . Adaptive Com- putation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts,
-
[78]
ISBN 978-0-262-04682-4
-
[79]
Kevin P. Murphy. Probabilistic Machine Learning: Advanced Topics. Adaptive Com- putation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts,
-
[80]
ISBN 978-0-262-04843-9
-
[81]
On the theory of risk-aware agents: Bridging actor-critic and economics
Michal Nauman and Marek Cygan. On the theory of risk-aware agents: Bridging actor-critic and economics. In ICML 2024 Workshop: Aligning Reinforcement Learn- ing Experimentalists and Theorists , 2023
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.