Balancing Plasticity and Stability with Fast and Slow Successor Features

Blake Richards; Doina Precup; Raymond Chua

arxiv: 2605.26357 · v2 · pith:B7ILAIKGnew · submitted 2026-05-25 · 💻 cs.LG

Balancing Plasticity and Stability with Fast and Slow Successor Features

Raymond Chua , Doina Precup , Blake Richards This is my paper

Pith reviewed 2026-06-29 22:13 UTC · model grok-4.3

classification 💻 cs.LG

keywords continual learningreinforcement learningsuccessor featuressynaptic consolidationstability-plasticity dilemmanon-stationary environmentsmulti-timescale learninggradual drift

0 comments

The pith

Stabilizing successor features with multi-timescale synaptic consolidation outperforms plasticity methods in gradually drifting environments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates the stability-plasticity dilemma for deep RL agents facing gradual rather than abrupt environmental change. It creates modified 3D Miniworld and MuJoCo settings with naturalistic continual drift to test consolidation versus resetting approaches. Findings indicate that synaptic consolidation applied to successor features produces better performance than consolidation of Q-values or parameter resets, and that consolidating across multiple timescales is most effective because the timescales together address different rates of environmental change. This points to stability mechanisms being particularly useful when non-stationarity unfolds slowly and continuously.

Core claim

In environments modified to feature gradual continual drift, applying neuro-inspired synaptic consolidation to successor features produces superior performance on continually changing tasks compared with methods that reset parameters or consolidate Q-values directly, with the largest gains arising when consolidation targets operate across multiple timescales that together capture complementary aspects of the drift.

What carries the argument

Successor Features stabilized via synaptic consolidation at multiple (fast and slow) timescales

If this is right

Stability-focused methods outperform plasticity-focused methods when environmental change occurs gradually rather than through discrete jumps.
Successor features serve as more effective consolidation targets than raw Q-values because they reduce interference across changing conditions.
Consolidation at multiple timescales captures complementary rates of environmental drift more effectively than any single timescale.
The performance advantage appears in both discrete navigation and continuous control domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same multi-timescale consolidation principle could be tested on other predictive representations such as successor measures or latent dynamics models.
Robotics or autonomous driving systems that encounter slow seasonal or wear-induced drift might benefit from explicit fast-slow consolidation schedules.
Benchmarks that rely only on abrupt task boundaries may systematically underestimate the value of stability mechanisms.

Load-bearing premise

The artificial gradual drift added to the modified environments accurately models real-world non-stationarity without introducing implementation artifacts that favor the consolidation methods.

What would settle it

Running the same consolidation experiments but replacing the gradual drift with abrupt task switches and finding that multi-timescale successor-feature consolidation no longer outperforms single-timescale or Q-value consolidation.

Figures

Figures reproduced from arXiv: 2605.26357 by Blake Richards, Doina Precup, Raymond Chua.

**Figure 1.** Figure 1: Motivating stability-plasticity tradeoffs in naturalistic, continually non-stationary RL where the environment evolves gradually, rather than abruptly. To illustrate, we show (a) the Humanoid walking forward task and (b) an example of the noisy sine function used to generate smooth changes in its mass. (c) Average episode return plot and (d) Area under the curve (AUC) show that stability-preserving methods… view at source ↗

**Figure 2.** Figure 2: a: Neuro-inspired synaptic consolidation model adapted from (Benna & Fusi, 2016). The visible variable, u1, represents the synaptic efficacy v, while downstream hidden variables u2, u3, ... interact bidirectionally across timescales, with beaker capacities C1 < C2 < ..., < CK and tube widths representing flow strength g1,2 > g2,3 > ..., > gK,K+1 controlling the rate of interaction between the variables. To… view at source ↗

**Figure 3.** Figure 3: Results from Slippery Four Rooms with naturalistic, continual evolving slip dynamics that randomly replace actions. (a): Average return across two sequential tasks (Task 1 and 2), each repeated twice (Exposure 1 and 2). In DQN+P-last (yellow), plasticity injection is applied midway through training by randomly re-initializing the last layer’s parameters. (b): Steps to reach a predefined performance thresho… view at source ↗

**Figure 4.** Figure 4: Slippery Four Rooms For base models, we use Double Deep Q-Network (DQN) (Van Hasselt et al., 2016) for Slippery Four Rooms environment and the Deterministic Policy Gradient algorithm (Silver et al., 2014) with twin critics for MuJoCo (TD3) (Fujimoto et al., 2018). For SFs, we use Simple SFs (Chua et al., 2024) which can be learned without auxiliary losses. We selected these models due to their flexibilit… view at source ↗

**Figure 5.** Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Quantification of mass changes for the Humanoid embodiment. We consider three levels of mass dynamics variation: mild (25%), moderate (50%), and severe (100%), corresponding to the maximum change allowed before the physical simulation becomes unstable. Across these settings, plasticity-preserving methods (CBP, P-last) are less effective than approaches incorporating synaptic consolidation (SC). Under moder… view at source ↗

**Figure 7.** Figure 7: Analysis of timescales in MuJoCo using our model, SF + SC. More consolidation variables (6–9) improve learning efficiency, highlighting the benefit of slower timescales. Zero variables correspond to the Simple SF agent. See Appendix M for results in the Slippery Four Rooms environment. 5.4. Robustness to Non-Periodic and Stochastic Drift To test robustness beyond periodic drift, we replace the noisy sine m… view at source ↗

**Figure 8.** Figure 8: Cross-Attention analysis of individual consolidations, replacing memory recall via backflow. (a): Implementation design. (b-c): Attention probabilities over consolidation variables, where higher probability indicates greater contribution to learning. Full results are provided in Appendix O. serve as keys and values. The softmax probabilities from the cross-attention mechanism (Figure 8b and c) showed that … view at source ↗

**Figure 9.** Figure 9: Capacity analysis using Humanoid. X-axis shows the number of parameters, while the y-axis shows performance measured by area under the curve (AUC). Increasing the parameter count of TD3 and its variants did not consistently improve performance compared to SF + SC (star), suggesting the contribution of consolidating SFs beyond network capacity scaling alone. Since each consolidation variable introduces an… view at source ↗

**Figure 10.** Figure 10: (a) The slippery variant of the 3D Four Rooms environment. The agent alternates between two tasks: in Task 1, reaching the green box produces +1 reward and the yellow box -1, in Task 2, the reward assignment is reversed. At each step, the agent’s chosen action may be randomly replaced with a probability sampled from the noisy sine function shown in B. The agent receives only egocentric pixel observations.… view at source ↗

**Figure 11.** Figure 11: MuJoCo suite with continuous periodic mass changes during training and evaluation for (a) Humanoid, (b)Walker, (c) Quadruped, (d) Half-cheetah. A periodic noisy sine wave that generates continuously varying mass values, used to stochastically scale the agent’s mass during training and evaluation. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: (a) Partial view over the first 1 million environment steps and (b) complete view over the full 10 million environment training steps of the noisy non-periodic sine wave used to generate continuously varying slip probabilities that stochastically replace the agent’s intended actions. E.4. Continuous Control in MuJoCo (Non-Periodic) Humanoid Task: Walk Forward Quadruped Task: Run Forward (c) Walker Task: R… view at source ↗

**Figure 13.** Figure 13 [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

**Figure 14.** Figure 14: (a) Partial view over the first 1 million environment steps and (b) complete view over the full 10 million training steps of the Ornstein–Uhlenbeck (OU) process used to generate varying slip probabilities that stochastically replace the agent’s intended actions. E.6. Continuous Control in MuJoCo (Ornstein–Uhlenbeck processes) Humanoid Task: Walk Forward Quadruped Task: Run Forward (c) Walker Task: Run For… view at source ↗

**Figure 15.** Figure 15 [PITH_FULL_IMAGE:figures/full_fig_p019_15.png] view at source ↗

**Figure 16.** Figure 16: Slippery Four Rooms environment We extended this environment used in the Simple Successor Features (Chua et al., 2024) which was built upon the original 3D Miniworld environment (Chevalier-Boisvert et al., 2023). In the slippery variant of the Four Rooms environment7 , we mimic wet or icy conditions in all four rooms, rather than just two rooms (top right and bottom left). The key difference is, unlike in… view at source ↗

**Figure 17.** Figure 17: Plasticity–stability analysis in the MuJoCo suite under continuous mass changes induced by a periodic noisy sine function during training and evaluation. We compare a baseline TD3 agent with three variants: (i) Continual Backprop (CBP), which selectively resets least-active weights; (ii) plasticity injection by resetting the weights in the last layer (P-last); and (iii) synaptic consolidation (SC). Across… view at source ↗

**Figure 18.** Figure 18: Plasticity–stability analysis in the Slippery Four Rooms environment using non-periodic noisy sinusoidal function. The agent undergoes two exposures; after each learning phase, the reward mapping is reversed. (a) Average return per episode. (b) Learning efficiency (steps to reach a good policy; lower is better). For the plasticity-injection agent, plasticity was injected once at 10 million environment ste… view at source ↗

**Figure 19.** Figure 19: Plasticity–stability analysis in the MuJoCo suite under continuous mass changes induced by a non-periodic noisy sine function during training and evaluation. We compare a baseline TD3 agent with three variants: (i) Continual Backprop (CBP), which selectively resets least-active weights; (ii) plasticity injection by resetting the weights in the last layer (P-last); and (iii) synaptic consolidation (SC). Ac… view at source ↗

**Figure 20.** Figure 20: Plasticity–stability analysis in the Slippery Four Rooms environment using OU processes. The agent undergoes two exposures; after each learning phase, the reward mapping is reversed. (a) Average return per episode. (b) Learning efficiency (steps to reach a good policy; lower is better). For the plasticity-injection agent, plasticity was injected once at 10 million environment steps (end of Exposure 1). In… view at source ↗

**Figure 21.** Figure 21: Plasticity–stability analysis in the MuJoCo suite under continuous mass changes induced by Ornstein–Uhlenbeck processes during training and evaluation. We compare a baseline TD3 agent with three variants: (i) Continual Backprop (CBP), which selectively resets least-active weights; (ii) plasticity injection by resetting the weights in the last layer (P-last); and (iii) synaptic consolidation (SC). Across e… view at source ↗

**Figure 22.** Figure 22: Schematic of synaptic consolidation applied to Q-values and to Successor Features (SFs). (a): (Kaplanis et al., 2018) showed that adapting the synaptic consolidation mechanism of (Benna & Fusi, 2016) to Q-values improves robustness in continual RL. (b): Here, we extend this approach to predictive, generalizable representations using Simple Successor Features (Chua et al., 2024). Consolidated variables (e.… view at source ↗

**Figure 23.** Figure 23: Comparison of Q-values and Successor Features (SFs), with synaptic consolidation (SC) or elastic weight consolidation (EWC), in the 3D Slippery Four Rooms environment during training and evaluation. Applying SC to Q-values (green) and SFs (purple) offers higher learning efficiency than their EWC counterparts, requiring fewer steps to learn a good policy. This demonstrates that SC is more effective than EW… view at source ↗

**Figure 24.** Figure 24: Comparison of Q-values and Successor Features (SFs), with and without synaptic consolidation, on the MuJoCo suite under continuous mass changes during training and evaluation. Interestingly, unlike Q-values, applying synaptic consolidation to SFs (purple) yields consistently higher learning efficiency. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_24.png] view at source ↗

**Figure 25.** Figure 25: Comparison of consolidating the parameters of Q-values and SFs using Synaptic Consolidation (SC) using the 3D Slippery Four Rooms environment. (left): Average episode return plot. (right): Number of training steps needed to reach a pre-determined good policy. Lesser steps the better. Applying SC to the SFs (purple) yields better learning performance overall. L.2. MuJoCo suite with periodic mass changes 31… view at source ↗

**Figure 26.** Figure 26: Comparison of consolidating the parameters of Q-values and SFs using Synaptic Consolidation using the MuJoCo suite. Interestingly, when compared to TD3 (blue), SFs (orange) learn well in Half-Cheetah and Walker but not Quadruped and Humanoid. This is probably due to higher complexity in Quadruped and Humanoid as they have larger state and action spaces. Overall, applying SC to the SFs (purple) yields bett… view at source ↗

**Figure 27.** Figure 27: Comparison of consolidating the parameters of Q-values and SFs using Synaptic Consolidation using the MuJoCo suite when embodiments undergo non-periodic mass changes. Interestingly, when compared to TD3 (blue), SFs (orange) learn well in Half-Cheetah and Walker but not Quadruped and Humanoid. This is probably due to higher complexity in Quadruped and Humanoid as they have larger state and action spaces. U… view at source ↗

**Figure 28.** Figure 28: Comparison of consolidating the parameters of Q-values and SFs using Synaptic Consolidation using the MuJoCo suite when embodiments under Ornstein-Uhlenbeck mass Changes. In this setting, applying SC to the SFs (purple) only yields better learning performance compared to applying SC to Q-values. 36 [PITH_FULL_IMAGE:figures/full_fig_p036_28.png] view at source ↗

**Figure 29.** Figure 29: Analysis of fast and slow timescale variables in the 3D Slippery Four Rooms environment during training and evaluation. Using synaptic consolidation clearly leads to better learning efficiency, but there is no clear advantage between six and nine consolidation variables. M.2. MuJoCo suite Results 37 [PITH_FULL_IMAGE:figures/full_fig_p037_29.png] view at source ↗

**Figure 30.** Figure 30: Analysis of fast and slow timescale variables on the MuJoCo suite under continuous mass changes during training and evaluation. Using more consolidation variables (six, eight or nine) yields consistently higher learning efficiency, highlighting the importance of slower-timescale variables. 38 [PITH_FULL_IMAGE:figures/full_fig_p038_30.png] view at source ↗

**Figure 31.** Figure 31: Using cross-attention to recall information from the SF consolidation modules. (a: A high-level schematic on how the cross-attention mechanism is used. (b: The computations for the cross-attention mechanism. We used the reward weight vector w as the query, the SFs consolidation variables except the most plastic one as keys and values (SFu2 , SFu3 , . . . , SFuK ). Because these SFs consolidation variables… view at source ↗

**Figure 32.** Figure 32: Analysis of all consolidated variables using Cross-Attention during training in the 3D Slippery Four Rooms environment. The cross-attention probabilities indicate that fast and slow timescale variables were attended to similarly, suggesting nearly equal contribution. This may be due to the sparse reward structure in the 3D Slippery Four Rooms environment, which affects how discriminate the SFs are given t… view at source ↗

**Figure 33.** Figure 33: Analysis of all consolidated variables using cross-attention in the MuJoCo suite under continuous mass changes. Memory recall was performed solely through the cross-attention mechanism, rather than by waiting for information to propagate from slower to faster timescale variables. Unsurprisingly, faster timescale variables were attended to more than slower ones. Notably, Half-Cheetah and Walker benefited f… view at source ↗

**Figure 34.** Figure 34: Learning curves in the MuJoCo suite under continuous mass changes with cross-attention over consolidated variables. Faster timescale variables were generally attended to more strongly than slower ones as shown in [PITH_FULL_IMAGE:figures/full_fig_p042_34.png] view at source ↗

**Figure 35.** Figure 35: Simple SFs with synaptic consolidation architecture. Simple SFs were adapted from (Chua et al., 2024), with TD3 (Lillicrap et al., 2015) as base model. The synaptic consolidation variables are updated analytically (see section 4 for more details on the consolidation variables). We swept the task learning rate of the reward weight vector across the values of {10−5 , 10−6 , . . . , 10−10} when optimizing th… view at source ↗

**Figure 36.** Figure 36: Comparison of training throughput (FPS) for all models in the Slippery Four Rooms environment. Higher FPS reflects more efficient computation. 0 consolidation var 3 consolidation var 6 consolidation var 9 consolidation var 0 100 200 300 400 Frames Per Sec Computational Cost of SF Consolidation for Slippery Four Rooms Env [PITH_FULL_IMAGE:figures/full_fig_p047_36.png] view at source ↗

**Figure 37.** Figure 37: Comparison of training throughput (FPS) for different number of consolidation variables for the SFs within the slippery four rooms environment. Higher FPS reflects more efficient computation. 47 [PITH_FULL_IMAGE:figures/full_fig_p047_37.png] view at source ↗

**Figure 38.** Figure 38: Comparison of training throughput (FPS) for all models in the humanoid embodiment within the MuJoCo environment. Higher FPS reflects more efficient computation. 0 consolidation var 3 consolidation var 6 consolidation var 9 consolidation var 0 20 40 60 80 100 120 Frames Per Sec Frames per Second During Training for Mujoco (Humanoid) [PITH_FULL_IMAGE:figures/full_fig_p048_38.png] view at source ↗

**Figure 39.** Figure 39: Comparison of training throughput (FPS) for different number of consolidation variables for the SFs within the humanoid embodiment within the MuJoCo environment. Higher FPS reflects more efficient computation. 48 [PITH_FULL_IMAGE:figures/full_fig_p048_39.png] view at source ↗

**Figure 40.** Figure 40: Comparison of PPO (teal) with TD3, SF, and their variants with plasticity preservation or stability enhancement mechanisms under continuous mass changes. (Left) Average episode return over training. (Middle) Area under the curve (AUC) of the return, summarizing overall performance. (Right) Total number of environment samples used during training. While PPO leverages parallelized data collection and uses m… view at source ↗

**Figure 41.** Figure 41: Quantification of slippery dynamics in the 3D Four Rooms environment. (a) Average episode return, (b) minimum number of environment steps required to learn a successful policy, and (c) area under the curve (AUC) of the episode returns. We consider three levels of slippery probability variation: mild (25%), moderate (50%), and severe (100%), where the maximum corresponds to a 0.45 probability that the sele… view at source ↗

**Figure 42.** Figure 42: Quantification of mass changes for the Humanoid embodiment. We consider three levels of mass dynamics variation: mild (25%), moderate (50%), and severe (100%), corresponding to the maximum change allowed before the physical simulation becomes unstable. Across these settings, plasticity-preserving methods (CBP, P-last) are less effective than approaches incorporating synaptic consolidation (SC). Under mode… view at source ↗

**Figure 43.** Figure 43: Quantification of mass changes for the Quadruped embodiment. We consider three levels of mass dynamics variation: mild (25%), moderate (50%), and severe (100%), corresponding to the maximum change allowed before the physical simulation becomes unstable. Across these settings, plasticity-preserving methods (CBP, P-last) are less effective than approaches incorporating synaptic consolidation (SC). Under sev… view at source ↗

**Figure 44.** Figure 44: Quantification of mass changes for the Half-Cheetah embodiment. We consider three levels of mass dynamics variation: mild (25%), moderate (50%), and severe (100%), corresponding to the maximum change allowed before the physical simulation becomes unstable. Across these settings, plasticity-preserving methods (CBP, P-last) are less effective than approaches incorporating synaptic consolidation (SC). Under … view at source ↗

**Figure 45.** Figure 45: Quantification of mass changes for the Walker embodiment. We consider three levels of mass dynamics variation: mild (25%), moderate (50%), and severe (100%), corresponding to the maximum change allowed before the physical simulation becomes unstable. Across these settings, plasticity-preserving methods (CBP, P-last) are less effective than approaches incorporating synaptic consolidation (SC). Under severe… view at source ↗

read the original abstract

A hallmark of intelligence is the ability to adapt in non-stationary environments, yet deep Reinforcement Learning (RL) agents often struggle in such settings. Prior studies introduce non-stationarity through abrupt shifts in features or dynamics, whereas real-world environments often evolve gradually through continual drift. This distinction has important implications for the "stability-plasticity dilemma" in RL, as abrupt task changes may demand more plasticity than naturalistic settings. To address this, we modify existing 3D Miniworld and MuJoCo environments to incorporate naturalistic, continual non-stationarity, and use them to examine how stability and adaptation affect performance under continuous environmental change. We find that methods favoring stability, such as synaptic consolidation, outperform approaches focused on plasticity, such as parameters resetting. Motivated by this result, and prior evidence that Successor Features (SFs) reduce interference, we investigate whether SFs are better consolidation targets than Q-values. Across both environments, applying neuro-inspired synaptic consolidation to SFs yields superior performance on continually changing settings. Moreover, consolidation is most effective when SFs are stabilized across multiple timescales, which capture complementary aspects of gradual environmental change. Together, these results suggest that stability is more critical in continual learning when changes are gradual, and that multi-timescale consolidation of predictive representations is an effective approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds multi-timescale consolidation on successor features beats plasticity baselines in gradual-drift versions of Miniworld and MuJoCo, but the abstract gives almost no experimental details so the result is hard to evaluate.

read the letter

The core observation here is that gradual environmental drift favors stability mechanisms over pure plasticity ones, and that successor features make better consolidation targets than raw Q-values when changes unfold slowly. They modify two standard environments to add continuous non-stationarity instead of the usual abrupt task switches, then compare synaptic consolidation applied to SFs against parameter resetting and single-timescale variants. The claim is that multi-timescale SF consolidation captures complementary aspects of the drift and yields the best performance.

What stands out is the framing: most continual RL work still uses discrete task boundaries, so testing gradual drift is a reasonable shift even if the environments themselves are not brand new. The motivation from the stability-plasticity dilemma is clear and the choice to stabilize predictive representations rather than value estimates follows from earlier SF literature.

The soft spots are obvious from the abstract alone. There are no numbers on runs, variance, statistical tests, or exact baseline implementations. The functional form of the drift (reward, dynamics, or visuals; linear or stochastic) is not described, which matters because any interaction between the drift schedule and the consolidation rule could produce the reported ordering without it generalizing. The stress-test concern about environment artifacts therefore lands; until the methods section shows the drift is naturalistic and the controls rule out implementation confounds, the superiority claim stays provisional.

This is the kind of paper that belongs in a reading group on continual RL if the full experiments hold up, but it is not yet ready for a strong citation. A serious editor should send it to review once the experimental details and ablations are filled in; the question it asks is worth referee time even if the current evidence is thin.

Referee Report

2 major / 1 minor

Summary. The manuscript modifies 3D Miniworld and MuJoCo environments to include gradual, continual non-stationarity and compares synaptic consolidation applied to successor features (SFs) against plasticity-focused baselines such as parameter resetting. It reports that consolidation on SFs, particularly when performed across multiple timescales, yields superior performance under these drifting conditions and concludes that stability is more critical than plasticity for gradual environmental change.

Significance. If the empirical results are robust, the work supplies concrete evidence that predictive representations such as SFs can serve as effective consolidation targets and that multi-timescale stabilization captures complementary aspects of slow environmental drift. This would strengthen the case for stability-oriented mechanisms in continual RL and motivate further investigation of timescale-separated representations.

major comments (2)

[Environment modification] Environment modification section: the functional form, rate, and scope of the introduced drift (whether applied to rewards, transitions, or visual features; linear, sinusoidal, or stochastic) are not specified. Because the central claim that multi-timescale SF consolidation outperforms plasticity baselines rests on these environments faithfully instantiating naturalistic gradual change, the absence of this detail leaves open the possibility that observed gains are artifacts of the particular drift implementation.
[Experimental results] Experimental results: the abstract asserts empirical superiority of consolidation on SFs, yet the manuscript supplies no information on the number of independent runs, statistical tests performed, or controls for confounding implementation choices. Without these, the reported performance differences cannot be assessed as reliable support for the stability-plasticity claim.

minor comments (1)

[Method] Notation for the fast and slow successor-feature components is introduced without an explicit equation relating them to the standard SF definition; adding this would clarify the multi-timescale construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments identify areas where additional detail will strengthen the manuscript's clarity and support for its claims. We address each point below and will revise accordingly.

read point-by-point responses

Referee: [Environment modification] Environment modification section: the functional form, rate, and scope of the introduced drift (whether applied to rewards, transitions, or visual features; linear, sinusoidal, or stochastic) are not specified. Because the central claim that multi-timescale SF consolidation outperforms plasticity baselines rests on these environments faithfully instantiating naturalistic gradual change, the absence of this detail leaves open the possibility that observed gains are artifacts of the particular drift implementation.

Authors: We agree that precise specification of the drift is essential for reproducibility and to substantiate that the environments capture gradual naturalistic change. In the revised manuscript we will expand the Environment modification section with the exact functional forms, rates, scopes (rewards, transitions, visual features), and any stochastic components used in both the 3D Miniworld and MuJoCo setups. revision: yes
Referee: [Experimental results] Experimental results: the abstract asserts empirical superiority of consolidation on SFs, yet the manuscript supplies no information on the number of independent runs, statistical tests performed, or controls for confounding implementation choices. Without these, the reported performance differences cannot be assessed as reliable support for the stability-plasticity claim.

Authors: We acknowledge that reporting the number of independent runs, statistical tests, and controls is required to evaluate reliability. The revised manuscript will include these details (number of random seeds, statistical tests with p-values, and controls for implementation choices) in the Experimental results section and figure captions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; paper is purely empirical.

full rationale

The manuscript contains no derivations, equations, or fitted parameters presented as predictions. All claims rest on experimental comparisons of consolidation methods versus baselines in modified environments. These results are externally falsifiable through replication and do not reduce to self-definition, self-citation load-bearing, or renaming of known results. The central premise (superiority of multi-timescale SF consolidation under gradual drift) is supported by performance metrics rather than by construction from its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; all technical details are absent.

pith-pipeline@v0.9.1-grok · 5760 in / 1022 out tokens · 24739 ms · 2026-06-29T22:13:55.392498+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 17 canonical work pages · 9 internal anchors

[1]

Abbas, Z., Zhao, R., Modayil, J., White, A., and Machado, M. C. Loss of plasticity in continual deep reinforcement learning. In Conference on lifelong learning agents, pp.\ 620--636. PMLR, 2023

2023
[2]

P., and Singh, S

Abel, D., Barreto, A., Van Roy, B., Precup, D., van Hasselt, H. P., and Singh, S. A definition of continual reinforcement learning. Advances in Neural Information Processing Systems, 36: 0 50377--50407, 2023

2023
[3]

and Precup, D

Anand, N. and Precup, D. Prediction and control in continual reinforcement learning. Advances in Neural Information Processing Systems, 36: 0 63779--63817, 2023

2023
[4]

J., Schaul, T., van Hasselt, H

Barreto, A., Dabney, W., Munos, R., Hunt, J. J., Schaul, T., van Hasselt, H. P., and Silver, D. Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30, 2017

2017
[5]

G., Naddaf, Y., Veness, J., and Bowling, M

Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The arcade learning environment: An evaluation platform for general agents. Journal of artificial intelligence research, 47: 0 253--279, 2013

2013
[6]

K., Kolouri, S., and Soltoggio, A

Ben-Iwhiwhu, E., Nath, S., Pilly, P. K., Kolouri, S., and Soltoggio, A. Lifelong reinforcement learning with modulating masks. arXiv preprint arXiv:2212.11110, 2022

work page arXiv 2022
[7]

Benna, M. K. and Fusi, S. Computational principles of synaptic memory consolidation. Nature neuroscience, 19 0 (12): 0 1697--1706, 2016

2016
[8]

Experiment tracking with weights and biases, 2020

Biewald, L. Experiment tracking with weights and biases, 2020. URL https://www.wandb.com/. Software available from wandb.com

2020
[9]

Universal Successor Features Approximators

Borsa, D., Barreto, A., Quan, J., Mankowitz, D., Munos, R., Van Hasselt, H., Silver, D., and Schaul, T. Universal successor features approximators. arXiv preprint arXiv:1812.07626, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., Vander P las, J., Wanderman- M ilne, S., and Zhang, Q

Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., Vander P las, J., Wanderman- M ilne, S., and Zhang, Q. JAX : composable transformations of P ython+ N um P y programs, 2018. URL http://github.com/google/jax

2018
[11]

Task-agnostic continual reinforcement learning: Gaining insights and overcoming challenges

Caccia, M., Mueller, J., Kim, T., Charlin, L., and Fakoor, R. Task-agnostic continual reinforcement learning: Gaining insights and overcoming challenges. In Conference on Lifelong Learning Agents, pp.\ 89--119. PMLR, 2023

2023
[12]

S., and Terry, J

Chevalier-Boisvert, M., Dai, B., Towers, M., de Lazcano, R., Willems, L., Lahlou, S., Pal, S., Castro, P. S., and Terry, J. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023

work page arXiv 2023
[13]

A., and Precup, D

Chua, R., Ghosh, A., Kaplanis, C., Richards, B. A., and Precup, D. Learning successor features the simple way. Advances in Neural Information Processing Systems, 37: 0 49957--50030, 2024

2024
[14]

F., Lan, Q., Rahman, P., Mahmood, A

Dohare, S., Hernandez-Garcia, J. F., Lan, Q., Rahman, P., Mahmood, A. R., and Sutton, R. S. Loss of plasticity in deep continual learning. Nature, 632 0 (8026): 0 768--774, 2024

2024
[15]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[16]

French, R. M. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3 0 (4): 0 128--135, 1999

1999
[17]

Addressing function approximation error in actor-critic methods, 2018

Fujimoto, S., van Hoof, H., and Meger, D. Addressing function approximation error in actor-critic methods, 2018

2018
[18]

J raph: A library for graph neural networks in jax., 2020

Godwin*, J., Keck*, T., Battaglia, P., Bapst, V., Kipf, T., Li, Y., Stachenfeld, K., Veli c kovi\' c , P., and Sanchez-Gonzalez, A. J raph: A library for graph neural networks in jax., 2020. URL http://github.com/deepmind/jraph

2020
[19]

F lax: A neural network library and ecosystem for JAX , 2024

Heek, J., Levskaya, A., Oliver, A., Ritter, M., Rondepierre, B., Steiner, A., and van Z ee, M. F lax: A neural network library and ecosystem for JAX , 2024. URL http://github.com/google/flax

2024
[20]

Hunter, J. D. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9 0 (3): 0 90--95, 2007. doi:10.1109/MCSE.2007.55

work page doi:10.1109/mcse.2007.55 2007
[21]

Continual reinforcement learning with complex synapses

Kaplanis, C., Shanahan, M., and Clopath, C. Continual reinforcement learning with complex synapses. In International Conference on Machine Learning, pp.\ 2497--2506. PMLR, 2018

2018
[22]

Policy Consolidation for Continual Reinforcement Learning

Kaplanis, C., Shanahan, M., and Clopath, C. Policy consolidation for continual reinforcement learning. arXiv preprint arXiv:1902.00255, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[23]

Zaletel, and Joel E

Kaplanis, C., Clopath, C., and Shanahan, M. Continual reinforcement learning with multi-timescale replay (2020). DOI: https://doi. org/10.48550/arXiv, 2020

work page internal anchor Pith review doi:10.48550/arxiv 2020
[24]

Towards continual reinforcement learning: A review and perspectives

Khetarpal, K., Riemer, M., Rish, I., and Precup, D. Towards continual reinforcement learning: A review and perspectives. Journal of Artificial Intelligence Research, 75: 0 1401--1476, 2022

2022
[25]

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[26]

Kingma, D. P. and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[27]

A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114 0 (13): 0 3521--3526, 2017

2017
[28]

Jupyter notebooks -- a publishing format for reproducible computational workflows

Kluyver, T., Ragan-Kelley, B., P \'e rez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., and Willing, C. Jupyter notebooks -- a publishing format for reproducible computational workflows. In Loizides, F. and Schmidt, B. (eds.), Positioning and Power in Academic Publishing:...

2016
[29]

Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks

Lee, H., Cho, H., Kim, H., Kim, D., Min, D., Choo, J., and Lyle, C. Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks. arXiv preprint arXiv:2406.02596, 2024

work page arXiv 2024
[30]

Continuous control with deep reinforcement learning

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[31]

Disentangling the causes of plasticity loss in neural networks

Lyle, C., Zheng, Z., Khetarpal, K., van Hasselt, H., Pascanu, R., Martens, J., and Dabney, W. Disentangling the causes of plasticity loss in neural networks. arXiv preprint arXiv:2402.18762, 2024

work page arXiv 2024
[32]

L., McNaughton, B

McClelland, J. L., McNaughton, B. L., and O'Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological review, 102 0 (3): 0 419, 1995

1995
[33]

and Cohen, N

McCloskey, M. and Cohen, N. J. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp.\ 109--165. Elsevier, 1989

1989
[34]

The primacy bias in deep reinforcement learning

Nikishin, E., Schwarzer, M., D’Oro, P., Bacon, P.-L., and Courville, A. The primacy bias in deep reinforcement learning. In International conference on machine learning, pp.\ 16828--16847. PMLR, 2022

2022
[35]

Deep reinforcement learning with plasticity injection

Nikishin, E., Oh, J., Ostrovski, G., Lyle, C., Pascanu, R., Dabney, W., and Barreto, A. Deep reinforcement learning with plasticity injection. Advances in Neural Information Processing Systems, 36: 0 37142--37159, 2023

2023
[36]

Pytorch: An imperative style, high-performance deep learning library

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

2019
[37]

Self-activating neural ensembles for continual reinforcement learning

Powers, S., Xing, E., and Gupta, A. Self-activating neural ensembles for continual reinforcement learning. In Conference on Lifelong Learning Agents, pp.\ 683--704. PMLR, 2022

2022
[38]

Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference

Riemer, M., Cases, I., Ajemian, R., Liu, M., Rish, I., Tu, Y., and Tesauro, G. Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[39]

Experience replay for continual learning

Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., and Wayne, G. Experience replay for continual learning. Advances in neural information processing systems, 32, 2019

2019
[40]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[41]

W., Pascanu, R., and Hadsell, R

Schwarz, J., Czarnecki, W., Luketina, J., Grabska-Barwinska, A., Teh, Y. W., Pascanu, R., and Hadsell, R. Progress & compress: A scalable framework for continual learning. In International conference on machine learning, pp.\ 4528--4537. PMLR, 2018

2018
[42]

and Sutton, R

Silver, D. and Sutton, R. S. Welcome to the era of experience. Google AI, 1, 2025

2025
[43]

Deterministic policy gradient algorithms

Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. Deterministic policy gradient algorithms. In International conference on machine learning, pp.\ 387--395. Pmlr, 2014

2014
[44]

S., and Evci, U

Sokar, G., Agarwal, R., Castro, P. S., and Evci, U. The dormant neuron phenomenon in deep reinforcement learning. In International Conference on Machine Learning, pp.\ 32145--32168. PMLR, 2023

2023
[45]

Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018

2018
[46]

Mujoco: A physics engine for model-based control

Todorov, E., Erez, T., and Tassa, Y. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp.\ 5026--5033. IEEE, 2012

2012
[47]

dm\_control: Software and tasks for continuous control

Tunyasuvunakool, S., Muldal, A., Doron, Y., Liu, S., Bohez, S., Merel, J., Erez, T., Lillicrap, T., Heess, N., and Tassa, Y. dm\_control: Software and tasks for continuous control. Software Impacts, 6: 0 100022, 2020

2020
[48]

Deep reinforcement learning with double q-learning

Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, 2016

2016
[49]

and Drake, F

Van Rossum, G. and Drake, F. L. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA, 2009. ISBN 1441412697

2009
[50]

Waskom, M. L. seaborn: statistical data visualization. Journal of Open Source Software, 6 0 (60): 0 3021, 2021. doi:10.21105/joss.03021. URL https://doi.org/10.21105/joss.03021

work page doi:10.21105/joss.03021 2021
[51]

Deep reinforcement learning amidst lifelong non-stationarity

Xie, A., Harrison, J., and Finn, C. Deep reinforcement learning amidst lifelong non-stationarity. arXiv preprint arXiv:2006.10701, 2020

work page arXiv 2006
[52]

Hydra - a framework for elegantly configuring complex applications

Yadan, O. Hydra - a framework for elegantly configuring complex applications. Github, 2019. URL https://github.com/facebookresearch/hydra

2019
[53]

Mastering visual continuous control: Improved data-augmented reinforcement learning

Yarats, D., Fergus, R., Lazaric, A., and Pinto, L. Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021

work page arXiv 2021
[54]

Continual learning through synaptic intelligence

Zenke, F., Poole, B., and Ganguli, S. Continual learning through synaptic intelligence. In International conference on machine learning, pp.\ 3987--3995. PMLR, 2017

2017
[55]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

[1] [1]

Abbas, Z., Zhao, R., Modayil, J., White, A., and Machado, M. C. Loss of plasticity in continual deep reinforcement learning. In Conference on lifelong learning agents, pp.\ 620--636. PMLR, 2023

2023

[2] [2]

P., and Singh, S

Abel, D., Barreto, A., Van Roy, B., Precup, D., van Hasselt, H. P., and Singh, S. A definition of continual reinforcement learning. Advances in Neural Information Processing Systems, 36: 0 50377--50407, 2023

2023

[3] [3]

and Precup, D

Anand, N. and Precup, D. Prediction and control in continual reinforcement learning. Advances in Neural Information Processing Systems, 36: 0 63779--63817, 2023

2023

[4] [4]

J., Schaul, T., van Hasselt, H

Barreto, A., Dabney, W., Munos, R., Hunt, J. J., Schaul, T., van Hasselt, H. P., and Silver, D. Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30, 2017

2017

[5] [5]

G., Naddaf, Y., Veness, J., and Bowling, M

Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The arcade learning environment: An evaluation platform for general agents. Journal of artificial intelligence research, 47: 0 253--279, 2013

2013

[6] [6]

K., Kolouri, S., and Soltoggio, A

Ben-Iwhiwhu, E., Nath, S., Pilly, P. K., Kolouri, S., and Soltoggio, A. Lifelong reinforcement learning with modulating masks. arXiv preprint arXiv:2212.11110, 2022

work page arXiv 2022

[7] [7]

Benna, M. K. and Fusi, S. Computational principles of synaptic memory consolidation. Nature neuroscience, 19 0 (12): 0 1697--1706, 2016

2016

[8] [8]

Experiment tracking with weights and biases, 2020

Biewald, L. Experiment tracking with weights and biases, 2020. URL https://www.wandb.com/. Software available from wandb.com

2020

[9] [9]

Universal Successor Features Approximators

Borsa, D., Barreto, A., Quan, J., Mankowitz, D., Munos, R., Van Hasselt, H., Silver, D., and Schaul, T. Universal successor features approximators. arXiv preprint arXiv:1812.07626, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., Vander P las, J., Wanderman- M ilne, S., and Zhang, Q

Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., Vander P las, J., Wanderman- M ilne, S., and Zhang, Q. JAX : composable transformations of P ython+ N um P y programs, 2018. URL http://github.com/google/jax

2018

[11] [11]

Task-agnostic continual reinforcement learning: Gaining insights and overcoming challenges

Caccia, M., Mueller, J., Kim, T., Charlin, L., and Fakoor, R. Task-agnostic continual reinforcement learning: Gaining insights and overcoming challenges. In Conference on Lifelong Learning Agents, pp.\ 89--119. PMLR, 2023

2023

[12] [12]

S., and Terry, J

Chevalier-Boisvert, M., Dai, B., Towers, M., de Lazcano, R., Willems, L., Lahlou, S., Pal, S., Castro, P. S., and Terry, J. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023

work page arXiv 2023

[13] [13]

A., and Precup, D

Chua, R., Ghosh, A., Kaplanis, C., Richards, B. A., and Precup, D. Learning successor features the simple way. Advances in Neural Information Processing Systems, 37: 0 49957--50030, 2024

2024

[14] [14]

F., Lan, Q., Rahman, P., Mahmood, A

Dohare, S., Hernandez-Garcia, J. F., Lan, Q., Rahman, P., Mahmood, A. R., and Sutton, R. S. Loss of plasticity in deep continual learning. Nature, 632 0 (8026): 0 768--774, 2024

2024

[15] [15]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[16] [16]

French, R. M. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3 0 (4): 0 128--135, 1999

1999

[17] [17]

Addressing function approximation error in actor-critic methods, 2018

Fujimoto, S., van Hoof, H., and Meger, D. Addressing function approximation error in actor-critic methods, 2018

2018

[18] [18]

J raph: A library for graph neural networks in jax., 2020

Godwin*, J., Keck*, T., Battaglia, P., Bapst, V., Kipf, T., Li, Y., Stachenfeld, K., Veli c kovi\' c , P., and Sanchez-Gonzalez, A. J raph: A library for graph neural networks in jax., 2020. URL http://github.com/deepmind/jraph

2020

[19] [19]

F lax: A neural network library and ecosystem for JAX , 2024

Heek, J., Levskaya, A., Oliver, A., Ritter, M., Rondepierre, B., Steiner, A., and van Z ee, M. F lax: A neural network library and ecosystem for JAX , 2024. URL http://github.com/google/flax

2024

[20] [20]

Hunter, J. D. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9 0 (3): 0 90--95, 2007. doi:10.1109/MCSE.2007.55

work page doi:10.1109/mcse.2007.55 2007

[21] [21]

Continual reinforcement learning with complex synapses

Kaplanis, C., Shanahan, M., and Clopath, C. Continual reinforcement learning with complex synapses. In International Conference on Machine Learning, pp.\ 2497--2506. PMLR, 2018

2018

[22] [22]

Policy Consolidation for Continual Reinforcement Learning

Kaplanis, C., Shanahan, M., and Clopath, C. Policy consolidation for continual reinforcement learning. arXiv preprint arXiv:1902.00255, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[23] [23]

Zaletel, and Joel E

Kaplanis, C., Clopath, C., and Shanahan, M. Continual reinforcement learning with multi-timescale replay (2020). DOI: https://doi. org/10.48550/arXiv, 2020

work page internal anchor Pith review doi:10.48550/arxiv 2020

[24] [24]

Towards continual reinforcement learning: A review and perspectives

Khetarpal, K., Riemer, M., Rish, I., and Precup, D. Towards continual reinforcement learning: A review and perspectives. Journal of Artificial Intelligence Research, 75: 0 1401--1476, 2022

2022

[25] [25]

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[26] [26]

Kingma, D. P. and Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[27] [27]

A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114 0 (13): 0 3521--3526, 2017

2017

[28] [28]

Jupyter notebooks -- a publishing format for reproducible computational workflows

Kluyver, T., Ragan-Kelley, B., P \'e rez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., and Willing, C. Jupyter notebooks -- a publishing format for reproducible computational workflows. In Loizides, F. and Schmidt, B. (eds.), Positioning and Power in Academic Publishing:...

2016

[29] [29]

Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks

Lee, H., Cho, H., Kim, H., Kim, D., Min, D., Choo, J., and Lyle, C. Slow and steady wins the race: Maintaining plasticity with hare and tortoise networks. arXiv preprint arXiv:2406.02596, 2024

work page arXiv 2024

[30] [30]

Continuous control with deep reinforcement learning

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[31] [31]

Disentangling the causes of plasticity loss in neural networks

Lyle, C., Zheng, Z., Khetarpal, K., van Hasselt, H., Pascanu, R., Martens, J., and Dabney, W. Disentangling the causes of plasticity loss in neural networks. arXiv preprint arXiv:2402.18762, 2024

work page arXiv 2024

[32] [32]

L., McNaughton, B

McClelland, J. L., McNaughton, B. L., and O'Reilly, R. C. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological review, 102 0 (3): 0 419, 1995

1995

[33] [33]

and Cohen, N

McCloskey, M. and Cohen, N. J. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp.\ 109--165. Elsevier, 1989

1989

[34] [34]

The primacy bias in deep reinforcement learning

Nikishin, E., Schwarzer, M., D’Oro, P., Bacon, P.-L., and Courville, A. The primacy bias in deep reinforcement learning. In International conference on machine learning, pp.\ 16828--16847. PMLR, 2022

2022

[35] [35]

Deep reinforcement learning with plasticity injection

Nikishin, E., Oh, J., Ostrovski, G., Lyle, C., Pascanu, R., Dabney, W., and Barreto, A. Deep reinforcement learning with plasticity injection. Advances in Neural Information Processing Systems, 36: 0 37142--37159, 2023

2023

[36] [36]

Pytorch: An imperative style, high-performance deep learning library

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

2019

[37] [37]

Self-activating neural ensembles for continual reinforcement learning

Powers, S., Xing, E., and Gupta, A. Self-activating neural ensembles for continual reinforcement learning. In Conference on Lifelong Learning Agents, pp.\ 683--704. PMLR, 2022

2022

[38] [38]

Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference

Riemer, M., Cases, I., Ajemian, R., Liu, M., Rish, I., Tu, Y., and Tesauro, G. Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[39] [39]

Experience replay for continual learning

Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., and Wayne, G. Experience replay for continual learning. Advances in neural information processing systems, 32, 2019

2019

[40] [40]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[41] [41]

W., Pascanu, R., and Hadsell, R

Schwarz, J., Czarnecki, W., Luketina, J., Grabska-Barwinska, A., Teh, Y. W., Pascanu, R., and Hadsell, R. Progress & compress: A scalable framework for continual learning. In International conference on machine learning, pp.\ 4528--4537. PMLR, 2018

2018

[42] [42]

and Sutton, R

Silver, D. and Sutton, R. S. Welcome to the era of experience. Google AI, 1, 2025

2025

[43] [43]

Deterministic policy gradient algorithms

Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. Deterministic policy gradient algorithms. In International conference on machine learning, pp.\ 387--395. Pmlr, 2014

2014

[44] [44]

S., and Evci, U

Sokar, G., Agarwal, R., Castro, P. S., and Evci, U. The dormant neuron phenomenon in deep reinforcement learning. In International Conference on Machine Learning, pp.\ 32145--32168. PMLR, 2023

2023

[45] [45]

Sutton, R. S. and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018

2018

[46] [46]

Mujoco: A physics engine for model-based control

Todorov, E., Erez, T., and Tassa, Y. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp.\ 5026--5033. IEEE, 2012

2012

[47] [47]

dm\_control: Software and tasks for continuous control

Tunyasuvunakool, S., Muldal, A., Doron, Y., Liu, S., Bohez, S., Merel, J., Erez, T., Lillicrap, T., Heess, N., and Tassa, Y. dm\_control: Software and tasks for continuous control. Software Impacts, 6: 0 100022, 2020

2020

[48] [48]

Deep reinforcement learning with double q-learning

Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, 2016

2016

[49] [49]

and Drake, F

Van Rossum, G. and Drake, F. L. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA, 2009. ISBN 1441412697

2009

[50] [50]

Waskom, M. L. seaborn: statistical data visualization. Journal of Open Source Software, 6 0 (60): 0 3021, 2021. doi:10.21105/joss.03021. URL https://doi.org/10.21105/joss.03021

work page doi:10.21105/joss.03021 2021

[51] [51]

Deep reinforcement learning amidst lifelong non-stationarity

Xie, A., Harrison, J., and Finn, C. Deep reinforcement learning amidst lifelong non-stationarity. arXiv preprint arXiv:2006.10701, 2020

work page arXiv 2006

[52] [52]

Hydra - a framework for elegantly configuring complex applications

Yadan, O. Hydra - a framework for elegantly configuring complex applications. Github, 2019. URL https://github.com/facebookresearch/hydra

2019

[53] [53]

Mastering visual continuous control: Improved data-augmented reinforcement learning

Yarats, D., Fergus, R., Lazaric, A., and Pinto, L. Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021

work page arXiv 2021

[54] [54]

Continual learning through synaptic intelligence

Zenke, F., Poole, B., and Ganguli, S. Continual learning through synaptic intelligence. In International conference on machine learning, pp.\ 3987--3995. PMLR, 2017

2017

[55] [55]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...