pith. sign in

arxiv: 2606.11797 · v1 · pith:QMGXYHSXnew · submitted 2026-06-10 · 💻 cs.LG

Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning

classification 💻 cs.LG
keywords deepdriftforgettingnon-stationarybehaviorchangingdecayeffects
0
0 comments X
read the original abstract

Studies on rodents such as mice have shown the capabilities to adapt their behavior when dealing with changing parameters (``drift'') of the environment even if no information about change is provided (uncertainty) -- a behavior that can be modeled by forgetting mechanisms. Non-stationary Reinforcement Learning (NSRL) deals with adapting state-of-the-art RL methods to deal with changing environments: these however usually require (partially) perfect information about the drift such as ``task IDs'' or ``context''. To mitigate the effects of drift, this work develops \emph{Space-sampled Value Decay} as an explicit forgetting mechanism for value-based deep RL architectures as a simple yet effective approach. In particular we demonstrate and discuss positive effects but also limitations in achieved returns for modifications of Deep Q-networks (DQN) and Soft Actor-Critic (SAC) when evaluated on non-stationary environments.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.