Forager is a lightweight partially-observable continual RL environment that exposes loss of plasticity in current agents and highlights the value of state construction for ongoing learning.
International Conference on Learning Representations , year=
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 3years
2026 3verdicts
UNVERDICTED 3roles
method 1polarities
use method 1representative citing papers
QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
SPHERE applies a Parseval penalty to MoE policies in continual RL to maintain spectral plasticity, yielding 133% and 50% higher average success on MetaWorld and HumanoidBench versus unregularized MoE baselines.
citing papers explorer
-
Forager: a lightweight testbed for continual learning with partial observability in RL
Forager is a lightweight partially-observable continual RL environment that exposes loss of plasticity in current agents and highlights the value of state construction for ongoing learning.
-
Behavior-Consistent Deep Reinforcement Learning
QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
-
SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
SPHERE applies a Parseval penalty to MoE policies in continual RL to maintain spectral plasticity, yielding 133% and 50% higher average success on MetaWorld and HumanoidBench versus unregularized MoE baselines.