Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning

Ali H. Sayed; Malek Khammassi; Michael Muehlebach; Onno Eberhard; Yike Zhao

arxiv: 2605.31261 · v1 · pith:ML6GFPHTnew · submitted 2026-05-29 · 💻 cs.LG · cs.AI· stat.ML

Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning

Yike Zhao , Onno Eberhard , Malek Khammassi , Ali H. Sayed , Michael Muehlebach This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords linearlearningrecurrentreinforcementdeterministicfiltersmatrixmemory

0 comments

read the original abstract

The family of linear recurrent neural networks has shown strong performance as recurrent memory units in partially observable reinforcement learning. We provide a theoretical justification for their empirical effectiveness by constructing and studying two linear filters: (i) the first exactly reproduces the pre-softmax logits of the belief vector in a hidden Markov model (HMM) under a deterministic transition matrix, thereby serving as a sufficient statistic for optimal policy learning, (ii) the second achieves vanishing state-decoding error under a nearly deterministic transition matrix, thus reducing state ambiguity to near zero. The results extend to action-controlled HMMs, where the corresponding linear filters become time-varying with action-dependent dynamics. We illustrate our main results through numerical experiments and further show that the constructed linear filter serves as a strong feature extractor in a small reinforcement learning game.

This paper has not been read by Pith yet.

Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning

discussion (0)