https://arxiv.org/pdf/1901.11275.pdf A Theory of regularized Markov decision processes

Matthieu Geist, Bruno Scherrer, Olivier Pietquin · 2019 · cs.LG · arXiv 1901.11275

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. This also draws connections to proximal convex optimization, especially to Mirror Descent.

representative citing papers

Planning in entropy-regularized Markov decision processes and games

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

SmoothCruiser achieves O~(1/epsilon^4) problem-independent sample complexity for value estimation in entropy-regularized MDPs and games via a generative model.

Entropic Regularization of Markov Decision Processes

cs.LG · 2019-07-06 · unverdicted · novelty 6.0

Using alpha-divergences for entropic regularization in MDPs unifies actor-critic architectures via closed-form policy improvement and provides asymptotic analysis on standard RL problems.

citing papers explorer

Showing 2 of 2 citing papers.

Planning in entropy-regularized Markov decision processes and games cs.LG · 2026-04-21 · unverdicted · none · ref 8
SmoothCruiser achieves O~(1/epsilon^4) problem-independent sample complexity for value estimation in entropy-regularized MDPs and games via a generative model.
Entropic Regularization of Markov Decision Processes cs.LG · 2019-07-06 · unverdicted · none · ref 42 · internal anchor
Using alpha-divergences for entropic regularization in MDPs unifies actor-critic architectures via closed-form policy improvement and provides asymptotic analysis on standard RL problems.

https://arxiv.org/pdf/1901.11275.pdf A Theory of regularized Markov decision processes

fields

years

verdicts

representative citing papers

citing papers explorer