Derives contraction-based Q-value extensions for exponential utility and proves almost-sure convergence of two-timescale and one-timescale model-free algorithms in discounted MDPs.
arXiv preprint arXiv:2402.09992 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
DR-SAC is the first actor-critic distributionally robust RL algorithm for offline continuous control that derives a convergent robust soft policy iteration and reports up to 9.8x higher rewards than SAC under perturbations.
citing papers explorer
-
Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs
Derives contraction-based Q-value extensions for exponential utility and proves almost-sure convergence of two-timescale and one-timescale model-free algorithms in discounted MDPs.
-
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty
DR-SAC is the first actor-critic distributionally robust RL algorithm for offline continuous control that derives a convergent robust soft policy iteration and reports up to 9.8x higher rewards than SAC under perturbations.