Derives an SDE describing the infinitesimal change in state distribution at each gradient step for neural actor-critic RL in continuous environments under vanishing learning rate in the infinite width limit.
Value iteration in continuous actions, states and time
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Policy iteration converges for entropy-regularized stochastic control via novel Hölder-Sobolev estimates yielding uniform bounds on value functions.
citing papers explorer
-
Convergence of Policy Iteration for Entropy-Regularized Stochastic Control Problems
Policy iteration converges for entropy-regularized stochastic control via novel Hölder-Sobolev estimates yielding uniform bounds on value functions.