Q-value iteration enters an invariant tube around Q* plus the all-ones vector in finite time, with distance decaying at rate given by the joint spectral radius of the transverse projected switching family, which can be strictly faster than the discount factor.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
math.OC 2verdicts
UNVERDICTED 2representative citing papers
Entropy-regularized stochastic games are defined with proofs of value existence for N-stage and discounted cases, sufficiency of Markovian and stationary strategies, and convex optimization algorithms for computation.
citing papers explorer
-
Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration
Q-value iteration enters an invariant tube around Q* plus the all-ones vector in finite time, with distance decaying at rate given by the joint spectral radius of the transverse projected switching family, which can be strictly faster than the discount factor.
-
Entropy-Regularized Stochastic Games
Entropy-regularized stochastic games are defined with proofs of value existence for N-stage and discounted cases, sufficiency of Markovian and stationary strategies, and convex optimization algorithms for computation.