Q-value iteration enters an invariant tube around Q* plus the all-ones vector in finite time, with distance decaying at rate given by the joint spectral radius of the transverse projected switching family, which can be strictly faster than the discount factor.
Thus, after finite-time identification of X ∗ , the transverse component admits exponential upper bounds at any rate larger than the JSR ¯ρ∗ of the restricted optimal family
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.OC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration
Q-value iteration enters an invariant tube around Q* plus the all-ones vector in finite time, with distance decaying at rate given by the joint spectral radius of the transverse projected switching family, which can be strictly faster than the discount factor.