Switching-Geometry Analysis of Deflated Q-Value Iteration
Pith reviewed 2026-05-20 22:01 UTC · model grok-4.3
The pith
Deflated Q-value iteration admits a convergence rate given by a joint spectral radius that can be strictly smaller than the discount factor.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The standard Q-VI switching system has joint spectral radius exactly gamma because all its subsystems share the all-ones vector as an invariant direction. Quotienting out this direction yields a projected switching system whose joint spectral radius governs the relevant error dynamics and may be strictly smaller than gamma. The all-ones residual correction is equivalent to subtracting a scalar multiple of the all-ones vector from each ordinary Q-VI iterate, leaving both the projected trajectory and the induced greedy-policy sequence unchanged.
What carries the argument
The projected switching system obtained by quotienting the standard Q-VI model by its all-ones invariant direction.
If this is right
- The convergence rate of deflated Q-VI is characterized by the joint spectral radius of the projected system.
- This rate can be strictly smaller than the discount factor, yielding a sharper iteration-complexity bound.
- The sequence of greedy policies generated by deflated Q-VI is identical to the sequence generated by ordinary Q-VI started at the same point.
- The only practical benefit of deflation is a more precise geometric description of convergence after the redundant all-ones component is removed.
Where Pith is reading between the lines
- The same quotient-space reduction may apply to other value-iteration variants whose linear maps share an obvious invariant direction.
- Numerical computation of the projected joint spectral radius on small MDPs could produce instance-specific convergence certificates that are tighter than the uniform gamma bound.
- The equivalence result suggests that any recentering operation preserving the greedy policy map can be analyzed by an analogous projection.
Load-bearing premise
Every admissible subsystem of the standard Q-VI switching model leaves the all-ones vector invariant, forcing the ambient joint spectral radius to equal gamma.
What would settle it
For a concrete finite MDP, compute or bound the joint spectral radius of the projected switching matrices and check whether the norm of the deflated error vector contracts at that rate rather than at gamma.
Figures
read the original abstract
This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision process control. Focusing on an all-ones residual correction, we interpret the resulting algorithm through the geometry of switching systems and, to the best of our knowledge, give the first JSR-based convergence analysis of deflated Q-VI for policy optimization problems. Our analysis reveals that the standard Q-VI switching system model has JSR exactly the discount factor $\gamma\in (0,1)$, since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than $\gamma$. Therefore, the deflated Q-VI admits a potentially sharper convergence-rate characterization than the ambient-space $\gamma$-bound. Finally, we prove that the correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry after the redundant all-ones component is removed.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision processes. It shows that the standard Q-VI switching system has JSR exactly equal to the discount factor γ because every admissible mode matrix shares the all-ones vector as an invariant direction. Passing to the quotient space orthogonal to this direction yields a projected switching system whose JSR governs the relevant error dynamics and may be strictly smaller than γ. The deflation operation is proven equivalent to a scalar recentering of standard Q-VI, which leaves the greedy-policy sequence unchanged.
Significance. If the central derivations hold, the manuscript supplies the first JSR-based convergence analysis of deflated Q-VI and a geometrically sharper rate characterization than the ambient γ-bound. The equivalence result clarifies that deflation improves the description of convergence geometry without altering the induced decision-making problem. This switching-system perspective on rank-one corrections could prove useful for refined convergence analysis in approximate dynamic programming and policy optimization.
major comments (1)
- [standard Q-VI switching system model] The section introducing the standard Q-VI switching system model: the claim that all admissible subsystems share the all-ones vector as an invariant direction (forcing ambient JSR exactly γ) is load-bearing for the ambient-versus-projected distinction. While the row-stochastic structure induced by the Bellman operator is standard, the manuscript should explicitly verify that this invariance holds uniformly over all possible per-state greedy action selections without further restrictions on the MDP or action sets.
minor comments (3)
- [quotient reduction] The definition of the projected norm on the quotient space R^n / span{1} should be stated explicitly (including the choice of representative or inner product) to make the JSR comparison fully rigorous.
- Consider adding a small numerical example (e.g., a two-state MDP) that computes both the ambient JSR and the quotient JSR to illustrate that the latter can indeed be strictly smaller than γ.
- The abstract states that proofs exist for the invariance property, quotient reduction, and equivalence to scalar recentering; ensure each proof is self-contained with clearly labeled assumptions.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the recommendation of minor revision. The single major comment concerns the need for an explicit verification that the all-ones invariance holds uniformly across all admissible modes. We address the point directly below and will revise the manuscript to incorporate the requested clarification.
read point-by-point responses
-
Referee: The section introducing the standard Q-VI switching system model: the claim that all admissible subsystems share the all-ones vector as an invariant direction (forcing ambient JSR exactly γ) is load-bearing for the ambient-versus-projected distinction. While the row-stochastic structure induced by the Bellman operator is standard, the manuscript should explicitly verify that this invariance holds uniformly over all possible per-state greedy action selections without further restrictions on the MDP or action sets.
Authors: We agree that an explicit verification is desirable for clarity. In the Q-VI switching model each mode corresponds to a choice, for every state, of which action realizes the maximum. The associated linear operator acts componentwise as L(Q)(s,a) = γ max_{a' ∈ A(s')} Q(s',a'), where s' is the successor state reached from (s,a). Substituting the all-ones vector yields max = 1 for every component, independently of which action is selected as the argmax. Hence every admissible mode matrix satisfies M 1 = 1 (or, after scaling by γ, eigenvalue γ), with no further restrictions on the MDP, transition kernel, or action sets required. We will add a short lemma or remark in the section on the standard switching-system model that records this argument verbatim. revision: yes
Circularity Check
No significant circularity identified
full rationale
The derivation rests on standard facts about joint spectral radii of switching systems whose modes share a common eigenvector (here the all-ones vector, induced by the row-stochastic structure of the Bellman optimality operator). The claim that ambient JSR equals gamma follows directly from this invariant direction and is not fitted or redefined inside the paper. The quotient-space projection and its potentially smaller JSR are obtained by standard linear-algebraic reduction, while the equivalence of deflation to scalar recentering is shown by an explicit separate proof that leaves the greedy-policy sequence unchanged. No step reduces by construction to a fitted parameter, a self-citation chain, or a renaming of the target quantity; the central geometric claim is independent of quantities internal to the present manuscript.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The set of admissible update matrices for Q-VI forms a finite switching set whose joint spectral radius equals the discount factor gamma because every matrix leaves the all-ones vector invariant.
- standard math The quotient space obtained by removing the all-ones direction yields a well-defined projected linear system whose spectral radius governs the error dynamics orthogonal to that direction.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the standard Q-VI switching system model has JSR exactly the discount factor γ … By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
all admissible subsystems share the all-ones vector as an invariant direction … ρdef = ρ̄
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Dynamic programming and optimal con trol 4th edition, volume ii
Dimitri P Bertsekas. Dynamic programming and optimal con trol 4th edition, volume ii. Athena Scientific , 2015
work page 2015
-
[3]
Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-dynamic programming. Athena Scientific Belmont, MA, 1996
work page 1996
-
[4]
Computationally effi cient approximations of the joint spectral radius
Vincent D Blondel and Yurii Nesterov. Computationally effi cient approximations of the joint spectral radius. SIAM Journal on Matrix Analysis and Applications , 27(1):256–272, 2005
work page 2005
-
[5]
A. Cicone. A note on the joint spectral radius. arXiv preprint arXiv:1502.01506 , 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[6]
V. Goyal and J. Grand-Clément. A first-order approach to a ccelerated value iteration. Opera- tions Research, 71(2):517–535, 2023
work page 2023
-
[7]
J. P. Hespanha and A. S. Morse. Stability of switched syst ems with average dwell-time. In Proceedings of the 38th IEEE Conference on Decision and Cont rol, pages 2655–2660, 1999
work page 1999
-
[8]
The joint spectral radius: Theory and applications , volume 385
Raphaël Jungers. The joint spectral radius: Theory and applications , volume 385. Springer Science & Business Media, 2009
work page 2009
-
[9]
A. S. Kolarijani, T. Ok, P. Mohajerin Esfahani, and M. A. S . Kolarijani. Rank-one modified value iteration. In Proceedings of the 42nd International Conference on Machin e Learning , volume 267 of Proceedings of Machine Learning Research, pages 31182–31201, 2025
work page 2025
-
[10]
Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration
Donghwan Lee. Beyond the bellman fixed point: geometry an d fast policy identification in value iteration. arXiv preprint arXiv:2604.17457v4 , 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[11]
Lyapunov-Certified Direct Switching Theory for Q-Learning
Donghwan Lee. Lyapunov-certified direct switching the ory for q-learning. arXiv preprint arXiv:2604.19569v2, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[12]
J. Lee, A. Rakhsha, E. K. Ryu, and A.-m. Farahmand. Deflat ed dynamics value iteration. Transactions on Machine Learning Research , 2025
work page 2025
-
[13]
Switching in systems and control
Daniel Liberzon. Switching in systems and control . Springer Science & Business Media, 2003
work page 2003
-
[14]
Stability and stabilizab ility of switched linear systems: A survey of recent results
Hai Lin and Panos J Antsaklis. Stability and stabilizab ility of switched linear systems: A survey of recent results. IEEE Transactions on Automatic Control , 54(2):308–322, 2009. 36
work page 2009
-
[15]
Carl D. Meyer. Matrix Analysis and Applied Linear Algebra . SIAM, 2000
work page 2000
- [16]
-
[17]
A note on the joint s pectral radius
Gian-Carlo Rota and Gilbert Strang. A note on the joint s pectral radius. Indag. Math , 22(4): 379–381, 1960
work page 1960
-
[18]
Stability criteria for switched and hybrid systems
Robert Shorten, Fabian Wirth, Oliver Mason, Kai Wulff, a nd Christopher King. Stability criteria for switched and hybrid systems. SIAM Review, 49(4):545–592, 2007
work page 2007
-
[19]
John N Tsitsiklis and Vincent D Blondel. The lyapunov exp onent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute a nd to approximate. Mathematics of Control, Signals and Systems , 10(1):31–40, 1997. 37
work page 1997
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.