Switching-Geometry Analysis of Deflated Q-Value Iteration

Donghwan Lee

arxiv: 2605.10811 · v2 · pith:FIYHKU37new · submitted 2026-05-11 · 🧮 math.OC · cs.AI

Switching-Geometry Analysis of Deflated Q-Value Iteration

Donghwan Lee This is my paper

Pith reviewed 2026-05-20 22:01 UTC · model grok-4.3

classification 🧮 math.OC cs.AI

keywords deflated Q-value iterationjoint spectral radiusswitching systemsquotient spaceMarkov decision processesconvergence ratepolicy optimization

0 comments

The pith

Deflated Q-value iteration admits a convergence rate given by a joint spectral radius that can be strictly smaller than the discount factor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models both standard and deflated Q-value iteration as switching linear systems whose contraction is measured by the joint spectral radius. It first shows that the ordinary model always has joint spectral radius exactly equal to the discount factor gamma because every admissible matrix leaves the all-ones vector invariant. Removing that direction by passing to the quotient space produces a projected switching system whose joint spectral radius governs the actual error decay and can be smaller than gamma. The same analysis proves that the deflation correction is only a scalar recentering of ordinary iterates, so the sequence of greedy policies stays identical while the geometric description of convergence becomes tighter.

Core claim

The standard Q-VI switching system has joint spectral radius exactly gamma because all its subsystems share the all-ones vector as an invariant direction. Quotienting out this direction yields a projected switching system whose joint spectral radius governs the relevant error dynamics and may be strictly smaller than gamma. The all-ones residual correction is equivalent to subtracting a scalar multiple of the all-ones vector from each ordinary Q-VI iterate, leaving both the projected trajectory and the induced greedy-policy sequence unchanged.

What carries the argument

The projected switching system obtained by quotienting the standard Q-VI model by its all-ones invariant direction.

If this is right

The convergence rate of deflated Q-VI is characterized by the joint spectral radius of the projected system.
This rate can be strictly smaller than the discount factor, yielding a sharper iteration-complexity bound.
The sequence of greedy policies generated by deflated Q-VI is identical to the sequence generated by ordinary Q-VI started at the same point.
The only practical benefit of deflation is a more precise geometric description of convergence after the redundant all-ones component is removed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same quotient-space reduction may apply to other value-iteration variants whose linear maps share an obvious invariant direction.
Numerical computation of the projected joint spectral radius on small MDPs could produce instance-specific convergence certificates that are tighter than the uniform gamma bound.
The equivalence result suggests that any recentering operation preserving the greedy policy map can be analyzed by an analogous projection.

Load-bearing premise

Every admissible subsystem of the standard Q-VI switching model leaves the all-ones vector invariant, forcing the ambient joint spectral radius to equal gamma.

What would settle it

For a concrete finite MDP, compute or bound the joint spectral radius of the projected switching matrices and check whether the norm of the deflated error vector contracts at that rate rather than at gamma.

Figures

Figures reproduced from arXiv: 2605.10811 by Donghwan Lee.

**Figure 2.** Figure 2: Full Q-function error for standard Q-VI and deflated Q-VI with uniform and nonuniform [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Convergence of the full Q-function error [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

**Figure 4.** Figure 4: Full Q-function error kQk − Q⋆k∞ for standard Q-VI and deflated Q-VI on the Taxi-v3 benchmark. Standard Q-VI exhibits a slow tail decay, while the deflated Q-VI update removes the common-shift mode and reaches numerical precision after a short transient. does not change the Bellman optimality fixed point, but it removes the slowly decaying all-ones component from the iteration. Consequently, the full Q-fun… view at source ↗

read the original abstract

This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision process control. Focusing on an all-ones residual correction, we interpret the resulting algorithm through the geometry of switching systems and, to the best of our knowledge, give the first JSR-based convergence analysis of deflated Q-VI for policy optimization problems. Our analysis reveals that the standard Q-VI switching system model has JSR exactly the discount factor $\gamma\in (0,1)$, since all admissible subsystems share the all-ones vector as an invariant direction. By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics and may be strictly smaller than $\gamma$. Therefore, the deflated Q-VI admits a potentially sharper convergence-rate characterization than the ambient-space $\gamma$-bound. Finally, we prove that the correction is equivalent to a scalar recentering of standard Q-VI. Hence, the projected trajectory, and therefore the greedy-policy sequence, is unchanged relative to standard Q-VI initialized from the same point. The benefit of deflation is not a change in the induced decision-making problem, but a more precise JSR-based description of the convergence geometry after the redundant all-ones component is removed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. This paper develops a joint spectral radius (JSR) framework for analyzing rank-one deflated Q-value iteration (Q-VI) in discounted Markov decision processes. It shows that the standard Q-VI switching system has JSR exactly equal to the discount factor γ because every admissible mode matrix shares the all-ones vector as an invariant direction. Passing to the quotient space orthogonal to this direction yields a projected switching system whose JSR governs the relevant error dynamics and may be strictly smaller than γ. The deflation operation is proven equivalent to a scalar recentering of standard Q-VI, which leaves the greedy-policy sequence unchanged.

Significance. If the central derivations hold, the manuscript supplies the first JSR-based convergence analysis of deflated Q-VI and a geometrically sharper rate characterization than the ambient γ-bound. The equivalence result clarifies that deflation improves the description of convergence geometry without altering the induced decision-making problem. This switching-system perspective on rank-one corrections could prove useful for refined convergence analysis in approximate dynamic programming and policy optimization.

major comments (1)

[standard Q-VI switching system model] The section introducing the standard Q-VI switching system model: the claim that all admissible subsystems share the all-ones vector as an invariant direction (forcing ambient JSR exactly γ) is load-bearing for the ambient-versus-projected distinction. While the row-stochastic structure induced by the Bellman operator is standard, the manuscript should explicitly verify that this invariance holds uniformly over all possible per-state greedy action selections without further restrictions on the MDP or action sets.

minor comments (3)

[quotient reduction] The definition of the projected norm on the quotient space R^n / span{1} should be stated explicitly (including the choice of representative or inner product) to make the JSR comparison fully rigorous.
Consider adding a small numerical example (e.g., a two-state MDP) that computes both the ambient JSR and the quotient JSR to illustrate that the latter can indeed be strictly smaller than γ.
The abstract states that proofs exist for the invariance property, quotient reduction, and equivalence to scalar recentering; ensure each proof is self-contained with clearly labeled assumptions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the recommendation of minor revision. The single major comment concerns the need for an explicit verification that the all-ones invariance holds uniformly across all admissible modes. We address the point directly below and will revise the manuscript to incorporate the requested clarification.

read point-by-point responses

Referee: The section introducing the standard Q-VI switching system model: the claim that all admissible subsystems share the all-ones vector as an invariant direction (forcing ambient JSR exactly γ) is load-bearing for the ambient-versus-projected distinction. While the row-stochastic structure induced by the Bellman operator is standard, the manuscript should explicitly verify that this invariance holds uniformly over all possible per-state greedy action selections without further restrictions on the MDP or action sets.

Authors: We agree that an explicit verification is desirable for clarity. In the Q-VI switching model each mode corresponds to a choice, for every state, of which action realizes the maximum. The associated linear operator acts componentwise as L(Q)(s,a) = γ max_{a' ∈ A(s')} Q(s',a'), where s' is the successor state reached from (s,a). Substituting the all-ones vector yields max = 1 for every component, independently of which action is selected as the argmax. Hence every admissible mode matrix satisfies M 1 = 1 (or, after scaling by γ, eigenvalue γ), with no further restrictions on the MDP, transition kernel, or action sets required. We will add a short lemma or remark in the section on the standard switching-system model that records this argument verbatim. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation rests on standard facts about joint spectral radii of switching systems whose modes share a common eigenvector (here the all-ones vector, induced by the row-stochastic structure of the Bellman optimality operator). The claim that ambient JSR equals gamma follows directly from this invariant direction and is not fitted or redefined inside the paper. The quotient-space projection and its potentially smaller JSR are obtained by standard linear-algebraic reduction, while the equivalence of deflation to scalar recentering is shown by an explicit separate proof that leaves the greedy-policy sequence unchanged. No step reduces by construction to a fitted parameter, a self-citation chain, or a renaming of the target quantity; the central geometric claim is independent of quantities internal to the present manuscript.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis relies on background facts about joint spectral radius of switched linear systems and on the modeling choice that Q-VI updates form a finite set of linear subsystems; no free parameters or new entities are introduced.

axioms (2)

domain assumption The set of admissible update matrices for Q-VI forms a finite switching set whose joint spectral radius equals the discount factor gamma because every matrix leaves the all-ones vector invariant.
Invoked when the abstract states that the standard Q-VI switching system model has JSR exactly gamma.
standard math The quotient space obtained by removing the all-ones direction yields a well-defined projected linear system whose spectral radius governs the error dynamics orthogonal to that direction.
Used to justify that the projected JSR controls the relevant convergence rate.

pith-pipeline@v0.9.0 · 5760 in / 1562 out tokens · 54862 ms · 2026-05-20T22:01:00.060117+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the standard Q-VI switching system model has JSR exactly the discount factor γ … By passing to the quotient space that removes this direction, we obtain a projected switching system model whose JSR governs the relevant error dynamics
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

all admissible subsystems share the all-ones vector as an invariant direction … ρdef = ρ̄

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 3 internal anchors

[1]

Bertsekas

Dimitri P. Bertsekas. Generic rank-one corrections for v alue iteration in markovian decision problems. Operations Research Letters, 17(3):111–119, 1995

work page 1995
[2]

Dynamic programming and optimal con trol 4th edition, volume ii

Dimitri P Bertsekas. Dynamic programming and optimal con trol 4th edition, volume ii. Athena Scientiﬁc , 2015

work page 2015
[3]

Bertsekas and John N

Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-dynamic programming. Athena Scientiﬁc Belmont, MA, 1996

work page 1996
[4]

Computationally eﬃ cient approximations of the joint spectral radius

Vincent D Blondel and Yurii Nesterov. Computationally eﬃ cient approximations of the joint spectral radius. SIAM Journal on Matrix Analysis and Applications , 27(1):256–272, 2005

work page 2005
[5]

A. Cicone. A note on the joint spectral radius. arXiv preprint arXiv:1502.01506 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[6]

Goyal and J

V. Goyal and J. Grand-Clément. A ﬁrst-order approach to a ccelerated value iteration. Opera- tions Research, 71(2):517–535, 2023

work page 2023
[7]

J. P. Hespanha and A. S. Morse. Stability of switched syst ems with average dwell-time. In Proceedings of the 38th IEEE Conference on Decision and Cont rol, pages 2655–2660, 1999

work page 1999
[8]

The joint spectral radius: Theory and applications , volume 385

Raphaël Jungers. The joint spectral radius: Theory and applications , volume 385. Springer Science & Business Media, 2009

work page 2009
[9]

A. S. Kolarijani, T. Ok, P. Mohajerin Esfahani, and M. A. S . Kolarijani. Rank-one modiﬁed value iteration. In Proceedings of the 42nd International Conference on Machin e Learning , volume 267 of Proceedings of Machine Learning Research, pages 31182–31201, 2025

work page 2025
[10]

Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

Donghwan Lee. Beyond the bellman ﬁxed point: geometry an d fast policy identiﬁcation in value iteration. arXiv preprint arXiv:2604.17457v4 , 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[11]

Lyapunov-Certified Direct Switching Theory for Q-Learning

Donghwan Lee. Lyapunov-certiﬁed direct switching the ory for q-learning. arXiv preprint arXiv:2604.19569v2, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

J. Lee, A. Rakhsha, E. K. Ryu, and A.-m. Farahmand. Deﬂat ed dynamics value iteration. Transactions on Machine Learning Research , 2025

work page 2025
[13]

Switching in systems and control

Daniel Liberzon. Switching in systems and control . Springer Science & Business Media, 2003

work page 2003
[14]

Stability and stabilizab ility of switched linear systems: A survey of recent results

Hai Lin and Panos J Antsaklis. Stability and stabilizab ility of switched linear systems: A survey of recent results. IEEE Transactions on Automatic Control , 54(2):308–322, 2009. 36

work page 2009
[15]

Carl D. Meyer. Matrix Analysis and Applied Linear Algebra . SIAM, 2000

work page 2000
[16]

Puterman

Martin L. Puterman. Markov decision processes: Discrete stochastic dynamic pr ogramming. John Wiley & Sons, 2014

work page 2014
[17]

A note on the joint s pectral radius

Gian-Carlo Rota and Gilbert Strang. A note on the joint s pectral radius. Indag. Math , 22(4): 379–381, 1960

work page 1960
[18]

Stability criteria for switched and hybrid systems

Robert Shorten, Fabian Wirth, Oliver Mason, Kai Wulﬀ, a nd Christopher King. Stability criteria for switched and hybrid systems. SIAM Review, 49(4):545–592, 2007

work page 2007
[19]

The lyapunov exp onent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute a nd to approximate

John N Tsitsiklis and Vincent D Blondel. The lyapunov exp onent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute a nd to approximate. Mathematics of Control, Signals and Systems , 10(1):31–40, 1997. 37

work page 1997

[1] [1]

Bertsekas

Dimitri P. Bertsekas. Generic rank-one corrections for v alue iteration in markovian decision problems. Operations Research Letters, 17(3):111–119, 1995

work page 1995

[2] [2]

Dynamic programming and optimal con trol 4th edition, volume ii

Dimitri P Bertsekas. Dynamic programming and optimal con trol 4th edition, volume ii. Athena Scientiﬁc , 2015

work page 2015

[3] [3]

Bertsekas and John N

Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-dynamic programming. Athena Scientiﬁc Belmont, MA, 1996

work page 1996

[4] [4]

Computationally eﬃ cient approximations of the joint spectral radius

Vincent D Blondel and Yurii Nesterov. Computationally eﬃ cient approximations of the joint spectral radius. SIAM Journal on Matrix Analysis and Applications , 27(1):256–272, 2005

work page 2005

[5] [5]

A. Cicone. A note on the joint spectral radius. arXiv preprint arXiv:1502.01506 , 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[6] [6]

Goyal and J

V. Goyal and J. Grand-Clément. A ﬁrst-order approach to a ccelerated value iteration. Opera- tions Research, 71(2):517–535, 2023

work page 2023

[7] [7]

J. P. Hespanha and A. S. Morse. Stability of switched syst ems with average dwell-time. In Proceedings of the 38th IEEE Conference on Decision and Cont rol, pages 2655–2660, 1999

work page 1999

[8] [8]

The joint spectral radius: Theory and applications , volume 385

Raphaël Jungers. The joint spectral radius: Theory and applications , volume 385. Springer Science & Business Media, 2009

work page 2009

[9] [9]

A. S. Kolarijani, T. Ok, P. Mohajerin Esfahani, and M. A. S . Kolarijani. Rank-one modiﬁed value iteration. In Proceedings of the 42nd International Conference on Machin e Learning , volume 267 of Proceedings of Machine Learning Research, pages 31182–31201, 2025

work page 2025

[10] [10]

Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

Donghwan Lee. Beyond the bellman ﬁxed point: geometry an d fast policy identiﬁcation in value iteration. arXiv preprint arXiv:2604.17457v4 , 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[11] [11]

Lyapunov-Certified Direct Switching Theory for Q-Learning

Donghwan Lee. Lyapunov-certiﬁed direct switching the ory for q-learning. arXiv preprint arXiv:2604.19569v2, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[12] [12]

J. Lee, A. Rakhsha, E. K. Ryu, and A.-m. Farahmand. Deﬂat ed dynamics value iteration. Transactions on Machine Learning Research , 2025

work page 2025

[13] [13]

Switching in systems and control

Daniel Liberzon. Switching in systems and control . Springer Science & Business Media, 2003

work page 2003

[14] [14]

Stability and stabilizab ility of switched linear systems: A survey of recent results

Hai Lin and Panos J Antsaklis. Stability and stabilizab ility of switched linear systems: A survey of recent results. IEEE Transactions on Automatic Control , 54(2):308–322, 2009. 36

work page 2009

[15] [15]

Carl D. Meyer. Matrix Analysis and Applied Linear Algebra . SIAM, 2000

work page 2000

[16] [16]

Puterman

Martin L. Puterman. Markov decision processes: Discrete stochastic dynamic pr ogramming. John Wiley & Sons, 2014

work page 2014

[17] [17]

A note on the joint s pectral radius

Gian-Carlo Rota and Gilbert Strang. A note on the joint s pectral radius. Indag. Math , 22(4): 379–381, 1960

work page 1960

[18] [18]

Stability criteria for switched and hybrid systems

Robert Shorten, Fabian Wirth, Oliver Mason, Kai Wulﬀ, a nd Christopher King. Stability criteria for switched and hybrid systems. SIAM Review, 49(4):545–592, 2007

work page 2007

[19] [19]

The lyapunov exp onent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute a nd to approximate

John N Tsitsiklis and Vincent D Blondel. The lyapunov exp onent and joint spectral radius of pairs of matrices are hard—when not impossible—to compute a nd to approximate. Mathematics of Control, Signals and Systems , 10(1):31–40, 1997. 37

work page 1997