Value iteration with stopping criterion: finite iterations, stability, and near-optimality guarantees
Pith reviewed 2026-06-26 07:42 UTC · model grok-4.3
The pith
Value iteration with a generalized stopping criterion terminates finitely and yields stabilizing near-optimal policies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under mild assumptions, value iteration terminates in a finite number of iterations. By properly designing the generalized stopping criterion, the policies at the final iteration stabilize the closed-loop system, and explicit near-optimality bounds for the policies and value functions are derived that depend on the stopping choice.
What carries the argument
Generalized stopping criterion for value iteration, which stops when the difference between successive value functions or related quantities meets a threshold, enabling finite termination, stability, and performance analysis.
Load-bearing premise
The plant is a deterministic discrete-time system with infinite-horizon costs and the algorithm generates the inputs via value iteration.
What would settle it
Apply value iteration to a known unstable or non-terminating deterministic system and check if the stopping criterion fails to produce a stabilizing policy or if termination does not occur.
read the original abstract
Value iteration (VI) is a cornerstone of dynamic programming that allows computing near-optimal feedback laws for general plant dynamics and cost functions. In practice, however, it must be stopped after finitely many iterations. This raises the question of when to stop the algorithm so that the resulting policies and value functions achieve desirable properties, like given near-optimality bounds and stability. In this context, we study deterministic, discrete-time systems with infinite-horizon (possibly discounted) costs whose inputs are generated by VI. We equip VI with a generalized stopping criterion that encompasses existing choices while allowing new ones. Our aim is to analyze the properties of the policies and value functions at the final iteration. Under mild assumptions, we first show that VI indeed terminates in a finite number of iterations. We then establish that the final policies are stabilizing by properly designing the stopping criterion, and derive explicit near-optimality bounds characterized by this choice. These results offer a design framework for the stopping criteria that balances computational effort with stability and performance guarantees.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper equips value iteration with a generalized stopping criterion for deterministic discrete-time infinite-horizon (possibly discounted) optimal control problems. It proves finite termination under mild assumptions, shows that the terminal policy is stabilizing when the stopping rule is suitably designed, and derives explicit near-optimality bounds on the final value function and policy that are parameterized by the chosen threshold.
Significance. If the claims hold, the work supplies a practical design framework that lets practitioners select a stopping criterion to balance computational cost against explicit stability and performance guarantees. The results rest on standard monotonicity and contraction properties of the Bellman operator together with the existence of a stabilizing policy, without introducing new technical machinery.
minor comments (2)
- [§3] §3: the generalized stopping criterion is introduced via an abstract threshold condition; an explicit comparison (e.g., a short table) with the classical residual and span criteria would clarify the scope of the generalization.
- [§2] The statement of the mild assumptions (existence of a stabilizing policy, bounded costs, etc.) appears in §2; moving a concise bullet list of these assumptions to the abstract or introduction would improve readability for control-oriented readers.
Simulated Author's Rebuttal
We thank the referee for their positive summary, recognition of the paper's significance, and recommendation to accept the manuscript. We appreciate the assessment that the results rest on standard properties without new technical machinery and provide a practical design framework.
Circularity Check
No significant circularity identified
full rationale
The paper derives finite termination of value iteration, stability of the terminal policy, and explicit near-optimality bounds via standard dynamic-programming arguments (Bellman operator monotonicity, existence of a stabilizing policy, and a threshold-based stopping rule). These steps rely on the stated mild assumptions for deterministic discrete-time plants and do not reduce to fitted quantities, self-definitional constructs, or load-bearing self-citations. The central claims are self-contained proofs that follow directly from the problem setup without circular reduction to inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Bertsekas,Dynamic Programming and Optimal Control, Volume I, vol
D. Bertsekas,Dynamic Programming and Optimal Control, Volume I, vol. 4. Athena Scientific, Belmont, U.S.A., 2012
2012
-
[2]
Sutton and A
R. Sutton and A. Barto,Reinforcement Learning: An Introduction. MIT Press, Cambridge, U.S.A., 1998
1998
-
[3]
D. P. Bertsekas,Reinforcement Learning and Optimal Control, vol. 1. Athena Scientific, Belmont, U.S.A., 2019
2019
-
[4]
Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,
F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,”IEEE Control Systems Magazine, vol. 32, no. 6, pp. 76–105, 2012
2012
-
[5]
M. L. Puterman,Markov Decision Processes: Discrete Stochastic Dy- namic Programming. John Wiley & Sons, Hoboken, U.S.A., 2014
2014
-
[6]
On an iterative technique for riccati equation computa- tions,
D. Kleinman, “On an iterative technique for riccati equation computa- tions,”IEEE Trans. on Aut. Control, vol. 13, no. 1, pp. 114–115, 1968
1968
-
[7]
Adaptive dynamic programming and optimal control of nonlinear nonaffine systems,
T. Bian, Y . Jiang, and Z.-P. Jiang, “Adaptive dynamic programming and optimal control of nonlinear nonaffine systems,”Automatica, vol. 50, no. 10, pp. 2624–2632, 2014
2014
-
[8]
Computing stabilizing linear controllers via policy iteration,
A. Lamperski, “Computing stabilizing linear controllers via policy iteration,” inIEEE Conference on Decision and Control, Jeju Island, South Korea, pp. 1902–1907, 2020. 16
1902
-
[9]
Early termination of NMPC interior point solvers: Relating the duality gap to stability,
A. Pavlov, I. Shames, and C. Manzie, “Early termination of NMPC interior point solvers: Relating the duality gap to stability,” inEuropean Control Conference, Naples, Italy, pp. 805–810, 2019
2019
-
[10]
Discrete-time non- linear HJB solution using approximate dynamic programming: Conver- gence proof,
A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time non- linear HJB solution using approximate dynamic programming: Conver- gence proof,”IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 38, no. 4, pp. 943–949, 2008
2008
-
[11]
Revisiting approximate dynamic programming and its convergence,
A. Heydari, “Revisiting approximate dynamic programming and its convergence,”IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2733–2743, 2014
2014
-
[12]
Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,
Q. Wei, D. Liu, and H. Lin, “Value iteration adaptive dynamic pro- gramming for optimal control of discrete-time nonlinear systems,”IEEE Transactions on Cybernetics, vol. 46, no. 3, pp. 840–853, 2015
2015
-
[13]
Stopping criteria for value iteration on stochastic games with quantitative objectives,
J. K ˇret´ınsk`y, T. Meggendorfer, and M. Weininger, “Stopping criteria for value iteration on stochastic games with quantitative objectives,” in Annual ACM/IEEE Symposium on Logic in Computer Science, Boston, U.S.A., pp. 1–14, 2023
2023
-
[14]
Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,
T. Bian and Z.-P. Jiang, “Value iteration and adaptive dynamic pro- gramming for data-driven adaptive optimal control design,”Automatica, vol. 71, pp. 348–360, 2016
2016
-
[15]
Value and policy iterations in optimal control and adaptive dynamic programming,
D. P. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,”IEEE Transactions on Neural networks and Learning Systems, vol. 28, no. 3, pp. 500–509, 2015
2015
-
[16]
Model predictive control: For want of a local control Lyapunov function, all is not lost,
G. Grimm, M. Messina, S. Tuna, and A. Teel, “Model predictive control: For want of a local control Lyapunov function, all is not lost,”IEEE Transactions on Automatic Control, vol. 50, no. 5, pp. 546–558, 2005
2005
-
[17]
Stability analysis of discrete-time infinite-horizon optimal control with discounted cost,
R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Stability analysis of discrete-time infinite-horizon optimal control with discounted cost,” IEEE Trans. on Aut. Control, vol. 62, no. 6, pp. 2736–2749, 2017
2017
-
[18]
Finite-horizon discounted optimal control: stability and performance,
M. Granzotto, R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Finite-horizon discounted optimal control: stability and performance,” IEEE Trans. on Aut. Control, vol. 66, no. 2, pp. 550–565, 2020
2020
-
[19]
NMPC without terminal constraints,
L. Gr ¨une, “NMPC without terminal constraints,”IFAC Proceedings Volumes, vol. 45, no. 17, pp. 1–13, 2012
2012
-
[20]
Sta- bilization of strictly dissipative discrete time systems with discounted optimal control,
V . Gaitsgory, L. Gr¨une, M. H ¨oger, C. M. Kellett, and S. R. Weller, “Sta- bilization of strictly dissipative discrete time systems with discounted optimal control,”Automatica, vol. 93, pp. 311–320, 2018
2018
-
[21]
B. D. Anderson and J. B. Moore,Optimal Control: Linear Quadratic Methods. Prentice-Hall, Inc., Englewood Cliffs, U.S.A., 1971
1971
-
[22]
Value and policy iterations in optimal control and adap- tive dynamic programming,
D. Bertsekas, “Value and policy iterations in optimal control and adap- tive dynamic programming,”IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 500–509, 2015
2015
-
[23]
When to stop value iteration: stability and near-optimality versus computation,
M. Granzotto, R. Postoyan, D. Ne ˇsi´c, L. Bus ¸oniu, and J. Daafouz, “When to stop value iteration: stability and near-optimality versus computation,” in3rd Conference on Learning for Dynamics and Control, pp. 412–424, PMLR, 2021
2021
-
[24]
Goebel, R
R. Goebel, R. Sanfelice, and A. Teel,Hybrid Dynamical Systems. Princeton University Press, Princeton, U.S.A., 2012
2012
-
[25]
Dynamical properties of hybrid systems simulators,
R. Sanfelice and A. Teel, “Dynamical properties of hybrid systems simulators,”Automatica, vol. 46, no. 2, pp. 239–248, 2010
2010
-
[26]
Robust stability and near-optimality for policy iteration: for want of recursive feasibility, all is not lost,
M. Granzotto, O. L. De Silva, R. Postoyan, D. Ne ˇsi´c, and Z.-P. Jiang, “Robust stability and near-optimality for policy iteration: for want of recursive feasibility, all is not lost,”IEEE Transactions on Automatic Control, vol. 69, no. 12, pp. 8247–8262, 2024
2024
-
[27]
Survey constrained model predictive control: Stability and optimality,
D. Mayne, J. Rawlings, C. Rao, and P. Scokaert, “Survey constrained model predictive control: Stability and optimality,”Automatica, vol. 36, no. 6, pp. 789–814, 2000
2000
-
[28]
Rockafellar and R.-B
R. Rockafellar and R.-B. Wets,Variational Analysis, vol. 317. Springer, Dordrecht, Germany, 3rd ed., 1998
1998
-
[29]
An existence theorem for discrete-time infinite-horizon optimal control problems,
S. Keerthi and E. Gilbert, “An existence theorem for discrete-time infinite-horizon optimal control problems,”IEEE Transactions on Au- tomatic Control, vol. 30, no. 9, pp. 907–909, 1985
1985
-
[30]
Rollout algorithms for discrete optimization: A survey,
D. P. Bertsekas, “Rollout algorithms for discrete optimization: A survey,” inHandbook of Combinatorial Optimization, vol. 5, pp. 2989–3013, Springer, New York, U.S.A., 2013
2013
-
[31]
Model predictive control and reinforcement learn- ing: A unified framework based on dynamic programming,
D. P. Bertsekas, “Model predictive control and reinforcement learn- ing: A unified framework based on dynamic programming,”IFAC- PapersOnLine, vol. 58, no. 18, pp. 363–383, 2024
2024
-
[32]
On the infinite horizon performance of receding horizon controllers,
L. Grune and A. Rantzer, “On the infinite horizon performance of receding horizon controllers,”IEEE Transactions on Automatic Control, vol. 53, no. 9, pp. 2100–2111, 2008
2008
-
[33]
Stability analysis of discrete-time finite-horizon discounted optimal control,
M. Granzotto, R. Postoyan, L. Bus ¸oniu, D. Ne ˇsi´c, and J. Daafouz, “Stability analysis of discrete-time finite-horizon discounted optimal control,” inIEEE Conf. on Dec. and Control, Miami, U.S.A., 2018
2018
-
[34]
Stability and recurrence of nonlinear stochastic systems with optimal control,
R. Moldenhauer, D. Ne ˇsi´c, M. Granzotto, R. Postoyan, and A. Teel, “Stability and recurrence of nonlinear stochastic systems with optimal control,”Submitted for journal publication, 2026
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.