Policy Iteration for Stationary Discounted Hamilton--Jacobi--Bellman Equations: A Viscosity Approach
Pith reviewed 2026-05-10 16:09 UTC · model grok-4.3
The pith
A space-discrete scheme with artificial viscosity of order h makes policy iteration well-defined for stationary discounted HJB equations and guarantees geometric convergence to the discrete solution with total error at most C sqrt(h).
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By introducing artificial viscosity of order O(h) into a space-discrete approximation of the stationary discounted HJB equation, policy iteration becomes a well-posed monotone contraction mapping on the discrete grid. For each fixed h > 0 the iterates converge monotonically and geometrically to the unique discrete solution because the discount produces a resolvent contraction. The discrete solution satisfies a sharp vanishing-viscosity bound ||V^h - V||_∞ ≤ C √h, and the total error can be decomposed into a policy-iteration component that decays geometrically in the number of iterations and a discretization component of order √h.
What carries the argument
The monotone semi-discrete operator obtained by adding artificial viscosity of order O(h) to the space-discrete Hamiltonian, which permits a pointwise policy-improvement step using discrete gradients.
If this is right
- For any fixed mesh size h the policy iteration sequence converges geometrically because of the resolvent structure induced by the discount factor.
- The total approximation error decomposes into a geometrically decaying policy-iteration contribution and an O(√h) discretization contribution that can be balanced by choice of iteration count.
- The vanishing-viscosity limit as h tends to zero recovers the original continuous viscosity solution.
- Numerical experiments on nonlinear one- and two-dimensional control problems reproduce the predicted geometric convergence followed by a plateau at the discretization error level.
Where Pith is reading between the lines
- The same artificial-viscosity regularization may allow policy iteration to be applied to other stationary viscosity problems where the continuous improvement step is formally undefined.
- Balancing the number of iterations against mesh size could yield near-optimal computational cost for high-dimensional infinite-horizon control problems.
- The resolvent-based contraction mechanism may extend the convergence analysis to related discounted problems such as stochastic control or mean-field games.
Load-bearing premise
That an artificial viscosity term of size proportional to the mesh size h is sufficient to restore the comparison principle for the discrete operator while still allowing the policy improvement step to be performed pointwise with discrete gradients.
What would settle it
Numerical computation on a sequence of successively finer meshes showing that the observed L^∞ error between the discrete solution and a reference solution fails to decrease proportionally to the square root of h, or that the policy iteration sequence does not exhibit geometric contraction for some fixed positive h.
Figures
read the original abstract
We study policy iteration (PI) for deterministic infinite-horizon discounted optimal control problems, whose value function is characterized by a stationary Hamilton--Jacobi--Bellman (HJB) equation. At the PDE level, PI is fundamentally ill-posed: the improvement step requires pointwise evaluation of $\nabla V$, which is not well defined for viscosity solutions, and thus the associated nonlinear operator cannot be interpreted in a stable functional sense. We develop a monotone semi-discrete formulation for the stationary discounted setting by introducing a space-discrete scheme with artificial viscosity of order $O(h)$. This regularization restores comparison, ensures monotonicity of the discrete operator, and yields a well-defined pointwise policy improvement via discrete gradients. Our analysis reveals a convergence mechanism fundamentally different from the finite-horizon case. For each fixed mesh size $h>0$, we prove that the semi-discrete PI sequence converges monotonically and geometrically to the unique discrete solution, where the contraction is induced by the resolvent structure of the discounted operator. We further establish the sharp vanishing-viscosity estimate $\|V^h - V\|_{L^\infty} \leq C\sqrt{h}$, and derive a quantitative error decomposition that separates policy iteration error from discretization error, exhibiting a nontrivial coupling between iteration count and mesh size. Numerical experiments in nonlinear one and two-dimensional control problems confirm the theoretical predictions, including geometric convergence and the characteristic decay-then-plateau behavior of the total error.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a monotone semi-discrete scheme with O(h) artificial viscosity for policy iteration applied to stationary discounted HJB equations arising from infinite-horizon deterministic optimal control. It proves that, for each fixed mesh size h>0, the semi-discrete PI sequence converges monotonically and geometrically to the unique discrete solution, with the contraction induced by the resolvent structure of the discounted operator. The analysis further yields the sharp vanishing-viscosity bound ||V^h - V||_L^∞ ≤ C√h together with a quantitative error decomposition that separates policy-iteration error from discretization error and exhibits their coupling; numerical experiments on nonlinear 1D and 2D problems confirm the predicted geometric rates and the characteristic decay-then-plateau total-error behavior.
Significance. If the central claims hold, the work supplies a rigorous viscosity-theoretic foundation for policy iteration in the stationary discounted setting, where the PDE-level formulation is otherwise ill-posed because of the need for pointwise gradients. The resolvent-based contraction argument, the sharp O(√h) rate obtained via doubling-variables techniques, and the explicit iteration-discretization error split are all load-bearing contributions that distinguish the infinite-horizon case from existing finite-horizon analyses and are directly useful for practical implementation.
minor comments (2)
- [§2] §2 (or the section introducing the semi-discrete scheme): the precise form of the artificial viscosity term and the definition of the discrete gradient used in the policy-improvement step should be stated explicitly before the monotonicity proof, so that the comparison principle can be verified directly from the scheme.
- [Numerical experiments] The numerical section: the reported error tables would benefit from an additional column or plot that isolates the pure discretization error (by running PI to machine precision for each h) to make the quantitative decomposition visually verifiable.
Simulated Author's Rebuttal
We thank the referee for the positive assessment, the accurate summary of our contributions, and the recommendation for minor revision. The recognition of the resolvent-based contraction, the sharp O(√h) bound via doubling-variables techniques, and the iteration-discretization error split is appreciated. As the report lists no specific major comments, we have no points requiring point-by-point rebuttal.
Circularity Check
No significant circularity identified
full rationale
The paper's central claims rest on a standard resolvent contraction induced by the positive discount factor for the semi-discrete scheme and on vanishing-viscosity estimates obtained via doubling-variables arguments. These are independent mathematical facts external to the paper's own constructions; the O(h) artificial viscosity is introduced explicitly as a regularization to restore monotonicity and comparison, not derived from the target result. No step reduces a prediction or uniqueness claim to a fitted parameter, self-citation chain, or definitional tautology. The quantitative error decomposition follows directly from the contraction and comparison principles without circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Viscosity solutions to HJB equations satisfy comparison principles under suitable conditions
Reference graph
Works this paper leans on
-
[1]
Martino Bardi and Italo Capuzzo-Dolcetta.Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Birkh¨ auser, 1997
work page 1997
-
[2]
Guy Barles and Panagiotis E Souganidis. Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991
work page 1991
-
[3]
Two approximations of solutions of Hamilton–Jacobi equations
MG Crandall and PL Lions. Two approximations of solutions of Hamilton–Jacobi equations. Mathematics of Computation, 43(167):1–19, 1984
work page 1984
-
[4]
Crandall, Hitoshi Ishii, and Pierre-Louis Lions
Michael G. Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial differential equations.Bulletin of the American Mathematical Society, 27(1):1–67, 1992
work page 1992
-
[5]
Wendell H Fleming and H Mete Soner.Controlled Markov processes and viscosity solutions. Springer, 2006
work page 2006
-
[6]
Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018
work page 2018
-
[7]
Dynamic programming and markov processes
Ronald A Howard. Dynamic programming and markov processes. 1960
work page 1960
-
[8]
Yu-Jui Huang, Zhenhua Wang, and Zhou Zhou. Convergence of policy iteration for entropy- regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025
work page 2025
-
[9]
Bekzhan Kerimkulov, David Siska, and Lukasz Szpruch. Exponential convergence and stability of howard’s policy improvement algorithm for controlled diffusions.SIAM Journal on Control and Optimization, 58(3):1314–1340, 2020
work page 2020
-
[10]
Physics-informed approach for exploratory hamilton–jacobi–bellman equations via policy iterations
Yeongjong Kim, Namkyeong Cho, Minseok Kim, and Yeoneung Kim. Physics-informed approach for exploratory hamilton–jacobi–bellman equations via policy iterations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 22609– 22616, 2026. 25
work page 2026
-
[11]
Neural policy iteration for stochastic optimal control: A physics-informed approach
Yeongjong Kim, Yeoneung Kim, Minseok Kim, and Namkyeong Cho. Neural policy iteration for stochastic optimal control: A physics-informed approach.arXiv preprint arXiv:2508.01718, 2025
-
[12]
David Kleinman. On an iterative technique for Riccati equation computations.IEEE Transactions on Automatic Control, 13(1):114–115, 1968
work page 1968
-
[13]
Hamilton–jacobi based policy-iteration via deep operator learning.Neurocomputing, page 130515, 2025
Jae Yong Lee and Yeoneung Kim. Hamilton–jacobi based policy-iteration via deep operator learning.Neurocomputing, page 130515, 2025
work page 2025
-
[14]
Markov decision processes.Handbooks in operations research and management science, 2:331–434, 1990
Martin L Puterman. Markov decision processes.Handbooks in operations research and management science, 2:331–434, 1990
work page 1990
-
[15]
ML Puterman. On the convergence of policy iteration for controlled diffusions.Journal of Optimization Theory and Applications, 33(1):137–144, 1981
work page 1981
- [16]
-
[17]
Manuel S Santos and John Rust. Convergence properties of policy iteration.SIAM Journal on Control and Optimization, 42(6):2094–2115, 2004
work page 2094
-
[18]
Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998
work page 1998
-
[19]
Wenpin Tang, Hung Vinh Tran, and Yuming Zhang. Policy iteration for deterministic control problems: A viscosity approach.SIAM Journal on Control and Optimization, 2025
work page 2025
-
[20]
American Mathematical Soc., 2021
Hung V Tran.Hamilton–Jacobi equations: theory and applications, volume 213. American Mathematical Soc., 2021
work page 2021
-
[21]
Policy iteration for exploratory HJB equations.Applied Mathematics and Optimization, 2025
Hung Vinh Tran, Zhenhua Wang, and Yuming Zhang. Policy iteration for exploratory HJB equations.Applied Mathematics and Optimization, 2025
work page 2025
-
[22]
Draguna Vrabie, Octavian Pastravanu, Murad Abu-Khalaf, and Frank L Lewis. Adaptive optimal control for continuous-time linear systems based on policy iteration.Automatica, 45(2):477–484, 2009. 26
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.