Policy Iteration for Stationary Discounted Hamilton--Jacobi--Bellman Equations: A Viscosity Approach

Namkyeong Cho; Yeoneung Kim

arxiv: 2604.10191 · v1 · submitted 2026-04-11 · 🧮 math.OC · cs.NA· math.NA

Policy Iteration for Stationary Discounted Hamilton--Jacobi--Bellman Equations: A Viscosity Approach

Namkyeong Cho , Yeoneung Kim This is my paper

Pith reviewed 2026-05-10 16:09 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NA

keywords policy iterationHamilton-Jacobi-Bellman equationviscosity solutionoptimal controlartificial viscositydiscretizationconvergence rateinfinite horizon

0 comments

The pith

A space-discrete scheme with artificial viscosity of order h makes policy iteration well-defined for stationary discounted HJB equations and guarantees geometric convergence to the discrete solution with total error at most C sqrt(h).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Policy iteration for stationary discounted Hamilton-Jacobi-Bellman equations is ill-posed at the continuous level because the improvement step requires the gradient of a viscosity solution that may not exist classically. The authors regularize the problem by discretizing space and adding artificial viscosity of order O(h). This produces a monotone discrete operator for which policy iteration is well-defined and converges geometrically to the unique discrete solution for any fixed mesh size. The discrete solution approximates the true viscosity solution with an error of order sqrt(h), and the total error admits a decomposition that isolates the contribution from the number of iterations.

Core claim

By introducing artificial viscosity of order O(h) into a space-discrete approximation of the stationary discounted HJB equation, policy iteration becomes a well-posed monotone contraction mapping on the discrete grid. For each fixed h > 0 the iterates converge monotonically and geometrically to the unique discrete solution because the discount produces a resolvent contraction. The discrete solution satisfies a sharp vanishing-viscosity bound ||V^h - V||_∞ ≤ C √h, and the total error can be decomposed into a policy-iteration component that decays geometrically in the number of iterations and a discretization component of order √h.

What carries the argument

The monotone semi-discrete operator obtained by adding artificial viscosity of order O(h) to the space-discrete Hamiltonian, which permits a pointwise policy-improvement step using discrete gradients.

If this is right

For any fixed mesh size h the policy iteration sequence converges geometrically because of the resolvent structure induced by the discount factor.
The total approximation error decomposes into a geometrically decaying policy-iteration contribution and an O(√h) discretization contribution that can be balanced by choice of iteration count.
The vanishing-viscosity limit as h tends to zero recovers the original continuous viscosity solution.
Numerical experiments on nonlinear one- and two-dimensional control problems reproduce the predicted geometric convergence followed by a plateau at the discretization error level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same artificial-viscosity regularization may allow policy iteration to be applied to other stationary viscosity problems where the continuous improvement step is formally undefined.
Balancing the number of iterations against mesh size could yield near-optimal computational cost for high-dimensional infinite-horizon control problems.
The resolvent-based contraction mechanism may extend the convergence analysis to related discounted problems such as stochastic control or mean-field games.

Load-bearing premise

That an artificial viscosity term of size proportional to the mesh size h is sufficient to restore the comparison principle for the discrete operator while still allowing the policy improvement step to be performed pointwise with discrete gradients.

What would settle it

Numerical computation on a sequence of successively finer meshes showing that the observed L^∞ error between the discrete solution and a reference solution fails to decrease proportionally to the square root of h, or that the policy iteration sequence does not exhibit geometric contraction for some fixed positive h.

Figures

Figures reproduced from arXiv: 2604.10191 by Namkyeong Cho, Yeoneung Kim.

**Figure 1.** Figure 1: Fixed-h policy iteration for the one-dimensional discounted quadratic control problem. The left panel shows convergence of the value iterates, the middle panel illustrates the decaythen-plateau behavior of the error, and the right confirms geometric decay of the PI residual. Together, these results clearly separate iteration error from discretization error, in agreement with the theoretical bound. Error m… view at source ↗

**Figure 2.** Figure 2: Fixed-h PI convergence in the nonlinear 2D benchmark. The three panels show two representative one-dimensional slices of the value iterates and the decay of the global error to the reference solution. (a) PINN prediction on the slice y 7→ Vθ(x0, y) at x0 = 0.80. (b) PINN prediction on the slice x 7→ Vθ(x, y0) at y0 = −0.80. (c) Error to the manufactured reference solution versus training step [PITH_FULL_I… view at source ↗

**Figure 3.** Figure 3: Boundary-free PINN experiment for the same nonlinear 2D manufactured benchmark. [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

read the original abstract

We study policy iteration (PI) for deterministic infinite-horizon discounted optimal control problems, whose value function is characterized by a stationary Hamilton--Jacobi--Bellman (HJB) equation. At the PDE level, PI is fundamentally ill-posed: the improvement step requires pointwise evaluation of $\nabla V$, which is not well defined for viscosity solutions, and thus the associated nonlinear operator cannot be interpreted in a stable functional sense. We develop a monotone semi-discrete formulation for the stationary discounted setting by introducing a space-discrete scheme with artificial viscosity of order $O(h)$. This regularization restores comparison, ensures monotonicity of the discrete operator, and yields a well-defined pointwise policy improvement via discrete gradients. Our analysis reveals a convergence mechanism fundamentally different from the finite-horizon case. For each fixed mesh size $h>0$, we prove that the semi-discrete PI sequence converges monotonically and geometrically to the unique discrete solution, where the contraction is induced by the resolvent structure of the discounted operator. We further establish the sharp vanishing-viscosity estimate $\|V^h - V\|_{L^\infty} \leq C\sqrt{h}$, and derive a quantitative error decomposition that separates policy iteration error from discretization error, exhibiting a nontrivial coupling between iteration count and mesh size. Numerical experiments in nonlinear one and two-dimensional control problems confirm the theoretical predictions, including geometric convergence and the characteristic decay-then-plateau behavior of the total error.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper stabilizes policy iteration for stationary discounted HJB equations by adding O(h) artificial viscosity on a grid, yielding geometric convergence for fixed h and a sharp O(sqrt(h)) error bound.

read the letter

The core advance is a monotone semi-discrete scheme for the infinite-horizon discounted case. By inserting artificial viscosity of order O(h), the discrete operator regains monotonicity and comparison, so policy improvement becomes well-defined via discrete gradients. For any fixed mesh size the iterates converge monotonically and geometrically to the unique discrete solution because the discount factor produces a resolvent contraction. They also prove the vanishing-viscosity error is at most C sqrt(h) and give an explicit decomposition that separates iteration error from discretization error while tracking their coupling with mesh size and iteration count. The 1D and 2D nonlinear control experiments reproduce the predicted geometric rate and the decay-then-plateau error pattern.

Referee Report

0 major / 2 minor

Summary. The paper develops a monotone semi-discrete scheme with O(h) artificial viscosity for policy iteration applied to stationary discounted HJB equations arising from infinite-horizon deterministic optimal control. It proves that, for each fixed mesh size h>0, the semi-discrete PI sequence converges monotonically and geometrically to the unique discrete solution, with the contraction induced by the resolvent structure of the discounted operator. The analysis further yields the sharp vanishing-viscosity bound ||V^h - V||_L^∞ ≤ C√h together with a quantitative error decomposition that separates policy-iteration error from discretization error and exhibits their coupling; numerical experiments on nonlinear 1D and 2D problems confirm the predicted geometric rates and the characteristic decay-then-plateau total-error behavior.

Significance. If the central claims hold, the work supplies a rigorous viscosity-theoretic foundation for policy iteration in the stationary discounted setting, where the PDE-level formulation is otherwise ill-posed because of the need for pointwise gradients. The resolvent-based contraction argument, the sharp O(√h) rate obtained via doubling-variables techniques, and the explicit iteration-discretization error split are all load-bearing contributions that distinguish the infinite-horizon case from existing finite-horizon analyses and are directly useful for practical implementation.

minor comments (2)

[§2] §2 (or the section introducing the semi-discrete scheme): the precise form of the artificial viscosity term and the definition of the discrete gradient used in the policy-improvement step should be stated explicitly before the monotonicity proof, so that the comparison principle can be verified directly from the scheme.
[Numerical experiments] The numerical section: the reported error tables would benefit from an additional column or plot that isolates the pure discretization error (by running PI to machine precision for each h) to make the quantitative decomposition visually verifiable.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment, the accurate summary of our contributions, and the recommendation for minor revision. The recognition of the resolvent-based contraction, the sharp O(√h) bound via doubling-variables techniques, and the iteration-discretization error split is appreciated. As the report lists no specific major comments, we have no points requiring point-by-point rebuttal.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central claims rest on a standard resolvent contraction induced by the positive discount factor for the semi-discrete scheme and on vanishing-viscosity estimates obtained via doubling-variables arguments. These are independent mathematical facts external to the paper's own constructions; the O(h) artificial viscosity is introduced explicitly as a regularization to restore monotonicity and comparison, not derived from the target result. No step reduces a prediction or uniqueness claim to a fitted parameter, self-citation chain, or definitional tautology. The quantitative error decomposition follows directly from the contraction and comparison principles without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard viscosity solution theory for HJB equations and properties of discounted resolvents; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Viscosity solutions to HJB equations satisfy comparison principles under suitable conditions
Invoked to ensure the regularized discrete operator restores comparison and monotonicity.

pith-pipeline@v0.9.0 · 5566 in / 1237 out tokens · 44367 ms · 2026-05-10T16:09:35.169601+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

Birkh¨ auser, 1997

Martino Bardi and Italo Capuzzo-Dolcetta.Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Birkh¨ auser, 1997

work page 1997
[2]

Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991

Guy Barles and Panagiotis E Souganidis. Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991

work page 1991
[3]

Two approximations of solutions of Hamilton–Jacobi equations

MG Crandall and PL Lions. Two approximations of solutions of Hamilton–Jacobi equations. Mathematics of Computation, 43(167):1–19, 1984

work page 1984
[4]

Crandall, Hitoshi Ishii, and Pierre-Louis Lions

Michael G. Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial differential equations.Bulletin of the American Mathematical Society, 27(1):1–67, 1992

work page 1992
[5]

Springer, 2006

Wendell H Fleming and H Mete Soner.Controlled Markov processes and viscosity solutions. Springer, 2006

work page 2006
[6]

Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

work page 2018
[7]

Dynamic programming and markov processes

Ronald A Howard. Dynamic programming and markov processes. 1960

work page 1960
[8]

Convergence of policy iteration for entropy- regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

Yu-Jui Huang, Zhenhua Wang, and Zhou Zhou. Convergence of policy iteration for entropy- regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

work page 2025
[9]

Exponential convergence and stability of howard’s policy improvement algorithm for controlled diffusions.SIAM Journal on Control and Optimization, 58(3):1314–1340, 2020

Bekzhan Kerimkulov, David Siska, and Lukasz Szpruch. Exponential convergence and stability of howard’s policy improvement algorithm for controlled diffusions.SIAM Journal on Control and Optimization, 58(3):1314–1340, 2020

work page 2020
[10]

Physics-informed approach for exploratory hamilton–jacobi–bellman equations via policy iterations

Yeongjong Kim, Namkyeong Cho, Minseok Kim, and Yeoneung Kim. Physics-informed approach for exploratory hamilton–jacobi–bellman equations via policy iterations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 22609– 22616, 2026. 25

work page 2026
[11]

Neural policy iteration for stochastic optimal control: A physics-informed approach

Yeongjong Kim, Yeoneung Kim, Minseok Kim, and Namkyeong Cho. Neural policy iteration for stochastic optimal control: A physics-informed approach.arXiv preprint arXiv:2508.01718, 2025

work page arXiv 2025
[12]

On an iterative technique for Riccati equation computations.IEEE Transactions on Automatic Control, 13(1):114–115, 1968

David Kleinman. On an iterative technique for Riccati equation computations.IEEE Transactions on Automatic Control, 13(1):114–115, 1968

work page 1968
[13]

Hamilton–jacobi based policy-iteration via deep operator learning.Neurocomputing, page 130515, 2025

Jae Yong Lee and Yeoneung Kim. Hamilton–jacobi based policy-iteration via deep operator learning.Neurocomputing, page 130515, 2025

work page 2025
[14]

Markov decision processes.Handbooks in operations research and management science, 2:331–434, 1990

Martin L Puterman. Markov decision processes.Handbooks in operations research and management science, 2:331–434, 1990

work page 1990
[15]

On the convergence of policy iteration for controlled diffusions.Journal of Optimization Theory and Applications, 33(1):137–144, 1981

ML Puterman. On the convergence of policy iteration for controlled diffusions.Journal of Optimization Theory and Applications, 33(1):137–144, 1981

work page 1981
[16]

Raissi, P

M. Raissi, P. Perdikaris, and G. Karniadakis. Physics-informed neural networks.Journal of Computational Physics, 2019

work page 2019
[17]

Convergence properties of policy iteration.SIAM Journal on Control and Optimization, 42(6):2094–2115, 2004

Manuel S Santos and John Rust. Convergence properties of policy iteration.SIAM Journal on Control and Optimization, 42(6):2094–2115, 2004

work page 2094
[18]

MIT press Cambridge, 1998

Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

work page 1998
[19]

Policy iteration for deterministic control problems: A viscosity approach.SIAM Journal on Control and Optimization, 2025

Wenpin Tang, Hung Vinh Tran, and Yuming Zhang. Policy iteration for deterministic control problems: A viscosity approach.SIAM Journal on Control and Optimization, 2025

work page 2025
[20]

American Mathematical Soc., 2021

Hung V Tran.Hamilton–Jacobi equations: theory and applications, volume 213. American Mathematical Soc., 2021

work page 2021
[21]

Policy iteration for exploratory HJB equations.Applied Mathematics and Optimization, 2025

Hung Vinh Tran, Zhenhua Wang, and Yuming Zhang. Policy iteration for exploratory HJB equations.Applied Mathematics and Optimization, 2025

work page 2025
[22]

Adaptive optimal control for continuous-time linear systems based on policy iteration.Automatica, 45(2):477–484, 2009

Draguna Vrabie, Octavian Pastravanu, Murad Abu-Khalaf, and Frank L Lewis. Adaptive optimal control for continuous-time linear systems based on policy iteration.Automatica, 45(2):477–484, 2009. 26

work page 2009

[1] [1]

Birkh¨ auser, 1997

Martino Bardi and Italo Capuzzo-Dolcetta.Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Birkh¨ auser, 1997

work page 1997

[2] [2]

Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991

Guy Barles and Panagiotis E Souganidis. Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991

work page 1991

[3] [3]

Two approximations of solutions of Hamilton–Jacobi equations

MG Crandall and PL Lions. Two approximations of solutions of Hamilton–Jacobi equations. Mathematics of Computation, 43(167):1–19, 1984

work page 1984

[4] [4]

Crandall, Hitoshi Ishii, and Pierre-Louis Lions

Michael G. Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial differential equations.Bulletin of the American Mathematical Society, 27(1):1–67, 1992

work page 1992

[5] [5]

Springer, 2006

Wendell H Fleming and H Mete Soner.Controlled Markov processes and viscosity solutions. Springer, 2006

work page 2006

[6] [6]

Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

work page 2018

[7] [7]

Dynamic programming and markov processes

Ronald A Howard. Dynamic programming and markov processes. 1960

work page 1960

[8] [8]

Convergence of policy iteration for entropy- regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

Yu-Jui Huang, Zhenhua Wang, and Zhou Zhou. Convergence of policy iteration for entropy- regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

work page 2025

[9] [9]

Exponential convergence and stability of howard’s policy improvement algorithm for controlled diffusions.SIAM Journal on Control and Optimization, 58(3):1314–1340, 2020

Bekzhan Kerimkulov, David Siska, and Lukasz Szpruch. Exponential convergence and stability of howard’s policy improvement algorithm for controlled diffusions.SIAM Journal on Control and Optimization, 58(3):1314–1340, 2020

work page 2020

[10] [10]

Physics-informed approach for exploratory hamilton–jacobi–bellman equations via policy iterations

Yeongjong Kim, Namkyeong Cho, Minseok Kim, and Yeoneung Kim. Physics-informed approach for exploratory hamilton–jacobi–bellman equations via policy iterations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 22609– 22616, 2026. 25

work page 2026

[11] [11]

Neural policy iteration for stochastic optimal control: A physics-informed approach

Yeongjong Kim, Yeoneung Kim, Minseok Kim, and Namkyeong Cho. Neural policy iteration for stochastic optimal control: A physics-informed approach.arXiv preprint arXiv:2508.01718, 2025

work page arXiv 2025

[12] [12]

On an iterative technique for Riccati equation computations.IEEE Transactions on Automatic Control, 13(1):114–115, 1968

David Kleinman. On an iterative technique for Riccati equation computations.IEEE Transactions on Automatic Control, 13(1):114–115, 1968

work page 1968

[13] [13]

Hamilton–jacobi based policy-iteration via deep operator learning.Neurocomputing, page 130515, 2025

Jae Yong Lee and Yeoneung Kim. Hamilton–jacobi based policy-iteration via deep operator learning.Neurocomputing, page 130515, 2025

work page 2025

[14] [14]

Markov decision processes.Handbooks in operations research and management science, 2:331–434, 1990

Martin L Puterman. Markov decision processes.Handbooks in operations research and management science, 2:331–434, 1990

work page 1990

[15] [15]

On the convergence of policy iteration for controlled diffusions.Journal of Optimization Theory and Applications, 33(1):137–144, 1981

ML Puterman. On the convergence of policy iteration for controlled diffusions.Journal of Optimization Theory and Applications, 33(1):137–144, 1981

work page 1981

[16] [16]

Raissi, P

M. Raissi, P. Perdikaris, and G. Karniadakis. Physics-informed neural networks.Journal of Computational Physics, 2019

work page 2019

[17] [17]

Convergence properties of policy iteration.SIAM Journal on Control and Optimization, 42(6):2094–2115, 2004

Manuel S Santos and John Rust. Convergence properties of policy iteration.SIAM Journal on Control and Optimization, 42(6):2094–2115, 2004

work page 2094

[18] [18]

MIT press Cambridge, 1998

Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

work page 1998

[19] [19]

Policy iteration for deterministic control problems: A viscosity approach.SIAM Journal on Control and Optimization, 2025

Wenpin Tang, Hung Vinh Tran, and Yuming Zhang. Policy iteration for deterministic control problems: A viscosity approach.SIAM Journal on Control and Optimization, 2025

work page 2025

[20] [20]

American Mathematical Soc., 2021

Hung V Tran.Hamilton–Jacobi equations: theory and applications, volume 213. American Mathematical Soc., 2021

work page 2021

[21] [21]

Policy iteration for exploratory HJB equations.Applied Mathematics and Optimization, 2025

Hung Vinh Tran, Zhenhua Wang, and Yuming Zhang. Policy iteration for exploratory HJB equations.Applied Mathematics and Optimization, 2025

work page 2025

[22] [22]

Adaptive optimal control for continuous-time linear systems based on policy iteration.Automatica, 45(2):477–484, 2009

Draguna Vrabie, Octavian Pastravanu, Murad Abu-Khalaf, and Frank L Lewis. Adaptive optimal control for continuous-time linear systems based on policy iteration.Automatica, 45(2):477–484, 2009. 26

work page 2009