pith. sign in

arxiv: 2604.10191 · v1 · submitted 2026-04-11 · 🧮 math.OC · cs.NA· math.NA

Policy Iteration for Stationary Discounted Hamilton--Jacobi--Bellman Equations: A Viscosity Approach

Pith reviewed 2026-05-10 16:09 UTC · model grok-4.3

classification 🧮 math.OC cs.NAmath.NA
keywords policy iterationHamilton-Jacobi-Bellman equationviscosity solutionoptimal controlartificial viscositydiscretizationconvergence rateinfinite horizon
0
0 comments X

The pith

A space-discrete scheme with artificial viscosity of order h makes policy iteration well-defined for stationary discounted HJB equations and guarantees geometric convergence to the discrete solution with total error at most C sqrt(h).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Policy iteration for stationary discounted Hamilton-Jacobi-Bellman equations is ill-posed at the continuous level because the improvement step requires the gradient of a viscosity solution that may not exist classically. The authors regularize the problem by discretizing space and adding artificial viscosity of order O(h). This produces a monotone discrete operator for which policy iteration is well-defined and converges geometrically to the unique discrete solution for any fixed mesh size. The discrete solution approximates the true viscosity solution with an error of order sqrt(h), and the total error admits a decomposition that isolates the contribution from the number of iterations.

Core claim

By introducing artificial viscosity of order O(h) into a space-discrete approximation of the stationary discounted HJB equation, policy iteration becomes a well-posed monotone contraction mapping on the discrete grid. For each fixed h > 0 the iterates converge monotonically and geometrically to the unique discrete solution because the discount produces a resolvent contraction. The discrete solution satisfies a sharp vanishing-viscosity bound ||V^h - V||_∞ ≤ C √h, and the total error can be decomposed into a policy-iteration component that decays geometrically in the number of iterations and a discretization component of order √h.

What carries the argument

The monotone semi-discrete operator obtained by adding artificial viscosity of order O(h) to the space-discrete Hamiltonian, which permits a pointwise policy-improvement step using discrete gradients.

If this is right

  • For any fixed mesh size h the policy iteration sequence converges geometrically because of the resolvent structure induced by the discount factor.
  • The total approximation error decomposes into a geometrically decaying policy-iteration contribution and an O(√h) discretization contribution that can be balanced by choice of iteration count.
  • The vanishing-viscosity limit as h tends to zero recovers the original continuous viscosity solution.
  • Numerical experiments on nonlinear one- and two-dimensional control problems reproduce the predicted geometric convergence followed by a plateau at the discretization error level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same artificial-viscosity regularization may allow policy iteration to be applied to other stationary viscosity problems where the continuous improvement step is formally undefined.
  • Balancing the number of iterations against mesh size could yield near-optimal computational cost for high-dimensional infinite-horizon control problems.
  • The resolvent-based contraction mechanism may extend the convergence analysis to related discounted problems such as stochastic control or mean-field games.

Load-bearing premise

That an artificial viscosity term of size proportional to the mesh size h is sufficient to restore the comparison principle for the discrete operator while still allowing the policy improvement step to be performed pointwise with discrete gradients.

What would settle it

Numerical computation on a sequence of successively finer meshes showing that the observed L^∞ error between the discrete solution and a reference solution fails to decrease proportionally to the square root of h, or that the policy iteration sequence does not exhibit geometric contraction for some fixed positive h.

Figures

Figures reproduced from arXiv: 2604.10191 by Namkyeong Cho, Yeoneung Kim.

Figure 1
Figure 1. Figure 1: Fixed-h policy iteration for the one-dimensional discounted quadratic control problem. The left panel shows convergence of the value iterates, the middle panel illustrates the decay￾then-plateau behavior of the error, and the right confirms geometric decay of the PI residual. Together, these results clearly separate iteration error from discretization error, in agreement with the theoretical bound. Error m… view at source ↗
Figure 2
Figure 2. Figure 2: Fixed-h PI convergence in the nonlinear 2D benchmark. The three panels show two representative one-dimensional slices of the value iterates and the decay of the global error to the reference solution. (a) PINN prediction on the slice y 7→ Vθ(x0, y) at x0 = 0.80. (b) PINN prediction on the slice x 7→ Vθ(x, y0) at y0 = −0.80. (c) Error to the manufactured reference solution versus training step [PITH_FULL_I… view at source ↗
Figure 3
Figure 3. Figure 3: Boundary-free PINN experiment for the same nonlinear 2D manufactured benchmark. [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗
read the original abstract

We study policy iteration (PI) for deterministic infinite-horizon discounted optimal control problems, whose value function is characterized by a stationary Hamilton--Jacobi--Bellman (HJB) equation. At the PDE level, PI is fundamentally ill-posed: the improvement step requires pointwise evaluation of $\nabla V$, which is not well defined for viscosity solutions, and thus the associated nonlinear operator cannot be interpreted in a stable functional sense. We develop a monotone semi-discrete formulation for the stationary discounted setting by introducing a space-discrete scheme with artificial viscosity of order $O(h)$. This regularization restores comparison, ensures monotonicity of the discrete operator, and yields a well-defined pointwise policy improvement via discrete gradients. Our analysis reveals a convergence mechanism fundamentally different from the finite-horizon case. For each fixed mesh size $h>0$, we prove that the semi-discrete PI sequence converges monotonically and geometrically to the unique discrete solution, where the contraction is induced by the resolvent structure of the discounted operator. We further establish the sharp vanishing-viscosity estimate $\|V^h - V\|_{L^\infty} \leq C\sqrt{h}$, and derive a quantitative error decomposition that separates policy iteration error from discretization error, exhibiting a nontrivial coupling between iteration count and mesh size. Numerical experiments in nonlinear one and two-dimensional control problems confirm the theoretical predictions, including geometric convergence and the characteristic decay-then-plateau behavior of the total error.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper develops a monotone semi-discrete scheme with O(h) artificial viscosity for policy iteration applied to stationary discounted HJB equations arising from infinite-horizon deterministic optimal control. It proves that, for each fixed mesh size h>0, the semi-discrete PI sequence converges monotonically and geometrically to the unique discrete solution, with the contraction induced by the resolvent structure of the discounted operator. The analysis further yields the sharp vanishing-viscosity bound ||V^h - V||_L^∞ ≤ C√h together with a quantitative error decomposition that separates policy-iteration error from discretization error and exhibits their coupling; numerical experiments on nonlinear 1D and 2D problems confirm the predicted geometric rates and the characteristic decay-then-plateau total-error behavior.

Significance. If the central claims hold, the work supplies a rigorous viscosity-theoretic foundation for policy iteration in the stationary discounted setting, where the PDE-level formulation is otherwise ill-posed because of the need for pointwise gradients. The resolvent-based contraction argument, the sharp O(√h) rate obtained via doubling-variables techniques, and the explicit iteration-discretization error split are all load-bearing contributions that distinguish the infinite-horizon case from existing finite-horizon analyses and are directly useful for practical implementation.

minor comments (2)
  1. [§2] §2 (or the section introducing the semi-discrete scheme): the precise form of the artificial viscosity term and the definition of the discrete gradient used in the policy-improvement step should be stated explicitly before the monotonicity proof, so that the comparison principle can be verified directly from the scheme.
  2. [Numerical experiments] The numerical section: the reported error tables would benefit from an additional column or plot that isolates the pure discretization error (by running PI to machine precision for each h) to make the quantitative decomposition visually verifiable.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment, the accurate summary of our contributions, and the recommendation for minor revision. The recognition of the resolvent-based contraction, the sharp O(√h) bound via doubling-variables techniques, and the iteration-discretization error split is appreciated. As the report lists no specific major comments, we have no points requiring point-by-point rebuttal.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central claims rest on a standard resolvent contraction induced by the positive discount factor for the semi-discrete scheme and on vanishing-viscosity estimates obtained via doubling-variables arguments. These are independent mathematical facts external to the paper's own constructions; the O(h) artificial viscosity is introduced explicitly as a regularization to restore monotonicity and comparison, not derived from the target result. No step reduces a prediction or uniqueness claim to a fitted parameter, self-citation chain, or definitional tautology. The quantitative error decomposition follows directly from the contraction and comparison principles without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on standard viscosity solution theory for HJB equations and properties of discounted resolvents; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption Viscosity solutions to HJB equations satisfy comparison principles under suitable conditions
    Invoked to ensure the regularized discrete operator restores comparison and monotonicity.

pith-pipeline@v0.9.0 · 5566 in / 1237 out tokens · 44367 ms · 2026-05-10T16:09:35.169601+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

  1. [1]

    Birkh¨ auser, 1997

    Martino Bardi and Italo Capuzzo-Dolcetta.Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Birkh¨ auser, 1997

  2. [2]

    Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991

    Guy Barles and Panagiotis E Souganidis. Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991

  3. [3]

    Two approximations of solutions of Hamilton–Jacobi equations

    MG Crandall and PL Lions. Two approximations of solutions of Hamilton–Jacobi equations. Mathematics of Computation, 43(167):1–19, 1984

  4. [4]

    Crandall, Hitoshi Ishii, and Pierre-Louis Lions

    Michael G. Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial differential equations.Bulletin of the American Mathematical Society, 27(1):1–67, 1992

  5. [5]

    Springer, 2006

    Wendell H Fleming and H Mete Soner.Controlled Markov processes and viscosity solutions. Springer, 2006

  6. [6]

    Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

    Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differen- tial equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

  7. [7]

    Dynamic programming and markov processes

    Ronald A Howard. Dynamic programming and markov processes. 1960

  8. [8]

    Convergence of policy iteration for entropy- regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

    Yu-Jui Huang, Zhenhua Wang, and Zhou Zhou. Convergence of policy iteration for entropy- regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

  9. [9]

    Exponential convergence and stability of howard’s policy improvement algorithm for controlled diffusions.SIAM Journal on Control and Optimization, 58(3):1314–1340, 2020

    Bekzhan Kerimkulov, David Siska, and Lukasz Szpruch. Exponential convergence and stability of howard’s policy improvement algorithm for controlled diffusions.SIAM Journal on Control and Optimization, 58(3):1314–1340, 2020

  10. [10]

    Physics-informed approach for exploratory hamilton–jacobi–bellman equations via policy iterations

    Yeongjong Kim, Namkyeong Cho, Minseok Kim, and Yeoneung Kim. Physics-informed approach for exploratory hamilton–jacobi–bellman equations via policy iterations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 22609– 22616, 2026. 25

  11. [11]

    Neural policy iteration for stochastic optimal control: A physics-informed approach

    Yeongjong Kim, Yeoneung Kim, Minseok Kim, and Namkyeong Cho. Neural policy iteration for stochastic optimal control: A physics-informed approach.arXiv preprint arXiv:2508.01718, 2025

  12. [12]

    On an iterative technique for Riccati equation computations.IEEE Transactions on Automatic Control, 13(1):114–115, 1968

    David Kleinman. On an iterative technique for Riccati equation computations.IEEE Transactions on Automatic Control, 13(1):114–115, 1968

  13. [13]

    Hamilton–jacobi based policy-iteration via deep operator learning.Neurocomputing, page 130515, 2025

    Jae Yong Lee and Yeoneung Kim. Hamilton–jacobi based policy-iteration via deep operator learning.Neurocomputing, page 130515, 2025

  14. [14]

    Markov decision processes.Handbooks in operations research and management science, 2:331–434, 1990

    Martin L Puterman. Markov decision processes.Handbooks in operations research and management science, 2:331–434, 1990

  15. [15]

    On the convergence of policy iteration for controlled diffusions.Journal of Optimization Theory and Applications, 33(1):137–144, 1981

    ML Puterman. On the convergence of policy iteration for controlled diffusions.Journal of Optimization Theory and Applications, 33(1):137–144, 1981

  16. [16]

    Raissi, P

    M. Raissi, P. Perdikaris, and G. Karniadakis. Physics-informed neural networks.Journal of Computational Physics, 2019

  17. [17]

    Convergence properties of policy iteration.SIAM Journal on Control and Optimization, 42(6):2094–2115, 2004

    Manuel S Santos and John Rust. Convergence properties of policy iteration.SIAM Journal on Control and Optimization, 42(6):2094–2115, 2004

  18. [18]

    MIT press Cambridge, 1998

    Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

  19. [19]

    Policy iteration for deterministic control problems: A viscosity approach.SIAM Journal on Control and Optimization, 2025

    Wenpin Tang, Hung Vinh Tran, and Yuming Zhang. Policy iteration for deterministic control problems: A viscosity approach.SIAM Journal on Control and Optimization, 2025

  20. [20]

    American Mathematical Soc., 2021

    Hung V Tran.Hamilton–Jacobi equations: theory and applications, volume 213. American Mathematical Soc., 2021

  21. [21]

    Policy iteration for exploratory HJB equations.Applied Mathematics and Optimization, 2025

    Hung Vinh Tran, Zhenhua Wang, and Yuming Zhang. Policy iteration for exploratory HJB equations.Applied Mathematics and Optimization, 2025

  22. [22]

    Adaptive optimal control for continuous-time linear systems based on policy iteration.Automatica, 45(2):477–484, 2009

    Draguna Vrabie, Octavian Pastravanu, Murad Abu-Khalaf, and Frank L Lewis. Adaptive optimal control for continuous-time linear systems based on policy iteration.Automatica, 45(2):477–484, 2009. 26