The Ces\`aro Value Iteration
Pith reviewed 2026-05-22 20:56 UTC · model grok-4.3
The pith
For systems with periodic optimal behavior, Cesàro value iteration converges and recovers the undiscounted infinite-horizon cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For systems with periodic optimal operating behavior, the Cesàro value iteration converges and the Cesàro value function recovers the undiscounted infinite-horizon optimal cost, if the latter is well-defined.
What carries the argument
The Cesàro value iteration, which replaces the ordinary limit in value iteration with the Cesàro mean of the sequence of value-function iterates.
If this is right
- Optimal policies can be extracted from the limit of the Cesàro iterates for this class of problems.
- The method supplies a numerically stable way to solve infinite-horizon problems whose total cost does not converge.
- The recovered value function can serve as a terminal cost in receding-horizon schemes that target periodic operation.
Where Pith is reading between the lines
- The same averaging idea may extend to systems whose optimal trajectories are almost periodic rather than strictly periodic.
- Connections exist to average-cost optimal control, where the Cesàro construction already appears implicitly.
- Implementation on continuous-state systems will require function approximation whose error propagation under Cesàro averaging remains to be quantified.
Load-bearing premise
The system must exhibit periodic optimal operating behavior.
What would settle it
A concrete system with periodic optimal operating behavior in which the Cesàro value iteration either diverges or produces a value function that differs from the true undiscounted optimal cost.
Figures
read the original abstract
In this paper, we consider undiscouted infinite-horizon optimal control for deterministic systems with an uncountable state and input space. We specifically address the case when the classic value iteration does not converge. For such systems, we use the Ces`aro mean to define the infinite-horizon optimal control problem and the corresponding infinite-horizon value function. Moreover, for this value function, we introduce the Ces\`aro value iteration and prove its convergence for the special case of systems with periodic optimal operating behavior. For this instance, we also show that the Ces\`aro value function recovers the undiscounted infinite-horizon optimal cost, if the latter is well-defined.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper considers undiscounted infinite-horizon optimal control for deterministic systems with uncountable state and input spaces. It defines the infinite-horizon problem and value function via the Cesàro mean in cases where standard value iteration fails to converge, introduces the Cesàro value iteration, proves convergence of this iteration for the special case of systems with periodic optimal operating behavior, and shows that the resulting value function recovers the undiscounted optimal cost when the latter is well-defined.
Significance. If the stated proofs hold, the work supplies a targeted theoretical construction for a nontrivial subclass of undiscounted problems on uncountable spaces. The restriction to periodic optimal operating behavior is explicitly acknowledged, and the manuscript provides machine-checked-style proofs (as claimed) together with a recovery result for the optimal cost; these are concrete strengths when the derivations are correct.
minor comments (2)
- The abstract and introduction should state the precise measurability and topological assumptions on the state and input spaces at the outset, as these are central to well-posedness on uncountable domains.
- Notation for the Cesàro mean operator and the associated value function should be introduced with a dedicated preliminary section or table to improve readability.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and for recommending minor revision. The referee's summary accurately captures the scope and contributions of the work on Cesàro value iteration for undiscounted infinite-horizon optimal control problems.
Circularity Check
No significant circularity
full rationale
The paper supplies an explicit proof of convergence for the Cesàro value iteration under the stated special case of periodic optimal operating behavior, together with recovery of the undiscounted cost when finite. No equations or steps are shown to reduce by definition to fitted inputs, self-citations, or ansatzes imported from prior work by the same authors. The central claim is therefore a self-contained theoretical result whose validity rests on the supplied proof rather than on any circular reduction to its own inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The system has periodic optimal operating behavior
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
prove its convergence for the special case of systems with periodic optimal operating behavior... optimal periodic orbit Π⋆ (Def. 12, Ass. 13 strict dissipativity)
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Cesàro mean... lim N→∞ VcesN(x) ... recovers the undiscounted infinite-horizon optimal cost
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Optimal decision procedures for finite Markov chains. Part II: Communicating systems,
J. Bather, “Optimal decision procedures for finite Markov chains. Part II: Communicating systems,” Advances in Applied Probability , vol. 5, no. 3, pp. 521–540, 1973
work page 1973
-
[2]
A modified form of the iterative method of dynamic programming,
A. Hordijk and H. Tijms, “A modified form of the iterative method of dynamic programming,” The Annals of Statistics , pp. 203–208, 1975
work page 1975
-
[3]
Bertsekas, Dynamic Programming and Optimal Control , 3rd ed
D. Bertsekas, Dynamic Programming and Optimal Control , 3rd ed. Belmont, MA, USA: Athena Scientific, 2005, vol. 1 and vol. 2
work page 2005
-
[4]
R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 2nd ed. Cambridge, MA, USA: MIT press, 2018
work page 2018
-
[5]
Infinite time optimal control and periodicity,
F. Colonius and W. Kliemann, “Infinite time optimal control and periodicity,” Applied Mathematics & Optimization , vol. 20, pp. 113– 130, 1989
work page 1989
-
[6]
A uniform Tauberian theorem in dynamic programming,
E. Lehrer and S. Sorin, “A uniform Tauberian theorem in dynamic programming,” Mathematics of Operations Research , vol. 17, no. 2, pp. 303–307, 1992
work page 1992
-
[7]
V . Gaitsgory, A. Parkinson, and I. Shvartsman, “Linear programming formulations of deterministic infinite horizon optimal control problems in discrete time,” Discrete and Continuous Dynamical Systems. Series B, vol. 22, no. 10, pp. 3821–3838, 2017
work page 2017
-
[8]
Examining average and discounted reward optimality criteria in reinforcement learning,
V . Dewanto and M. Gallagher, “Examining average and discounted reward optimality criteria in reinforcement learning,” in Australasian Joint Conference on Artificial Intelligence . Springer, 2022, pp. 800– 813
work page 2022
-
[9]
G. H. Hardy, Divergent Series. Oxford University Press, 1949
work page 1949
-
[10]
Economic Nonlinear Model Predictive Control,
T. Faulwasser, L. Gr ¨une, and M. A. M ¨uller, “Economic Nonlinear Model Predictive Control,” Foundations and Trends in Systems and Control, vol. 5, no. 1, pp. 224–409, 2018
work page 2018
-
[11]
Economic model predictive control with- out terminal constraints for optimal periodic behavior,
M. A. M ¨uller and L. Gr¨une, “Economic model predictive control with- out terminal constraints for optimal periodic behavior,” Automatica, vol. 70, pp. 128–139, Aug. 2016
work page 2016
-
[12]
Linearly discounted economic MPC without terminal conditions for periodic optimal operation,
L. Schwenkel, A. Hadorn, M. A. M ¨uller, and F. Allg ¨ower, “Linearly discounted economic MPC without terminal conditions for periodic optimal operation,” Automatica, vol. 159, p. 111393, 2024
work page 2024
-
[13]
Asymptotic stability and transient optimality of economic mpc without terminal conditions,
L. Gr ¨une and M. Stieler, “Asymptotic stability and transient optimality of economic mpc without terminal conditions,” Journal of Process Control, vol. 24, no. 8, pp. 1187–1196, Aug. 2014
work page 2014
-
[14]
On discount functions for economic model predictive control without terminal conditions,
L. Schwenkel, D. Briem, M. A. M ¨uller, and F. Allg¨ower, “On discount functions for economic model predictive control without terminal conditions,” arXiv:2405.14361, 2024
-
[15]
On the role of dissipativity in economic model predictive control,
M. A. M ¨uller, L. Gr¨une, and F. Allg¨ower, “On the role of dissipativity in economic model predictive control,”Proc. 5th IFAC Conf. Nonlinear Model Predictive Control (NMPC) , pp. 110–116, 2015
work page 2015
-
[16]
J. Mair, L. Schwenkel, M. A. M ¨uller, and F. Allg ¨ower, “The Ces `aro Value Iteration,” arXiv:2504.04889, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Dissipative dynamical systems part I: General theory,
J. C. Willems, “Dissipative dynamical systems part I: General theory,” Archive for rational mechanics and analysis , vol. 45, no. 5, pp. 321– 351, 1972. APPENDIX Lemma 35: Let Assumptions 1, 13, 14, 15 and 18 hold. Then, there exists ˜C < ∞ such that ˜V β N(x) ≤ ˜C for all N ∈ N and for all x ∈ X. Proof: By Assumptions 14 and 15, for all x ∈ X, there exis...
work page 1972
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.