pith. sign in

arxiv: 2504.04889 · v4 · submitted 2025-04-07 · 📡 eess.SY · cs.SY· math.OC

The Ces\`aro Value Iteration

Pith reviewed 2026-05-22 20:56 UTC · model grok-4.3

classification 📡 eess.SY cs.SYmath.OC
keywords Cesàro meanvalue iterationundiscounted optimal controlperiodic operationinfinite horizondeterministic systems
0
0 comments X

The pith

For systems with periodic optimal behavior, Cesàro value iteration converges and recovers the undiscounted infinite-horizon cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles undiscounted infinite-horizon optimal control for deterministic systems whose state and input spaces are uncountable, a setting where classic value iteration often fails to converge. It replaces the standard limit with the Cesàro mean to define both the problem and its value function, then introduces the Cesàro value iteration as the corresponding algorithm. The central result is a convergence proof for the special case of periodic optimal operating behavior, together with the guarantee that the obtained value function equals the true undiscounted optimal cost whenever that cost exists. A reader should care because many practical control problems exhibit costs that oscillate or grow without settling to a limit, yet possess well-defined long-run averages that can be recovered by this averaging technique.

Core claim

For systems with periodic optimal operating behavior, the Cesàro value iteration converges and the Cesàro value function recovers the undiscounted infinite-horizon optimal cost, if the latter is well-defined.

What carries the argument

The Cesàro value iteration, which replaces the ordinary limit in value iteration with the Cesàro mean of the sequence of value-function iterates.

If this is right

  • Optimal policies can be extracted from the limit of the Cesàro iterates for this class of problems.
  • The method supplies a numerically stable way to solve infinite-horizon problems whose total cost does not converge.
  • The recovered value function can serve as a terminal cost in receding-horizon schemes that target periodic operation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same averaging idea may extend to systems whose optimal trajectories are almost periodic rather than strictly periodic.
  • Connections exist to average-cost optimal control, where the Cesàro construction already appears implicitly.
  • Implementation on continuous-state systems will require function approximation whose error propagation under Cesàro averaging remains to be quantified.

Load-bearing premise

The system must exhibit periodic optimal operating behavior.

What would settle it

A concrete system with periodic optimal operating behavior in which the Cesàro value iteration either diverges or produces a value function that differs from the true undiscounted optimal cost.

Figures

Figures reproduced from arXiv: 2504.04889 by Frank Allg\"ower, Jonas Mair, Lukas Schwenkel, Matthias A. M\"uller.

Figure 1
Figure 1. Figure 1: Illustration of the states xi (nodes) and feasible transitions (edges) with corresponding inputs uij and costs ℓ(xi, uij ) for a system which is optimally operated at a periodic orbit with average cost ℓ ⋆ = 0. x1 x0 x3 x2 ℓ(x1, u11) = 0 ℓ(x2, u22) = 0 ℓ(x3, u01) = 2 ℓ(x3, u13) = −3 ℓ(x3, u02) = 1 ℓ(x3, u23) = −1 ℓ(x3, u30) = 5 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the states xi (nodes) and feasible transitions (edges) with corresponding inputs uij and costs ℓ(xi, uij ) for a system which is optimally operated at a periodic orbit with average cost ℓ ⋆ = 0. then, under certain assumptions, it can be proven that V¯∞(x) = V¯ ces ∞ (x) for all x ∈ X (cf. Theorem 23). Our main conceptual contribution is a recursive procedure to compute the Cesaro value fun… view at source ↗
Figure 3
Figure 3. Figure 3: Values of value functions for x = 6 in dependence of N. 0.6 0.7 0.8 0.9 1 101 102 γ N⋆ [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Number of VIs N⋆ of V¯ ces N (red) and V¯ γ N (blue) that are required for the resulting policy to converge. A dashed-dotted line indicates that the resulting input is sub-optimal w.r.t. Cesaro infinite-horizon performance. ` depends on γ and the limit is therefore not equal to V¯ ces ∞ as can be seen in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

In this paper, we consider undiscouted infinite-horizon optimal control for deterministic systems with an uncountable state and input space. We specifically address the case when the classic value iteration does not converge. For such systems, we use the Ces`aro mean to define the infinite-horizon optimal control problem and the corresponding infinite-horizon value function. Moreover, for this value function, we introduce the Ces\`aro value iteration and prove its convergence for the special case of systems with periodic optimal operating behavior. For this instance, we also show that the Ces\`aro value function recovers the undiscounted infinite-horizon optimal cost, if the latter is well-defined.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper considers undiscounted infinite-horizon optimal control for deterministic systems with uncountable state and input spaces. It defines the infinite-horizon problem and value function via the Cesàro mean in cases where standard value iteration fails to converge, introduces the Cesàro value iteration, proves convergence of this iteration for the special case of systems with periodic optimal operating behavior, and shows that the resulting value function recovers the undiscounted optimal cost when the latter is well-defined.

Significance. If the stated proofs hold, the work supplies a targeted theoretical construction for a nontrivial subclass of undiscounted problems on uncountable spaces. The restriction to periodic optimal operating behavior is explicitly acknowledged, and the manuscript provides machine-checked-style proofs (as claimed) together with a recovery result for the optimal cost; these are concrete strengths when the derivations are correct.

minor comments (2)
  1. The abstract and introduction should state the precise measurability and topological assumptions on the state and input spaces at the outset, as these are central to well-posedness on uncountable domains.
  2. Notation for the Cesàro mean operator and the associated value function should be introduced with a dedicated preliminary section or table to improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending minor revision. The referee's summary accurately captures the scope and contributions of the work on Cesàro value iteration for undiscounted infinite-horizon optimal control problems.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper supplies an explicit proof of convergence for the Cesàro value iteration under the stated special case of periodic optimal operating behavior, together with recovery of the undiscounted cost when finite. No equations or steps are shown to reduce by definition to fitted inputs, self-citations, or ansatzes imported from prior work by the same authors. The central claim is therefore a self-contained theoretical result whose validity rests on the supplied proof rather than on any circular reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that optimal behavior is periodic; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption The system has periodic optimal operating behavior
    Convergence of Cesàro value iteration and recovery of optimal cost are proved specifically for this case.

pith-pipeline@v0.9.0 · 5646 in / 1112 out tokens · 40707 ms · 2026-05-22T20:56:12.409417+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    Optimal decision procedures for finite Markov chains. Part II: Communicating systems,

    J. Bather, “Optimal decision procedures for finite Markov chains. Part II: Communicating systems,” Advances in Applied Probability , vol. 5, no. 3, pp. 521–540, 1973

  2. [2]

    A modified form of the iterative method of dynamic programming,

    A. Hordijk and H. Tijms, “A modified form of the iterative method of dynamic programming,” The Annals of Statistics , pp. 203–208, 1975

  3. [3]

    Bertsekas, Dynamic Programming and Optimal Control , 3rd ed

    D. Bertsekas, Dynamic Programming and Optimal Control , 3rd ed. Belmont, MA, USA: Athena Scientific, 2005, vol. 1 and vol. 2

  4. [4]

    R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 2nd ed. Cambridge, MA, USA: MIT press, 2018

  5. [5]

    Infinite time optimal control and periodicity,

    F. Colonius and W. Kliemann, “Infinite time optimal control and periodicity,” Applied Mathematics & Optimization , vol. 20, pp. 113– 130, 1989

  6. [6]

    A uniform Tauberian theorem in dynamic programming,

    E. Lehrer and S. Sorin, “A uniform Tauberian theorem in dynamic programming,” Mathematics of Operations Research , vol. 17, no. 2, pp. 303–307, 1992

  7. [7]

    Linear programming formulations of deterministic infinite horizon optimal control problems in discrete time,

    V . Gaitsgory, A. Parkinson, and I. Shvartsman, “Linear programming formulations of deterministic infinite horizon optimal control problems in discrete time,” Discrete and Continuous Dynamical Systems. Series B, vol. 22, no. 10, pp. 3821–3838, 2017

  8. [8]

    Examining average and discounted reward optimality criteria in reinforcement learning,

    V . Dewanto and M. Gallagher, “Examining average and discounted reward optimality criteria in reinforcement learning,” in Australasian Joint Conference on Artificial Intelligence . Springer, 2022, pp. 800– 813

  9. [9]

    G. H. Hardy, Divergent Series. Oxford University Press, 1949

  10. [10]

    Economic Nonlinear Model Predictive Control,

    T. Faulwasser, L. Gr ¨une, and M. A. M ¨uller, “Economic Nonlinear Model Predictive Control,” Foundations and Trends in Systems and Control, vol. 5, no. 1, pp. 224–409, 2018

  11. [11]

    Economic model predictive control with- out terminal constraints for optimal periodic behavior,

    M. A. M ¨uller and L. Gr¨une, “Economic model predictive control with- out terminal constraints for optimal periodic behavior,” Automatica, vol. 70, pp. 128–139, Aug. 2016

  12. [12]

    Linearly discounted economic MPC without terminal conditions for periodic optimal operation,

    L. Schwenkel, A. Hadorn, M. A. M ¨uller, and F. Allg ¨ower, “Linearly discounted economic MPC without terminal conditions for periodic optimal operation,” Automatica, vol. 159, p. 111393, 2024

  13. [13]

    Asymptotic stability and transient optimality of economic mpc without terminal conditions,

    L. Gr ¨une and M. Stieler, “Asymptotic stability and transient optimality of economic mpc without terminal conditions,” Journal of Process Control, vol. 24, no. 8, pp. 1187–1196, Aug. 2014

  14. [14]

    On discount functions for economic model predictive control without terminal conditions,

    L. Schwenkel, D. Briem, M. A. M ¨uller, and F. Allg¨ower, “On discount functions for economic model predictive control without terminal conditions,” arXiv:2405.14361, 2024

  15. [15]

    On the role of dissipativity in economic model predictive control,

    M. A. M ¨uller, L. Gr¨une, and F. Allg¨ower, “On the role of dissipativity in economic model predictive control,”Proc. 5th IFAC Conf. Nonlinear Model Predictive Control (NMPC) , pp. 110–116, 2015

  16. [16]

    The Ces\`aro Value Iteration

    J. Mair, L. Schwenkel, M. A. M ¨uller, and F. Allg ¨ower, “The Ces `aro Value Iteration,” arXiv:2504.04889, 2025

  17. [17]

    Dissipative dynamical systems part I: General theory,

    J. C. Willems, “Dissipative dynamical systems part I: General theory,” Archive for rational mechanics and analysis , vol. 45, no. 5, pp. 321– 351, 1972. APPENDIX Lemma 35: Let Assumptions 1, 13, 14, 15 and 18 hold. Then, there exists ˜C < ∞ such that ˜V β N(x) ≤ ˜C for all N ∈ N and for all x ∈ X. Proof: By Assumptions 14 and 15, for all x ∈ X, there exis...