The Ces\`aro Value Iteration

Frank Allg\"ower; Jonas Mair; Lukas Schwenkel; Matthias A. M\"uller

arxiv: 2504.04889 · v4 · submitted 2025-04-07 · 📡 eess.SY · cs.SY· math.OC

The Ces\`aro Value Iteration

Jonas Mair , Lukas Schwenkel , Matthias A. M\"uller , Frank Allg\"ower This is my paper

Pith reviewed 2026-05-22 20:56 UTC · model grok-4.3

classification 📡 eess.SY cs.SYmath.OC

keywords Cesàro meanvalue iterationundiscounted optimal controlperiodic operationinfinite horizondeterministic systems

0 comments

The pith

For systems with periodic optimal behavior, Cesàro value iteration converges and recovers the undiscounted infinite-horizon cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles undiscounted infinite-horizon optimal control for deterministic systems whose state and input spaces are uncountable, a setting where classic value iteration often fails to converge. It replaces the standard limit with the Cesàro mean to define both the problem and its value function, then introduces the Cesàro value iteration as the corresponding algorithm. The central result is a convergence proof for the special case of periodic optimal operating behavior, together with the guarantee that the obtained value function equals the true undiscounted optimal cost whenever that cost exists. A reader should care because many practical control problems exhibit costs that oscillate or grow without settling to a limit, yet possess well-defined long-run averages that can be recovered by this averaging technique.

Core claim

For systems with periodic optimal operating behavior, the Cesàro value iteration converges and the Cesàro value function recovers the undiscounted infinite-horizon optimal cost, if the latter is well-defined.

What carries the argument

The Cesàro value iteration, which replaces the ordinary limit in value iteration with the Cesàro mean of the sequence of value-function iterates.

If this is right

Optimal policies can be extracted from the limit of the Cesàro iterates for this class of problems.
The method supplies a numerically stable way to solve infinite-horizon problems whose total cost does not converge.
The recovered value function can serve as a terminal cost in receding-horizon schemes that target periodic operation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same averaging idea may extend to systems whose optimal trajectories are almost periodic rather than strictly periodic.
Connections exist to average-cost optimal control, where the Cesàro construction already appears implicitly.
Implementation on continuous-state systems will require function approximation whose error propagation under Cesàro averaging remains to be quantified.

Load-bearing premise

The system must exhibit periodic optimal operating behavior.

What would settle it

A concrete system with periodic optimal operating behavior in which the Cesàro value iteration either diverges or produces a value function that differs from the true undiscounted optimal cost.

Figures

Figures reproduced from arXiv: 2504.04889 by Frank Allg\"ower, Jonas Mair, Lukas Schwenkel, Matthias A. M\"uller.

**Figure 1.** Figure 1: Illustration of the states xi (nodes) and feasible transitions (edges) with corresponding inputs uij and costs ℓ(xi, uij ) for a system which is optimally operated at a periodic orbit with average cost ℓ ⋆ = 0. x1 x0 x3 x2 ℓ(x1, u11) = 0 ℓ(x2, u22) = 0 ℓ(x3, u01) = 2 ℓ(x3, u13) = −3 ℓ(x3, u02) = 1 ℓ(x3, u23) = −1 ℓ(x3, u30) = 5 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Illustration of the states xi (nodes) and feasible transitions (edges) with corresponding inputs uij and costs ℓ(xi, uij ) for a system which is optimally operated at a periodic orbit with average cost ℓ ⋆ = 0. then, under certain assumptions, it can be proven that V¯∞(x) = V¯ ces ∞ (x) for all x ∈ X (cf. Theorem 23). Our main conceptual contribution is a recursive procedure to compute the Cesaro value fun… view at source ↗

**Figure 3.** Figure 3: Values of value functions for x = 6 in dependence of N. 0.6 0.7 0.8 0.9 1 101 102 γ N⋆ [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Number of VIs N⋆ of V¯ ces N (red) and V¯ γ N (blue) that are required for the resulting policy to converge. A dashed-dotted line indicates that the resulting input is sub-optimal w.r.t. Cesaro infinite-horizon performance. ` depends on γ and the limit is therefore not equal to V¯ ces ∞ as can be seen in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

In this paper, we consider undiscouted infinite-horizon optimal control for deterministic systems with an uncountable state and input space. We specifically address the case when the classic value iteration does not converge. For such systems, we use the Ces`aro mean to define the infinite-horizon optimal control problem and the corresponding infinite-horizon value function. Moreover, for this value function, we introduce the Ces\`aro value iteration and prove its convergence for the special case of systems with periodic optimal operating behavior. For this instance, we also show that the Ces\`aro value function recovers the undiscounted infinite-horizon optimal cost, if the latter is well-defined.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper considers undiscounted infinite-horizon optimal control for deterministic systems with uncountable state and input spaces. It defines the infinite-horizon problem and value function via the Cesàro mean in cases where standard value iteration fails to converge, introduces the Cesàro value iteration, proves convergence of this iteration for the special case of systems with periodic optimal operating behavior, and shows that the resulting value function recovers the undiscounted optimal cost when the latter is well-defined.

Significance. If the stated proofs hold, the work supplies a targeted theoretical construction for a nontrivial subclass of undiscounted problems on uncountable spaces. The restriction to periodic optimal operating behavior is explicitly acknowledged, and the manuscript provides machine-checked-style proofs (as claimed) together with a recovery result for the optimal cost; these are concrete strengths when the derivations are correct.

minor comments (2)

The abstract and introduction should state the precise measurability and topological assumptions on the state and input spaces at the outset, as these are central to well-posedness on uncountable domains.
Notation for the Cesàro mean operator and the associated value function should be introduced with a dedicated preliminary section or table to improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending minor revision. The referee's summary accurately captures the scope and contributions of the work on Cesàro value iteration for undiscounted infinite-horizon optimal control problems.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper supplies an explicit proof of convergence for the Cesàro value iteration under the stated special case of periodic optimal operating behavior, together with recovery of the undiscounted cost when finite. No equations or steps are shown to reduce by definition to fitted inputs, self-citations, or ansatzes imported from prior work by the same authors. The central claim is therefore a self-contained theoretical result whose validity rests on the supplied proof rather than on any circular reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that optimal behavior is periodic; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The system has periodic optimal operating behavior
Convergence of Cesàro value iteration and recovery of optimal cost are proved specifically for this case.

pith-pipeline@v0.9.0 · 5646 in / 1112 out tokens · 40707 ms · 2026-05-22T20:56:12.409417+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

prove its convergence for the special case of systems with periodic optimal operating behavior... optimal periodic orbit Π⋆ (Def. 12, Ass. 13 strict dissipativity)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Cesàro mean... lim N→∞ VcesN(x) ... recovers the undiscounted infinite-horizon optimal cost

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

[1]

Optimal decision procedures for finite Markov chains. Part II: Communicating systems,

J. Bather, “Optimal decision procedures for finite Markov chains. Part II: Communicating systems,” Advances in Applied Probability , vol. 5, no. 3, pp. 521–540, 1973

work page 1973
[2]

A modified form of the iterative method of dynamic programming,

A. Hordijk and H. Tijms, “A modified form of the iterative method of dynamic programming,” The Annals of Statistics , pp. 203–208, 1975

work page 1975
[3]

Bertsekas, Dynamic Programming and Optimal Control , 3rd ed

D. Bertsekas, Dynamic Programming and Optimal Control , 3rd ed. Belmont, MA, USA: Athena Scientific, 2005, vol. 1 and vol. 2

work page 2005
[4]

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 2nd ed. Cambridge, MA, USA: MIT press, 2018

work page 2018
[5]

Infinite time optimal control and periodicity,

F. Colonius and W. Kliemann, “Infinite time optimal control and periodicity,” Applied Mathematics & Optimization , vol. 20, pp. 113– 130, 1989

work page 1989
[6]

A uniform Tauberian theorem in dynamic programming,

E. Lehrer and S. Sorin, “A uniform Tauberian theorem in dynamic programming,” Mathematics of Operations Research , vol. 17, no. 2, pp. 303–307, 1992

work page 1992
[7]

Linear programming formulations of deterministic infinite horizon optimal control problems in discrete time,

V . Gaitsgory, A. Parkinson, and I. Shvartsman, “Linear programming formulations of deterministic infinite horizon optimal control problems in discrete time,” Discrete and Continuous Dynamical Systems. Series B, vol. 22, no. 10, pp. 3821–3838, 2017

work page 2017
[8]

Examining average and discounted reward optimality criteria in reinforcement learning,

V . Dewanto and M. Gallagher, “Examining average and discounted reward optimality criteria in reinforcement learning,” in Australasian Joint Conference on Artificial Intelligence . Springer, 2022, pp. 800– 813

work page 2022
[9]

G. H. Hardy, Divergent Series. Oxford University Press, 1949

work page 1949
[10]

Economic Nonlinear Model Predictive Control,

T. Faulwasser, L. Gr ¨une, and M. A. M ¨uller, “Economic Nonlinear Model Predictive Control,” Foundations and Trends in Systems and Control, vol. 5, no. 1, pp. 224–409, 2018

work page 2018
[11]

Economic model predictive control with- out terminal constraints for optimal periodic behavior,

M. A. M ¨uller and L. Gr¨une, “Economic model predictive control with- out terminal constraints for optimal periodic behavior,” Automatica, vol. 70, pp. 128–139, Aug. 2016

work page 2016
[12]

Linearly discounted economic MPC without terminal conditions for periodic optimal operation,

L. Schwenkel, A. Hadorn, M. A. M ¨uller, and F. Allg ¨ower, “Linearly discounted economic MPC without terminal conditions for periodic optimal operation,” Automatica, vol. 159, p. 111393, 2024

work page 2024
[13]

Asymptotic stability and transient optimality of economic mpc without terminal conditions,

L. Gr ¨une and M. Stieler, “Asymptotic stability and transient optimality of economic mpc without terminal conditions,” Journal of Process Control, vol. 24, no. 8, pp. 1187–1196, Aug. 2014

work page 2014
[14]

On discount functions for economic model predictive control without terminal conditions,

L. Schwenkel, D. Briem, M. A. M ¨uller, and F. Allg¨ower, “On discount functions for economic model predictive control without terminal conditions,” arXiv:2405.14361, 2024

work page arXiv 2024
[15]

On the role of dissipativity in economic model predictive control,

M. A. M ¨uller, L. Gr¨une, and F. Allg¨ower, “On the role of dissipativity in economic model predictive control,”Proc. 5th IFAC Conf. Nonlinear Model Predictive Control (NMPC) , pp. 110–116, 2015

work page 2015
[16]

The Ces\`aro Value Iteration

J. Mair, L. Schwenkel, M. A. M ¨uller, and F. Allg ¨ower, “The Ces `aro Value Iteration,” arXiv:2504.04889, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Dissipative dynamical systems part I: General theory,

J. C. Willems, “Dissipative dynamical systems part I: General theory,” Archive for rational mechanics and analysis , vol. 45, no. 5, pp. 321– 351, 1972. APPENDIX Lemma 35: Let Assumptions 1, 13, 14, 15 and 18 hold. Then, there exists ˜C < ∞ such that ˜V β N(x) ≤ ˜C for all N ∈ N and for all x ∈ X. Proof: By Assumptions 14 and 15, for all x ∈ X, there exis...

work page 1972

[1] [1]

Optimal decision procedures for finite Markov chains. Part II: Communicating systems,

J. Bather, “Optimal decision procedures for finite Markov chains. Part II: Communicating systems,” Advances in Applied Probability , vol. 5, no. 3, pp. 521–540, 1973

work page 1973

[2] [2]

A modified form of the iterative method of dynamic programming,

A. Hordijk and H. Tijms, “A modified form of the iterative method of dynamic programming,” The Annals of Statistics , pp. 203–208, 1975

work page 1975

[3] [3]

Bertsekas, Dynamic Programming and Optimal Control , 3rd ed

D. Bertsekas, Dynamic Programming and Optimal Control , 3rd ed. Belmont, MA, USA: Athena Scientific, 2005, vol. 1 and vol. 2

work page 2005

[4] [4]

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 2nd ed. Cambridge, MA, USA: MIT press, 2018

work page 2018

[5] [5]

Infinite time optimal control and periodicity,

F. Colonius and W. Kliemann, “Infinite time optimal control and periodicity,” Applied Mathematics & Optimization , vol. 20, pp. 113– 130, 1989

work page 1989

[6] [6]

A uniform Tauberian theorem in dynamic programming,

E. Lehrer and S. Sorin, “A uniform Tauberian theorem in dynamic programming,” Mathematics of Operations Research , vol. 17, no. 2, pp. 303–307, 1992

work page 1992

[7] [7]

Linear programming formulations of deterministic infinite horizon optimal control problems in discrete time,

V . Gaitsgory, A. Parkinson, and I. Shvartsman, “Linear programming formulations of deterministic infinite horizon optimal control problems in discrete time,” Discrete and Continuous Dynamical Systems. Series B, vol. 22, no. 10, pp. 3821–3838, 2017

work page 2017

[8] [8]

Examining average and discounted reward optimality criteria in reinforcement learning,

V . Dewanto and M. Gallagher, “Examining average and discounted reward optimality criteria in reinforcement learning,” in Australasian Joint Conference on Artificial Intelligence . Springer, 2022, pp. 800– 813

work page 2022

[9] [9]

G. H. Hardy, Divergent Series. Oxford University Press, 1949

work page 1949

[10] [10]

Economic Nonlinear Model Predictive Control,

T. Faulwasser, L. Gr ¨une, and M. A. M ¨uller, “Economic Nonlinear Model Predictive Control,” Foundations and Trends in Systems and Control, vol. 5, no. 1, pp. 224–409, 2018

work page 2018

[11] [11]

Economic model predictive control with- out terminal constraints for optimal periodic behavior,

M. A. M ¨uller and L. Gr¨une, “Economic model predictive control with- out terminal constraints for optimal periodic behavior,” Automatica, vol. 70, pp. 128–139, Aug. 2016

work page 2016

[12] [12]

Linearly discounted economic MPC without terminal conditions for periodic optimal operation,

L. Schwenkel, A. Hadorn, M. A. M ¨uller, and F. Allg ¨ower, “Linearly discounted economic MPC without terminal conditions for periodic optimal operation,” Automatica, vol. 159, p. 111393, 2024

work page 2024

[13] [13]

Asymptotic stability and transient optimality of economic mpc without terminal conditions,

L. Gr ¨une and M. Stieler, “Asymptotic stability and transient optimality of economic mpc without terminal conditions,” Journal of Process Control, vol. 24, no. 8, pp. 1187–1196, Aug. 2014

work page 2014

[14] [14]

On discount functions for economic model predictive control without terminal conditions,

L. Schwenkel, D. Briem, M. A. M ¨uller, and F. Allg¨ower, “On discount functions for economic model predictive control without terminal conditions,” arXiv:2405.14361, 2024

work page arXiv 2024

[15] [15]

On the role of dissipativity in economic model predictive control,

M. A. M ¨uller, L. Gr¨une, and F. Allg¨ower, “On the role of dissipativity in economic model predictive control,”Proc. 5th IFAC Conf. Nonlinear Model Predictive Control (NMPC) , pp. 110–116, 2015

work page 2015

[16] [16]

The Ces\`aro Value Iteration

J. Mair, L. Schwenkel, M. A. M ¨uller, and F. Allg ¨ower, “The Ces `aro Value Iteration,” arXiv:2504.04889, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

Dissipative dynamical systems part I: General theory,

J. C. Willems, “Dissipative dynamical systems part I: General theory,” Archive for rational mechanics and analysis , vol. 45, no. 5, pp. 321– 351, 1972. APPENDIX Lemma 35: Let Assumptions 1, 13, 14, 15 and 18 hold. Then, there exists ˜C < ∞ such that ˜V β N(x) ≤ ˜C for all N ∈ N and for all x ∈ X. Proof: By Assumptions 14 and 15, for all x ∈ X, there exis...

work page 1972