pith. sign in

arxiv: 2508.01942 · v2 · submitted 2025-08-03 · 🧮 math.OC · math.ST· stat.TH

Central Limit Theorems for Sample Average Approximations in Stochastic Optimal Control

Pith reviewed 2026-05-19 01:17 UTC · model grok-4.3

classification 🧮 math.OC math.STstat.TH
keywords central limit theoremsample average approximationstochastic optimal controldynamic programmingvalue functionasymptotic normalityvariance decomposition
0
0 comments X

The pith

Sample average approximations of value functions in stochastic optimal control obey central limit theorems with recursively characterized Gaussian limits under unique policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops central limit theorems for the sample average approximation method applied to discrete-time finite-horizon stochastic optimal control problems. It introduces an abstract limit theorem for stochastic backward recursions that gives a recursive way to characterize the limiting distributions. When applied to the dynamic programming recursion, this yields Gaussian limits for the approximated value functions, but only when the optimal policy at each stage is unique. The limiting variance at each time step splits into a term from the current stage's sampling noise and a term carried forward from later stages. This decomposition reveals how estimation errors propagate backward in time through the control problem.

Core claim

We establish central limit theorems for the Sample Average Approximation (SAA) method in discrete-time, finite-horizon stochastic optimal control. Our analysis is based on an abstract limit theorem for stochastic backward recursions, which yields a recursive characterization of the limiting laws. Applied to the dynamic programming principle, this framework gives Gaussian limits for SAA value functions under unique optimal policies. The asymptotic variance at each stage decomposes into a current-stage variance and a propagated future variance, demonstrating how statistical uncertainty accumulates backward through time.

What carries the argument

Abstract limit theorem for stochastic backward recursions that provides a recursive characterization of the limiting laws when applied to the dynamic programming principle.

Load-bearing premise

The derivation of Gaussian limits relies on the assumption that optimal policies are unique at each stage.

What would settle it

Simulating multiple SAA solutions in a simple stochastic control problem with a known unique optimal policy and checking whether the scaled error converges in distribution to a normal random variable with the predicted variance.

Figures

Figures reproduced from arXiv: 2508.01942 by Alexander Shapiro, Johannes Milz.

Figure 1
Figure 1. Figure 1: Empirical distributions of the normalized value function estimation error at different time [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Normal probability plots of the normalized value function estimation error, [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of the quadratic asymptotic variance function. The asymptotic variance is given [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Decomposition of the total asymptotic variance [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
read the original abstract

We establish central limit theorems for the Sample Average Approximation (SAA) method in discrete-time, finite-horizon stochastic optimal control. Our analysis is based on an abstract limit theorem for stochastic backward recursions, which yields a recursive characterization of the limiting laws. Applied to the dynamic programming principle, this framework gives Gaussian limits for SAA value functions under unique optimal policies. The asymptotic variance at each stage decomposes into a current-stage variance and a propagated future variance, demonstrating how statistical uncertainty accumulates backward through time. We also apply the framework to the linear quadratic regulator, derive explicit limiting laws and variance formulas, and provide numerical illustrations of the resulting variance decomposition. Finally, we discuss the form of the limit laws under nonunique optimal policies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript establishes central limit theorems for sample average approximations (SAA) in discrete-time finite-horizon stochastic optimal control. It introduces an abstract limit theorem for stochastic backward recursions and applies it to the dynamic programming principle, yielding Gaussian limiting distributions for SAA value functions under the assumption of unique optimal policies. The asymptotic variance at each stage is characterized recursively as the sum of a current-stage variance term and a propagated future variance. The framework is specialized to the linear-quadratic regulator to obtain explicit limiting laws and variance formulas, accompanied by numerical illustrations of the variance decomposition. The form of the limiting laws under nonunique optimal policies is discussed separately.

Significance. If the central results hold, the paper provides a valuable rigorous asymptotic theory for SAA methods in stochastic dynamic programming. The recursive variance decomposition offers a precise description of how statistical error propagates backward through the stages, which is useful for understanding the reliability of approximate solutions in control problems. The explicit LQR specialization and numerical examples make the abstract framework more concrete and applicable. The work connects stochastic programming limit theorems with dynamic programming in a way that could inform the design and analysis of approximation schemes.

major comments (2)
  1. [§3 and §4] §3 (abstract limit theorem) and its application in §4 (dynamic programming): the Gaussian characterization requires the recursion map to be differentiable at the true value function and the argmin to be single-valued. While the paper assumes unique optimal policies for the main result, an explicit verification that the Bellman operator satisfies the differentiability condition of the abstract theorem (or a reference to a standard result establishing it) would strengthen the derivation and confirm that no subdifferential corrections are needed.
  2. [§5] §5 (LQR specialization): the explicit limiting variance formulas are derived under the unique-policy assumption; it would be useful to state clearly whether the same closed-form expressions continue to hold (perhaps with qualification) when the LQR admits multiple optimal policies, given that the paper separately discusses the nonunique case in general.
minor comments (2)
  1. [Abstract / Introduction] The abstract states that the framework 'gives Gaussian limits for SAA value functions under unique optimal policies'; a brief parenthetical reminder of this assumption in the introduction would improve readability for readers who skip directly to the main results.
  2. [Numerical section] In the numerical illustrations, the reported variance decomposition plots would benefit from error bars or multiple replications to visually confirm the match with the theoretical asymptotic variance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We address each of the major comments below.

read point-by-point responses
  1. Referee: [§3 and §4] §3 (abstract limit theorem) and its application in §4 (dynamic programming): the Gaussian characterization requires the recursion map to be differentiable at the true value function and the argmin to be single-valued. While the paper assumes unique optimal policies for the main result, an explicit verification that the Bellman operator satisfies the differentiability condition of the abstract theorem (or a reference to a standard result establishing it) would strengthen the derivation and confirm that no subdifferential corrections are needed.

    Authors: We agree that an explicit verification would strengthen the derivation. Under the unique optimal policy assumption and the smoothness conditions on the stage costs and kernels, the Bellman operator is differentiable at the true value function, so the abstract theorem applies directly without subdifferential corrections. In the revised manuscript we will add a short remark in §4 confirming that the differentiability condition holds for the dynamic programming recursion. revision: yes

  2. Referee: [§5] §5 (LQR specialization): the explicit limiting variance formulas are derived under the unique-policy assumption; it would be useful to state clearly whether the same closed-form expressions continue to hold (perhaps with qualification) when the LQR admits multiple optimal policies, given that the paper separately discusses the nonunique case in general.

    Authors: The closed-form expressions derived in §5 rely on uniqueness of the optimal policy. As noted in the general discussion of nonunique policies, the limiting law takes a different form (potentially involving subdifferentials) when multiple optimal policies exist. We will revise §5 to state explicitly that the formulas are specific to the unique-policy case and to cross-reference the nonunique discussion. revision: yes

Circularity Check

0 steps flagged

Derivation grounded in standard limit theorems with no reduction to inputs by construction

full rationale

The paper develops its central result by applying an abstract limit theorem for stochastic backward recursions to the dynamic programming principle, obtaining Gaussian limits for SAA value functions when optimal policies are unique. The asymptotic variance is characterized recursively as the sum of a current-stage term and a propagated future term. This structure follows directly from the recursion and standard CLT/delta-method arguments rather than any self-definition, fitted parameter renamed as prediction, or load-bearing self-citation chain. Uniqueness is stated as an explicit assumption required for the Gaussian form (with a separate discussion of the non-unique case), but the assumption is not smuggled in via prior self-work or used to define the result circularly. The derivation remains self-contained against external probabilistic benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard mathematical assumptions from probability theory (central limit theorems for dependent processes) and dynamic programming (Bellman principle under unique policies). No free parameters or invented entities are introduced; the work adds a new recursive limit theorem rather than fitting constants.

axioms (2)
  • domain assumption Existence of unique optimal policies for the Gaussian limit characterization to hold without qualification
    Invoked when applying the abstract backward recursion theorem to the dynamic programming principle; the paper notes separate discussion for nonunique cases.
  • standard math Standard regularity conditions for interchanging limits and expectations in stochastic recursions
    Required for the abstract limit theorem for stochastic backward recursions to apply.

pith-pipeline@v0.9.0 · 5647 in / 1427 out tokens · 35070 ms · 2026-05-19T01:17:22.154905+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    D. P. Bertsekas. Dynamic programming and optimal control. Vol. 1. Athena Scientific, Belmont, MA, 1st edition, 1995

  2. [2]

    Bertsekas

    D. Bertsekas. Convergence of discretization procedures in dynamic programming. IEEE Transactions on Automatic Control, 20(3):415–419, 1975. doi:10.1109/TAC.1975.1100984

  3. [3]

    Bertsekas and S

    D. Bertsekas and S. Shreve. Stochastic Optimal Control, The Discrete Time Case . Academic Press, New York, 1978

  4. [4]

    J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems . Springer Ser. Oper. Res. Springer, New York, 2000. doi:10.1007/978-1-4612-1394-9

  5. [5]

    L. Ding, S. Ahmed, and A. Shapiro. A Python package for multi-stage stochastic programming. Op- timization online , 2019. URL: http://www.optimization-online.org/DB_FILE/2019/05/7199. pdf

  6. [6]

    Eichhorn and W

    A. Eichhorn and W. R¨ omisch. Stochastic integer programming: Limit theorems and confidence intervals. Math. Oper. Res., 32(1):118–135, 2007. doi:10.1287/moor.1060.0222

  7. [7]

    A. J. Kleywegt, A. Shapiro, and T. Homem-de Mello. The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. , 12(2):479–502, 2002. doi:10.1137/ S1052623499363220

  8. [8]

    Ledoux and M

    M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes . Ergeb. Math. Grenzgeb. 23. Springer, Berlin, 2013. doi:10.1007/978-3-642-20212-4

  9. [9]

    J. Milz. Supplementary code for the manuscript: Central limit theorems for sample average approx- imations in stochastic optimal control, August 2025. doi:10.5281/zenodo.16733442. 20

  10. [10]

    W. K. Newey. Uniform convergence in probability and stochastic equicontinuity. Econometrica, 59(4):1161–1167, 1991. doi:10.2307/2938179

  11. [11]

    D. Pollard. Empirical processes: Theory and applications , volume 2 of NSF-CBMS Regional Con- ference Series in Probability and Statistics . Institute of Mathematical Statistics, Hayward, CA; American Statistical Association, Alexandria, VA, 1990. doi:10.1214/cbms/1462061091

  12. [12]

    A. Shapiro. Asymptotic analysis of stochastic programs. Annals of Operations Research, 30:169–186,

  13. [13]

    doi:10.1007/BF02204815

  14. [14]

    A. Shapiro. On complexity of multistage stochastic programs. Oper. Res. Lett. , 34(1):1–8, 2006. doi:10.1016/j.orl.2005.02.003

  15. [15]

    Shapiro and A

    A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. In V. Jeyakumar and A. Rubinov, editors,Continuous Optimization: Current Trends and Applications, pages 111–144. Springer-Verlag, 2005. doi:10.1007/0-387-26771-9_4

  16. [16]

    Shapiro and Y

    A. Shapiro and Y. Cheng. Central limit theorem and sample complexity of stationary stochastic programs. Oper. Res. Lett., 49(5):676–681, 2021. doi:10.1016/j.orl.2021.06.019

  17. [17]

    Scalable anomaly ranking of attributed neighborhoods

    A. Shapiro, D. Dentcheva, and A. Ruszczy´ nski. Lectures on Stochastic Programming: Modeling and Theory. MOS-SIAM Ser. Optim. SIAM, Philadelphia, PA, 3rd edition, 2021. doi:10.1137/1. 9781611976595

  18. [18]

    A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. With Ap- plications to Statistics . Springer Ser. Stat. Cham: Springer, 2nd edition, 2023. doi:10.1007/ 978-3-031-29040-4

  19. [19]

    A. W. Van der Vaart. Asymptotic Statistics , volume 3. Cambridge University Press, 2000. doi: 10.1017/CBO9780511802256

  20. [20]

    Zhang, Z.-S

    X. Zhang, Z.-S. Ye, and W. B. Haskell. Error propagation in asymptotic analysis of the data-driven (s, S) inventory policy. Operations Research, 73(1):1–21, 2025. doi:10.1287/opre.2020.0568

  21. [21]

    P. Zipkin. Foundation of inventory management . McGraw-Hill, Boston, MA, 2000. 21