Central Limit Theorems for Sample Average Approximations in Stochastic Optimal Control

Alexander Shapiro; Johannes Milz

arxiv: 2508.01942 · v2 · submitted 2025-08-03 · 🧮 math.OC · math.ST· stat.TH

Central Limit Theorems for Sample Average Approximations in Stochastic Optimal Control

Johannes Milz , Alexander Shapiro This is my paper

Pith reviewed 2026-05-19 01:17 UTC · model grok-4.3

classification 🧮 math.OC math.STstat.TH

keywords central limit theoremsample average approximationstochastic optimal controldynamic programmingvalue functionasymptotic normalityvariance decomposition

0 comments

The pith

Sample average approximations of value functions in stochastic optimal control obey central limit theorems with recursively characterized Gaussian limits under unique policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops central limit theorems for the sample average approximation method applied to discrete-time finite-horizon stochastic optimal control problems. It introduces an abstract limit theorem for stochastic backward recursions that gives a recursive way to characterize the limiting distributions. When applied to the dynamic programming recursion, this yields Gaussian limits for the approximated value functions, but only when the optimal policy at each stage is unique. The limiting variance at each time step splits into a term from the current stage's sampling noise and a term carried forward from later stages. This decomposition reveals how estimation errors propagate backward in time through the control problem.

Core claim

What carries the argument

Abstract limit theorem for stochastic backward recursions that provides a recursive characterization of the limiting laws when applied to the dynamic programming principle.

Load-bearing premise

The derivation of Gaussian limits relies on the assumption that optimal policies are unique at each stage.

What would settle it

Simulating multiple SAA solutions in a simple stochastic control problem with a known unique optimal policy and checking whether the scaled error converges in distribution to a normal random variable with the predicted variance.

Figures

Figures reproduced from arXiv: 2508.01942 by Alexander Shapiro, Johannes Milz.

**Figure 2.** Figure 2: Normal probability plots of the normalized value function estimation error, [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of the quadratic asymptotic variance function. The asymptotic variance is given [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Decomposition of the total asymptotic variance [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

read the original abstract

We establish central limit theorems for the Sample Average Approximation (SAA) method in discrete-time, finite-horizon stochastic optimal control. Our analysis is based on an abstract limit theorem for stochastic backward recursions, which yields a recursive characterization of the limiting laws. Applied to the dynamic programming principle, this framework gives Gaussian limits for SAA value functions under unique optimal policies. The asymptotic variance at each stage decomposes into a current-stage variance and a propagated future variance, demonstrating how statistical uncertainty accumulates backward through time. We also apply the framework to the linear quadratic regulator, derive explicit limiting laws and variance formulas, and provide numerical illustrations of the resulting variance decomposition. Finally, we discuss the form of the limit laws under nonunique optimal policies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a recursive CLT for SAA value functions in finite-horizon stochastic control, with an explicit variance decomposition that tracks how sampling error propagates backward under unique policies.

read the letter

This paper establishes central limit theorems for sample-average approximations of finite-horizon stochastic optimal control problems. The main new element is an abstract limit theorem for stochastic backward recursions that produces a recursive characterization of the limiting laws. When applied to the dynamic programming principle, it yields Gaussian limits for the SAA value functions together with a variance formula that splits each stage into a local sampling variance plus a propagated term from later stages. That decomposition is the part that was not already available in the SAA-for-control literature they cite. They also carry the same framework through the linear-quadratic regulator, obtain closed-form limiting distributions and variances, and illustrate the recursion numerically. The derivations rest on standard probability limit theorems plus the usual dynamic-programming assumptions, so the technical backbone looks reliable. The clearest limitation is the standing assumption of a unique optimal policy. The paper notes that the limit law changes when multiple policies are optimal, and the Gaussian result requires the argmin to be single-valued so that the usual delta-method style argument applies without subdifferential adjustments. That condition is stated up front, so readers who work with non-unique policies will simply use the more general form they sketch. The work is aimed at researchers who already use SAA or approximate dynamic programming and want asymptotic error bounds rather than finite-sample guarantees. It supplies a clean tool for tracking how statistical uncertainty accumulates over the horizon. I would send it to a referee who knows the SAA literature in stochastic optimization and control.

Referee Report

2 major / 2 minor

Summary. The manuscript establishes central limit theorems for sample average approximations (SAA) in discrete-time finite-horizon stochastic optimal control. It introduces an abstract limit theorem for stochastic backward recursions and applies it to the dynamic programming principle, yielding Gaussian limiting distributions for SAA value functions under the assumption of unique optimal policies. The asymptotic variance at each stage is characterized recursively as the sum of a current-stage variance term and a propagated future variance. The framework is specialized to the linear-quadratic regulator to obtain explicit limiting laws and variance formulas, accompanied by numerical illustrations of the variance decomposition. The form of the limiting laws under nonunique optimal policies is discussed separately.

Significance. If the central results hold, the paper provides a valuable rigorous asymptotic theory for SAA methods in stochastic dynamic programming. The recursive variance decomposition offers a precise description of how statistical error propagates backward through the stages, which is useful for understanding the reliability of approximate solutions in control problems. The explicit LQR specialization and numerical examples make the abstract framework more concrete and applicable. The work connects stochastic programming limit theorems with dynamic programming in a way that could inform the design and analysis of approximation schemes.

major comments (2)

[§3 and §4] §3 (abstract limit theorem) and its application in §4 (dynamic programming): the Gaussian characterization requires the recursion map to be differentiable at the true value function and the argmin to be single-valued. While the paper assumes unique optimal policies for the main result, an explicit verification that the Bellman operator satisfies the differentiability condition of the abstract theorem (or a reference to a standard result establishing it) would strengthen the derivation and confirm that no subdifferential corrections are needed.
[§5] §5 (LQR specialization): the explicit limiting variance formulas are derived under the unique-policy assumption; it would be useful to state clearly whether the same closed-form expressions continue to hold (perhaps with qualification) when the LQR admits multiple optimal policies, given that the paper separately discusses the nonunique case in general.

minor comments (2)

[Abstract / Introduction] The abstract states that the framework 'gives Gaussian limits for SAA value functions under unique optimal policies'; a brief parenthetical reminder of this assumption in the introduction would improve readability for readers who skip directly to the main results.
[Numerical section] In the numerical illustrations, the reported variance decomposition plots would benefit from error bars or multiple replications to visually confirm the match with the theoretical asymptotic variance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We address each of the major comments below.

read point-by-point responses

Referee: [§3 and §4] §3 (abstract limit theorem) and its application in §4 (dynamic programming): the Gaussian characterization requires the recursion map to be differentiable at the true value function and the argmin to be single-valued. While the paper assumes unique optimal policies for the main result, an explicit verification that the Bellman operator satisfies the differentiability condition of the abstract theorem (or a reference to a standard result establishing it) would strengthen the derivation and confirm that no subdifferential corrections are needed.

Authors: We agree that an explicit verification would strengthen the derivation. Under the unique optimal policy assumption and the smoothness conditions on the stage costs and kernels, the Bellman operator is differentiable at the true value function, so the abstract theorem applies directly without subdifferential corrections. In the revised manuscript we will add a short remark in §4 confirming that the differentiability condition holds for the dynamic programming recursion. revision: yes
Referee: [§5] §5 (LQR specialization): the explicit limiting variance formulas are derived under the unique-policy assumption; it would be useful to state clearly whether the same closed-form expressions continue to hold (perhaps with qualification) when the LQR admits multiple optimal policies, given that the paper separately discusses the nonunique case in general.

Authors: The closed-form expressions derived in §5 rely on uniqueness of the optimal policy. As noted in the general discussion of nonunique policies, the limiting law takes a different form (potentially involving subdifferentials) when multiple optimal policies exist. We will revise §5 to state explicitly that the formulas are specific to the unique-policy case and to cross-reference the nonunique discussion. revision: yes

Circularity Check

0 steps flagged

Derivation grounded in standard limit theorems with no reduction to inputs by construction

full rationale

The paper develops its central result by applying an abstract limit theorem for stochastic backward recursions to the dynamic programming principle, obtaining Gaussian limits for SAA value functions when optimal policies are unique. The asymptotic variance is characterized recursively as the sum of a current-stage term and a propagated future term. This structure follows directly from the recursion and standard CLT/delta-method arguments rather than any self-definition, fitted parameter renamed as prediction, or load-bearing self-citation chain. Uniqueness is stated as an explicit assumption required for the Gaussian form (with a separate discussion of the non-unique case), but the assumption is not smuggled in via prior self-work or used to define the result circularly. The derivation remains self-contained against external probabilistic benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard mathematical assumptions from probability theory (central limit theorems for dependent processes) and dynamic programming (Bellman principle under unique policies). No free parameters or invented entities are introduced; the work adds a new recursive limit theorem rather than fitting constants.

axioms (2)

domain assumption Existence of unique optimal policies for the Gaussian limit characterization to hold without qualification
Invoked when applying the abstract backward recursion theorem to the dynamic programming principle; the paper notes separate discussion for nonunique cases.
standard math Standard regularity conditions for interchanging limits and expectations in stochastic recursions
Required for the abstract limit theorem for stochastic backward recursions to apply.

pith-pipeline@v0.9.0 · 5647 in / 1427 out tokens · 35070 ms · 2026-05-19T01:17:22.154905+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

N^{1/2}(hat V_t,N - V_t) converges to Gaussian process G_t with covariance Gamma_t decomposing into Cov(Psi(G_{t+1})) + Cov(Phi_t) (current + propagated variance)
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 2.2(iv) unique minimizer u^*_t = pi_t(x_t) required for directional derivative to be linear and for asymptotic normality

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

D. P. Bertsekas. Dynamic programming and optimal control. Vol. 1. Athena Scientific, Belmont, MA, 1st edition, 1995

work page 1995
[2]

Bertsekas

D. Bertsekas. Convergence of discretization procedures in dynamic programming. IEEE Transactions on Automatic Control, 20(3):415–419, 1975. doi:10.1109/TAC.1975.1100984

work page doi:10.1109/tac.1975.1100984 1975
[3]

Bertsekas and S

D. Bertsekas and S. Shreve. Stochastic Optimal Control, The Discrete Time Case . Academic Press, New York, 1978

work page 1978
[4]

J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems . Springer Ser. Oper. Res. Springer, New York, 2000. doi:10.1007/978-1-4612-1394-9

work page doi:10.1007/978-1-4612-1394-9 2000
[5]

L. Ding, S. Ahmed, and A. Shapiro. A Python package for multi-stage stochastic programming. Op- timization online , 2019. URL: http://www.optimization-online.org/DB_FILE/2019/05/7199. pdf

work page 2019
[6]

Eichhorn and W

A. Eichhorn and W. R¨ omisch. Stochastic integer programming: Limit theorems and confidence intervals. Math. Oper. Res., 32(1):118–135, 2007. doi:10.1287/moor.1060.0222

work page doi:10.1287/moor.1060.0222 2007
[7]

A. J. Kleywegt, A. Shapiro, and T. Homem-de Mello. The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. , 12(2):479–502, 2002. doi:10.1137/ S1052623499363220

work page 2002
[8]

Ledoux and M

M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes . Ergeb. Math. Grenzgeb. 23. Springer, Berlin, 2013. doi:10.1007/978-3-642-20212-4

work page doi:10.1007/978-3-642-20212-4 2013
[9]

J. Milz. Supplementary code for the manuscript: Central limit theorems for sample average approx- imations in stochastic optimal control, August 2025. doi:10.5281/zenodo.16733442. 20

work page doi:10.5281/zenodo.16733442 2025
[10]

W. K. Newey. Uniform convergence in probability and stochastic equicontinuity. Econometrica, 59(4):1161–1167, 1991. doi:10.2307/2938179

work page doi:10.2307/2938179 1991
[11]

D. Pollard. Empirical processes: Theory and applications , volume 2 of NSF-CBMS Regional Con- ference Series in Probability and Statistics . Institute of Mathematical Statistics, Hayward, CA; American Statistical Association, Alexandria, VA, 1990. doi:10.1214/cbms/1462061091

work page doi:10.1214/cbms/1462061091 1990
[12]

A. Shapiro. Asymptotic analysis of stochastic programs. Annals of Operations Research, 30:169–186,

work page
[13]

doi:10.1007/BF02204815

work page doi:10.1007/bf02204815
[14]

A. Shapiro. On complexity of multistage stochastic programs. Oper. Res. Lett. , 34(1):1–8, 2006. doi:10.1016/j.orl.2005.02.003

work page doi:10.1016/j.orl.2005.02.003 2006
[15]

Shapiro and A

A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. In V. Jeyakumar and A. Rubinov, editors,Continuous Optimization: Current Trends and Applications, pages 111–144. Springer-Verlag, 2005. doi:10.1007/0-387-26771-9_4

work page doi:10.1007/0-387-26771-9_4 2005
[16]

Shapiro and Y

A. Shapiro and Y. Cheng. Central limit theorem and sample complexity of stationary stochastic programs. Oper. Res. Lett., 49(5):676–681, 2021. doi:10.1016/j.orl.2021.06.019

work page doi:10.1016/j.orl.2021.06.019 2021
[17]

Scalable anomaly ranking of attributed neighborhoods

A. Shapiro, D. Dentcheva, and A. Ruszczy´ nski. Lectures on Stochastic Programming: Modeling and Theory. MOS-SIAM Ser. Optim. SIAM, Philadelphia, PA, 3rd edition, 2021. doi:10.1137/1. 9781611976595

work page doi:10.1137/1 2021
[18]

A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. With Ap- plications to Statistics . Springer Ser. Stat. Cham: Springer, 2nd edition, 2023. doi:10.1007/ 978-3-031-29040-4

work page 2023
[19]

A. W. Van der Vaart. Asymptotic Statistics , volume 3. Cambridge University Press, 2000. doi: 10.1017/CBO9780511802256

work page doi:10.1017/cbo9780511802256 2000
[20]

Zhang, Z.-S

X. Zhang, Z.-S. Ye, and W. B. Haskell. Error propagation in asymptotic analysis of the data-driven (s, S) inventory policy. Operations Research, 73(1):1–21, 2025. doi:10.1287/opre.2020.0568

work page doi:10.1287/opre.2020.0568 2025
[21]

P. Zipkin. Foundation of inventory management . McGraw-Hill, Boston, MA, 2000. 21

work page 2000

[1] [1]

D. P. Bertsekas. Dynamic programming and optimal control. Vol. 1. Athena Scientific, Belmont, MA, 1st edition, 1995

work page 1995

[2] [2]

Bertsekas

D. Bertsekas. Convergence of discretization procedures in dynamic programming. IEEE Transactions on Automatic Control, 20(3):415–419, 1975. doi:10.1109/TAC.1975.1100984

work page doi:10.1109/tac.1975.1100984 1975

[3] [3]

Bertsekas and S

D. Bertsekas and S. Shreve. Stochastic Optimal Control, The Discrete Time Case . Academic Press, New York, 1978

work page 1978

[4] [4]

J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems . Springer Ser. Oper. Res. Springer, New York, 2000. doi:10.1007/978-1-4612-1394-9

work page doi:10.1007/978-1-4612-1394-9 2000

[5] [5]

L. Ding, S. Ahmed, and A. Shapiro. A Python package for multi-stage stochastic programming. Op- timization online , 2019. URL: http://www.optimization-online.org/DB_FILE/2019/05/7199. pdf

work page 2019

[6] [6]

Eichhorn and W

A. Eichhorn and W. R¨ omisch. Stochastic integer programming: Limit theorems and confidence intervals. Math. Oper. Res., 32(1):118–135, 2007. doi:10.1287/moor.1060.0222

work page doi:10.1287/moor.1060.0222 2007

[7] [7]

A. J. Kleywegt, A. Shapiro, and T. Homem-de Mello. The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. , 12(2):479–502, 2002. doi:10.1137/ S1052623499363220

work page 2002

[8] [8]

Ledoux and M

M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes . Ergeb. Math. Grenzgeb. 23. Springer, Berlin, 2013. doi:10.1007/978-3-642-20212-4

work page doi:10.1007/978-3-642-20212-4 2013

[9] [9]

J. Milz. Supplementary code for the manuscript: Central limit theorems for sample average approx- imations in stochastic optimal control, August 2025. doi:10.5281/zenodo.16733442. 20

work page doi:10.5281/zenodo.16733442 2025

[10] [10]

W. K. Newey. Uniform convergence in probability and stochastic equicontinuity. Econometrica, 59(4):1161–1167, 1991. doi:10.2307/2938179

work page doi:10.2307/2938179 1991

[11] [11]

D. Pollard. Empirical processes: Theory and applications , volume 2 of NSF-CBMS Regional Con- ference Series in Probability and Statistics . Institute of Mathematical Statistics, Hayward, CA; American Statistical Association, Alexandria, VA, 1990. doi:10.1214/cbms/1462061091

work page doi:10.1214/cbms/1462061091 1990

[12] [12]

A. Shapiro. Asymptotic analysis of stochastic programs. Annals of Operations Research, 30:169–186,

work page

[13] [13]

doi:10.1007/BF02204815

work page doi:10.1007/bf02204815

[14] [14]

A. Shapiro. On complexity of multistage stochastic programs. Oper. Res. Lett. , 34(1):1–8, 2006. doi:10.1016/j.orl.2005.02.003

work page doi:10.1016/j.orl.2005.02.003 2006

[15] [15]

Shapiro and A

A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. In V. Jeyakumar and A. Rubinov, editors,Continuous Optimization: Current Trends and Applications, pages 111–144. Springer-Verlag, 2005. doi:10.1007/0-387-26771-9_4

work page doi:10.1007/0-387-26771-9_4 2005

[16] [16]

Shapiro and Y

A. Shapiro and Y. Cheng. Central limit theorem and sample complexity of stationary stochastic programs. Oper. Res. Lett., 49(5):676–681, 2021. doi:10.1016/j.orl.2021.06.019

work page doi:10.1016/j.orl.2021.06.019 2021

[17] [17]

Scalable anomaly ranking of attributed neighborhoods

A. Shapiro, D. Dentcheva, and A. Ruszczy´ nski. Lectures on Stochastic Programming: Modeling and Theory. MOS-SIAM Ser. Optim. SIAM, Philadelphia, PA, 3rd edition, 2021. doi:10.1137/1. 9781611976595

work page doi:10.1137/1 2021

[18] [18]

A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. With Ap- plications to Statistics . Springer Ser. Stat. Cham: Springer, 2nd edition, 2023. doi:10.1007/ 978-3-031-29040-4

work page 2023

[19] [19]

A. W. Van der Vaart. Asymptotic Statistics , volume 3. Cambridge University Press, 2000. doi: 10.1017/CBO9780511802256

work page doi:10.1017/cbo9780511802256 2000

[20] [20]

Zhang, Z.-S

X. Zhang, Z.-S. Ye, and W. B. Haskell. Error propagation in asymptotic analysis of the data-driven (s, S) inventory policy. Operations Research, 73(1):1–21, 2025. doi:10.1287/opre.2020.0568

work page doi:10.1287/opre.2020.0568 2025

[21] [21]

P. Zipkin. Foundation of inventory management . McGraw-Hill, Boston, MA, 2000. 21

work page 2000