Central Limit Theorems for Sample Average Approximations in Stochastic Optimal Control
Pith reviewed 2026-05-19 01:17 UTC · model grok-4.3
The pith
Sample average approximations of value functions in stochastic optimal control obey central limit theorems with recursively characterized Gaussian limits under unique policies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We establish central limit theorems for the Sample Average Approximation (SAA) method in discrete-time, finite-horizon stochastic optimal control. Our analysis is based on an abstract limit theorem for stochastic backward recursions, which yields a recursive characterization of the limiting laws. Applied to the dynamic programming principle, this framework gives Gaussian limits for SAA value functions under unique optimal policies. The asymptotic variance at each stage decomposes into a current-stage variance and a propagated future variance, demonstrating how statistical uncertainty accumulates backward through time.
What carries the argument
Abstract limit theorem for stochastic backward recursions that provides a recursive characterization of the limiting laws when applied to the dynamic programming principle.
Load-bearing premise
The derivation of Gaussian limits relies on the assumption that optimal policies are unique at each stage.
What would settle it
Simulating multiple SAA solutions in a simple stochastic control problem with a known unique optimal policy and checking whether the scaled error converges in distribution to a normal random variable with the predicted variance.
Figures
read the original abstract
We establish central limit theorems for the Sample Average Approximation (SAA) method in discrete-time, finite-horizon stochastic optimal control. Our analysis is based on an abstract limit theorem for stochastic backward recursions, which yields a recursive characterization of the limiting laws. Applied to the dynamic programming principle, this framework gives Gaussian limits for SAA value functions under unique optimal policies. The asymptotic variance at each stage decomposes into a current-stage variance and a propagated future variance, demonstrating how statistical uncertainty accumulates backward through time. We also apply the framework to the linear quadratic regulator, derive explicit limiting laws and variance formulas, and provide numerical illustrations of the resulting variance decomposition. Finally, we discuss the form of the limit laws under nonunique optimal policies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript establishes central limit theorems for sample average approximations (SAA) in discrete-time finite-horizon stochastic optimal control. It introduces an abstract limit theorem for stochastic backward recursions and applies it to the dynamic programming principle, yielding Gaussian limiting distributions for SAA value functions under the assumption of unique optimal policies. The asymptotic variance at each stage is characterized recursively as the sum of a current-stage variance term and a propagated future variance. The framework is specialized to the linear-quadratic regulator to obtain explicit limiting laws and variance formulas, accompanied by numerical illustrations of the variance decomposition. The form of the limiting laws under nonunique optimal policies is discussed separately.
Significance. If the central results hold, the paper provides a valuable rigorous asymptotic theory for SAA methods in stochastic dynamic programming. The recursive variance decomposition offers a precise description of how statistical error propagates backward through the stages, which is useful for understanding the reliability of approximate solutions in control problems. The explicit LQR specialization and numerical examples make the abstract framework more concrete and applicable. The work connects stochastic programming limit theorems with dynamic programming in a way that could inform the design and analysis of approximation schemes.
major comments (2)
- [§3 and §4] §3 (abstract limit theorem) and its application in §4 (dynamic programming): the Gaussian characterization requires the recursion map to be differentiable at the true value function and the argmin to be single-valued. While the paper assumes unique optimal policies for the main result, an explicit verification that the Bellman operator satisfies the differentiability condition of the abstract theorem (or a reference to a standard result establishing it) would strengthen the derivation and confirm that no subdifferential corrections are needed.
- [§5] §5 (LQR specialization): the explicit limiting variance formulas are derived under the unique-policy assumption; it would be useful to state clearly whether the same closed-form expressions continue to hold (perhaps with qualification) when the LQR admits multiple optimal policies, given that the paper separately discusses the nonunique case in general.
minor comments (2)
- [Abstract / Introduction] The abstract states that the framework 'gives Gaussian limits for SAA value functions under unique optimal policies'; a brief parenthetical reminder of this assumption in the introduction would improve readability for readers who skip directly to the main results.
- [Numerical section] In the numerical illustrations, the reported variance decomposition plots would benefit from error bars or multiple replications to visually confirm the match with the theoretical asymptotic variance.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and the constructive comments. We address each of the major comments below.
read point-by-point responses
-
Referee: [§3 and §4] §3 (abstract limit theorem) and its application in §4 (dynamic programming): the Gaussian characterization requires the recursion map to be differentiable at the true value function and the argmin to be single-valued. While the paper assumes unique optimal policies for the main result, an explicit verification that the Bellman operator satisfies the differentiability condition of the abstract theorem (or a reference to a standard result establishing it) would strengthen the derivation and confirm that no subdifferential corrections are needed.
Authors: We agree that an explicit verification would strengthen the derivation. Under the unique optimal policy assumption and the smoothness conditions on the stage costs and kernels, the Bellman operator is differentiable at the true value function, so the abstract theorem applies directly without subdifferential corrections. In the revised manuscript we will add a short remark in §4 confirming that the differentiability condition holds for the dynamic programming recursion. revision: yes
-
Referee: [§5] §5 (LQR specialization): the explicit limiting variance formulas are derived under the unique-policy assumption; it would be useful to state clearly whether the same closed-form expressions continue to hold (perhaps with qualification) when the LQR admits multiple optimal policies, given that the paper separately discusses the nonunique case in general.
Authors: The closed-form expressions derived in §5 rely on uniqueness of the optimal policy. As noted in the general discussion of nonunique policies, the limiting law takes a different form (potentially involving subdifferentials) when multiple optimal policies exist. We will revise §5 to state explicitly that the formulas are specific to the unique-policy case and to cross-reference the nonunique discussion. revision: yes
Circularity Check
Derivation grounded in standard limit theorems with no reduction to inputs by construction
full rationale
The paper develops its central result by applying an abstract limit theorem for stochastic backward recursions to the dynamic programming principle, obtaining Gaussian limits for SAA value functions when optimal policies are unique. The asymptotic variance is characterized recursively as the sum of a current-stage term and a propagated future term. This structure follows directly from the recursion and standard CLT/delta-method arguments rather than any self-definition, fitted parameter renamed as prediction, or load-bearing self-citation chain. Uniqueness is stated as an explicit assumption required for the Gaussian form (with a separate discussion of the non-unique case), but the assumption is not smuggled in via prior self-work or used to define the result circularly. The derivation remains self-contained against external probabilistic benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Existence of unique optimal policies for the Gaussian limit characterization to hold without qualification
- standard math Standard regularity conditions for interchanging limits and expectations in stochastic recursions
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
N^{1/2}(hat V_t,N - V_t) converges to Gaussian process G_t with covariance Gamma_t decomposing into Cov(Psi(G_{t+1})) + Cov(Phi_t) (current + propagated variance)
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Assumption 2.2(iv) unique minimizer u^*_t = pi_t(x_t) required for directional derivative to be linear and for asymptotic normality
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
D. P. Bertsekas. Dynamic programming and optimal control. Vol. 1. Athena Scientific, Belmont, MA, 1st edition, 1995
work page 1995
-
[2]
D. Bertsekas. Convergence of discretization procedures in dynamic programming. IEEE Transactions on Automatic Control, 20(3):415–419, 1975. doi:10.1109/TAC.1975.1100984
-
[3]
D. Bertsekas and S. Shreve. Stochastic Optimal Control, The Discrete Time Case . Academic Press, New York, 1978
work page 1978
-
[4]
J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems . Springer Ser. Oper. Res. Springer, New York, 2000. doi:10.1007/978-1-4612-1394-9
-
[5]
L. Ding, S. Ahmed, and A. Shapiro. A Python package for multi-stage stochastic programming. Op- timization online , 2019. URL: http://www.optimization-online.org/DB_FILE/2019/05/7199. pdf
work page 2019
-
[6]
A. Eichhorn and W. R¨ omisch. Stochastic integer programming: Limit theorems and confidence intervals. Math. Oper. Res., 32(1):118–135, 2007. doi:10.1287/moor.1060.0222
-
[7]
A. J. Kleywegt, A. Shapiro, and T. Homem-de Mello. The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. , 12(2):479–502, 2002. doi:10.1137/ S1052623499363220
work page 2002
-
[8]
M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes . Ergeb. Math. Grenzgeb. 23. Springer, Berlin, 2013. doi:10.1007/978-3-642-20212-4
-
[9]
J. Milz. Supplementary code for the manuscript: Central limit theorems for sample average approx- imations in stochastic optimal control, August 2025. doi:10.5281/zenodo.16733442. 20
-
[10]
W. K. Newey. Uniform convergence in probability and stochastic equicontinuity. Econometrica, 59(4):1161–1167, 1991. doi:10.2307/2938179
-
[11]
D. Pollard. Empirical processes: Theory and applications , volume 2 of NSF-CBMS Regional Con- ference Series in Probability and Statistics . Institute of Mathematical Statistics, Hayward, CA; American Statistical Association, Alexandria, VA, 1990. doi:10.1214/cbms/1462061091
-
[12]
A. Shapiro. Asymptotic analysis of stochastic programs. Annals of Operations Research, 30:169–186,
-
[13]
doi:10.1007/BF02204815
-
[14]
A. Shapiro. On complexity of multistage stochastic programs. Oper. Res. Lett. , 34(1):1–8, 2006. doi:10.1016/j.orl.2005.02.003
-
[15]
A. Shapiro and A. Nemirovski. On complexity of stochastic programming problems. In V. Jeyakumar and A. Rubinov, editors,Continuous Optimization: Current Trends and Applications, pages 111–144. Springer-Verlag, 2005. doi:10.1007/0-387-26771-9_4
-
[16]
A. Shapiro and Y. Cheng. Central limit theorem and sample complexity of stationary stochastic programs. Oper. Res. Lett., 49(5):676–681, 2021. doi:10.1016/j.orl.2021.06.019
-
[17]
Scalable anomaly ranking of attributed neighborhoods
A. Shapiro, D. Dentcheva, and A. Ruszczy´ nski. Lectures on Stochastic Programming: Modeling and Theory. MOS-SIAM Ser. Optim. SIAM, Philadelphia, PA, 3rd edition, 2021. doi:10.1137/1. 9781611976595
work page doi:10.1137/1 2021
-
[18]
A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes. With Ap- plications to Statistics . Springer Ser. Stat. Cham: Springer, 2nd edition, 2023. doi:10.1007/ 978-3-031-29040-4
work page 2023
-
[19]
A. W. Van der Vaart. Asymptotic Statistics , volume 3. Cambridge University Press, 2000. doi: 10.1017/CBO9780511802256
-
[20]
X. Zhang, Z.-S. Ye, and W. B. Haskell. Error propagation in asymptotic analysis of the data-driven (s, S) inventory policy. Operations Research, 73(1):1–21, 2025. doi:10.1287/opre.2020.0568
-
[21]
P. Zipkin. Foundation of inventory management . McGraw-Hill, Boston, MA, 2000. 21
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.