Adaptive Distributionally Robust Optimal Control with Bayesian Ambiguity Sets

Enlu Zhou; Huifu Xu; Wentao Ma; Zhiping Chen

arxiv: 2604.06936 · v2 · submitted 2026-04-08 · 🧮 math.OC

Adaptive Distributionally Robust Optimal Control with Bayesian Ambiguity Sets

Wentao Ma , Zhiping Chen , Huifu Xu , Enlu Zhou This is my paper

Pith reviewed 2026-05-10 17:25 UTC · model grok-4.3

classification 🧮 math.OC

keywords adaptive distributionally robust optimal controlBayesian ambiguity setsepisodic Bayesian learningstochastic optimal controlrisk-averse reformulationconsistency guaranteescutting-plane algorithminventory control

0 comments

The pith

Bayesian learning from episodic data produces consistent and adaptive distributionally robust control policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an adaptive distributionally robust optimal control model whose ambiguity set is refined by Bayesian learning from data that arrives in separate episodes. This addresses the excessive conservatism of offline models when data are limited and extends applicability to settings where samples are collected episodically rather than all at once. Under moderate conditions the model admits a tractable risk-averse reformulation. The authors prove that the optimal value function and policy converge to those of the true distribution in the infinite-horizon case and supply finite-sample posterior credibility bounds on the value attained by the learned policy. They further establish stability under data perturbations and supply a convergent Bellman-operator cutting-plane algorithm.

Core claim

By updating the ambiguity set of a distributionally robust optimal control problem via Bayesian posteriors computed from episodic samples, one obtains a tractable risk-averse reformulation together with consistency of the optimal value function and optimal policy for infinite-horizon problems and finite-sample posterior credibility guarantees for the policy value; the resulting model is stable to sample perturbations and can be solved by a convergent Bellman-operator cutting-plane algorithm.

What carries the argument

The episodic Bayesian DROC model whose ambiguity set is updated by Bayesian posterior distributions computed from successive episodes of samples, which enables the adaptive reduction of conservatism while preserving robustness.

If this is right

The optimal value function and policy converge to their true counterparts for infinite-horizon stochastic optimal control.
The policy value satisfies finite-sample posterior credibility guarantees.
The model remains stable and statistically robust under perturbations of the observed samples.
Solutions can be computed efficiently by the Bellman-operator cutting-plane algorithm with proven convergence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same Bayesian updating mechanism could be applied to other sequential decision problems where data arrives in batches, such as certain reinforcement-learning tasks under distributional uncertainty.
The finite-sample credibility bounds give a practical way to decide when enough episodes have been observed for the policy to be deployed with quantified reliability.
Testing the moderate conditions on concrete problem structures would clarify how large the data requirement is in specific applications.

Load-bearing premise

Moderate conditions hold that permit the tractable risk-averse reformulation, the consistency proofs, and the credibility bounds, and that samples are generated episodically from the true underlying distribution.

What would settle it

Numerical simulations in which the policy value computed from the episodic Bayesian model fails to approach the policy value obtained under the true distribution as the number of episodes grows to infinity would falsify the consistency claim.

Figures

Figures reproduced from arXiv: 2604.06936 by Enlu Zhou, Huifu Xu, Wentao Ma, Zhiping Chen.

**Figure 2.** Figure 2: Quantitative statistical robustness experiment at [PITH_FULL_IMAGE:figures/full_fig_p035_2.png] view at source ↗

**Figure 3.** Figure 3: BOCP warm-start vs. cold-start for the episodic Bayesian DROC. [PITH_FULL_IMAGE:figures/full_fig_p036_3.png] view at source ↗

**Figure 4.** Figure 4: Out-of-sample discounted cost under the true environment versus episode index [PITH_FULL_IMAGE:figures/full_fig_p037_4.png] view at source ↗

**Figure 5.** Figure 5: Out-of-sample discounted cost under the contaminated environment versus episode index [PITH_FULL_IMAGE:figures/full_fig_p038_5.png] view at source ↗

read the original abstract

In stochastic optimal control (SOC), uncertainty may arise from incomplete knowledge of the true probability distribution of the underlying environment, which is known as Knightian or epistemic uncertainty. Distributionally robust optimal control (DROC) models are subsequently proposed to tackle this source of uncertainty. While such models are effective in some practical applications, most existing DROC models are offline and can be overly conservative when data are scarce. Moreover, they cannot be applied to the case when samples are generated episodically. Motivated by the Bayesian SOC framework recently proposed by Shapiro et al.~\cite{shapiro2025episodic}, we propose an adaptive DROC model in which the ambiguity set is updated via Bayesian learning from new data. Under some moderate conditions, we derive a tractable risk-averse reformulation, establish consistency of the optimal value function and optimal policy for an infinite-horizon SOC and establish a finite-sample posterior credibility guarantee for the policy value induced by the proposed episodic Bayesian DROC model. We also study the stability and statistical robustness of the proposed model with respect to sample perturbations that often arise in data-driven environments. To solve the episodic Bayesian DROC model, we propose a Bellman-operator cutting-plane (BOCP) algorithm that is computationally efficient and provably convergent. Numerical results on an inventory control problem demonstrate the effectiveness, adaptivity, and robust performance of the proposed model and algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This adapts Bayesian SOC to episodic DROC with consistency and credibility claims, but everything hinges on unspecified moderate conditions that need checking.

read the letter

The core of the paper is extending Shapiro et al.'s episodic Bayesian framework into distributionally robust optimal control. It updates the ambiguity set with new data, claims a tractable risk-averse reformulation, proves consistency of the value function and policy for infinite-horizon problems, and adds a finite-sample posterior credibility guarantee, plus a Bellman-operator cutting-plane algorithm and some stability checks against sample noise. The inventory example illustrates adaptivity in practice.

Referee Report

2 major / 2 minor

Summary. The paper proposes an adaptive distributionally robust optimal control (DROC) framework for stochastic optimal control (SOC) under epistemic uncertainty. The ambiguity set is updated via Bayesian learning from episodic data samples. Under unspecified moderate conditions, the authors derive a tractable risk-averse reformulation, establish consistency of the optimal value function and policy for infinite-horizon problems, provide a finite-sample posterior credibility guarantee, analyze stability and robustness to sample perturbations, and introduce a provably convergent Bellman-operator cutting-plane (BOCP) algorithm. Numerical validation is provided on an inventory control example.

Significance. If the moderate conditions hold and the proofs are complete, the work meaningfully extends offline DROC by enabling online Bayesian adaptation, reducing conservatism with data while retaining robustness guarantees. The consistency and credibility results, combined with the convergent BOCP algorithm, offer both theoretical and computational contributions to data-driven control. The inventory example illustrates practical relevance in operations research settings. The integration of Bayesian updating with distributionally robust control is a clear strength when the assumptions align with the application.

major comments (2)

[Abstract] Abstract: All three central claims (tractable risk-averse reformulation, consistency of value function/policy for infinite-horizon SOC, and finite-sample posterior credibility guarantee) are stated to hold only 'under some moderate conditions,' yet these conditions are never enumerated or characterized. This is load-bearing because the conditions control whether the Bayesian update remains tractable, whether the Bellman operator is a contraction, and whether the credibility bound applies; without an explicit list (e.g., requirements on the ambiguity-set family, uniform integrability, or moment conditions), the scope and practical utility of the results cannot be assessed.
[Abstract and model formulation] Episodic sampling assumption (referenced in abstract and likely §3): The framework requires that samples are generated episodically from the true underlying distribution for the posterior update and finite-sample guarantee to be well-defined. If this assumption is violated (common in non-stationary or biased data environments), both the consistency result and the credibility guarantee may fail to hold, narrowing the applicability of the adaptive DROC model.

minor comments (2)

[Abstract] Abstract: The reference to Shapiro et al. is cited as shapiro2025episodic; ensure the full bibliographic entry is provided in the reference list and that any dependence on prior results is clearly delineated.
[Numerical results] Numerical section: The inventory control example demonstrates effectiveness, but additional details on how the moderate conditions are satisfied in the example (e.g., specific distribution family or integrability) would strengthen the link between theory and numerics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We address each major comment point by point below, indicating planned revisions to improve clarity and scope without misrepresenting the contributions.

read point-by-point responses

Referee: [Abstract] Abstract: All three central claims (tractable risk-averse reformulation, consistency of value function/policy for infinite-horizon SOC, and finite-sample posterior credibility guarantee) are stated to hold only 'under some moderate conditions,' yet these conditions are never enumerated or characterized. This is load-bearing because the conditions control whether the Bayesian update remains tractable, whether the Bellman operator is a contraction, and whether the credibility bound applies; without an explicit list (e.g., requirements on the ambiguity-set family, uniform integrability, or moment conditions), the scope and practical utility of the results cannot be assessed.

Authors: We agree that the abstract would benefit from an explicit enumeration of the moderate conditions to immediately convey scope. These conditions are fully specified in the manuscript: Assumption 2.1 requires the ambiguity set to be a weakly compact, convex collection of measures with uniformly bounded first moments; Assumption 3.2 imposes uniform integrability and Lipschitz continuity on the stage costs; and Assumption 4.1 ensures the prior has full support with the posterior concentrating under i.i.d. episodic sampling. The Bellman operator is shown to be a contraction under a discount factor strictly less than one combined with the moment bounds. We will revise the abstract to include a concise parenthetical list of these conditions and add a short summary paragraph at the end of the introduction that cross-references their locations. This change directly addresses the concern while preserving the original claims. revision: yes
Referee: [Abstract and model formulation] Episodic sampling assumption (referenced in abstract and likely §3): The framework requires that samples are generated episodically from the true underlying distribution for the posterior update and finite-sample guarantee to be well-defined. If this assumption is violated (common in non-stationary or biased data environments), both the consistency result and the credibility guarantee may fail to hold, narrowing the applicability of the adaptive DROC model.

Authors: The episodic i.i.d. sampling assumption is indeed foundational, as it underpins both the Bayesian posterior update (Section 3) and the finite-sample credibility bound (Theorem 4.3), which rely on independent episodes drawn from the true distribution. We acknowledge that the consistency and guarantee results do not automatically extend to non-stationary or biased sampling regimes. In the revised manuscript we will add an explicit paragraph in the introduction and a dedicated limitations subsection in the conclusion that states the assumption, illustrates its role via the inventory example, and outlines future extensions such as sliding-window posteriors or ambiguity-set inflation for non-stationarity. This clarifies applicability without weakening the core episodic setting, which remains relevant for many operations-research control problems. revision: partial

Circularity Check

0 steps flagged

No significant circularity; central results build on external Bayesian SOC framework without self-referential reduction

full rationale

The derivation chain starts from the external Bayesian SOC framework of Shapiro et al. (cited as motivation) and applies standard Bayesian updating to construct ambiguity sets for DROC. Tractable risk-averse reformulation, infinite-horizon consistency of value function/policy, and finite-sample posterior credibility guarantees are all stated to hold only under unspecified moderate conditions, but these are not shown to reduce by the paper's own equations to fitted inputs or self-citations. The BOCP algorithm is a proposed solver with claimed convergence, independent of the guarantees. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear; the framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on Bayesian updating of ambiguity sets and standard assumptions from stochastic optimal control; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Moderate conditions on the ambiguity set, data generation process, and problem structure
Invoked to derive the tractable risk-averse reformulation and to establish consistency and credibility guarantees.

pith-pipeline@v0.9.0 · 5552 in / 1247 out tokens · 50447 ms · 2026-05-10T17:25:58.450824+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages

[1]

3.2 Asymptotic convergence of value function and optimal policy Recall that (2.11) in Section 2.2 assumes the existence of a solution ˆV ∗ N to the Bellman equation (2.10)

and the monograph treatments in [26, 5]. 3.2 Asymptotic convergence of value function and optimal policy Recall that (2.11) in Section 2.2 assumes the existence of a solution ˆV ∗ N to the Bellman equation (2.10). However, the underlying rationale has not yet been fully established. In the following, we first demonstrate the existence and uniqueness of ˆV...

work page
[2]

Using the closed-form solution (6.1), we approximate the benchmark value functionV ∗ with 105 samples

The normalized bin probabilities arep j = F(u j)−F(u j−1) /F(U). Using the closed-form solution (6.1), we approximate the benchmark value functionV ∗ with 105 samples. In episodeN, we update the Bayesian posterior by (2.3) using the observed data and construct the ambiguity set (2.5) corresponding to the posterior distribution. We then compute the episode...

work page 2000
[3]

Abeille and A

M. Abeille and A. Lazaric. Improved regret bounds for thompson sampling in linear quadratic control problems. InInternational Conference on Machine Learning, pages 1–9. PMLR, 2018

work page 2018
[4]

C. D. Aliprantis and K. C. Border.Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, 2006

work page 2006
[5]

J. O. Berger.Statistical Decision Theory and Bayesian Analysis. Springer Science & Business Media, 2013

work page 2013
[6]

Bertsekas.Dynamic Programming and Optimal Control: Volume I, volume 4

D. Bertsekas.Dynamic Programming and Optimal Control: Volume I, volume 4. Athena Scientific, 2012

work page 2012
[7]

Bertsekas.Abstract Dynamic Programming

D. Bertsekas.Abstract Dynamic Programming. Athena Scientific, 2022

work page 2022
[8]

Bertsekas and S

D. Bertsekas and S. E. Shreve.Stochastic Optimal Control: The Discrete-time Case, volume 5. Athena Scientific, 1996

work page 1996
[9]

Bertsimas, V

D. Bertsimas, V. Gupta, and N. Kallus. Robust sample average approximation.Mathematical Programming, 171:217–282, 2018

work page 2018
[10]

Carpentier, J.-P

P. Carpentier, J.-P. Chancelier, G. Cohen, M. De Lara, and P. Girardeau. Dynamic consistency for stochastic optimal control problems.Annals of Operations Research, 200:247–263, 2012. 39

work page 2012
[11]

Castaing and M

C. Castaing and M. Valadier.Convex Analysis and Measurable Multifunctions. Springer, 1977

work page 1977
[12]

Chen and W

Z. Chen and W. Ma. A Bayesian approach to data-driven multi-stage stochastic optimization. Journal of Global Optimization, pages 1–28, 2024

work page 2024
[13]

Z. Chen, W. Ma, and B. Ji. Data-driven approximation of distributionally robust chance constraints using Bayesian credible intervals.OR Spectrum, 47(3):969–1009, 2025

work page 2025
[14]

W. L. Cooper and B. Rangarajan. Performance guarantees for empirical Markov decision processes with applications to multiperiod inventory models.Operations Research, 60(5):1267–1281, 2012

work page 2012
[15]

Delage and Y

E. Delage and Y. Ye. Distributionally robust optimization under moment uncertainty with appli- cation to data-driven problems.Operations research, 58(3):595–612, 2010

work page 2010
[16]

Dibiasi and D

A. Dibiasi and D. Iselin. Measuring Knightian uncertainty.Empirical Economics, 61(4):2113–2141, 2021

work page 2021
[17]

Efron and T

B. Efron and T. Hastie.Computer Age Statistical Inference, Student Edition: Algorithms, Evidence, and Data Science, volume 6. Cambridge University Press, 2021

work page 2021
[18]

F¨ ullner and S

C. F¨ ullner and S. Rebennack. Stochastic dual dynamic programming and its variants: A review. SIAM Review, 67(3):415–539, 2025

work page 2025
[19]

R. Gao. Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality.Operations Research, 71(6):2291–2306, 2023

work page 2023
[20]

Gelman, J

A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin.Bayesian Data Analysis. Chapman and Hall/CRC, 1995

work page 1995
[21]

Guigues, A

V. Guigues, A. Shapiro, and Y. Cheng. Risk-averse stochastic optimal control: an efficiently computable statistical upper bound.Operations Research Letters, 51(4):393–400, 2023

work page 2023
[22]

Guo and H

S. Guo and H. Xu. Distributionally robust shortfall risk optimization model and its approximation. Mathematical Programming, 174(1):473–498, 2019

work page 2019
[23]

Guo and H

S. Guo and H. Xu. Statistical robustness in utility preference robust optimization models.Mathe- matical Programming, 190(1):679–720, 2021

work page 2021
[24]

V. Gupta. Near-optimal Bayesian ambiguity sets for distributionally robust optimization.Man- agement Science, 65(9):4242–4260, 2019

work page 2019
[25]

F. R. Hampel. A general qualitative definition of robustness.The Annals of Mathematical Statistics, 42(6):1887–1896, 1971

work page 1971
[26]

G. A. Hanasusanto, V. Roitch, D. Kuhn, and W. Wiesemann. A distributionally robust perspective on uncertainty quantification and chance constrained programming.Mathematical Programming, 151(1):35–62, 2015

work page 2015
[27]

W. B. Haskell, R. Jain, and D. Kalathil. Empirical dynamic programming.Mathematics of Oper- ations Research, 41(2):402–429, 2016

work page 2016
[28]

Hern´ andez-Lerma and J

O. Hern´ andez-Lerma and J. B. Lasserre.Further Topics on Discrete-time Markov Control Pro- cesses, volume 42. Springer Science & Business Media, 2012

work page 2012
[29]

Huang, K

J. Huang, K. Zhou, and Y. Guan. A study of distributionally robust multistage stochastic opti- mization.arXiv preprint arXiv:1708.07930, 2017

work page arXiv 2017
[30]

P. J. Huber and E. M. Ronchetti.Robust Statistics. John Wiley & Sons, 2011. 40

work page 2011
[31]

Jiang and Y

R. Jiang and Y. Guan. Risk-averse two-stage stochastic program with distributional ambiguity. Operations Research, 66(5):1390–1405, 2018

work page 2018
[32]

P. Kern, A. Simroth, and H. Z¨ ahle. First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition probability function.Mathematical Methods of Operations Research, 92(1):165–197, 2020

work page 2020
[33]

Kim and I

K. Kim and I. Yang. Distributional robustness in minimax linear quadratic control with Wasserstein distance.SIAM Journal on Control and Optimization, 61(2):458–483, 2023

work page 2023
[34]

H. Lam. Recovering best statistical guarantees via the empirical divergence-based distributionally robust optimization.Operations Research, 67(4):1090–1105, 2019

work page 2019
[35]

M. Li, X. Tong, and H. Sun. Discretization and quantification for distributionally robust opti- mization with decision-dependent ambiguity sets.Optimization Methods and Software, pages 1–30, 2024

work page 2024
[36]

P. Li, M. Yang, and Q. Wu. Confidence interval based distributionally robust real-time economic dispatch approach considering wind power accommodation risk.IEEE Transactions on Sustainable Energy, 12(1):58–69, 2020

work page 2020
[37]

Y. Li, Y. Lin, E. Zhou, and F. Zhang. Risk-aware model predictive control enabled by Bayesian learning. In2022 American Control Conference (ACC), pages 108–113. IEEE, 2022

work page 2022
[38]

Y. Li, Y. Lin, E. Zhou, and F. Zhang. Bayesian risk-averse model predictive control with consistency and stability guarantees.arXiv preprint arXiv:2511.21871, 2025

work page arXiv 2025
[39]

Y. Lin, Y. Ren, and E. Zhou. Bayesian risk Markov decision processes.Advances in Neural Information Processing Systems, 35:17430–17442, 2022

work page 2022
[40]

Liu and H

Y. Liu and H. Xu. Stability analysis of stochastic programs with second order dominance con- straints.Mathematical Programming, 142:435–460, 2013

work page 2013
[41]

W. Ma, Z. Chen, and X. Chen. Bayesian distributionally robust variational inequalities: regular- ization and quantification.arXiv preprint arXiv:2509.16537, 2025

work page arXiv 2025
[42]

W. Ma, Z. Chen, and H. Xu. A Bayesian composite risk approach for stochastic optimal control and Markov decision processes.arXiv preprint arXiv:2412.16488, 2024

work page arXiv 2024
[43]

Mehrotra and H

S. Mehrotra and H. Zhang. Models and algorithms for distributionally robust least squares prob- lems.Mathematical Programming, 146(1):123–141, 2014

work page 2014
[44]

Mohajerin Esfahani and D

P. Mohajerin Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations.Mathematical Program- ming, 171(1):115–166, 2018

work page 2018
[45]

Nilim and L

A. Nilim and L. El Ghaoui.Robust markov decision processes with uncertain transition matrices. PhD thesis, University of California, Berkeley, 2004

work page 2004
[46]

Osband, D

I. Osband, D. Russo, and B. Van Roy. (More) efficient reinforcement learning via posterior sam- pling.Advances in Neural Information Processing Systems, 26, 2013

work page 2013
[47]

Pfeiffer

L. Pfeiffer. Two approaches to stochastic optimal control problems with a final-time expectation constraint.Applied Mathematics & Optimization, 77:377–404, 2018

work page 2018
[48]

G. C. Pflug and A. Pichler.Multistage Stochastic Optimization, volume 1104. Springer, 2014

work page 2014
[49]

A. B. Philpott, V. L. de Matos, and L. Kapelevich. Distributionally robust SDDP.Computational Management Science, 15:431–454, 2018. 41

work page 2018
[50]

A. B. Philpott and Z. Guan. On the convergence of stochastic dual dynamic programming and related methods.Operations Research Letters, 36(4):450–455, 2008

work page 2008
[51]

Pichler and H

A. Pichler and H. Xu. Quantitative stability analysis for minimax distributionally robust risk optimization.Mathematical Programming, 191(1):47–77, 2022

work page 2022
[52]

M. L. Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014

work page 2014
[53]

Rahimian, G

H. Rahimian, G. Bayraksan, and T. H. De-Mello. Effective scenarios in multistage distributionally robust optimization with a focus on total variation distance.SIAM Journal on Optimization, 32(3):1698–1727, 2022

work page 2022
[54]

U. Rieder. Bayesian dynamic programming.Advances in Applied Probability, 7(2):330–348, 1975

work page 1975
[55]

R. T. Rockafellar.Convex Analysis. Princeton university press, 2015

work page 2015
[56]

R. T. Rockafellar, S. Uryasev, et al. Optimization of conditional value-at-risk.Journal of risk, 2:21–42, 2000

work page 2000
[57]

R¨ omisch

W. R¨ omisch. Stability of stochastic programming problems. InHandbooks in Operations Research and Management Science, volume 10, pages 483–554. Elsevier, 2003

work page 2003
[58]

A. Shapiro. Minimax and risk averse multistage stochastic programming.European Journal of Operational Research, 219(3):719–726, 2012

work page 2012
[59]

Shapiro, D

A. Shapiro, D. Dentcheva, and A. Ruszczynski.Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2021

work page 2021
[60]

Shapiro, E

A. Shapiro, E. Zhou, Y. Lin, and Y. Wang. Episodic Bayesian optimal control with unknown randomness distributions.Operations Research, 2025

work page 2025
[61]

M. Strens. A Bayesian framework for reinforcement learning. InInternational Conference on Machine Learning, volume 2000, pages 943–950, 2000

work page 2000
[62]

Taskesen, D

B. Taskesen, D. Iancu, C ¸ . Ko¸ cyi˘ git, and D. Kuhn. Distributionally robust linear quadratic control. Advances in Neural Information Processing Systems, 36:18613–18632, 2023

work page 2023
[63]

Tzortzis, C

I. Tzortzis, C. D. Charalambous, and T. Charalambous. Infinite horizon average cost dynamic programming subject to total variation distance ambiguity.SIAM Journal on Control and Opti- mization, 57(4):2843–2872, 2019

work page 2019
[64]

A. W. Van Der Vaart and J. A. Wellner. Weak convergence. InWeak convergence and empirical processes: with applications to statistics, pages 16–28. Springer, 1996

work page 1996
[65]

B. P. Van Parys, D. Kuhn, P. J. Goulart, and M. Morari. Distributionally robust control of constrained stochastic systems.IEEE Transactions on Automatic Control, 61(2):430–442, 2015

work page 2015
[66]

H. Wang, L. He, R. Gao, and F. Calmon. Aleatoric and epistemic discrimination: Fundamental limits of fairness interventions.Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[67]

Wang and E

Y. Wang and E. Zhou. Bayesian risk-averse Q-learning with streaming observations.Advances in Neural Information Processing Systems, 36:75967–75992, 2023

work page 2023
[68]

Z. Wang, P. W. Glynn, and Y. Ye. Likelihood robust optimization for data-driven problems. Computational Management Science, 13:241–261, 2016

work page 2016
[69]

J. Wessels. Markov programming by successive approximations with respect to weighted supremum norms.Journal of mathematical analysis and applications, 58(2):326–335, 1977. 42

work page 1977
[70]

W. Xie, C. Li, Y. Wu, and P. Zhang. A nonparametric Bayesian framework for uncertainty quan- tification in stochastic simulation.SIAM/ASA Journal on Uncertainty Quantification, 9(4):1527– 1552, 2021

work page 2021
[71]

Xu and S

H. Xu and S. Mannor. Distributionally robust markov decision processes.Advances in Neural Information Processing Systems, 23, 2010

work page 2010
[72]

Xu and S

H. Xu and S. Zhang. Quantitative statistical robustness in distributionally robust optimization models.Pacific Journal of Optimization Special Issue, 2021

work page 2021
[73]

I. Yang. Wasserstein distributionally robust stochastic control: A data-driven approach.IEEE Transactions on Automatic Control, 66(8):3863–3870, 2020

work page 2020
[74]

Z. Yang, Z. Chen, and H. Xu. Stability analysis of an integrated multistage stochastic programming and Markov decision process problem.arXiv preprint arXiv:2509.22194, 2025. 43

work page arXiv 2025

[1] [1]

3.2 Asymptotic convergence of value function and optimal policy Recall that (2.11) in Section 2.2 assumes the existence of a solution ˆV ∗ N to the Bellman equation (2.10)

and the monograph treatments in [26, 5]. 3.2 Asymptotic convergence of value function and optimal policy Recall that (2.11) in Section 2.2 assumes the existence of a solution ˆV ∗ N to the Bellman equation (2.10). However, the underlying rationale has not yet been fully established. In the following, we first demonstrate the existence and uniqueness of ˆV...

work page

[2] [2]

Using the closed-form solution (6.1), we approximate the benchmark value functionV ∗ with 105 samples

The normalized bin probabilities arep j = F(u j)−F(u j−1) /F(U). Using the closed-form solution (6.1), we approximate the benchmark value functionV ∗ with 105 samples. In episodeN, we update the Bayesian posterior by (2.3) using the observed data and construct the ambiguity set (2.5) corresponding to the posterior distribution. We then compute the episode...

work page 2000

[3] [3]

Abeille and A

M. Abeille and A. Lazaric. Improved regret bounds for thompson sampling in linear quadratic control problems. InInternational Conference on Machine Learning, pages 1–9. PMLR, 2018

work page 2018

[4] [4]

C. D. Aliprantis and K. C. Border.Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, 2006

work page 2006

[5] [5]

J. O. Berger.Statistical Decision Theory and Bayesian Analysis. Springer Science & Business Media, 2013

work page 2013

[6] [6]

Bertsekas.Dynamic Programming and Optimal Control: Volume I, volume 4

D. Bertsekas.Dynamic Programming and Optimal Control: Volume I, volume 4. Athena Scientific, 2012

work page 2012

[7] [7]

Bertsekas.Abstract Dynamic Programming

D. Bertsekas.Abstract Dynamic Programming. Athena Scientific, 2022

work page 2022

[8] [8]

Bertsekas and S

D. Bertsekas and S. E. Shreve.Stochastic Optimal Control: The Discrete-time Case, volume 5. Athena Scientific, 1996

work page 1996

[9] [9]

Bertsimas, V

D. Bertsimas, V. Gupta, and N. Kallus. Robust sample average approximation.Mathematical Programming, 171:217–282, 2018

work page 2018

[10] [10]

Carpentier, J.-P

P. Carpentier, J.-P. Chancelier, G. Cohen, M. De Lara, and P. Girardeau. Dynamic consistency for stochastic optimal control problems.Annals of Operations Research, 200:247–263, 2012. 39

work page 2012

[11] [11]

Castaing and M

C. Castaing and M. Valadier.Convex Analysis and Measurable Multifunctions. Springer, 1977

work page 1977

[12] [12]

Chen and W

Z. Chen and W. Ma. A Bayesian approach to data-driven multi-stage stochastic optimization. Journal of Global Optimization, pages 1–28, 2024

work page 2024

[13] [13]

Z. Chen, W. Ma, and B. Ji. Data-driven approximation of distributionally robust chance constraints using Bayesian credible intervals.OR Spectrum, 47(3):969–1009, 2025

work page 2025

[14] [14]

W. L. Cooper and B. Rangarajan. Performance guarantees for empirical Markov decision processes with applications to multiperiod inventory models.Operations Research, 60(5):1267–1281, 2012

work page 2012

[15] [15]

Delage and Y

E. Delage and Y. Ye. Distributionally robust optimization under moment uncertainty with appli- cation to data-driven problems.Operations research, 58(3):595–612, 2010

work page 2010

[16] [16]

Dibiasi and D

A. Dibiasi and D. Iselin. Measuring Knightian uncertainty.Empirical Economics, 61(4):2113–2141, 2021

work page 2021

[17] [17]

Efron and T

B. Efron and T. Hastie.Computer Age Statistical Inference, Student Edition: Algorithms, Evidence, and Data Science, volume 6. Cambridge University Press, 2021

work page 2021

[18] [18]

F¨ ullner and S

C. F¨ ullner and S. Rebennack. Stochastic dual dynamic programming and its variants: A review. SIAM Review, 67(3):415–539, 2025

work page 2025

[19] [19]

R. Gao. Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality.Operations Research, 71(6):2291–2306, 2023

work page 2023

[20] [20]

Gelman, J

A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin.Bayesian Data Analysis. Chapman and Hall/CRC, 1995

work page 1995

[21] [21]

Guigues, A

V. Guigues, A. Shapiro, and Y. Cheng. Risk-averse stochastic optimal control: an efficiently computable statistical upper bound.Operations Research Letters, 51(4):393–400, 2023

work page 2023

[22] [22]

Guo and H

S. Guo and H. Xu. Distributionally robust shortfall risk optimization model and its approximation. Mathematical Programming, 174(1):473–498, 2019

work page 2019

[23] [23]

Guo and H

S. Guo and H. Xu. Statistical robustness in utility preference robust optimization models.Mathe- matical Programming, 190(1):679–720, 2021

work page 2021

[24] [24]

V. Gupta. Near-optimal Bayesian ambiguity sets for distributionally robust optimization.Man- agement Science, 65(9):4242–4260, 2019

work page 2019

[25] [25]

F. R. Hampel. A general qualitative definition of robustness.The Annals of Mathematical Statistics, 42(6):1887–1896, 1971

work page 1971

[26] [26]

G. A. Hanasusanto, V. Roitch, D. Kuhn, and W. Wiesemann. A distributionally robust perspective on uncertainty quantification and chance constrained programming.Mathematical Programming, 151(1):35–62, 2015

work page 2015

[27] [27]

W. B. Haskell, R. Jain, and D. Kalathil. Empirical dynamic programming.Mathematics of Oper- ations Research, 41(2):402–429, 2016

work page 2016

[28] [28]

Hern´ andez-Lerma and J

O. Hern´ andez-Lerma and J. B. Lasserre.Further Topics on Discrete-time Markov Control Pro- cesses, volume 42. Springer Science & Business Media, 2012

work page 2012

[29] [29]

Huang, K

J. Huang, K. Zhou, and Y. Guan. A study of distributionally robust multistage stochastic opti- mization.arXiv preprint arXiv:1708.07930, 2017

work page arXiv 2017

[30] [30]

P. J. Huber and E. M. Ronchetti.Robust Statistics. John Wiley & Sons, 2011. 40

work page 2011

[31] [31]

Jiang and Y

R. Jiang and Y. Guan. Risk-averse two-stage stochastic program with distributional ambiguity. Operations Research, 66(5):1390–1405, 2018

work page 2018

[32] [32]

P. Kern, A. Simroth, and H. Z¨ ahle. First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition probability function.Mathematical Methods of Operations Research, 92(1):165–197, 2020

work page 2020

[33] [33]

Kim and I

K. Kim and I. Yang. Distributional robustness in minimax linear quadratic control with Wasserstein distance.SIAM Journal on Control and Optimization, 61(2):458–483, 2023

work page 2023

[34] [34]

H. Lam. Recovering best statistical guarantees via the empirical divergence-based distributionally robust optimization.Operations Research, 67(4):1090–1105, 2019

work page 2019

[35] [35]

M. Li, X. Tong, and H. Sun. Discretization and quantification for distributionally robust opti- mization with decision-dependent ambiguity sets.Optimization Methods and Software, pages 1–30, 2024

work page 2024

[36] [36]

P. Li, M. Yang, and Q. Wu. Confidence interval based distributionally robust real-time economic dispatch approach considering wind power accommodation risk.IEEE Transactions on Sustainable Energy, 12(1):58–69, 2020

work page 2020

[37] [37]

Y. Li, Y. Lin, E. Zhou, and F. Zhang. Risk-aware model predictive control enabled by Bayesian learning. In2022 American Control Conference (ACC), pages 108–113. IEEE, 2022

work page 2022

[38] [38]

Y. Li, Y. Lin, E. Zhou, and F. Zhang. Bayesian risk-averse model predictive control with consistency and stability guarantees.arXiv preprint arXiv:2511.21871, 2025

work page arXiv 2025

[39] [39]

Y. Lin, Y. Ren, and E. Zhou. Bayesian risk Markov decision processes.Advances in Neural Information Processing Systems, 35:17430–17442, 2022

work page 2022

[40] [40]

Liu and H

Y. Liu and H. Xu. Stability analysis of stochastic programs with second order dominance con- straints.Mathematical Programming, 142:435–460, 2013

work page 2013

[41] [41]

W. Ma, Z. Chen, and X. Chen. Bayesian distributionally robust variational inequalities: regular- ization and quantification.arXiv preprint arXiv:2509.16537, 2025

work page arXiv 2025

[42] [42]

W. Ma, Z. Chen, and H. Xu. A Bayesian composite risk approach for stochastic optimal control and Markov decision processes.arXiv preprint arXiv:2412.16488, 2024

work page arXiv 2024

[43] [43]

Mehrotra and H

S. Mehrotra and H. Zhang. Models and algorithms for distributionally robust least squares prob- lems.Mathematical Programming, 146(1):123–141, 2014

work page 2014

[44] [44]

Mohajerin Esfahani and D

P. Mohajerin Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations.Mathematical Program- ming, 171(1):115–166, 2018

work page 2018

[45] [45]

Nilim and L

A. Nilim and L. El Ghaoui.Robust markov decision processes with uncertain transition matrices. PhD thesis, University of California, Berkeley, 2004

work page 2004

[46] [46]

Osband, D

I. Osband, D. Russo, and B. Van Roy. (More) efficient reinforcement learning via posterior sam- pling.Advances in Neural Information Processing Systems, 26, 2013

work page 2013

[47] [47]

Pfeiffer

L. Pfeiffer. Two approaches to stochastic optimal control problems with a final-time expectation constraint.Applied Mathematics & Optimization, 77:377–404, 2018

work page 2018

[48] [48]

G. C. Pflug and A. Pichler.Multistage Stochastic Optimization, volume 1104. Springer, 2014

work page 2014

[49] [49]

A. B. Philpott, V. L. de Matos, and L. Kapelevich. Distributionally robust SDDP.Computational Management Science, 15:431–454, 2018. 41

work page 2018

[50] [50]

A. B. Philpott and Z. Guan. On the convergence of stochastic dual dynamic programming and related methods.Operations Research Letters, 36(4):450–455, 2008

work page 2008

[51] [51]

Pichler and H

A. Pichler and H. Xu. Quantitative stability analysis for minimax distributionally robust risk optimization.Mathematical Programming, 191(1):47–77, 2022

work page 2022

[52] [52]

M. L. Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014

work page 2014

[53] [53]

Rahimian, G

H. Rahimian, G. Bayraksan, and T. H. De-Mello. Effective scenarios in multistage distributionally robust optimization with a focus on total variation distance.SIAM Journal on Optimization, 32(3):1698–1727, 2022

work page 2022

[54] [54]

U. Rieder. Bayesian dynamic programming.Advances in Applied Probability, 7(2):330–348, 1975

work page 1975

[55] [55]

R. T. Rockafellar.Convex Analysis. Princeton university press, 2015

work page 2015

[56] [56]

R. T. Rockafellar, S. Uryasev, et al. Optimization of conditional value-at-risk.Journal of risk, 2:21–42, 2000

work page 2000

[57] [57]

R¨ omisch

W. R¨ omisch. Stability of stochastic programming problems. InHandbooks in Operations Research and Management Science, volume 10, pages 483–554. Elsevier, 2003

work page 2003

[58] [58]

A. Shapiro. Minimax and risk averse multistage stochastic programming.European Journal of Operational Research, 219(3):719–726, 2012

work page 2012

[59] [59]

Shapiro, D

A. Shapiro, D. Dentcheva, and A. Ruszczynski.Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2021

work page 2021

[60] [60]

Shapiro, E

A. Shapiro, E. Zhou, Y. Lin, and Y. Wang. Episodic Bayesian optimal control with unknown randomness distributions.Operations Research, 2025

work page 2025

[61] [61]

M. Strens. A Bayesian framework for reinforcement learning. InInternational Conference on Machine Learning, volume 2000, pages 943–950, 2000

work page 2000

[62] [62]

Taskesen, D

B. Taskesen, D. Iancu, C ¸ . Ko¸ cyi˘ git, and D. Kuhn. Distributionally robust linear quadratic control. Advances in Neural Information Processing Systems, 36:18613–18632, 2023

work page 2023

[63] [63]

Tzortzis, C

I. Tzortzis, C. D. Charalambous, and T. Charalambous. Infinite horizon average cost dynamic programming subject to total variation distance ambiguity.SIAM Journal on Control and Opti- mization, 57(4):2843–2872, 2019

work page 2019

[64] [64]

A. W. Van Der Vaart and J. A. Wellner. Weak convergence. InWeak convergence and empirical processes: with applications to statistics, pages 16–28. Springer, 1996

work page 1996

[65] [65]

B. P. Van Parys, D. Kuhn, P. J. Goulart, and M. Morari. Distributionally robust control of constrained stochastic systems.IEEE Transactions on Automatic Control, 61(2):430–442, 2015

work page 2015

[66] [66]

H. Wang, L. He, R. Gao, and F. Calmon. Aleatoric and epistemic discrimination: Fundamental limits of fairness interventions.Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[67] [67]

Wang and E

Y. Wang and E. Zhou. Bayesian risk-averse Q-learning with streaming observations.Advances in Neural Information Processing Systems, 36:75967–75992, 2023

work page 2023

[68] [68]

Z. Wang, P. W. Glynn, and Y. Ye. Likelihood robust optimization for data-driven problems. Computational Management Science, 13:241–261, 2016

work page 2016

[69] [69]

J. Wessels. Markov programming by successive approximations with respect to weighted supremum norms.Journal of mathematical analysis and applications, 58(2):326–335, 1977. 42

work page 1977

[70] [70]

W. Xie, C. Li, Y. Wu, and P. Zhang. A nonparametric Bayesian framework for uncertainty quan- tification in stochastic simulation.SIAM/ASA Journal on Uncertainty Quantification, 9(4):1527– 1552, 2021

work page 2021

[71] [71]

Xu and S

H. Xu and S. Mannor. Distributionally robust markov decision processes.Advances in Neural Information Processing Systems, 23, 2010

work page 2010

[72] [72]

Xu and S

H. Xu and S. Zhang. Quantitative statistical robustness in distributionally robust optimization models.Pacific Journal of Optimization Special Issue, 2021

work page 2021

[73] [73]

I. Yang. Wasserstein distributionally robust stochastic control: A data-driven approach.IEEE Transactions on Automatic Control, 66(8):3863–3870, 2020

work page 2020

[74] [74]

Z. Yang, Z. Chen, and H. Xu. Stability analysis of an integrated multistage stochastic programming and Markov decision process problem.arXiv preprint arXiv:2509.22194, 2025. 43

work page arXiv 2025