Adaptive Distributionally Robust Optimal Control with Bayesian Ambiguity Sets
Pith reviewed 2026-05-10 17:25 UTC · model grok-4.3
The pith
Bayesian learning from episodic data produces consistent and adaptive distributionally robust control policies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By updating the ambiguity set of a distributionally robust optimal control problem via Bayesian posteriors computed from episodic samples, one obtains a tractable risk-averse reformulation together with consistency of the optimal value function and optimal policy for infinite-horizon problems and finite-sample posterior credibility guarantees for the policy value; the resulting model is stable to sample perturbations and can be solved by a convergent Bellman-operator cutting-plane algorithm.
What carries the argument
The episodic Bayesian DROC model whose ambiguity set is updated by Bayesian posterior distributions computed from successive episodes of samples, which enables the adaptive reduction of conservatism while preserving robustness.
If this is right
- The optimal value function and policy converge to their true counterparts for infinite-horizon stochastic optimal control.
- The policy value satisfies finite-sample posterior credibility guarantees.
- The model remains stable and statistically robust under perturbations of the observed samples.
- Solutions can be computed efficiently by the Bellman-operator cutting-plane algorithm with proven convergence.
Where Pith is reading between the lines
- The same Bayesian updating mechanism could be applied to other sequential decision problems where data arrives in batches, such as certain reinforcement-learning tasks under distributional uncertainty.
- The finite-sample credibility bounds give a practical way to decide when enough episodes have been observed for the policy to be deployed with quantified reliability.
- Testing the moderate conditions on concrete problem structures would clarify how large the data requirement is in specific applications.
Load-bearing premise
Moderate conditions hold that permit the tractable risk-averse reformulation, the consistency proofs, and the credibility bounds, and that samples are generated episodically from the true underlying distribution.
What would settle it
Numerical simulations in which the policy value computed from the episodic Bayesian model fails to approach the policy value obtained under the true distribution as the number of episodes grows to infinity would falsify the consistency claim.
Figures
read the original abstract
In stochastic optimal control (SOC), uncertainty may arise from incomplete knowledge of the true probability distribution of the underlying environment, which is known as Knightian or epistemic uncertainty. Distributionally robust optimal control (DROC) models are subsequently proposed to tackle this source of uncertainty. While such models are effective in some practical applications, most existing DROC models are offline and can be overly conservative when data are scarce. Moreover, they cannot be applied to the case when samples are generated episodically. Motivated by the Bayesian SOC framework recently proposed by Shapiro et al.~\cite{shapiro2025episodic}, we propose an adaptive DROC model in which the ambiguity set is updated via Bayesian learning from new data. Under some moderate conditions, we derive a tractable risk-averse reformulation, establish consistency of the optimal value function and optimal policy for an infinite-horizon SOC and establish a finite-sample posterior credibility guarantee for the policy value induced by the proposed episodic Bayesian DROC model. We also study the stability and statistical robustness of the proposed model with respect to sample perturbations that often arise in data-driven environments. To solve the episodic Bayesian DROC model, we propose a Bellman-operator cutting-plane (BOCP) algorithm that is computationally efficient and provably convergent. Numerical results on an inventory control problem demonstrate the effectiveness, adaptivity, and robust performance of the proposed model and algorithm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an adaptive distributionally robust optimal control (DROC) framework for stochastic optimal control (SOC) under epistemic uncertainty. The ambiguity set is updated via Bayesian learning from episodic data samples. Under unspecified moderate conditions, the authors derive a tractable risk-averse reformulation, establish consistency of the optimal value function and policy for infinite-horizon problems, provide a finite-sample posterior credibility guarantee, analyze stability and robustness to sample perturbations, and introduce a provably convergent Bellman-operator cutting-plane (BOCP) algorithm. Numerical validation is provided on an inventory control example.
Significance. If the moderate conditions hold and the proofs are complete, the work meaningfully extends offline DROC by enabling online Bayesian adaptation, reducing conservatism with data while retaining robustness guarantees. The consistency and credibility results, combined with the convergent BOCP algorithm, offer both theoretical and computational contributions to data-driven control. The inventory example illustrates practical relevance in operations research settings. The integration of Bayesian updating with distributionally robust control is a clear strength when the assumptions align with the application.
major comments (2)
- [Abstract] Abstract: All three central claims (tractable risk-averse reformulation, consistency of value function/policy for infinite-horizon SOC, and finite-sample posterior credibility guarantee) are stated to hold only 'under some moderate conditions,' yet these conditions are never enumerated or characterized. This is load-bearing because the conditions control whether the Bayesian update remains tractable, whether the Bellman operator is a contraction, and whether the credibility bound applies; without an explicit list (e.g., requirements on the ambiguity-set family, uniform integrability, or moment conditions), the scope and practical utility of the results cannot be assessed.
- [Abstract and model formulation] Episodic sampling assumption (referenced in abstract and likely §3): The framework requires that samples are generated episodically from the true underlying distribution for the posterior update and finite-sample guarantee to be well-defined. If this assumption is violated (common in non-stationary or biased data environments), both the consistency result and the credibility guarantee may fail to hold, narrowing the applicability of the adaptive DROC model.
minor comments (2)
- [Abstract] Abstract: The reference to Shapiro et al. is cited as shapiro2025episodic; ensure the full bibliographic entry is provided in the reference list and that any dependence on prior results is clearly delineated.
- [Numerical results] Numerical section: The inventory control example demonstrates effectiveness, but additional details on how the moderate conditions are satisfied in the example (e.g., specific distribution family or integrability) would strengthen the link between theory and numerics.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive comments. We address each major comment point by point below, indicating planned revisions to improve clarity and scope without misrepresenting the contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: All three central claims (tractable risk-averse reformulation, consistency of value function/policy for infinite-horizon SOC, and finite-sample posterior credibility guarantee) are stated to hold only 'under some moderate conditions,' yet these conditions are never enumerated or characterized. This is load-bearing because the conditions control whether the Bayesian update remains tractable, whether the Bellman operator is a contraction, and whether the credibility bound applies; without an explicit list (e.g., requirements on the ambiguity-set family, uniform integrability, or moment conditions), the scope and practical utility of the results cannot be assessed.
Authors: We agree that the abstract would benefit from an explicit enumeration of the moderate conditions to immediately convey scope. These conditions are fully specified in the manuscript: Assumption 2.1 requires the ambiguity set to be a weakly compact, convex collection of measures with uniformly bounded first moments; Assumption 3.2 imposes uniform integrability and Lipschitz continuity on the stage costs; and Assumption 4.1 ensures the prior has full support with the posterior concentrating under i.i.d. episodic sampling. The Bellman operator is shown to be a contraction under a discount factor strictly less than one combined with the moment bounds. We will revise the abstract to include a concise parenthetical list of these conditions and add a short summary paragraph at the end of the introduction that cross-references their locations. This change directly addresses the concern while preserving the original claims. revision: yes
-
Referee: [Abstract and model formulation] Episodic sampling assumption (referenced in abstract and likely §3): The framework requires that samples are generated episodically from the true underlying distribution for the posterior update and finite-sample guarantee to be well-defined. If this assumption is violated (common in non-stationary or biased data environments), both the consistency result and the credibility guarantee may fail to hold, narrowing the applicability of the adaptive DROC model.
Authors: The episodic i.i.d. sampling assumption is indeed foundational, as it underpins both the Bayesian posterior update (Section 3) and the finite-sample credibility bound (Theorem 4.3), which rely on independent episodes drawn from the true distribution. We acknowledge that the consistency and guarantee results do not automatically extend to non-stationary or biased sampling regimes. In the revised manuscript we will add an explicit paragraph in the introduction and a dedicated limitations subsection in the conclusion that states the assumption, illustrates its role via the inventory example, and outlines future extensions such as sliding-window posteriors or ambiguity-set inflation for non-stationarity. This clarifies applicability without weakening the core episodic setting, which remains relevant for many operations-research control problems. revision: partial
Circularity Check
No significant circularity; central results build on external Bayesian SOC framework without self-referential reduction
full rationale
The derivation chain starts from the external Bayesian SOC framework of Shapiro et al. (cited as motivation) and applies standard Bayesian updating to construct ambiguity sets for DROC. Tractable risk-averse reformulation, infinite-horizon consistency of value function/policy, and finite-sample posterior credibility guarantees are all stated to hold only under unspecified moderate conditions, but these are not shown to reduce by the paper's own equations to fitted inputs or self-citations. The BOCP algorithm is a proposed solver with claimed convergence, independent of the guarantees. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear; the framework remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Moderate conditions on the ambiguity set, data generation process, and problem structure
Reference graph
Works this paper leans on
-
[1]
and the monograph treatments in [26, 5]. 3.2 Asymptotic convergence of value function and optimal policy Recall that (2.11) in Section 2.2 assumes the existence of a solution ˆV ∗ N to the Bellman equation (2.10). However, the underlying rationale has not yet been fully established. In the following, we first demonstrate the existence and uniqueness of ˆV...
-
[2]
The normalized bin probabilities arep j = F(u j)−F(u j−1) /F(U). Using the closed-form solution (6.1), we approximate the benchmark value functionV ∗ with 105 samples. In episodeN, we update the Bayesian posterior by (2.3) using the observed data and construct the ambiguity set (2.5) corresponding to the posterior distribution. We then compute the episode...
work page 2000
-
[3]
M. Abeille and A. Lazaric. Improved regret bounds for thompson sampling in linear quadratic control problems. InInternational Conference on Machine Learning, pages 1–9. PMLR, 2018
work page 2018
-
[4]
C. D. Aliprantis and K. C. Border.Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, 2006
work page 2006
-
[5]
J. O. Berger.Statistical Decision Theory and Bayesian Analysis. Springer Science & Business Media, 2013
work page 2013
-
[6]
Bertsekas.Dynamic Programming and Optimal Control: Volume I, volume 4
D. Bertsekas.Dynamic Programming and Optimal Control: Volume I, volume 4. Athena Scientific, 2012
work page 2012
-
[7]
Bertsekas.Abstract Dynamic Programming
D. Bertsekas.Abstract Dynamic Programming. Athena Scientific, 2022
work page 2022
-
[8]
D. Bertsekas and S. E. Shreve.Stochastic Optimal Control: The Discrete-time Case, volume 5. Athena Scientific, 1996
work page 1996
-
[9]
D. Bertsimas, V. Gupta, and N. Kallus. Robust sample average approximation.Mathematical Programming, 171:217–282, 2018
work page 2018
-
[10]
P. Carpentier, J.-P. Chancelier, G. Cohen, M. De Lara, and P. Girardeau. Dynamic consistency for stochastic optimal control problems.Annals of Operations Research, 200:247–263, 2012. 39
work page 2012
-
[11]
C. Castaing and M. Valadier.Convex Analysis and Measurable Multifunctions. Springer, 1977
work page 1977
-
[12]
Z. Chen and W. Ma. A Bayesian approach to data-driven multi-stage stochastic optimization. Journal of Global Optimization, pages 1–28, 2024
work page 2024
-
[13]
Z. Chen, W. Ma, and B. Ji. Data-driven approximation of distributionally robust chance constraints using Bayesian credible intervals.OR Spectrum, 47(3):969–1009, 2025
work page 2025
-
[14]
W. L. Cooper and B. Rangarajan. Performance guarantees for empirical Markov decision processes with applications to multiperiod inventory models.Operations Research, 60(5):1267–1281, 2012
work page 2012
-
[15]
E. Delage and Y. Ye. Distributionally robust optimization under moment uncertainty with appli- cation to data-driven problems.Operations research, 58(3):595–612, 2010
work page 2010
-
[16]
A. Dibiasi and D. Iselin. Measuring Knightian uncertainty.Empirical Economics, 61(4):2113–2141, 2021
work page 2021
-
[17]
B. Efron and T. Hastie.Computer Age Statistical Inference, Student Edition: Algorithms, Evidence, and Data Science, volume 6. Cambridge University Press, 2021
work page 2021
-
[18]
C. F¨ ullner and S. Rebennack. Stochastic dual dynamic programming and its variants: A review. SIAM Review, 67(3):415–539, 2025
work page 2025
-
[19]
R. Gao. Finite-sample guarantees for Wasserstein distributionally robust optimization: Breaking the curse of dimensionality.Operations Research, 71(6):2291–2306, 2023
work page 2023
- [20]
-
[21]
V. Guigues, A. Shapiro, and Y. Cheng. Risk-averse stochastic optimal control: an efficiently computable statistical upper bound.Operations Research Letters, 51(4):393–400, 2023
work page 2023
- [22]
- [23]
-
[24]
V. Gupta. Near-optimal Bayesian ambiguity sets for distributionally robust optimization.Man- agement Science, 65(9):4242–4260, 2019
work page 2019
-
[25]
F. R. Hampel. A general qualitative definition of robustness.The Annals of Mathematical Statistics, 42(6):1887–1896, 1971
work page 1971
-
[26]
G. A. Hanasusanto, V. Roitch, D. Kuhn, and W. Wiesemann. A distributionally robust perspective on uncertainty quantification and chance constrained programming.Mathematical Programming, 151(1):35–62, 2015
work page 2015
-
[27]
W. B. Haskell, R. Jain, and D. Kalathil. Empirical dynamic programming.Mathematics of Oper- ations Research, 41(2):402–429, 2016
work page 2016
-
[28]
O. Hern´ andez-Lerma and J. B. Lasserre.Further Topics on Discrete-time Markov Control Pro- cesses, volume 42. Springer Science & Business Media, 2012
work page 2012
- [29]
-
[30]
P. J. Huber and E. M. Ronchetti.Robust Statistics. John Wiley & Sons, 2011. 40
work page 2011
-
[31]
R. Jiang and Y. Guan. Risk-averse two-stage stochastic program with distributional ambiguity. Operations Research, 66(5):1390–1405, 2018
work page 2018
-
[32]
P. Kern, A. Simroth, and H. Z¨ ahle. First-order sensitivity of the optimal value in a Markov decision model with respect to deviations in the transition probability function.Mathematical Methods of Operations Research, 92(1):165–197, 2020
work page 2020
- [33]
-
[34]
H. Lam. Recovering best statistical guarantees via the empirical divergence-based distributionally robust optimization.Operations Research, 67(4):1090–1105, 2019
work page 2019
-
[35]
M. Li, X. Tong, and H. Sun. Discretization and quantification for distributionally robust opti- mization with decision-dependent ambiguity sets.Optimization Methods and Software, pages 1–30, 2024
work page 2024
-
[36]
P. Li, M. Yang, and Q. Wu. Confidence interval based distributionally robust real-time economic dispatch approach considering wind power accommodation risk.IEEE Transactions on Sustainable Energy, 12(1):58–69, 2020
work page 2020
-
[37]
Y. Li, Y. Lin, E. Zhou, and F. Zhang. Risk-aware model predictive control enabled by Bayesian learning. In2022 American Control Conference (ACC), pages 108–113. IEEE, 2022
work page 2022
- [38]
-
[39]
Y. Lin, Y. Ren, and E. Zhou. Bayesian risk Markov decision processes.Advances in Neural Information Processing Systems, 35:17430–17442, 2022
work page 2022
- [40]
- [41]
- [42]
-
[43]
S. Mehrotra and H. Zhang. Models and algorithms for distributionally robust least squares prob- lems.Mathematical Programming, 146(1):123–141, 2014
work page 2014
-
[44]
P. Mohajerin Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations.Mathematical Program- ming, 171(1):115–166, 2018
work page 2018
-
[45]
A. Nilim and L. El Ghaoui.Robust markov decision processes with uncertain transition matrices. PhD thesis, University of California, Berkeley, 2004
work page 2004
- [46]
- [47]
-
[48]
G. C. Pflug and A. Pichler.Multistage Stochastic Optimization, volume 1104. Springer, 2014
work page 2014
-
[49]
A. B. Philpott, V. L. de Matos, and L. Kapelevich. Distributionally robust SDDP.Computational Management Science, 15:431–454, 2018. 41
work page 2018
-
[50]
A. B. Philpott and Z. Guan. On the convergence of stochastic dual dynamic programming and related methods.Operations Research Letters, 36(4):450–455, 2008
work page 2008
-
[51]
A. Pichler and H. Xu. Quantitative stability analysis for minimax distributionally robust risk optimization.Mathematical Programming, 191(1):47–77, 2022
work page 2022
-
[52]
M. L. Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014
work page 2014
-
[53]
H. Rahimian, G. Bayraksan, and T. H. De-Mello. Effective scenarios in multistage distributionally robust optimization with a focus on total variation distance.SIAM Journal on Optimization, 32(3):1698–1727, 2022
work page 2022
-
[54]
U. Rieder. Bayesian dynamic programming.Advances in Applied Probability, 7(2):330–348, 1975
work page 1975
-
[55]
R. T. Rockafellar.Convex Analysis. Princeton university press, 2015
work page 2015
-
[56]
R. T. Rockafellar, S. Uryasev, et al. Optimization of conditional value-at-risk.Journal of risk, 2:21–42, 2000
work page 2000
- [57]
-
[58]
A. Shapiro. Minimax and risk averse multistage stochastic programming.European Journal of Operational Research, 219(3):719–726, 2012
work page 2012
-
[59]
A. Shapiro, D. Dentcheva, and A. Ruszczynski.Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2021
work page 2021
-
[60]
A. Shapiro, E. Zhou, Y. Lin, and Y. Wang. Episodic Bayesian optimal control with unknown randomness distributions.Operations Research, 2025
work page 2025
-
[61]
M. Strens. A Bayesian framework for reinforcement learning. InInternational Conference on Machine Learning, volume 2000, pages 943–950, 2000
work page 2000
-
[62]
B. Taskesen, D. Iancu, C ¸ . Ko¸ cyi˘ git, and D. Kuhn. Distributionally robust linear quadratic control. Advances in Neural Information Processing Systems, 36:18613–18632, 2023
work page 2023
-
[63]
I. Tzortzis, C. D. Charalambous, and T. Charalambous. Infinite horizon average cost dynamic programming subject to total variation distance ambiguity.SIAM Journal on Control and Opti- mization, 57(4):2843–2872, 2019
work page 2019
-
[64]
A. W. Van Der Vaart and J. A. Wellner. Weak convergence. InWeak convergence and empirical processes: with applications to statistics, pages 16–28. Springer, 1996
work page 1996
-
[65]
B. P. Van Parys, D. Kuhn, P. J. Goulart, and M. Morari. Distributionally robust control of constrained stochastic systems.IEEE Transactions on Automatic Control, 61(2):430–442, 2015
work page 2015
-
[66]
H. Wang, L. He, R. Gao, and F. Calmon. Aleatoric and epistemic discrimination: Fundamental limits of fairness interventions.Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[67]
Y. Wang and E. Zhou. Bayesian risk-averse Q-learning with streaming observations.Advances in Neural Information Processing Systems, 36:75967–75992, 2023
work page 2023
-
[68]
Z. Wang, P. W. Glynn, and Y. Ye. Likelihood robust optimization for data-driven problems. Computational Management Science, 13:241–261, 2016
work page 2016
-
[69]
J. Wessels. Markov programming by successive approximations with respect to weighted supremum norms.Journal of mathematical analysis and applications, 58(2):326–335, 1977. 42
work page 1977
-
[70]
W. Xie, C. Li, Y. Wu, and P. Zhang. A nonparametric Bayesian framework for uncertainty quan- tification in stochastic simulation.SIAM/ASA Journal on Uncertainty Quantification, 9(4):1527– 1552, 2021
work page 2021
- [71]
- [72]
-
[73]
I. Yang. Wasserstein distributionally robust stochastic control: A data-driven approach.IEEE Transactions on Automatic Control, 66(8):3863–3870, 2020
work page 2020
- [74]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.