Constrained Policy Optimization for Stochastic Optimal Control under Nonstationary Uncertainties

Emil Contantinescu; Fran\c{c}ois Pacaud; Mihai Anitescu; Sungho Shin

arxiv: 2209.13050 · v1 · submitted 2022-09-26 · 🧮 math.OC

Constrained Policy Optimization for Stochastic Optimal Control under Nonstationary Uncertainties

Sungho Shin , Fran\c{c}ois Pacaud , Emil Contantinescu , Mihai Anitescu This is my paper

Pith reviewed 2026-05-24 11:22 UTC · model grok-4.3

classification 🧮 math.OC

keywords stochastic optimal controlpolicy optimizationMarkov embeddabilitynonstationary uncertaintiesnonlinear programminginterior-point methodsautomatic differentiation

0 comments

The pith

Stochastic optimal control under nonstationary uncertainties reduces to constrained policy optimization via Markov embeddability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that by assuming Markov embeddability, one can reformulate stochastic optimal control as a policy optimization problem in an augmented state space that includes the uncertainty process. This infinite-dimensional problem is then discretized into a finite nonlinear program through function approximation, deterministic sampling, and truncation in time. Solving this program with automatic differentiation and interior-point methods yields control policies, as demonstrated in a numerical example. The approach addresses systems where uncertainties change over time, which standard stationary methods cannot handle directly.

Core claim

Under the Markov embeddability assumption, the stochastic optimal control problem is cast as a policy optimization problem over the augmented state space. This infinite-dimensional problem is approximated as a finite-dimensional nonlinear program by applying function approximation, deterministic sampling, and temporal truncation. The approximated problem is solved using automatic differentiation and condensed-space interior-point methods.

What carries the argument

Markov embeddability assumption, which embeds the nonstationary uncertainty process into an augmented Markov state to allow policy optimization.

If this is right

The stochastic optimal control problem becomes equivalent to optimizing a policy over the augmented state space.
The infinite-dimensional problem is reduced to a tractable finite nonlinear program.
Automatic differentiation supplies the gradients needed for condensed-space interior-point solvers.
A numerical demonstration confirms that the resulting policy performs as intended on the example system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If Markov embeddability can be verified for common classes of time-varying disturbances, the reformulation would extend to many engineering control tasks.
The open questions on asymptotic exactness indicate that convergence rates under increasing sample size and horizon length remain to be quantified.
The sampling-based approximation could be replaced by quadrature rules or other deterministic integration schemes to improve accuracy.

Load-bearing premise

The nonstationary uncertainty process must satisfy Markov embeddability so that the augmented state captures the dynamics without loss of information.

What would settle it

A concrete nonstationary uncertainty process that violates Markov embeddability, for which the method produces a policy whose achieved cost differs from the true optimum by a measurable amount.

Figures

Figures reproduced from arXiv: 2209.13050 by Emil Contantinescu, Fran\c{c}ois Pacaud, Mihai Anitescu, Sungho Shin.

**Figure 2.** Figure 2: Closed-loop simulation of PO, LQR, and MPC (noisy). [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

This article presents a constrained policy optimization approach for the optimal control of systems under nonstationary uncertainties. We introduce an assumption that we call Markov embeddability that allows us to cast the stochastic optimal control problem as a policy optimization problem over the augmented state space. Then, the infinite-dimensional policy optimization problem is approximated as a finite-dimensional nonlinear program by applying function approximation, deterministic sampling, and temporal truncation. The approximated problem is solved by using automatic differentiation and condensed-space interior-point methods. We formulate several conceptual and practical open questions regarding the asymptotic exactness of the approximation and the solution strategies for the approximated problem. As a proof of concept, we provide a numerical example demonstrating the performance of the control policy obtained by the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Markov embeddability reformulates the nonstationary SOC problem as policy optimization over augmented states, then approximates it to a finite NLP, but convergence of that approximation is left open with no analysis.

read the letter

The paper introduces Markov embeddability to cast stochastic optimal control under nonstationary uncertainties as a policy optimization problem over an augmented state space. From there it approximates the infinite-dimensional problem to a finite nonlinear program through function approximation, deterministic sampling, and temporal truncation, then solves the NLP with automatic differentiation and condensed-space interior-point methods. One numerical example serves as proof of concept, and the authors list open questions on asymptotic exactness and solution strategies.

Referee Report

2 major / 0 minor

Summary. The paper proposes casting stochastic optimal control problems under nonstationary uncertainties as constrained policy optimization problems over an augmented state space, enabled by a Markov embeddability assumption. The resulting infinite-dimensional problem is approximated as a finite-dimensional nonlinear program via function approximation, deterministic sampling, and temporal truncation; the NLP is then solved using automatic differentiation and condensed-space interior-point methods. Several open questions on asymptotic exactness of the approximation are explicitly formulated, and the method is illustrated on a single numerical example.

Significance. If the open questions on asymptotic exactness were resolved with positive convergence results, the framework could offer a systematic way to apply modern nonlinear programming tools to constrained stochastic control with nonstationary uncertainty. The explicit use of automatic differentiation and interior-point methods is a practical strength, and the formulation of open questions provides a clear research agenda. At present, however, the absence of any error bounds or consistency analysis limits the result to a conceptual proposal whose practical significance remains to be demonstrated.

major comments (2)

[Abstract] Abstract: the central claim that the finite NLP obtained by function approximation, deterministic sampling, and temporal truncation can be used to solve the original constrained stochastic optimal control problem rests on the asymptotic exactness of this scheme, yet the manuscript itself states that this exactness is posed as an open question with no accompanying error bounds, consistency proof, or convergence analysis supplied anywhere in the text.
[Numerical example] Numerical example section: validation is limited to a single numerical example with no comparison against alternative methods for nonstationary stochastic control or against the infinite-dimensional problem, making it impossible to assess whether the interior-point solution of the approximated NLP faithfully represents the original problem.

Simulated Author's Rebuttal

2 responses · 2 unresolved

We thank the referee for the detailed review. The manuscript is explicitly framed as a conceptual proposal that formulates open questions on asymptotic exactness rather than claiming to resolve them. We respond point by point to the major comments below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the finite NLP obtained by function approximation, deterministic sampling, and temporal truncation can be used to solve the original constrained stochastic optimal control problem rests on the asymptotic exactness of this scheme, yet the manuscript itself states that this exactness is posed as an open question with no accompanying error bounds, consistency proof, or convergence analysis supplied anywhere in the text.

Authors: The abstract does not advance a claim that the finite NLP solves the original problem. It describes the Markov-embeddability formulation, the approximation steps, the use of automatic differentiation and interior-point methods to solve the resulting NLP, and then states that open questions on asymptotic exactness are formulated. The contribution is therefore the casting of the problem and the practical solution procedure for the approximation, with the open questions serving as an explicit research agenda. No error bounds are supplied because their derivation is left open. revision: no
Referee: [Numerical example] Numerical example section: validation is limited to a single numerical example with no comparison against alternative methods for nonstationary stochastic control or against the infinite-dimensional problem, making it impossible to assess whether the interior-point solution of the approximated NLP faithfully represents the original problem.

Authors: The numerical example is presented solely as a proof of concept, consistent with the abstract wording. A single illustrative instance is appropriate for demonstrating that the overall pipeline (formulation, approximation, and solver) can be executed. Systematic comparisons to other nonstationary stochastic control methods or to the infinite-dimensional problem would require additional theoretical and computational machinery that lies outside the scope of the current conceptual contribution. revision: no

standing simulated objections not resolved

Derivation of error bounds, consistency proofs, or convergence analysis for the approximation scheme
Empirical comparisons against alternative methods or the infinite-dimensional formulation

Circularity Check

0 steps flagged

No significant circularity; derivation relies on explicit assumption and standard methods with open questions stated.

full rationale

The paper introduces Markov embeddability as a new assumption to recast the SOC problem, then applies function approximation, deterministic sampling, and temporal truncation to obtain a finite NLP solved via automatic differentiation and interior-point methods. It explicitly formulates open questions on asymptotic exactness rather than claiming convergence. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described chain. The central claim remains an approximation approach whose validity is left partially open, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the Markov embeddability assumption introduced in the paper.

axioms (1)

domain assumption Markov embeddability assumption
Allows casting SOC as policy optimization over augmented state space.

pith-pipeline@v0.9.0 · 5664 in / 922 out tokens · 20258 ms · 2026-05-24T11:22:44.724031+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 3 internal anchors

[1]

Bertsekas, Dynamic programming and optimal control: Volume I

D. Bertsekas, Dynamic programming and optimal control: Volume I . Athena scientiﬁc, 2012, vol. 1

work page 2012
[2]

A note on certainty equivalence in dynamic planning,

H. Theil, “A note on certainty equivalence in dynamic planning,” Econometrica: Journal of the Econometric Society , pp. 346–349, 1957

work page 1957
[3]

Dynamic programming under uncertainty with a quadratic criterion function,

H. A. Simon, “Dynamic programming under uncertainty with a quadratic criterion function,” Econometrica, Journal of the Econometric Society, pp. 74–81, 1956

work page 1956
[4]

Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games,

D. Jacobson, “Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games,” IEEE Transactions on Automatic Control , vol. 18, no. 2, pp. 124–131, 1973

work page 1973
[5]

Decomposition and partitioning methods for multistage stochastic linear programs,

J. R. Birge, “Decomposition and partitioning methods for multistage stochastic linear programs,” Operations research, vol. 33, no. 5, pp. 989–1007, 1985

work page 1985
[6]

Nested decomposition for dynamic models,

J. K. Ho and A. S. Manne, “Nested decomposition for dynamic models,” Mathematical Programming, vol. 6, no. 1, pp. 121–140, 1974

work page 1974
[7]

Applying the progressive hedging algorithm to stochastic generalized networks,

J. M. Mulvey and H. Vladimirou, “Applying the progressive hedging algorithm to stochastic generalized networks,” Annals of Operations Research, vol. 31, no. 1, pp. 399–424, 1991

work page 1991
[8]

Scenarios and policy aggrega- tion in optimization under uncertainty,

R. T. Rockafellar and R. J.-B. Wets, “Scenarios and policy aggrega- tion in optimization under uncertainty,” Mathematics of Operations Research, vol. 16, no. 1, pp. 119–147, 1991

work page 1991
[9]

Multi-stage stochastic optimization applied to energy planning,

M. V . Pereira and L. M. Pinto, “Multi-stage stochastic optimization applied to energy planning,” Mathematical Programming, vol. 52, no. 1, pp. 359–375, 1991

work page 1991
[10]

When to trust your model: Model-based policy optimization,

M. Janner, J. Fu, M. Zhang, and S. Levine, “When to trust your model: Model-based policy optimization,” in Advances in Neural Information Processing Systems, 2019, pp. 12 498–12 509

work page 2019
[11]

Optimization methods for large- scale machine learning,

L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large- scale machine learning,” SIAM Review, vol. 60, no. 2, pp. 223–311, 2018

work page 2018
[12]

Nocedal and S

J. Nocedal and S. J. Wright, Numerical optimization. Springer, 1999

work page 1999
[13]

Global convergence of policy gradient methods for the linear quadratic regulator,

M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International Conference on Machine Learning . PMLR, 2018, pp. 1467–1476

work page 2018
[14]

Direct policy optimization using deterministic sampling and collocation,

T. A. Howell, C. Fu, and Z. Manchester, “Direct policy optimization using deterministic sampling and collocation,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5324–5331, 2021

work page 2021
[15]

Stochastic model predictive control with joint chance constraints,

J. A. Paulson, E. A. Buehler, R. D. Braatz, and A. Mesbah, “Stochastic model predictive control with joint chance constraints,” International Journal of Control , vol. 93, no. 1, pp. 126–139, 2020

work page 2020
[16]

J. R. Birge and F. Louveaux, Introduction to stochastic programming . Springer Science & Business Media, 2011

work page 2011
[17]

Reinforcement learning for selective key applications in power systems: Recent advances and future challenges,

X. Chen, G. Qu, Y . Tang, S. Low, and N. Li, “Reinforcement learning for selective key applications in power systems: Recent advances and future challenges,” IEEE Transactions on Smart Grid , 2022

work page 2022
[18]

Economic opportunities for industrial systems from frequency regulation markets,

A. W. Dowling and V . M. Zavala, “Economic opportunities for industrial systems from frequency regulation markets,” Computers & Chemical Engineering, vol. 114, pp. 254–264, 2018

work page 2018
[19]

Optimal demand response scheduling of an industrial air separation unit using data-driven dynamic models,

C. Tsay, A. Kumar, J. Flores-Cerrillo, and M. Baldea, “Optimal demand response scheduling of an industrial air separation unit using data-driven dynamic models,” Computers & Chemical Engineering , vol. 126, pp. 22–34, 2019

work page 2019
[20]

On differential stability in stochastic programming,

A. Shapiro, “On differential stability in stochastic programming,” Mathematical Programming, vol. 47, no. 1, pp. 107–116, 1990

work page 1990
[21]

On a time consistency concept in risk averse multistage stochastic programming,

——, “On a time consistency concept in risk averse multistage stochastic programming,” Operations Research Letters, vol. 37, no. 3, pp. 143–147, 2009

work page 2009
[22]

Detecting strange attractors in turbulence,

F. Takens, “Detecting strange attractors in turbulence,” in Dynamical systems and turbulence, Warwick 1980 . Springer, 1981, pp. 366–381

work page 1980
[23]

Data-driven model reduction, Wiener projections, and the Koopman-Mori-Zwanzig formalism,

K. K. Lin and F. Lu, “Data-driven model reduction, Wiener projections, and the Koopman-Mori-Zwanzig formalism,” Journal of Computational Physics, vol. 424, p. 109864, 2021. Fig. 1. Closed-loop simulation of PO, LQR, and MPC (nominal). Fig. 2. Closed-loop simulation of PO, LQR, and MPC (noisy). TABLE I PERFORMANCE COMPARISON OF PO, LQR, AND MPC ( NOMINAL )...

work page 2021
[24]

A discrete approach to stochastic parametrization and dimensional reduction in nonlinear dynamics,

A. Chorin and F. Lu, “A discrete approach to stochastic parametrization and dimensional reduction in nonlinear dynamics,” Proceedings of the National Academy of Sciences , 2015

work page 2015
[25]

Operator inference of non-Markovian terms for learning reduced models from partially observed state trajectories,

W. I. T. Uy and B. Peherstorfer, “Operator inference of non-Markovian terms for learning reduced models from partially observed state trajectories,” Journal of Scientiﬁc Computing , vol. 88, no. 3, pp. 1–31, 2021

work page 2021
[26]

A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition,

M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, “A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition,” Journal of Nonlinear Science , vol. 25, no. 6, pp. 1307–1346, 2015

work page 2015
[27]

Camera: A method for cost-aware, adaptive, multiﬁdelity, efﬁcient reliability analysis,

S. A. Renganathan, V . Rao, and I. M. Navon, “Camera: A method for cost-aware, adaptive, multiﬁdelity, efﬁcient reliability analysis,” arXiv preprint arXiv:2203.01436, 2022

work page arXiv 2022
[28]

The t-model as a large eddy simulation model for the Navier–Stokes equations,

A. J. Chandy and S. H. Frankel, “The t-model as a large eddy simulation model for the Navier–Stokes equations,” Multiscale Modeling & Simulation, vol. 8, no. 2, pp. 445–462, 2010

work page 2010
[29]

Physics-based covariance models for Gaussian processes with multiple outputs,

E. Constantinescu and M. Anitescu, “Physics-based covariance models for Gaussian processes with multiple outputs,” International Journal for Uncertainty Quantiﬁcation , vol. 3, no. 1, pp. 47–71, 2013

work page 2013
[30]

D. P. Bertsekas and S. E. Shreve, Stochastic optimal control: the discrete-time case. Athena Scientiﬁc, 1996, vol. 5

work page 1996
[31]

M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014

work page 2014
[32]

The explicit linear quadratic regulator for constrained systems,

A. Bemporad, M. Morari, V . Dua, and E. N. Pistikopoulos, “The explicit linear quadratic regulator for constrained systems,” Automatica, vol. 38, no. 1, pp. 3–20, 2002

work page 2002
[33]

Approximation by superpositions of a sigmoidal function,

G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303–314, 1989

work page 1989
[34]

Asymptotic behavior of optimal solutions in stochastic programming,

A. Shapiro, “Asymptotic behavior of optimal solutions in stochastic programming,” Mathematics of Operations Research , vol. 18, no. 4, pp. 829–845, 1993

work page 1993
[35]

On the rate of convergence of optimal solutions of Monte Carlo approximations of stochastic programs,

A. Shapiro and T. Homem-de Mello, “On the rate of convergence of optimal solutions of Monte Carlo approximations of stochastic programs,” SIAM journal on optimization , vol. 11, no. 1, pp. 70–86, 2000

work page 2000
[36]

Sample average approximation with heavier tails i: non-asymptotic bounds with weak assumptions and stochastic constraints,

R. I. Oliveira and P. Thompson, “Sample average approximation with heavier tails i: non-asymptotic bounds with weak assumptions and stochastic constraints,” Mathematical Programming, pp. 1–48, 2022

work page 2022
[37]

Exponential decay in the sensitivity analysis of nonlinear dynamic programming,

S. Na and M. Anitescu, “Exponential decay in the sensitivity analysis of nonlinear dynamic programming,” SIAM Journal on Optimization , vol. 30, no. 2, pp. 1527–1554, 2020

work page 2020
[38]

Exponential decay of sensitivity in graph-structured nonlinear programs,

S. Shin, M. Anitescu, and V . M. Zavala, “Exponential decay of sensitivity in graph-structured nonlinear programs,” SIAM Journal on Optimization, 2022

work page 2022
[39]

Perturbation- based regret analysis of predictive control in linear time varying systems,

Y . Lin, Y . Hu, G. Shi, H. Sun, G. Qu, and A. Wierman, “Perturbation- based regret analysis of predictive control in linear time varying systems,” Advances in Neural Information Processing Systems , vol. 34, 2021

work page 2021
[40]

Mastering the game of Go without human knowledge,

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017

work page 2017
[41]

Don't Unroll Adjoint: Differentiating SSA-Form Programs

M. Innes, “Don’t unroll adjoint: Differentiating SSA-form programs,” CoRR, vol. abs/1810.07951, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[42]

JAX: composable transformations of Python+ NumPy programs,

J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman- Milne et al., “JAX: composable transformations of Python+ NumPy programs,” Version 0.2, vol. 5, pp. 14–24, 2018

work page 2018
[43]

Automatic differentiation in PyTorch,

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” 2017

work page 2017
[44]

Graph-Based Modeling and Decomposition of Energy Infrastructures

S. Shin, C. Coffrin, K. Sundar, and V . M. Zavala, “Graph-based modeling and decomposition of energy infrastructures,” arXiv preprint arXiv:2010.02404, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[45]

Forward-Mode Automatic Differentiation in Julia

J. Revels, M. Lubin, and T. Papamarkou, “Forward-mode automatic differentiation in Julia,” arXiv:1607.07892 [cs.MS], 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[46]

NLPModels.jl: Data structures for optimization models,

D. Orban, A. S. Siqueira, and contributors, “NLPModels.jl: Data structures for optimization models,” https://github.com/ JuliaSmoothOptimizers/NLPModels.jl, July 2020

work page 2020
[47]

Available: https://github.com/sshin23/con-pol-opt-code

[Online]. Available: https://github.com/sshin23/con-pol-opt-code

work page
[48]

J. B. Rawlings, D. Q. Mayne, and M. Diehl, Model predictive control: theory, computation, and design . Nob Hill Publishing Madison, WI, 2017, vol. 2

work page 2017
[49]

Stochastic model predictive control: An overview and perspectives for future research,

A. Mesbah, “Stochastic model predictive control: An overview and perspectives for future research,” IEEE Control Systems Magazine , vol. 36, no. 6, pp. 30–44, 2016

work page 2016
[50]

Stability properties of multi-stage nonlinear model predictive control,

S. Lucia, S. Subramanian, D. Limon, and S. Engell, “Stability properties of multi-stage nonlinear model predictive control,” Systems & Control Letters, vol. 143, p. 104743, 2020

work page 2020
[51]

Scenario-based model predictive control of stochastic constrained linear systems,

D. Bernardini and A. Bemporad, “Scenario-based model predictive control of stochastic constrained linear systems,” in Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference . IEEE, 2009, pp. 6333–6338. Government License: The submitted manuscript has been cre- ated by UChicago Argonne, L...

work page 2009

[1] [1]

Bertsekas, Dynamic programming and optimal control: Volume I

D. Bertsekas, Dynamic programming and optimal control: Volume I . Athena scientiﬁc, 2012, vol. 1

work page 2012

[2] [2]

A note on certainty equivalence in dynamic planning,

H. Theil, “A note on certainty equivalence in dynamic planning,” Econometrica: Journal of the Econometric Society , pp. 346–349, 1957

work page 1957

[3] [3]

Dynamic programming under uncertainty with a quadratic criterion function,

H. A. Simon, “Dynamic programming under uncertainty with a quadratic criterion function,” Econometrica, Journal of the Econometric Society, pp. 74–81, 1956

work page 1956

[4] [4]

Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games,

D. Jacobson, “Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games,” IEEE Transactions on Automatic Control , vol. 18, no. 2, pp. 124–131, 1973

work page 1973

[5] [5]

Decomposition and partitioning methods for multistage stochastic linear programs,

J. R. Birge, “Decomposition and partitioning methods for multistage stochastic linear programs,” Operations research, vol. 33, no. 5, pp. 989–1007, 1985

work page 1985

[6] [6]

Nested decomposition for dynamic models,

J. K. Ho and A. S. Manne, “Nested decomposition for dynamic models,” Mathematical Programming, vol. 6, no. 1, pp. 121–140, 1974

work page 1974

[7] [7]

Applying the progressive hedging algorithm to stochastic generalized networks,

J. M. Mulvey and H. Vladimirou, “Applying the progressive hedging algorithm to stochastic generalized networks,” Annals of Operations Research, vol. 31, no. 1, pp. 399–424, 1991

work page 1991

[8] [8]

Scenarios and policy aggrega- tion in optimization under uncertainty,

R. T. Rockafellar and R. J.-B. Wets, “Scenarios and policy aggrega- tion in optimization under uncertainty,” Mathematics of Operations Research, vol. 16, no. 1, pp. 119–147, 1991

work page 1991

[9] [9]

Multi-stage stochastic optimization applied to energy planning,

M. V . Pereira and L. M. Pinto, “Multi-stage stochastic optimization applied to energy planning,” Mathematical Programming, vol. 52, no. 1, pp. 359–375, 1991

work page 1991

[10] [10]

When to trust your model: Model-based policy optimization,

M. Janner, J. Fu, M. Zhang, and S. Levine, “When to trust your model: Model-based policy optimization,” in Advances in Neural Information Processing Systems, 2019, pp. 12 498–12 509

work page 2019

[11] [11]

Optimization methods for large- scale machine learning,

L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large- scale machine learning,” SIAM Review, vol. 60, no. 2, pp. 223–311, 2018

work page 2018

[12] [12]

Nocedal and S

J. Nocedal and S. J. Wright, Numerical optimization. Springer, 1999

work page 1999

[13] [13]

Global convergence of policy gradient methods for the linear quadratic regulator,

M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International Conference on Machine Learning . PMLR, 2018, pp. 1467–1476

work page 2018

[14] [14]

Direct policy optimization using deterministic sampling and collocation,

T. A. Howell, C. Fu, and Z. Manchester, “Direct policy optimization using deterministic sampling and collocation,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5324–5331, 2021

work page 2021

[15] [15]

Stochastic model predictive control with joint chance constraints,

J. A. Paulson, E. A. Buehler, R. D. Braatz, and A. Mesbah, “Stochastic model predictive control with joint chance constraints,” International Journal of Control , vol. 93, no. 1, pp. 126–139, 2020

work page 2020

[16] [16]

J. R. Birge and F. Louveaux, Introduction to stochastic programming . Springer Science & Business Media, 2011

work page 2011

[17] [17]

Reinforcement learning for selective key applications in power systems: Recent advances and future challenges,

X. Chen, G. Qu, Y . Tang, S. Low, and N. Li, “Reinforcement learning for selective key applications in power systems: Recent advances and future challenges,” IEEE Transactions on Smart Grid , 2022

work page 2022

[18] [18]

Economic opportunities for industrial systems from frequency regulation markets,

A. W. Dowling and V . M. Zavala, “Economic opportunities for industrial systems from frequency regulation markets,” Computers & Chemical Engineering, vol. 114, pp. 254–264, 2018

work page 2018

[19] [19]

Optimal demand response scheduling of an industrial air separation unit using data-driven dynamic models,

C. Tsay, A. Kumar, J. Flores-Cerrillo, and M. Baldea, “Optimal demand response scheduling of an industrial air separation unit using data-driven dynamic models,” Computers & Chemical Engineering , vol. 126, pp. 22–34, 2019

work page 2019

[20] [20]

On differential stability in stochastic programming,

A. Shapiro, “On differential stability in stochastic programming,” Mathematical Programming, vol. 47, no. 1, pp. 107–116, 1990

work page 1990

[21] [21]

On a time consistency concept in risk averse multistage stochastic programming,

——, “On a time consistency concept in risk averse multistage stochastic programming,” Operations Research Letters, vol. 37, no. 3, pp. 143–147, 2009

work page 2009

[22] [22]

Detecting strange attractors in turbulence,

F. Takens, “Detecting strange attractors in turbulence,” in Dynamical systems and turbulence, Warwick 1980 . Springer, 1981, pp. 366–381

work page 1980

[23] [23]

Data-driven model reduction, Wiener projections, and the Koopman-Mori-Zwanzig formalism,

K. K. Lin and F. Lu, “Data-driven model reduction, Wiener projections, and the Koopman-Mori-Zwanzig formalism,” Journal of Computational Physics, vol. 424, p. 109864, 2021. Fig. 1. Closed-loop simulation of PO, LQR, and MPC (nominal). Fig. 2. Closed-loop simulation of PO, LQR, and MPC (noisy). TABLE I PERFORMANCE COMPARISON OF PO, LQR, AND MPC ( NOMINAL )...

work page 2021

[24] [24]

A discrete approach to stochastic parametrization and dimensional reduction in nonlinear dynamics,

A. Chorin and F. Lu, “A discrete approach to stochastic parametrization and dimensional reduction in nonlinear dynamics,” Proceedings of the National Academy of Sciences , 2015

work page 2015

[25] [25]

Operator inference of non-Markovian terms for learning reduced models from partially observed state trajectories,

W. I. T. Uy and B. Peherstorfer, “Operator inference of non-Markovian terms for learning reduced models from partially observed state trajectories,” Journal of Scientiﬁc Computing , vol. 88, no. 3, pp. 1–31, 2021

work page 2021

[26] [26]

A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition,

M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, “A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition,” Journal of Nonlinear Science , vol. 25, no. 6, pp. 1307–1346, 2015

work page 2015

[27] [27]

Camera: A method for cost-aware, adaptive, multiﬁdelity, efﬁcient reliability analysis,

S. A. Renganathan, V . Rao, and I. M. Navon, “Camera: A method for cost-aware, adaptive, multiﬁdelity, efﬁcient reliability analysis,” arXiv preprint arXiv:2203.01436, 2022

work page arXiv 2022

[28] [28]

The t-model as a large eddy simulation model for the Navier–Stokes equations,

A. J. Chandy and S. H. Frankel, “The t-model as a large eddy simulation model for the Navier–Stokes equations,” Multiscale Modeling & Simulation, vol. 8, no. 2, pp. 445–462, 2010

work page 2010

[29] [29]

Physics-based covariance models for Gaussian processes with multiple outputs,

E. Constantinescu and M. Anitescu, “Physics-based covariance models for Gaussian processes with multiple outputs,” International Journal for Uncertainty Quantiﬁcation , vol. 3, no. 1, pp. 47–71, 2013

work page 2013

[30] [30]

D. P. Bertsekas and S. E. Shreve, Stochastic optimal control: the discrete-time case. Athena Scientiﬁc, 1996, vol. 5

work page 1996

[31] [31]

M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014

work page 2014

[32] [32]

The explicit linear quadratic regulator for constrained systems,

A. Bemporad, M. Morari, V . Dua, and E. N. Pistikopoulos, “The explicit linear quadratic regulator for constrained systems,” Automatica, vol. 38, no. 1, pp. 3–20, 2002

work page 2002

[33] [33]

Approximation by superpositions of a sigmoidal function,

G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303–314, 1989

work page 1989

[34] [34]

Asymptotic behavior of optimal solutions in stochastic programming,

A. Shapiro, “Asymptotic behavior of optimal solutions in stochastic programming,” Mathematics of Operations Research , vol. 18, no. 4, pp. 829–845, 1993

work page 1993

[35] [35]

On the rate of convergence of optimal solutions of Monte Carlo approximations of stochastic programs,

A. Shapiro and T. Homem-de Mello, “On the rate of convergence of optimal solutions of Monte Carlo approximations of stochastic programs,” SIAM journal on optimization , vol. 11, no. 1, pp. 70–86, 2000

work page 2000

[36] [36]

Sample average approximation with heavier tails i: non-asymptotic bounds with weak assumptions and stochastic constraints,

R. I. Oliveira and P. Thompson, “Sample average approximation with heavier tails i: non-asymptotic bounds with weak assumptions and stochastic constraints,” Mathematical Programming, pp. 1–48, 2022

work page 2022

[37] [37]

Exponential decay in the sensitivity analysis of nonlinear dynamic programming,

S. Na and M. Anitescu, “Exponential decay in the sensitivity analysis of nonlinear dynamic programming,” SIAM Journal on Optimization , vol. 30, no. 2, pp. 1527–1554, 2020

work page 2020

[38] [38]

Exponential decay of sensitivity in graph-structured nonlinear programs,

S. Shin, M. Anitescu, and V . M. Zavala, “Exponential decay of sensitivity in graph-structured nonlinear programs,” SIAM Journal on Optimization, 2022

work page 2022

[39] [39]

Perturbation- based regret analysis of predictive control in linear time varying systems,

Y . Lin, Y . Hu, G. Shi, H. Sun, G. Qu, and A. Wierman, “Perturbation- based regret analysis of predictive control in linear time varying systems,” Advances in Neural Information Processing Systems , vol. 34, 2021

work page 2021

[40] [40]

Mastering the game of Go without human knowledge,

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017

work page 2017

[41] [41]

Don't Unroll Adjoint: Differentiating SSA-Form Programs

M. Innes, “Don’t unroll adjoint: Differentiating SSA-form programs,” CoRR, vol. abs/1810.07951, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[42] [42]

JAX: composable transformations of Python+ NumPy programs,

J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman- Milne et al., “JAX: composable transformations of Python+ NumPy programs,” Version 0.2, vol. 5, pp. 14–24, 2018

work page 2018

[43] [43]

Automatic differentiation in PyTorch,

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” 2017

work page 2017

[44] [44]

Graph-Based Modeling and Decomposition of Energy Infrastructures

S. Shin, C. Coffrin, K. Sundar, and V . M. Zavala, “Graph-based modeling and decomposition of energy infrastructures,” arXiv preprint arXiv:2010.02404, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[45] [45]

Forward-Mode Automatic Differentiation in Julia

J. Revels, M. Lubin, and T. Papamarkou, “Forward-mode automatic differentiation in Julia,” arXiv:1607.07892 [cs.MS], 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[46] [46]

NLPModels.jl: Data structures for optimization models,

D. Orban, A. S. Siqueira, and contributors, “NLPModels.jl: Data structures for optimization models,” https://github.com/ JuliaSmoothOptimizers/NLPModels.jl, July 2020

work page 2020

[47] [47]

Available: https://github.com/sshin23/con-pol-opt-code

[Online]. Available: https://github.com/sshin23/con-pol-opt-code

work page

[48] [48]

J. B. Rawlings, D. Q. Mayne, and M. Diehl, Model predictive control: theory, computation, and design . Nob Hill Publishing Madison, WI, 2017, vol. 2

work page 2017

[49] [49]

Stochastic model predictive control: An overview and perspectives for future research,

A. Mesbah, “Stochastic model predictive control: An overview and perspectives for future research,” IEEE Control Systems Magazine , vol. 36, no. 6, pp. 30–44, 2016

work page 2016

[50] [50]

Stability properties of multi-stage nonlinear model predictive control,

S. Lucia, S. Subramanian, D. Limon, and S. Engell, “Stability properties of multi-stage nonlinear model predictive control,” Systems & Control Letters, vol. 143, p. 104743, 2020

work page 2020

[51] [51]

Scenario-based model predictive control of stochastic constrained linear systems,

D. Bernardini and A. Bemporad, “Scenario-based model predictive control of stochastic constrained linear systems,” in Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference . IEEE, 2009, pp. 6333–6338. Government License: The submitted manuscript has been cre- ated by UChicago Argonne, L...

work page 2009