MDP modeling for multi-stage stochastic programs

Bernardo K. Pagnoncelli; David P. Morton; Oscar Dowson

arxiv: 2509.22981 · v2 · submitted 2025-09-26 · 💻 cs.LG · math.OC

MDP modeling for multi-stage stochastic programs

David P. Morton , Oscar Dowson , Bernardo K. Pagnoncelli This is my paper

Pith reviewed 2026-05-18 12:47 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords multi-stage stochastic programmingMarkov decision processespolicy graphsdecision-dependent uncertaintystochastic dual dynamic programmingnon-convex optimizationstatistical learning

0 comments

The pith

Multi-stage stochastic programs incorporate MDP features by extending policy graphs to handle decision-dependent uncertainty and limited statistical learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a modeling approach that merges multi-stage stochastic programs with Markov decision process ideas to handle more realistic sequential decisions. It extends policy graphs so transition probabilities can depend on the decisions taken and includes a basic way to incorporate statistical learning from data. Examples of growing complexity illustrate how this works for problems with continuous action and state spaces. New variants of stochastic dual dynamic programming are introduced to find approximate solutions even when the extensions create non-convex problems. A reader would care because this lets modelers capture cases where choices themselves change the probabilities of future events.

Core claim

By extending policy graphs to include decision-dependent uncertainty for one-step transition probabilities as well as a limited form of statistical learning, the approach allows multi-stage stochastic programs to represent structured MDPs with continuous action and state spaces. New variants of stochastic dual dynamic programming are developed as solution methods, with approximations introduced to address the non-convexities that arise.

What carries the argument

Extended policy graphs that encode decision-dependent one-step transition probabilities together with limited statistical learning, solved approximately by new variants of stochastic dual dynamic programming.

If this is right

Structured MDPs with continuous spaces become expressible as multi-stage stochastic programs.
Decision-dependent uncertainty can be modeled directly in the transition structure.
Limited statistical learning integrates into the policy graph without requiring full retraining at each stage.
Approximate solutions remain available through adapted stochastic dual dynamic programming despite non-convexities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This modeling style could improve sequential planning in domains such as inventory control or energy systems where actions alter risk distributions.
The limited learning component might be tested by comparing learned transitions against held-out data on real sequential decision traces.
Similar extensions could be explored for other forms of state-dependent or history-dependent uncertainty beyond one-step transitions.

Load-bearing premise

The new variants of stochastic dual dynamic programming can effectively approximate solutions for the non-convex models that arise from the extended policy graphs with decision-dependent uncertainty.

What would settle it

A concrete numerical test on a problem with known optimal policy where the proposed SDDP variants produce policies whose expected cost deviates by more than a small tolerance from the true optimum under decision-dependent transitions.

Figures

Figures reproduced from arXiv: 2509.22981 by Bernardo K. Pagnoncelli, David P. Morton, Oscar Dowson.

**Figure 1.** Figure 1: A schematic of a node in a policy graph. The incoming physical state is denoted by x and an incoming realization by ωi . The decision rule πi(x, ωi) specifies the control, u, and outgoing physical state, x ′ . We incur a one-step cost Ci(u, x′ , ωi), and transition according to one-step probabilities ϕij (not shown) to a child node, or terminate with probability 1 − P j∈i+ ϕij [PITH_FULL_IMAGE:figures/fu… view at source ↗

**Figure 2.** Figure 2: The policy graph for Example 1. While the problem is formulated with continuous actions and physical state, in some problem variants we instead require integer-valued decisions, e.g., u ∈ Z+ at node 1 and u ∈ {0, 1, . . . , ω} at node 2. xR = 0 min u,x′ 2u s.t. x ′ = x + u u ≥ 0 min u,x′ −5us + 2ub + 0.1x ′ s.t. us ≤ x x ′ = x − us + ub 0 ≤ us ≤ ω ub ≥ 0 ω ϕR,1 = 1 ϕ1,2 = 1 ϕ2,2 = ρ [PITH_FULL_IMAGE:figur… view at source ↗

**Figure 3.** Figure 3: The policy graph for Example 2. The same comments from Example 1 hold regarding continuity of u. is governed by a favorable random variable ωs, and when it’s cloudy, the random demand is instead ωc. The subproblems are the same as in node 2 of Example 2, and the root state is again xR = 0. The graph structure is shown in [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The policy graph structure of Figure [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: The policy graph for Example 3. When the process exits the sunny node, it returns with probability ϕs,s, transitions to the cloudy node with probability ϕs,c, and the process terminates with probability 1 − ϕs,s − ϕs,c. Analogous probabilistic transitions occur out of the cloudy market state. more general model, y could compete with u for limited resources and the set Y may differ between nodes. For the on… view at source ↗

**Figure 6.** Figure 6: The policy graph for Example 4. This example is a variant of the cyclic newsvendor problem in Examples 2 and 3, with random production amounts, along with a binary decision that dictates at which inventory levels we transition to market, i.e., transition to node 2. We can form equivalent models for Example 4 that do not use decision-dependent transition matrices. For example, we could have added y as a sta… view at source ↗

**Figure 7.** Figure 7: The policy graph for Example 5. This variant of Example 4 uses a binary variable y to determine inventory levels at which we should advertise to produce a favorable distribution of demand, ω H d versus ω L d , at the market. Example 6 (Advertising). Consider a Markovian newsvendor problem from Example 3, with the nodes corresponding to high and low distributions of demand rather than sunny and cloudy condi… view at source ↗

**Figure 8.** Figure 8: The policy graph for Example 7. The agent is aware of whether the current node is sunny (yellow) or cloudy (gray), but is unaware of whether model m = 1 (left) or m = 2 (right) is correct. The one-step transition probabilities differ in the left and right halves of the graph, as illustrated by the Φ 1 and Φ 2 labels on arcs. While the supports in the two models are identical, their pmfs differ. The agent o… view at source ↗

**Figure 9.** Figure 9: The policy graph for Example 9. The two dashed nodes form an ambiguity set, and the two solid nodes form an ambiguity set. l ωl ll lr r ωr rl rr ϕR,l = 0.5 ϕR,r = 0.5 yl yr yl yr ρ · (1 − yl − yr) ρ · (1 − yl − yr) [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: The tiger problem formulated as a decision-dependent learning model [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: One simulation trajectory of the cheese producer policy over 50 time-steps. The red crosses [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Violin plots of the distribution of 1000 simulated objective values for Example [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: One hundred simulations of the tiger policy with a false positive rate of 15%. The green circles [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

**Figure 14.** Figure 14: Violin plots of the distribution of 100 simulated objective values for the tiger problem with varying [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

read the original abstract

We study a class of multi-stage stochastic programs, which incorporate modeling features from Markov decision processes (MDPs). This class includes structured MDPs with continuous action and state spaces. We extend policy graphs to include decision-dependent uncertainty for one-step transition probabilities as well as a limited form of statistical learning. We focus on the expressiveness of our modeling approach, illustrating ideas with a series of examples of increasing complexity. As a solution method, we develop new variants of stochastic dual dynamic programming, including approximations to handle non-convexities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends policy graphs to handle decision-dependent uncertainty and limited statistical learning in multi-stage stochastic programs, then proposes SDDP variants with non-convexity approximations.

read the letter

The main thing here is that the authors extend policy graphs to let one-step transition probabilities depend on decisions, while folding in a limited form of statistical learning. They illustrate the modeling with examples of increasing complexity and develop new SDDP variants that include approximations for the non-convex models that result. That is the core contribution on the page. It builds directly on existing SDDP literature without claiming to reinvent the wheel. The modeling side is where the work is strongest. The examples make a reasonable case that this setup can capture more adaptive uncertainty structures than standard multi-stage programs, especially for continuous state and action spaces. The connection to MDP ideas feels natural and the citation pattern looks appropriate for the subfield. The softer part is the algorithmic claim. The abstract notes approximations to handle non-convexities but does not identify their source or describe the form they take, nor does it give error bounds or convergence arguments. If the non-convexities arise from bilinear terms or parameters inside the recourse functions, it is not clear that the usual SDDP cut machinery still produces valid underestimators. That gap matches the stress-test note and would need concrete schemes or numerical checks in the full text to carry the central claim. This is aimed at researchers in stochastic optimization who already use policy graphs or SDDP and want to model decision-dependent uncertainty in applications like energy or supply chains. A reader focused on modeling expressiveness rather than a complete algorithmic package would find usable ideas. I would send it to peer review. The modeling extension is substantive enough to merit referee time even if the approximation details require revision.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a modeling framework that integrates features from Markov decision processes into multi-stage stochastic programs, including structured MDPs with continuous action and state spaces. It extends policy graphs to incorporate decision-dependent uncertainty in one-step transition probabilities along with a limited form of statistical learning. The work emphasizes expressiveness through a series of illustrative examples of increasing complexity and develops new variants of stochastic dual dynamic programming that include approximations to handle resulting non-convexities.

Significance. If the modeling extensions and SDDP variants are shown to be effective with rigorous analysis, the paper could meaningfully increase the expressiveness of multi-stage stochastic programs by allowing decision-dependent uncertainties and learning elements, potentially connecting MDP and stochastic optimization literature. The emphasis on illustrative examples is a constructive element for demonstrating applicability. However, the absence of detailed derivations, identification of non-convexity sources, or convergence properties for the approximations reduces the assessed significance at present.

major comments (1)

The central claim that new SDDP variants can effectively approximate solutions for the non-convex models arising from extended policy graphs with decision-dependent transitions is load-bearing but unsupported by any specification of the non-convexity source (e.g., bilinear terms in kernels or learned parameters in recourse), the form of the approximation (e.g., outer approximation or sampling-based cuts), or convergence/bound-quality guarantees. This directly affects validation of the algorithmic contribution.

minor comments (1)

The abstract would benefit from a brief statement clarifying the precise scope and assumptions of the 'limited form of statistical learning' component to aid reader expectations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for their insightful comments on our manuscript. We provide a point-by-point response to the major comment below and will incorporate revisions to address the concerns raised.

read point-by-point responses

Referee: The central claim that new SDDP variants can effectively approximate solutions for the non-convex models arising from extended policy graphs with decision-dependent transitions is load-bearing but unsupported by any specification of the non-convexity source (e.g., bilinear terms in kernels or learned parameters in recourse), the form of the approximation (e.g., outer approximation or sampling-based cuts), or convergence/bound-quality guarantees. This directly affects validation of the algorithmic contribution.

Authors: We thank the referee for highlighting this important point. The primary source of non-convexity arises from the decision-dependent one-step transition probabilities in the extended policy graphs, which introduce nonlinear dependencies (potentially bilinear when kernels depend linearly on decisions). Our SDDP variants employ sampling-based cut generation combined with outer linearizations to handle these during forward and backward passes. We acknowledge that the current manuscript does not include detailed derivations or formal convergence guarantees, as the emphasis is on modeling expressiveness through illustrative examples rather than full algorithmic analysis. We will revise the paper to add a dedicated subsection explicitly identifying the non-convexity sources, describing the approximation forms used, and discussing bound quality where applicable. revision: yes

Circularity Check

0 steps flagged

No circularity: modeling extensions and algorithmic variants remain independent of fitted inputs or self-referential definitions

full rationale

The paper's core contribution is an extension of policy graphs to decision-dependent transition probabilities and limited statistical learning, illustrated via examples of increasing complexity, followed by development of new SDDP variants that include approximations for resulting non-convexities. No load-bearing step reduces a claimed prediction or result to a parameter fitted from the same data, a self-citation chain, or an ansatz smuggled in by definition. The derivation chain is self-contained against external benchmarks of SDDP and MDP modeling, with the abstract and described approach showing independent content rather than tautological renaming or construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard assumptions from stochastic programming and MDP theory without introducing new free parameters or invented entities based on the abstract description.

axioms (1)

domain assumption Multi-stage stochastic programs can incorporate MDP features such as policy graphs while preserving solvability via extended SDDP methods.
The paper assumes this integration is valid for the class of problems with continuous spaces and decision-dependent uncertainty.

pith-pipeline@v0.9.0 · 5608 in / 1117 out tokens · 47895 ms · 2026-05-18T12:47:24.201442+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop new variants of stochastic dual dynamic programming, including approximations to handle non-convexities.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Decision-dependent one-step transition probabilities introduce non-convexities.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

[1]

Adiga, R.: Optimizing geothermal well planning under reservoir uncertainty with stochastic programming. Ph.D. thesis, University of Auckland (2024)

work page 2024
[2]

Ahmed, S.: Strategic planning under uncertainty: Stochastic integer programming approaches. Ph.D. thesis, University of Illinois at Urbana-Champaign (2000)

work page 2000
[3]

In: 2024 Winter Simulation Conference, pp

Arslan, N., Dowson, O., Morton, D.P.: An SDDP algorithm for multistage stochastic programs with decision-dependent uncertainty. In: 2024 Winter Simulation Conference, pp. 3288–3299 (2024) 24

work page 2024
[4]

In: Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems (2017)

Barto, A.G., Thomas, P.S., Sutton, R.S.: Some recent applications of reinforcement learning. In: Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems (2017)

work page 2017
[5]

Manufacturing & Service Operations Management26(6), 2121–2141 (2024)

Basciftci, B., Ahmed, S., Gebraeel, N.: Adaptive two-stage stochastic programming with an analysis on capacity expansion planning problem. Manufacturing & Service Operations Management26(6), 2121–2141 (2024)

work page 2024
[6]

Princeton University Press, Princeton, NJ (1957)

Bellman, R.: Dynamic programming. Princeton University Press, Princeton, NJ (1957)

work page 1957
[7]

SIAM Review59(1), 65–98 (2017)

Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: A fresh approach to numerical computing. SIAM Review59(1), 65–98 (2017)

work page 2017
[8]

Mathematical Methods of Operations Research98(2), 231–268 (2023)

Bielecki, T.R., Cialenco, I., Ruszczyński, A.: Risk filtering and risk-averse control of Markovian systems subject to model uncertainty. Mathematical Methods of Operations Research98(2), 231–268 (2023)

work page 2023
[9]

Springer Science & Business Media, New York, NY (2011)

Birge, J.R., Louveaux, F.: Introduction to stochastic programming. Springer Science & Business Media, New York, NY (2011)

work page 2011
[10]

Journal of the Royal Statistical Society Series B: Statistical Methodology78(5), 1103–1130 (2016)

Bissiri, P.G., Holmes, C.C., Walker, S.G.: A general framework for updating belief distributions. Journal of the Royal Statistical Society Series B: Statistical Methodology78(5), 1103–1130 (2016)

work page 2016
[11]

Springer, London (2013)

Chang, H.S., Hu, J., Fu, M.C., Marcus, S.I.: Simulation-based algorithms for Markov decision processes. Springer, London (2013)

work page 2013
[12]

Operations Research51(6), 850–865 (2003)

de Farias, D.P., van Roy, B.: The linear programming approach to approximate dynamic programming. Operations Research51(6), 850–865 (2003)

work page 2003
[13]

Networks 76(1), 3–23 (2020)

Dowson, O.: The policy graph decomposition of multistage stochastic programming problems. Networks 76(1), 3–23 (2020)

work page 2020
[14]

INFORMS Journal on Computing33(1), 27–33 (2021)

Dowson, O., Kapelevich, L.: SDDP.jl: a Julia package for stochastic dual dynamic programming. INFORMS Journal on Computing33(1), 27–33 (2021)

work page 2021
[15]

Operations Research Letters48(4), 505–512 (2020)

Dowson, O., Morton, D.P., Pagnoncelli, B.K.: Partially observable multistage stochastic programming. Operations Research Letters48(4), 505–512 (2020)

work page 2020
[16]

In: Proceedings of MME06, University of West Bohemia in Pilsen (2006)

Dupacová, J.: Optimization under exogenous and endogenous uncertainty. In: Proceedings of MME06, University of West Bohemia in Pilsen (2006)

work page 2006
[17]

Elliott, R.J., Aggoun, L., Moore, J.B.: Hidden Markov models: estimation and control, vol. 29. Springer Science & Business Media, New York, NY (2008)

work page 2008
[18]

SIAM Review67, 415–539 (2025)

Füllner, C., Rebennack, S.: Stochastic dual dynamic programming and its variants. SIAM Review67, 415–539 (2025)

work page 2025
[19]

IISE Transactions55(6), 588–601 (2023)

Ghatrani, Z., Ghate, A.: Inverse Markov decision processes with unknown transition probabilities. IISE Transactions55(6), 588–601 (2023)

work page 2023
[20]

Mathematics of Operations Research40(1), 130–145 (2015)

Girardeau, P., Leclère, V., Philpott, A.B.: On the convergence of decomposition methods for multistage stochastic convex programs. Mathematics of Operations Research40(1), 130–145 (2015)

work page 2015
[21]

Mathe- matical programming108(2), 355–394 (2006)

Goel, V., Grossmann, I.E.: A class of stochastic programs with decision dependent uncertainty. Mathe- matical programming108(2), 355–394 (2006)

work page 2006
[22]

In: International Conference on Algorithmic Learning Theory, pp

Grünwald, P.: The safe Bayesian: learning the learning rate via the mixability gap. In: International Conference on Algorithmic Learning Theory, pp. 169–183. Springer (2012)

work page 2012
[23]

SIAM Journal on Optimization26(4), 2468–2494 (2016)

Guigues, V.: Convergence analysis of sampling-based decomposition methods for risk-averse multistage stochastic convex programs. SIAM Journal on Optimization26(4), 2468–2494 (2016)

work page 2016
[24]

Contextual Markov Decision Processes

Hallak, A., Di Castro, D., Mannor, S.: Contextual Markov decision processes. arXiv preprint arXiv:1502.02259 (2015) 25

work page internal anchor Pith review Pith/arXiv arXiv 2015
[25]

Computational Management Science15, 369–395 (2018)

Hellemo, L., Barton, P.I., Tomasgard, A.: Decision-dependent probabilities in stochastic programs with recourse. Computational Management Science15, 369–395 (2018)

work page 2018
[26]

MIT Press, Cambridge, MA (1960)

Howard, R.: Dynamic programming and Markov processes. MIT Press, Cambridge, MA (1960)

work page 1960
[27]

Statistical Science pp

Ibrahim, J.G., Chen, M.H.: Power prior distributions for regression models. Statistical Science pp. 46–60 (2000)

work page 2000
[28]

Artificial Intelligence101, 99–134 (1998)

Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence101, 99–134 (1998)

work page 1998
[29]

Springer (2012)

King, A.J., Wallace, S.W.: Modeling with stochastic programming. Springer (2012)

work page 2012
[30]

European Journal of Operational Research314(2), 792–806 (2024)

Lamas, P., Goycoolea, M., Pagnoncelli, B., Newman, A.: A target-time-windows technique for project scheduling under uncertainty. European Journal of Operational Research314(2), 792–806 (2024)

work page 2024
[31]

Mathematical Programming198(1), 1059–1106 (2023)

Lan, G.: Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes. Mathematical Programming198(1), 1059–1106 (2023)

work page 2023
[32]

European Journal of Operational Research271(3), 1037–1054 (2018)

Lara, C.L., Mallapragada, D.S., Papageorgiou, D.J., Venkatesh, A., Grossmann, I.E.: Deterministic electric power infrastructure planning: Mixed-integer programming model and nested decomposition algorithm. European Journal of Operational Research271(3), 1037–1054 (2018)

work page 2018
[33]

Optimization and Engineering21, 1243–1281 (2020)

Lara, C.L., Siirola, J.D., Grossmann, I.E.: Electric power infrastructure planning under uncertainty: stochastic dual dynamic integer programming (SDDiP) and parallelization scheme. Optimization and Engineering21, 1243–1281 (2020)

work page 2020
[34]

IEEE Access6, 49089–49102 (2018)

Le, T.P., Vien, N.A., Chung, T.: A deep hierarchical reinforcement learning algorithm in partially observable Markov decision processes. IEEE Access6, 49089–49102 (2018)

work page 2018
[35]

In: Proc

Lejeune, M., Margot, F., de Oliveira, A.D.: Chance-constrained programming with decision-dependent uncertainty. In: Proc. Workshop New Directions Stochastic Optim. (2018)

work page 2018
[36]

arXiv preprint arXiv:1908.06973 (2019)

Li, Y.: Reinforcement learning applications. arXiv preprint arXiv:1908.06973 (2019)

work page arXiv 1908
[37]

Mathematical Programming Computation (2023)

Lubin, M., Dowson, O., Dias Garcia, J., Huchette, J., Legat, B., Vielma, J.P.: JuMP 1.0: Recent improvements to a modeling language for mathematical optimization. Mathematical Programming Computation (2023). DOI 10.1007/s12532-023-00239-3

work page doi:10.1007/s12532-023-00239-3 2023
[38]

Mamon, R.S., Elliott, R.J.: Hidden Markov models in finance, vol. 4. Springer, New York, NY (2007)

work page 2007
[39]

INFORMS Journal on Computing22(2), 266–281 (2010)

Maxwell, M.S., Restrepo, M., Henderson, S.G., Topaloglu, H.: Approximate dynamic programming for ambulance redeployment. INFORMS Journal on Computing22(2), 266–281 (2010)

work page 2010
[40]

Journal of the Royal Statistical Society: Series B (Methodological)57(1), 99–118 (1995)

O’Hagan, A.: Fractional Bayes factors for model comparison. Journal of the Royal Statistical Society: Series B (Methodological)57(1), 99–118 (1995)

work page 1995
[41]

Mathematical Programming52, 359–375 (1991)

Pereira, M.V.F., Pinto, L.M.V.G.: Multi-stage stochastic optimization applied to energy planning. Mathematical Programming52, 359–375 (1991)

work page 1991
[42]

Operations Research Letters36, 450–455 (2008)

Philpott, A.B., Guan, Z.: On the convergence of sampling-based methods for multi-stage stochastic linear programs. Operations Research Letters36, 450–455 (2008)

work page 2008
[43]

Wiley Series in Probability and Statistics

Powell, W.B.: Approximate dynamic programming: Solving the curses of dimensionality, 2nd ed edn. Wiley Series in Probability and Statistics. Wiley, Hoboken, N.J (2011)

work page 2011
[44]

In: Bridging data and decisions, pp

Powell, W.B.: Clearing the jungle of stochastic optimization. In: Bridging data and decisions, pp. 109–137. INFORMS (2014)

work page 2014
[45]

John Wiley & Sons, Hoboken, NJ (2014) 26

Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, Hoboken, NJ (2014) 26

work page 2014
[46]

IEEE Transactions on Sustainable Energy13(1), 196–206 (2022)

Rosemberg, A., Street, A., Garcia, J.D., Silva, T., Valladão, D., Dowson, O.: Assessing the cost of network simplifications in hydrothermal power systems. IEEE Transactions on Sustainable Energy13(1), 196–206 (2022)

work page 2022
[47]

Seranilla, B.K.: On the applications of stochastic dual dynamic programming. Ph.D. thesis, Université du Luxembourg, Luxembourg City, Luxembourg (2023)

work page 2023
[48]

SIAM, Philadelphia, PA (2021)

Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on stochastic programming: Modeling and theory. SIAM, Philadelphia, PA (2021)

work page 2021
[49]

In: Proceedings of the 2021 IISE Annual Conference (2021)

Siddig, M., Song, Y., Khademi, A.: Maximum-posterior evaluation for partially observable multistage stochastic programming. In: Proceedings of the 2021 IISE Annual Conference (2021)

work page 2021
[50]

IISE Transactions 53(10), 1124–1139 (2021)

Steimle, L.N., Kaufman, D.L., Denton, B.T.: Multi-model Markov decision processes. IISE Transactions 53(10), 1124–1139 (2021)

work page 2021
[51]

INFORMS Journal on Computing18(1), 31–42 (2006)

Topaloglu, H., Powell, W.B.: Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS Journal on Computing18(1), 31–42 (2006)

work page 2006
[52]

Advances in Neural Information Processing Systems34, 8795–8806 (2021)

Wang, K., Shah, S., Chen, H., Perrault, A., Doshi-Velez, F., Tambe, M.: Learning MDPs from features: Predict-then-optimize for sequential decision making by reinforcement learning. Advances in Neural Information Processing Systems34, 8795–8806 (2021)

work page 2021
[53]

Mathematics of Operations Research38(1), 153–183 (2013)

Wiesemann, W., Kuhn, D., Rustem, B.: Robust Markov decision processes. Mathematics of Operations Research38(1), 153–183 (2013)

work page 2013
[54]

IEEE Systems Journal17(2), 2247–2258 (2022)

Yin, W., Li, Y., Hou, J., Miao, M., Hou, Y.: Coordinated planning of wind power generation and energy storage with decision-dependent uncertainty induced by spatial correlation. IEEE Systems Journal17(2), 2247–2258 (2022)

work page 2022
[55]

Mathematical Program- ming175(1-2), 461–502 (2019) 27

Zou, J., Ahmed, S., Sun, X.A.: Stochastic dual dynamic integer programming. Mathematical Program- ming175(1-2), 461–502 (2019) 27

work page 2019

[1] [1]

Adiga, R.: Optimizing geothermal well planning under reservoir uncertainty with stochastic programming. Ph.D. thesis, University of Auckland (2024)

work page 2024

[2] [2]

Ahmed, S.: Strategic planning under uncertainty: Stochastic integer programming approaches. Ph.D. thesis, University of Illinois at Urbana-Champaign (2000)

work page 2000

[3] [3]

In: 2024 Winter Simulation Conference, pp

Arslan, N., Dowson, O., Morton, D.P.: An SDDP algorithm for multistage stochastic programs with decision-dependent uncertainty. In: 2024 Winter Simulation Conference, pp. 3288–3299 (2024) 24

work page 2024

[4] [4]

In: Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems (2017)

Barto, A.G., Thomas, P.S., Sutton, R.S.: Some recent applications of reinforcement learning. In: Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems (2017)

work page 2017

[5] [5]

Manufacturing & Service Operations Management26(6), 2121–2141 (2024)

Basciftci, B., Ahmed, S., Gebraeel, N.: Adaptive two-stage stochastic programming with an analysis on capacity expansion planning problem. Manufacturing & Service Operations Management26(6), 2121–2141 (2024)

work page 2024

[6] [6]

Princeton University Press, Princeton, NJ (1957)

Bellman, R.: Dynamic programming. Princeton University Press, Princeton, NJ (1957)

work page 1957

[7] [7]

SIAM Review59(1), 65–98 (2017)

Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: A fresh approach to numerical computing. SIAM Review59(1), 65–98 (2017)

work page 2017

[8] [8]

Mathematical Methods of Operations Research98(2), 231–268 (2023)

Bielecki, T.R., Cialenco, I., Ruszczyński, A.: Risk filtering and risk-averse control of Markovian systems subject to model uncertainty. Mathematical Methods of Operations Research98(2), 231–268 (2023)

work page 2023

[9] [9]

Springer Science & Business Media, New York, NY (2011)

Birge, J.R., Louveaux, F.: Introduction to stochastic programming. Springer Science & Business Media, New York, NY (2011)

work page 2011

[10] [10]

Journal of the Royal Statistical Society Series B: Statistical Methodology78(5), 1103–1130 (2016)

Bissiri, P.G., Holmes, C.C., Walker, S.G.: A general framework for updating belief distributions. Journal of the Royal Statistical Society Series B: Statistical Methodology78(5), 1103–1130 (2016)

work page 2016

[11] [11]

Springer, London (2013)

Chang, H.S., Hu, J., Fu, M.C., Marcus, S.I.: Simulation-based algorithms for Markov decision processes. Springer, London (2013)

work page 2013

[12] [12]

Operations Research51(6), 850–865 (2003)

de Farias, D.P., van Roy, B.: The linear programming approach to approximate dynamic programming. Operations Research51(6), 850–865 (2003)

work page 2003

[13] [13]

Networks 76(1), 3–23 (2020)

Dowson, O.: The policy graph decomposition of multistage stochastic programming problems. Networks 76(1), 3–23 (2020)

work page 2020

[14] [14]

INFORMS Journal on Computing33(1), 27–33 (2021)

Dowson, O., Kapelevich, L.: SDDP.jl: a Julia package for stochastic dual dynamic programming. INFORMS Journal on Computing33(1), 27–33 (2021)

work page 2021

[15] [15]

Operations Research Letters48(4), 505–512 (2020)

Dowson, O., Morton, D.P., Pagnoncelli, B.K.: Partially observable multistage stochastic programming. Operations Research Letters48(4), 505–512 (2020)

work page 2020

[16] [16]

In: Proceedings of MME06, University of West Bohemia in Pilsen (2006)

Dupacová, J.: Optimization under exogenous and endogenous uncertainty. In: Proceedings of MME06, University of West Bohemia in Pilsen (2006)

work page 2006

[17] [17]

Elliott, R.J., Aggoun, L., Moore, J.B.: Hidden Markov models: estimation and control, vol. 29. Springer Science & Business Media, New York, NY (2008)

work page 2008

[18] [18]

SIAM Review67, 415–539 (2025)

Füllner, C., Rebennack, S.: Stochastic dual dynamic programming and its variants. SIAM Review67, 415–539 (2025)

work page 2025

[19] [19]

IISE Transactions55(6), 588–601 (2023)

Ghatrani, Z., Ghate, A.: Inverse Markov decision processes with unknown transition probabilities. IISE Transactions55(6), 588–601 (2023)

work page 2023

[20] [20]

Mathematics of Operations Research40(1), 130–145 (2015)

Girardeau, P., Leclère, V., Philpott, A.B.: On the convergence of decomposition methods for multistage stochastic convex programs. Mathematics of Operations Research40(1), 130–145 (2015)

work page 2015

[21] [21]

Mathe- matical programming108(2), 355–394 (2006)

Goel, V., Grossmann, I.E.: A class of stochastic programs with decision dependent uncertainty. Mathe- matical programming108(2), 355–394 (2006)

work page 2006

[22] [22]

In: International Conference on Algorithmic Learning Theory, pp

Grünwald, P.: The safe Bayesian: learning the learning rate via the mixability gap. In: International Conference on Algorithmic Learning Theory, pp. 169–183. Springer (2012)

work page 2012

[23] [23]

SIAM Journal on Optimization26(4), 2468–2494 (2016)

Guigues, V.: Convergence analysis of sampling-based decomposition methods for risk-averse multistage stochastic convex programs. SIAM Journal on Optimization26(4), 2468–2494 (2016)

work page 2016

[24] [24]

Contextual Markov Decision Processes

Hallak, A., Di Castro, D., Mannor, S.: Contextual Markov decision processes. arXiv preprint arXiv:1502.02259 (2015) 25

work page internal anchor Pith review Pith/arXiv arXiv 2015

[25] [25]

Computational Management Science15, 369–395 (2018)

Hellemo, L., Barton, P.I., Tomasgard, A.: Decision-dependent probabilities in stochastic programs with recourse. Computational Management Science15, 369–395 (2018)

work page 2018

[26] [26]

MIT Press, Cambridge, MA (1960)

Howard, R.: Dynamic programming and Markov processes. MIT Press, Cambridge, MA (1960)

work page 1960

[27] [27]

Statistical Science pp

Ibrahim, J.G., Chen, M.H.: Power prior distributions for regression models. Statistical Science pp. 46–60 (2000)

work page 2000

[28] [28]

Artificial Intelligence101, 99–134 (1998)

Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence101, 99–134 (1998)

work page 1998

[29] [29]

Springer (2012)

King, A.J., Wallace, S.W.: Modeling with stochastic programming. Springer (2012)

work page 2012

[30] [30]

European Journal of Operational Research314(2), 792–806 (2024)

Lamas, P., Goycoolea, M., Pagnoncelli, B., Newman, A.: A target-time-windows technique for project scheduling under uncertainty. European Journal of Operational Research314(2), 792–806 (2024)

work page 2024

[31] [31]

Mathematical Programming198(1), 1059–1106 (2023)

Lan, G.: Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes. Mathematical Programming198(1), 1059–1106 (2023)

work page 2023

[32] [32]

European Journal of Operational Research271(3), 1037–1054 (2018)

Lara, C.L., Mallapragada, D.S., Papageorgiou, D.J., Venkatesh, A., Grossmann, I.E.: Deterministic electric power infrastructure planning: Mixed-integer programming model and nested decomposition algorithm. European Journal of Operational Research271(3), 1037–1054 (2018)

work page 2018

[33] [33]

Optimization and Engineering21, 1243–1281 (2020)

Lara, C.L., Siirola, J.D., Grossmann, I.E.: Electric power infrastructure planning under uncertainty: stochastic dual dynamic integer programming (SDDiP) and parallelization scheme. Optimization and Engineering21, 1243–1281 (2020)

work page 2020

[34] [34]

IEEE Access6, 49089–49102 (2018)

Le, T.P., Vien, N.A., Chung, T.: A deep hierarchical reinforcement learning algorithm in partially observable Markov decision processes. IEEE Access6, 49089–49102 (2018)

work page 2018

[35] [35]

In: Proc

Lejeune, M., Margot, F., de Oliveira, A.D.: Chance-constrained programming with decision-dependent uncertainty. In: Proc. Workshop New Directions Stochastic Optim. (2018)

work page 2018

[36] [36]

arXiv preprint arXiv:1908.06973 (2019)

Li, Y.: Reinforcement learning applications. arXiv preprint arXiv:1908.06973 (2019)

work page arXiv 1908

[37] [37]

Mathematical Programming Computation (2023)

Lubin, M., Dowson, O., Dias Garcia, J., Huchette, J., Legat, B., Vielma, J.P.: JuMP 1.0: Recent improvements to a modeling language for mathematical optimization. Mathematical Programming Computation (2023). DOI 10.1007/s12532-023-00239-3

work page doi:10.1007/s12532-023-00239-3 2023

[38] [38]

Mamon, R.S., Elliott, R.J.: Hidden Markov models in finance, vol. 4. Springer, New York, NY (2007)

work page 2007

[39] [39]

INFORMS Journal on Computing22(2), 266–281 (2010)

Maxwell, M.S., Restrepo, M., Henderson, S.G., Topaloglu, H.: Approximate dynamic programming for ambulance redeployment. INFORMS Journal on Computing22(2), 266–281 (2010)

work page 2010

[40] [40]

Journal of the Royal Statistical Society: Series B (Methodological)57(1), 99–118 (1995)

O’Hagan, A.: Fractional Bayes factors for model comparison. Journal of the Royal Statistical Society: Series B (Methodological)57(1), 99–118 (1995)

work page 1995

[41] [41]

Mathematical Programming52, 359–375 (1991)

Pereira, M.V.F., Pinto, L.M.V.G.: Multi-stage stochastic optimization applied to energy planning. Mathematical Programming52, 359–375 (1991)

work page 1991

[42] [42]

Operations Research Letters36, 450–455 (2008)

Philpott, A.B., Guan, Z.: On the convergence of sampling-based methods for multi-stage stochastic linear programs. Operations Research Letters36, 450–455 (2008)

work page 2008

[43] [43]

Wiley Series in Probability and Statistics

Powell, W.B.: Approximate dynamic programming: Solving the curses of dimensionality, 2nd ed edn. Wiley Series in Probability and Statistics. Wiley, Hoboken, N.J (2011)

work page 2011

[44] [44]

In: Bridging data and decisions, pp

Powell, W.B.: Clearing the jungle of stochastic optimization. In: Bridging data and decisions, pp. 109–137. INFORMS (2014)

work page 2014

[45] [45]

John Wiley & Sons, Hoboken, NJ (2014) 26

Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, Hoboken, NJ (2014) 26

work page 2014

[46] [46]

IEEE Transactions on Sustainable Energy13(1), 196–206 (2022)

Rosemberg, A., Street, A., Garcia, J.D., Silva, T., Valladão, D., Dowson, O.: Assessing the cost of network simplifications in hydrothermal power systems. IEEE Transactions on Sustainable Energy13(1), 196–206 (2022)

work page 2022

[47] [47]

Seranilla, B.K.: On the applications of stochastic dual dynamic programming. Ph.D. thesis, Université du Luxembourg, Luxembourg City, Luxembourg (2023)

work page 2023

[48] [48]

SIAM, Philadelphia, PA (2021)

Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on stochastic programming: Modeling and theory. SIAM, Philadelphia, PA (2021)

work page 2021

[49] [49]

In: Proceedings of the 2021 IISE Annual Conference (2021)

Siddig, M., Song, Y., Khademi, A.: Maximum-posterior evaluation for partially observable multistage stochastic programming. In: Proceedings of the 2021 IISE Annual Conference (2021)

work page 2021

[50] [50]

IISE Transactions 53(10), 1124–1139 (2021)

Steimle, L.N., Kaufman, D.L., Denton, B.T.: Multi-model Markov decision processes. IISE Transactions 53(10), 1124–1139 (2021)

work page 2021

[51] [51]

INFORMS Journal on Computing18(1), 31–42 (2006)

Topaloglu, H., Powell, W.B.: Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS Journal on Computing18(1), 31–42 (2006)

work page 2006

[52] [52]

Advances in Neural Information Processing Systems34, 8795–8806 (2021)

Wang, K., Shah, S., Chen, H., Perrault, A., Doshi-Velez, F., Tambe, M.: Learning MDPs from features: Predict-then-optimize for sequential decision making by reinforcement learning. Advances in Neural Information Processing Systems34, 8795–8806 (2021)

work page 2021

[53] [53]

Mathematics of Operations Research38(1), 153–183 (2013)

Wiesemann, W., Kuhn, D., Rustem, B.: Robust Markov decision processes. Mathematics of Operations Research38(1), 153–183 (2013)

work page 2013

[54] [54]

IEEE Systems Journal17(2), 2247–2258 (2022)

Yin, W., Li, Y., Hou, J., Miao, M., Hou, Y.: Coordinated planning of wind power generation and energy storage with decision-dependent uncertainty induced by spatial correlation. IEEE Systems Journal17(2), 2247–2258 (2022)

work page 2022

[55] [55]

Mathematical Program- ming175(1-2), 461–502 (2019) 27

Zou, J., Ahmed, S., Sun, X.A.: Stochastic dual dynamic integer programming. Mathematical Program- ming175(1-2), 461–502 (2019) 27

work page 2019