Residuals-Based Contextual Distributionally Robust Optimization with Decision-Dependent Uncertainty: Theoretical Guarantees and Decomposition Algorithm

Guzin Bayraksan; Qing Zhu; Xian Yu

arxiv: 2406.20004 · v2 · pith:7MEDOLS5new · submitted 2024-06-28 · 🧮 math.OC

Residuals-Based Contextual Distributionally Robust Optimization with Decision-Dependent Uncertainty: Theoretical Guarantees and Decomposition Algorithm

Qing Zhu , Xian Yu , Guzin Bayraksan This is my paper

Pith reviewed 2026-05-23 23:42 UTC · model grok-4.3

classification 🧮 math.OC

keywords distributionally robust optimizationdecision-dependent uncertaintycontextual optimizationWasserstein ambiguity setBenders decompositionregression residualsasymptotic optimalityfinite convergence

0 comments

The pith

The residuals-based contextual DRO model with decision-dependent uncertainty satisfies asymptotic optimality, rate of convergence, and finite sample guarantees under specified conditions, and the Benders decomposition algorithm with nonline

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a distributionally robust optimization model in which uncertainty depends on both observed covariates and the decisions themselves. Parametric or nonparametric regression is used to learn this latent dependency, after which empirical residuals define a nominal distribution around which a Wasserstein ambiguity set is built. Under stated conditions the model delivers asymptotic optimality, convergence rates, and finite-sample guarantees. A specialized Benders decomposition algorithm equipped with nonlinear cuts is shown to reach an optimal solution in finitely many iterations. Numerical experiments illustrate the practical gains from explicitly modeling the decision dependency.

Core claim

By learning decision-dependent uncertainty through regression and centering Wasserstein ambiguity sets on the resulting empirical residuals, the contextual DRO model attains asymptotic optimality, rates of convergence, and finite-sample guarantees. The resulting optimization problem is solved to optimality in a finite number of steps by a Benders decomposition algorithm that generates nonlinear cuts.

What carries the argument

The residuals-based Wasserstein ambiguity set whose nominal distribution depends on both covariates and decisions, solved via Benders decomposition with nonlinear cuts

If this is right

Asymptotic optimality holds for the optimal value and solutions as sample size tends to infinity under the model conditions.
Rates of convergence and finite-sample guarantees are obtained for the robust solutions.
The Benders decomposition algorithm with nonlinear cuts converges to an optimal solution in finitely many iterations.
Numerical experiments confirm that incorporating decision dependency improves performance over models that ignore it.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The finite-convergence property suggests the method can be embedded in repeated or online optimization loops.
Replacing the Wasserstein metric with another divergence while preserving the regression step may retain similar statistical guarantees.
The framework offers a direct route to embed learned residuals from modern regression techniques into robust optimization.

Load-bearing premise

The regression models accurately capture the latent decision dependency in the uncertainty so that the empirical residuals form a valid nominal distribution around which the Wasserstein ambiguity set is constructed.

What would settle it

A dataset or counterexample in which the regression fails to capture the true decision dependency and the claimed asymptotic optimality or finite-sample guarantees do not hold, or the algorithm requires more than finitely many steps.

Figures

Figures reproduced from arXiv: 2406.20004 by Guzin Bayraksan, Qing Zhu, Xian Yu.

**Figure 2.** Figure 2: Out-of-sample cost comparison between ER-D [PITH_FULL_IMAGE:figures/full_fig_p033_2.png] view at source ↗

**Figure 3.** Figure 3: Out-of-sample cost comparison between Algorithm [PITH_FULL_IMAGE:figures/full_fig_p034_3.png] view at source ↗

**Figure 4.** Figure 4: Out-of-sample cost comparison of ER-D3RO-W between OLS, Lasso, and Ridge regression with different sample size n. 6 Conclusion and Future Work In this paper, we considered a contextual stochastic program where the uncertainty could be affected by both covariate information and our decisions. We introduced an empirical residuals framework, where the uncertainty on the prediction is considered in a distribut… view at source ↗

read the original abstract

We consider a residuals-based distributionally robust optimization (DRO) model, where the underlying uncertainty depends on both covariate information and our decisions. We adopt both parametric and nonparametric regression models to learn the latent decision dependency and construct a nominal distribution (thereby ambiguity sets) around the learned model using empirical residuals from the regressions. We formulate the ambiguity set via the Wasserstein distance, where the nominal distribution is both decision- and covariate-dependent. We provide conditions under which desired statistical properties such as asymptotic optimality, rate of convergence, and finite sample guarantees are satisfied. To solve the resulting DRO model, we develop a specialized Bender's decomposition algorithm with nonlinear cuts and prove its finite convergence. Through numerical experiments, we illustrate the effectiveness of our approach and the benefits of integrating decision dependency into a residuals-based DRO framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a residuals-based contextual DRO model with decision-dependent uncertainty via Wasserstein balls around regression residuals and proves finite convergence for its Benders algorithm, but the statistical guarantees require correct regression specification.

read the letter

The core advance is extending residuals-based contextual DRO to cases where the uncertainty law depends on the decision variable itself. They learn that dependence with parametric or nonparametric regression, form a nominal distribution from the residuals, wrap it in a Wasserstein ambiguity set that is both covariate- and decision-dependent, and give conditions for asymptotic optimality, convergence rates, and finite-sample guarantees. They also supply a Benders decomposition with nonlinear cuts and prove it terminates in finite steps. That combination of decision dependency, residuals construction, and the specialized algorithm looks new relative to prior contextual DRO work. The finite-convergence result for the algorithm is a concrete plus if the cuts are implemented correctly. The numerical experiments are said to illustrate the value of including decision dependency, which is the right place to check practical payoff. The main limitation is that the statistical claims hold only when the chosen regressor fully recovers the latent map from covariates and decisions to the conditional distribution of uncertainty. Any misspecification leaves systematic bias in the residuals, so the true distribution need not lie inside the ambiguity set even asymptotically. The paper states the results under “conditions,” but those conditions amount to correct specification. That is a standard modeling assumption rather than a hidden flaw, yet it limits how far the robustness interpretation travels when the regression is imperfect. This work is aimed at researchers in distributionally robust optimization and stochastic programming who deal with decision-dependent uncertainty in operations research settings. A reader already working on contextual or residuals-based DRO will get the most out of it. The paper shows clear engagement with the relevant literature and supplies both theory and an implementable algorithm, so it deserves a serious referee even if revisions are needed on the assumption discussion and experiment details.

Referee Report

1 major / 2 minor

Summary. The paper introduces a residuals-based contextual DRO framework in which uncertainty depends on both covariates and decisions. Parametric or nonparametric regression is used to learn the decision-dependent map; empirical residuals then form a nominal distribution around which a Wasserstein ambiguity set is centered. Under stated conditions the authors claim asymptotic optimality, convergence rates, and finite-sample guarantees for the resulting DRO problem. They also develop a Benders decomposition algorithm with nonlinear cuts that is proved to converge in finitely many steps and illustrate the method numerically.

Significance. If the derivations are correct, the work supplies a principled way to incorporate decision-dependent uncertainty into DRO via regression residuals, together with non-asymptotic guarantees and an algorithm whose finite termination is a concrete algorithmic contribution. The explicit conditioning of all statistical claims on correct specification of the regression model is a strength that keeps the robustness interpretation well-defined.

major comments (1)

[§4] §4 (statistical analysis): the finite-sample and asymptotic guarantees are stated to hold under 'conditions' that include correct specification of the regression model for the latent map from (covariates, decisions) to the conditional law of uncertainty. The manuscript should add an explicit remark (perhaps a dedicated paragraph or remark box) clarifying that, when this condition fails, the Wasserstein ball is centered at the wrong location and the DRO solution loses its out-of-sample robustness guarantee; this is load-bearing for the central claim.

minor comments (2)

[Abstract] Abstract, line 8: 'Bender's decomposition' should read 'Benders decomposition'.
[§3] Notation for the decision-dependent nominal distribution and the radius of the Wasserstein ball should be introduced once in §3 and used consistently thereafter to avoid re-definition in later sections.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the constructive suggestion regarding the statistical guarantees. We address the single major comment below and will incorporate the requested clarification.

read point-by-point responses

Referee: [§4] §4 (statistical analysis): the finite-sample and asymptotic guarantees are stated to hold under 'conditions' that include correct specification of the regression model for the latent map from (covariates, decisions) to the conditional law of uncertainty. The manuscript should add an explicit remark (perhaps a dedicated paragraph or remark box) clarifying that, when this condition fails, the Wasserstein ball is centered at the wrong location and the DRO solution loses its out-of-sample robustness guarantee; this is load-bearing for the central claim.

Authors: We agree that an explicit clarification is warranted and will strengthen the manuscript. In the revised version we will add a new dedicated Remark 4.3 immediately after the statement of the main statistical results (following Theorem 4.2). The remark will read: 'All finite-sample and asymptotic guarantees in this section are derived under the maintained assumption of correct specification of the regression model. When this assumption is violated, the empirical residuals do not converge to the true conditional law of the uncertainty; consequently the Wasserstein ball is centered at an incorrect nominal distribution and the out-of-sample robustness interpretation of the DRO solution no longer holds.' This addition makes the dependence on correct specification fully transparent while leaving the existing proofs unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity; statistical guarantees derived from standard residual and Wasserstein properties under explicit assumptions

full rationale

The paper states conditions under which asymptotic optimality, convergence rates, and finite-sample guarantees hold for the residuals-based contextual DRO model. These conditions require the regression (parametric or nonparametric) to accurately recover the latent decision dependency so that empirical residuals form a valid nominal distribution; the Wasserstein ambiguity set and its properties then follow from established theory on residuals and optimal transport. The Benders decomposition with nonlinear cuts is shown to converge in finite steps via direct proof. No step reduces a claimed prediction or result to a fitted input by construction, no load-bearing self-citation chain appears, and the derivation remains self-contained against external benchmarks on Wasserstein DRO and regression residuals. The reader's circularity score of 2.0 is consistent with this assessment.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on regression models accurately learning decision dependency and standard properties of the Wasserstein metric for ambiguity sets; free parameters include the ambiguity radius and regression coefficients fitted from data.

free parameters (2)

Wasserstein ambiguity radius
Controls the size of the ambiguity set around the nominal distribution and is typically chosen or tuned.
Regression model parameters
Coefficients or hyperparameters of parametric/nonparametric regressions fitted to data to learn decision dependency.

axioms (2)

standard math Wasserstein distance defines a valid metric on probability distributions
Invoked to construct the ambiguity set around the decision- and covariate-dependent nominal distribution.
domain assumption Regression residuals provide a valid empirical basis for the nominal distribution
Assumes the learned regression captures the latent dependency sufficiently for residuals to represent uncertainty.

pith-pipeline@v0.9.0 · 5671 in / 1511 out tokens · 27491 ms · 2026-05-23T23:42:29.545714+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt both parametric and nonparametric regression models to learn the latent decision dependency and construct a nominal distribution (thereby ambiguity sets) around the learned model using empirical residuals from the regressions. We formulate the ambiguity set via the Wasserstein distance...
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We provide conditions under which desired statistical properties such as asymptotic optimality, rate of convergence, and finite sample guarantees are satisfied.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · 1 internal anchor

[1]

Ban, G.-Y., Gallien, J., and Mersereau, A. J. (2019). Dyn amic procurement of new products with covariate information: The residual tree method. Manufacturing & Service Operations Management, 21(4):798–815

work page 2019
[2]

and Rudin, C

Ban, G.-Y. and Rudin, C. (2019). The big data newsvendor: Practical insights from machine learning. Operations Research, 67(1):90–108

work page 2019
[3]

Basciftci, B., Ahmed, S., and Shen, S. (2021). Distribut ionally robust facility location prob- lem under decision-dependent stochastic demand. European Journal of Operational Research , 292(2):548–561

work page 2021
[4]

and Love, D

Bayraksan, G. and Love, D. K. (2015). Data-driven stocha stic programming using phi- divergences. In The operations research revolution , pages 1–19. Informs

work page 2015
[5]

Ben-Tal, A., Den Hertog, D., De Waegenaere, A., Melenber g, B., and Rennen, G. (2013). Robust solutions of optimization problems aﬀected by uncert ain probabilities. Management Science, 59(2):341–357

work page 2013
[6]

and Dunn, J

Bertsimas, D. and Dunn, J. (2017). Optimal classiﬁcatio n trees. Machine Learning, 106:1039– 1082

work page 2017
[7]

and Kallus, N

Bertsimas, D. and Kallus, N. (2020). From predictive to p rescriptive analytics. Management Science, 66(3):1025–1044

work page 2020
[8]

and Koduri, N

Bertsimas, D. and Koduri, N. (2022). Data-driven optimi zation: A reproducing kernel hilbert space approach. Operations Research, 70(1):454–471

work page 2022
[9]

and McCord, C

Bertsimas, D. and McCord, C. (2019). From predictions to prescriptions in multistage opti- mization problems

work page 2019
[10]

Bertsimas, D., McCord, C., and Sturt, B. (2023a). Dynam ic optimization with side information. European Journal of Operational Research , 304(2):634–651. 35

work page
[11]

Bertsimas, D., Shtern, S., and Sturt, B. (2022). Two-st age sample robust optimization. Oper- ations Research, 70(1):624–640

work page 2022
[12]

Bertsimas, D., Shtern, S., and Sturt, B. (2023b). A data -driven approach to multistage stochas- tic linear optimization. Management Science , 69(1):51–74

work page
[13]

and Van Parys, B

Bertsimas, D. and Van Parys, B. (2022). Bootstrap robus t prescriptive analytics. Mathematical Programming, 195(1):39–78

work page 2022
[14]

and Devroye, L

Biau, G. and Devroye, L. (2015). Lectures on the nearest neighbor method , volume 246. Springer

work page 2015
[15]

Birge, J. R. and Louveaux, F. (2011). Introduction to Stochastic Programming . Springer Science & Business Media

work page 2011
[16]

Breiman, L. (2017). Classiﬁcation and Regression Trees. Routledge

work page 2017
[17]

B., and Wegkamp, M

Bunea, F., Tsybakov, A. B., and Wegkamp, M. H. (2007). Sp arsity oracle inequalities for the lasso. Electronic Journal of Statistics , 1:169–194

work page 2007
[18]

Chatterjee, S. (2014). Assumptionless consistency of the lasso

work page 2014
[19]

Chen, X., Sim, M., Simchi-Levi, D., and Sun, P. (2007). R isk aversion in inventory management. Operations Research, 55(5):828–842

work page 2007
[20]

and Simchi-Levi, D

Chen, X. and Simchi-Levi, D. (2004). Coordinating inve ntory control and pricing strategies with random demand and ﬁxed ordering cost: The ﬁnite horizon case. Operations Research, 52(6):887–896

work page 2004
[21]

and Ye, Y

Delage, E. and Ye, Y. (2010). Distributionally robust o ptimization under moment uncertainty with application to data-driven problems. Operations Research, 58(3):595–612

work page 2010
[22]

and Anitescu, M

Dou, X. and Anitescu, M. (2019). Distributionally robu st optimization with correlated data from vector autoregressive processes. Operations Research Letters, 47(4):294–299

work page 2019
[23]

N., Grigas, P., and Tewa ri, A

El Balghiti, O., Elmachtoub, A. N., Grigas, P., and Tewa ri, A. (2019). Generalization bounds in the predict-then-optimize framework. Advances in Neural Information Processing Systems , 32

work page 2019
[24]

predict , then optimize

Elmachtoub, A. N. and Grigas, P. (2022). Smart “predict , then optimize”. Management Science, 68(1):9–26

work page 2022
[25]

Esfahani, P. M. and Kuhn, D. (2018). Data-driven distri butionally robust optimization using the Wasserstein metric: Performance guarantees and tracta ble reformulations. Mathematical Programming, 171(1):115–166. 36

work page 2018
[26]

and Morales, J

Esteban-P´ erez, A. and Morales, J. M. (2022). Distribu tionally robust stochastic programs with side information based on trimmings. Mathematical Programming, 195(1):1069–1105

work page 2022
[27]

Estes, A. S. and Richard, J.-P. P. (2023). Smart predict -then-optimize for two-stage linear programs with side information. INFORMS Journal on Optimization , 5(3):295–320

work page 2023
[28]

and Junca, M

Fonseca, D. and Junca, M. (2023). Decision-dependent d istributionally robust optimization. arXiv preprint arXiv:2303.03971

work page arXiv 2023
[29]

and Guillin, A

Fournier, N. and Guillin, A. (2015). On the rate of conve rgence in Wasserstein distance of the empirical measure. Probability theory and related ﬁelds , 162(3-4):707–738

work page 2015
[30]

Gao, R. (2023). Finite-sample guarantees for Wasserst ein distributionally robust optimization: Breaking the curse of dimensionality. Operations Research, 71(6):2291–2306

work page 2023
[31]

and Kleywegt, A

Gao, R. and Kleywegt, A. (2023). Distributionally robu st stochastic optimization with Wasser- stein distance. Mathematics of Operations Research , 48(2):603–655

work page 2023
[32]

and Grossmann, I

Goel, V. and Grossmann, I. E. (2006). A class of stochast ic programs with decision dependent uncertainty. Mathematical Programming, 108(2):355–394

work page 2006
[33]

Hanasusanto, G. A. and Kuhn, D. (2013). Robust data-dri ven dynamic programming. In Advances in Neural Information Processing Systems , pages 827–835

work page 2013
[34]

Hanasusanto, G. A. and Kuhn, D. (2018). Conic programmi ng reformulations of two-stage distributionally robust linear programs over Wasserstein balls. Operations Research, 66(3):849– 869

work page 2018
[35]

H., and Friedm an, J

Hastie, T., Tibshirani, R., Friedman, J. H., and Friedm an, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Predicti on, volume 2. Springer

work page 2009
[36]

I., and Tomasgard, A

Hellemo, L., Barton, P. I., and Tomasgard, A. (2018). De cision-dependent probabilities in stochastic programs with recourse. Computational Management Science , 15(3):1619–6988

work page 2018
[37]

M., and Zhang, T

Hsu, D., Kakade, S. M., and Zhang, T. (2012). Random desi gn analysis of ridge regression. In Conference on Learning Theory , pages 9–1. JMLR Workshop and Conference Proceedings

work page 2012
[38]

and Guan, Y

Jiang, R. and Guan, Y. (2018). Risk-averse two-stage st ochastic program with distributional ambiguity. Operations Research, 66(5):1390–1405

work page 2018
[39]

Kallenberg, O. (1997). Foundations of Modern Probability , volume 2. Springer

work page 1997
[40]

and Mao, X

Kallus, N. and Mao, X. (2023). Stochastic optimization forests. Management Science , 69(4):1975–1994

work page 2023
[41]

Kannan, R., Bayraksan, G., and Luedtke, J. R. (2022). Da ta-driven sample average approxi- mation with covariate information. arXiv preprint arXiv:2207.13554 . 37

work page arXiv 2022
[42]

Kannan, R., Bayraksan, G., and Luedtke, J. R. (2023). Re siduals-based distributionally robust optimization with covariate information. Mathematical Programming, pages 1–57

work page 2023
[43]

M., Nguyen, V

Kuhn, D., Esfahani, P. M., Nguyen, V. A., and Shaﬁeezade h-Abadeh, S. (2019). Wasserstein distributionally robust optimization: Theory and applica tions in machine learning. In Tutorials in Operations Research: Operations Research & Management S cience in The Age of Analytics , pages 130–166. INFORMS

work page 2019
[44]

Lee, S., Homem-de Mello, T., and Kleywegt, A. J. (2012). Newsvendor-type models with decision-dependent uncertainty. Mathematical Methods of Operations Research , 76:189–221

work page 2012
[45]

Liu, J., Li, G., and Sen, S. (2022). Coupled learning ena bled stochastic programming with endogenous uncertainty. Mathematics of Operations Research , 47(2):1681–1705

work page 2022
[46]

and Zhang, Z

Liu, W. and Zhang, Z. (2023). Solving data-driven newsv endor pricing problems with decision- dependent eﬀect. arXiv preprint arXiv:2304.13924

work page arXiv 2023
[47]

and Mehrotra, S

Luo, F. and Mehrotra, S. (2020). Distributionally robu st optimization with decision dependent ambiguity sets. Optimization Letters , 14(8):2565–2594

work page 2020
[48]

McCormick, G. P. (1976). Computability of global solut ions to factorable nonconvex programs: Part i—convex underestimating problems. Mathematical Programming, 10(1):147–175

work page 1976
[49]

and Papp, D

Mehrotra, S. and Papp, D. (2014). A cutting surface algo rithm for semi-inﬁnite convex pro- gramming with an application to moment robust optimization . SIAM Journal on Optimization , 24(4):1670–1697

work page 2014
[50]

and Kuhn, D

Mohajerin Esfahani, P. and Kuhn, D. (2018). Data-drive n distributionally robust optimization using the Wasserstein metric: Performance guarantees and t ractable reformulations. Mathemat- ical Programming, 171(1-2):115–166

work page 2018
[51]

Nadaraya, E. A. (1964). On estimating regression. Theory of Probability & Its Applications , 9(1):141–142

work page 1964
[52]

A., Zhang, F., Blanchet, J., Delage, E., and Y e, Y

Nguyen, V. A., Zhang, F., Blanchet, J., Delage, E., and Y e, Y. (2020). Distributionally robust local non-parametric conditional estimation. Advances in Neural Information Processing Systems, 33:15232–15242

work page 2020
[53]

and Sharma, K

Nohadani, O. and Sharma, K. (2018). Optimization under decision-dependent uncertainty. SIAM Journal on Optimization , 28(2):1773–1795

work page 2018
[54]

Noyan, N., Rudolf, G., and Lejeune, M. (2022). Distribu tionally robust optimization under a decision-dependent ambiguity set with applications to ma chine scheduling and humanitarian logistics. INFORMS Journal on Computing , 34(2):729–751

work page 2022
[55]

V., and Tak´ aˇ c, M

Oroojlooyjadid, A., Snyder, L. V., and Tak´ aˇ c, M. (202 0). Applying deep learning to the newsvendor problem. IISE Transactions, 52(4):444–463. 38

work page
[56]

Petruzzi, N. C. and Dada, M. (1999). Pricing and the news vendor problem: A review with extensions. Operations Research, 47(2):183–194

work page 1999
[57]

Poss, M. (2013). Robust combinatorial optimization wi th variable budgeted uncertainty. 4OR, 11:75–92

work page 2013
[58]

and Shen, Z.-J

Qi, M. and Shen, Z.-J. (2022). Integrating prediction/ estimation and optimization with appli- cations in operations management. In Tutorials in Operations Research: Emerging and Impactful Topics in Operations , pages 36–58. INFORMS

work page 2022
[59]

M., and Zheng, Z

Qi, M., Shen, Z.-J. M., and Zheng, Z. (2024). Learning ne wsvendor problem with intertemporal dependence and moderate non-stationarities. Production and Operations Management

work page 2024
[60]

and Mehrotra, S

Rahimian, H. and Mehrotra, S. (2022). Frameworks and re sults in distributionally robust optimization. Open Journal of Mathematical Optimization , 3:1–85

work page 2022
[61]

and H¨ utter, J.-C

Rigollet, P. and H¨ utter, J.-C. (2017). High dimension al statistics. Lecture Notes for MIT’s 18.657 Course. URL: http://www-math.mit.edu/ ~rigollet/PDFs/RigNotes17.pdf

work page 2017
[62]

T., Uryasev, S., et al

Rockafellar, R. T., Uryasev, S., et al. (2000). Optimiz ation of conditional value-at-risk. Journal of risk , 2:21–42

work page 2000
[63]

Sadana, U., Chenreddy, A., Delage, E., Forel, A., Freji nger, E., and Vidal, T. (2024). A survey of contextual optimization methods for decision-making un der uncertainty. European Journal of Operational Research, pages 1–19. Article in press

work page 2024
[64]

and Deng, Y

Sen, S. and Deng, Y. (2022). Predictive stochastic prog ramming. Computational Management Science, 19:1–45

work page 2022
[65]

Shapiro, A., Dentcheva, D., and Ruszczynski, A. (2021) . Lectures on Stochastic Programming: Modeling and Theory . SIAM

work page 2021
[66]

Trillos, N. G. and Slepˇ cev, D. (2015). On the rate of con vergence of empirical measures in ∞ -transportation distance. Canadian Journal of Mathematics , 67(6):1358–1383

work page 2015
[67]

Vayanos, P., Georghiou, A., and Yu, H. (2020). Robust op timization with decision-dependent information discovery. arXiv preprint arXiv:2004.08490

work page arXiv 2020
[68]

Vayanos, P., Kuhn, D., and Rustem, B. (2011). Decision r ules for information discovery in multi-stage stochastic programming. In 2011 50th IEEE Conference on Decision and Control and European Control Conference , pages 7368–7373. IEEE

work page 2011
[69]

Villani, C. et al. (2009). Optimal Transport: Old and New , volume 338. Springer

work page 2009
[70]

Watson, G. S. (1964). Smooth regression analysis. Sankhy¯ a: The Indian Journal of Statistics, Series A , pages 359–372. 39

work page 1964
[71]

Webster, M., Santen, N., and Parpas, P. (2012). An appro ximate dynamic programming frame- work for modeling global climate policy under decision-dep endent uncertainty. Computational Management Science , 9:339–362

work page 2012
[72]

White, H. (2014). Asymptotic theory for econometricians . Academic press

work page 2014
[73]

Xie, W. (2020). Tractable reformulations of two-stage distributionally robust linear programs over the type- ∞ Wasserstein ball. Operations Research Letters, 48(4):513–523

work page 2020
[74]

Yang, J., Zhang, L., Chen, N., Gao, R., and Hu, M. (2022). Decision-making with side information: A causal transport robust approach. Optimization Online. URL: https://optimization-online.org/?p=20639, pages=1–40

work page 2022
[75]

and Shen, S

Yu, X. and Shen, S. (2022). Multistage distributionall y robust mixed-integer programming with decision-dependent moment-based ambiguity sets. Mathematical Programming, 196(1):1025– 1064

work page 2022
[76]

Zhang, L., Yang, J., and Gao, R. (2023). Optimal robust p olicy for feature-based newsvendor. Management Science

work page 2023
[77]

Zhang, Y., Jiang, R., and Shen, S. (2018). Ambiguous cha nce-constrained binary programs under mean-covariance information. SIAM Journal on Optimization , 28(4):2922–2944. A Omitted proofs A.1 Proof of Proposition 1 Proof. By triangle inequality, we have dW, p ( ˆP ER n (x, z ), P Y |X=x,Z =z) ≤ dW, p ( ˆP ER n (x, z ), P ∗ n (x, z )) + dW, p (P ∗ n (x, z...

work page 2018
[78]

Residuals-Based Contextual Distributionally Robust Optimization with Decision-Dependent Uncertainty: Theoretical Guarantees and Decomposition Algorithm

satisﬁes 2L1(z∗(x))κ(1) p,n (α, x ) ≤ κ 2 . Furthermore, by Eq. ( 15), we know for a.e. x ∈ X , if α ≥ c1(exp(− c2n( κ 4L1(z∗(x)) )1/s ) with s = min {p/d y, 1/ 2} or p/a , then we have 2 L1(z∗(x))κ(2) p,n (α ) ≤ κ 2 . Therefore, there exist positive constants ˜Ω1(κ, x ), ˜ω 1(κ, x ) s.t. the solution of the ER-D 3RO problem ( 6) with risk level α = ˜Ω1(κ...

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Ban, G.-Y., Gallien, J., and Mersereau, A. J. (2019). Dyn amic procurement of new products with covariate information: The residual tree method. Manufacturing & Service Operations Management, 21(4):798–815

work page 2019

[2] [2]

and Rudin, C

Ban, G.-Y. and Rudin, C. (2019). The big data newsvendor: Practical insights from machine learning. Operations Research, 67(1):90–108

work page 2019

[3] [3]

Basciftci, B., Ahmed, S., and Shen, S. (2021). Distribut ionally robust facility location prob- lem under decision-dependent stochastic demand. European Journal of Operational Research , 292(2):548–561

work page 2021

[4] [4]

and Love, D

Bayraksan, G. and Love, D. K. (2015). Data-driven stocha stic programming using phi- divergences. In The operations research revolution , pages 1–19. Informs

work page 2015

[5] [5]

Ben-Tal, A., Den Hertog, D., De Waegenaere, A., Melenber g, B., and Rennen, G. (2013). Robust solutions of optimization problems aﬀected by uncert ain probabilities. Management Science, 59(2):341–357

work page 2013

[6] [6]

and Dunn, J

Bertsimas, D. and Dunn, J. (2017). Optimal classiﬁcatio n trees. Machine Learning, 106:1039– 1082

work page 2017

[7] [7]

and Kallus, N

Bertsimas, D. and Kallus, N. (2020). From predictive to p rescriptive analytics. Management Science, 66(3):1025–1044

work page 2020

[8] [8]

and Koduri, N

Bertsimas, D. and Koduri, N. (2022). Data-driven optimi zation: A reproducing kernel hilbert space approach. Operations Research, 70(1):454–471

work page 2022

[9] [9]

and McCord, C

Bertsimas, D. and McCord, C. (2019). From predictions to prescriptions in multistage opti- mization problems

work page 2019

[10] [10]

Bertsimas, D., McCord, C., and Sturt, B. (2023a). Dynam ic optimization with side information. European Journal of Operational Research , 304(2):634–651. 35

work page

[11] [11]

Bertsimas, D., Shtern, S., and Sturt, B. (2022). Two-st age sample robust optimization. Oper- ations Research, 70(1):624–640

work page 2022

[12] [12]

Bertsimas, D., Shtern, S., and Sturt, B. (2023b). A data -driven approach to multistage stochas- tic linear optimization. Management Science , 69(1):51–74

work page

[13] [13]

and Van Parys, B

Bertsimas, D. and Van Parys, B. (2022). Bootstrap robus t prescriptive analytics. Mathematical Programming, 195(1):39–78

work page 2022

[14] [14]

and Devroye, L

Biau, G. and Devroye, L. (2015). Lectures on the nearest neighbor method , volume 246. Springer

work page 2015

[15] [15]

Birge, J. R. and Louveaux, F. (2011). Introduction to Stochastic Programming . Springer Science & Business Media

work page 2011

[16] [16]

Breiman, L. (2017). Classiﬁcation and Regression Trees. Routledge

work page 2017

[17] [17]

B., and Wegkamp, M

Bunea, F., Tsybakov, A. B., and Wegkamp, M. H. (2007). Sp arsity oracle inequalities for the lasso. Electronic Journal of Statistics , 1:169–194

work page 2007

[18] [18]

Chatterjee, S. (2014). Assumptionless consistency of the lasso

work page 2014

[19] [19]

Chen, X., Sim, M., Simchi-Levi, D., and Sun, P. (2007). R isk aversion in inventory management. Operations Research, 55(5):828–842

work page 2007

[20] [20]

and Simchi-Levi, D

Chen, X. and Simchi-Levi, D. (2004). Coordinating inve ntory control and pricing strategies with random demand and ﬁxed ordering cost: The ﬁnite horizon case. Operations Research, 52(6):887–896

work page 2004

[21] [21]

and Ye, Y

Delage, E. and Ye, Y. (2010). Distributionally robust o ptimization under moment uncertainty with application to data-driven problems. Operations Research, 58(3):595–612

work page 2010

[22] [22]

and Anitescu, M

Dou, X. and Anitescu, M. (2019). Distributionally robu st optimization with correlated data from vector autoregressive processes. Operations Research Letters, 47(4):294–299

work page 2019

[23] [23]

N., Grigas, P., and Tewa ri, A

El Balghiti, O., Elmachtoub, A. N., Grigas, P., and Tewa ri, A. (2019). Generalization bounds in the predict-then-optimize framework. Advances in Neural Information Processing Systems , 32

work page 2019

[24] [24]

predict , then optimize

Elmachtoub, A. N. and Grigas, P. (2022). Smart “predict , then optimize”. Management Science, 68(1):9–26

work page 2022

[25] [25]

Esfahani, P. M. and Kuhn, D. (2018). Data-driven distri butionally robust optimization using the Wasserstein metric: Performance guarantees and tracta ble reformulations. Mathematical Programming, 171(1):115–166. 36

work page 2018

[26] [26]

and Morales, J

Esteban-P´ erez, A. and Morales, J. M. (2022). Distribu tionally robust stochastic programs with side information based on trimmings. Mathematical Programming, 195(1):1069–1105

work page 2022

[27] [27]

Estes, A. S. and Richard, J.-P. P. (2023). Smart predict -then-optimize for two-stage linear programs with side information. INFORMS Journal on Optimization , 5(3):295–320

work page 2023

[28] [28]

and Junca, M

Fonseca, D. and Junca, M. (2023). Decision-dependent d istributionally robust optimization. arXiv preprint arXiv:2303.03971

work page arXiv 2023

[29] [29]

and Guillin, A

Fournier, N. and Guillin, A. (2015). On the rate of conve rgence in Wasserstein distance of the empirical measure. Probability theory and related ﬁelds , 162(3-4):707–738

work page 2015

[30] [30]

Gao, R. (2023). Finite-sample guarantees for Wasserst ein distributionally robust optimization: Breaking the curse of dimensionality. Operations Research, 71(6):2291–2306

work page 2023

[31] [31]

and Kleywegt, A

Gao, R. and Kleywegt, A. (2023). Distributionally robu st stochastic optimization with Wasser- stein distance. Mathematics of Operations Research , 48(2):603–655

work page 2023

[32] [32]

and Grossmann, I

Goel, V. and Grossmann, I. E. (2006). A class of stochast ic programs with decision dependent uncertainty. Mathematical Programming, 108(2):355–394

work page 2006

[33] [33]

Hanasusanto, G. A. and Kuhn, D. (2013). Robust data-dri ven dynamic programming. In Advances in Neural Information Processing Systems , pages 827–835

work page 2013

[34] [34]

Hanasusanto, G. A. and Kuhn, D. (2018). Conic programmi ng reformulations of two-stage distributionally robust linear programs over Wasserstein balls. Operations Research, 66(3):849– 869

work page 2018

[35] [35]

H., and Friedm an, J

Hastie, T., Tibshirani, R., Friedman, J. H., and Friedm an, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Predicti on, volume 2. Springer

work page 2009

[36] [36]

I., and Tomasgard, A

Hellemo, L., Barton, P. I., and Tomasgard, A. (2018). De cision-dependent probabilities in stochastic programs with recourse. Computational Management Science , 15(3):1619–6988

work page 2018

[37] [37]

M., and Zhang, T

Hsu, D., Kakade, S. M., and Zhang, T. (2012). Random desi gn analysis of ridge regression. In Conference on Learning Theory , pages 9–1. JMLR Workshop and Conference Proceedings

work page 2012

[38] [38]

and Guan, Y

Jiang, R. and Guan, Y. (2018). Risk-averse two-stage st ochastic program with distributional ambiguity. Operations Research, 66(5):1390–1405

work page 2018

[39] [39]

Kallenberg, O. (1997). Foundations of Modern Probability , volume 2. Springer

work page 1997

[40] [40]

and Mao, X

Kallus, N. and Mao, X. (2023). Stochastic optimization forests. Management Science , 69(4):1975–1994

work page 2023

[41] [41]

Kannan, R., Bayraksan, G., and Luedtke, J. R. (2022). Da ta-driven sample average approxi- mation with covariate information. arXiv preprint arXiv:2207.13554 . 37

work page arXiv 2022

[42] [42]

Kannan, R., Bayraksan, G., and Luedtke, J. R. (2023). Re siduals-based distributionally robust optimization with covariate information. Mathematical Programming, pages 1–57

work page 2023

[43] [43]

M., Nguyen, V

Kuhn, D., Esfahani, P. M., Nguyen, V. A., and Shaﬁeezade h-Abadeh, S. (2019). Wasserstein distributionally robust optimization: Theory and applica tions in machine learning. In Tutorials in Operations Research: Operations Research & Management S cience in The Age of Analytics , pages 130–166. INFORMS

work page 2019

[44] [44]

Lee, S., Homem-de Mello, T., and Kleywegt, A. J. (2012). Newsvendor-type models with decision-dependent uncertainty. Mathematical Methods of Operations Research , 76:189–221

work page 2012

[45] [45]

Liu, J., Li, G., and Sen, S. (2022). Coupled learning ena bled stochastic programming with endogenous uncertainty. Mathematics of Operations Research , 47(2):1681–1705

work page 2022

[46] [46]

and Zhang, Z

Liu, W. and Zhang, Z. (2023). Solving data-driven newsv endor pricing problems with decision- dependent eﬀect. arXiv preprint arXiv:2304.13924

work page arXiv 2023

[47] [47]

and Mehrotra, S

Luo, F. and Mehrotra, S. (2020). Distributionally robu st optimization with decision dependent ambiguity sets. Optimization Letters , 14(8):2565–2594

work page 2020

[48] [48]

McCormick, G. P. (1976). Computability of global solut ions to factorable nonconvex programs: Part i—convex underestimating problems. Mathematical Programming, 10(1):147–175

work page 1976

[49] [49]

and Papp, D

Mehrotra, S. and Papp, D. (2014). A cutting surface algo rithm for semi-inﬁnite convex pro- gramming with an application to moment robust optimization . SIAM Journal on Optimization , 24(4):1670–1697

work page 2014

[50] [50]

and Kuhn, D

Mohajerin Esfahani, P. and Kuhn, D. (2018). Data-drive n distributionally robust optimization using the Wasserstein metric: Performance guarantees and t ractable reformulations. Mathemat- ical Programming, 171(1-2):115–166

work page 2018

[51] [51]

Nadaraya, E. A. (1964). On estimating regression. Theory of Probability & Its Applications , 9(1):141–142

work page 1964

[52] [52]

A., Zhang, F., Blanchet, J., Delage, E., and Y e, Y

Nguyen, V. A., Zhang, F., Blanchet, J., Delage, E., and Y e, Y. (2020). Distributionally robust local non-parametric conditional estimation. Advances in Neural Information Processing Systems, 33:15232–15242

work page 2020

[53] [53]

and Sharma, K

Nohadani, O. and Sharma, K. (2018). Optimization under decision-dependent uncertainty. SIAM Journal on Optimization , 28(2):1773–1795

work page 2018

[54] [54]

Noyan, N., Rudolf, G., and Lejeune, M. (2022). Distribu tionally robust optimization under a decision-dependent ambiguity set with applications to ma chine scheduling and humanitarian logistics. INFORMS Journal on Computing , 34(2):729–751

work page 2022

[55] [55]

V., and Tak´ aˇ c, M

Oroojlooyjadid, A., Snyder, L. V., and Tak´ aˇ c, M. (202 0). Applying deep learning to the newsvendor problem. IISE Transactions, 52(4):444–463. 38

work page

[56] [56]

Petruzzi, N. C. and Dada, M. (1999). Pricing and the news vendor problem: A review with extensions. Operations Research, 47(2):183–194

work page 1999

[57] [57]

Poss, M. (2013). Robust combinatorial optimization wi th variable budgeted uncertainty. 4OR, 11:75–92

work page 2013

[58] [58]

and Shen, Z.-J

Qi, M. and Shen, Z.-J. (2022). Integrating prediction/ estimation and optimization with appli- cations in operations management. In Tutorials in Operations Research: Emerging and Impactful Topics in Operations , pages 36–58. INFORMS

work page 2022

[59] [59]

M., and Zheng, Z

Qi, M., Shen, Z.-J. M., and Zheng, Z. (2024). Learning ne wsvendor problem with intertemporal dependence and moderate non-stationarities. Production and Operations Management

work page 2024

[60] [60]

and Mehrotra, S

Rahimian, H. and Mehrotra, S. (2022). Frameworks and re sults in distributionally robust optimization. Open Journal of Mathematical Optimization , 3:1–85

work page 2022

[61] [61]

and H¨ utter, J.-C

Rigollet, P. and H¨ utter, J.-C. (2017). High dimension al statistics. Lecture Notes for MIT’s 18.657 Course. URL: http://www-math.mit.edu/ ~rigollet/PDFs/RigNotes17.pdf

work page 2017

[62] [62]

T., Uryasev, S., et al

Rockafellar, R. T., Uryasev, S., et al. (2000). Optimiz ation of conditional value-at-risk. Journal of risk , 2:21–42

work page 2000

[63] [63]

Sadana, U., Chenreddy, A., Delage, E., Forel, A., Freji nger, E., and Vidal, T. (2024). A survey of contextual optimization methods for decision-making un der uncertainty. European Journal of Operational Research, pages 1–19. Article in press

work page 2024

[64] [64]

and Deng, Y

Sen, S. and Deng, Y. (2022). Predictive stochastic prog ramming. Computational Management Science, 19:1–45

work page 2022

[65] [65]

Shapiro, A., Dentcheva, D., and Ruszczynski, A. (2021) . Lectures on Stochastic Programming: Modeling and Theory . SIAM

work page 2021

[66] [66]

Trillos, N. G. and Slepˇ cev, D. (2015). On the rate of con vergence of empirical measures in ∞ -transportation distance. Canadian Journal of Mathematics , 67(6):1358–1383

work page 2015

[67] [67]

Vayanos, P., Georghiou, A., and Yu, H. (2020). Robust op timization with decision-dependent information discovery. arXiv preprint arXiv:2004.08490

work page arXiv 2020

[68] [68]

Vayanos, P., Kuhn, D., and Rustem, B. (2011). Decision r ules for information discovery in multi-stage stochastic programming. In 2011 50th IEEE Conference on Decision and Control and European Control Conference , pages 7368–7373. IEEE

work page 2011

[69] [69]

Villani, C. et al. (2009). Optimal Transport: Old and New , volume 338. Springer

work page 2009

[70] [70]

Watson, G. S. (1964). Smooth regression analysis. Sankhy¯ a: The Indian Journal of Statistics, Series A , pages 359–372. 39

work page 1964

[71] [71]

Webster, M., Santen, N., and Parpas, P. (2012). An appro ximate dynamic programming frame- work for modeling global climate policy under decision-dep endent uncertainty. Computational Management Science , 9:339–362

work page 2012

[72] [72]

White, H. (2014). Asymptotic theory for econometricians . Academic press

work page 2014

[73] [73]

Xie, W. (2020). Tractable reformulations of two-stage distributionally robust linear programs over the type- ∞ Wasserstein ball. Operations Research Letters, 48(4):513–523

work page 2020

[74] [74]

Yang, J., Zhang, L., Chen, N., Gao, R., and Hu, M. (2022). Decision-making with side information: A causal transport robust approach. Optimization Online. URL: https://optimization-online.org/?p=20639, pages=1–40

work page 2022

[75] [75]

and Shen, S

Yu, X. and Shen, S. (2022). Multistage distributionall y robust mixed-integer programming with decision-dependent moment-based ambiguity sets. Mathematical Programming, 196(1):1025– 1064

work page 2022

[76] [76]

Zhang, L., Yang, J., and Gao, R. (2023). Optimal robust p olicy for feature-based newsvendor. Management Science

work page 2023

[77] [77]

Zhang, Y., Jiang, R., and Shen, S. (2018). Ambiguous cha nce-constrained binary programs under mean-covariance information. SIAM Journal on Optimization , 28(4):2922–2944. A Omitted proofs A.1 Proof of Proposition 1 Proof. By triangle inequality, we have dW, p ( ˆP ER n (x, z ), P Y |X=x,Z =z) ≤ dW, p ( ˆP ER n (x, z ), P ∗ n (x, z )) + dW, p (P ∗ n (x, z...

work page 2018

[78] [78]

Residuals-Based Contextual Distributionally Robust Optimization with Decision-Dependent Uncertainty: Theoretical Guarantees and Decomposition Algorithm

satisﬁes 2L1(z∗(x))κ(1) p,n (α, x ) ≤ κ 2 . Furthermore, by Eq. ( 15), we know for a.e. x ∈ X , if α ≥ c1(exp(− c2n( κ 4L1(z∗(x)) )1/s ) with s = min {p/d y, 1/ 2} or p/a , then we have 2 L1(z∗(x))κ(2) p,n (α ) ≤ κ 2 . Therefore, there exist positive constants ˜Ω1(κ, x ), ˜ω 1(κ, x ) s.t. the solution of the ER-D 3RO problem ( 6) with risk level α = ˜Ω1(κ...

work page internal anchor Pith review Pith/arXiv arXiv