pith. sign in

arxiv: 2406.20004 · v2 · pith:7MEDOLS5new · submitted 2024-06-28 · 🧮 math.OC

Residuals-Based Contextual Distributionally Robust Optimization with Decision-Dependent Uncertainty: Theoretical Guarantees and Decomposition Algorithm

Pith reviewed 2026-05-23 23:42 UTC · model grok-4.3

classification 🧮 math.OC
keywords distributionally robust optimizationdecision-dependent uncertaintycontextual optimizationWasserstein ambiguity setBenders decompositionregression residualsasymptotic optimalityfinite convergence
0
0 comments X

The pith

The residuals-based contextual DRO model with decision-dependent uncertainty satisfies asymptotic optimality, rate of convergence, and finite sample guarantees under specified conditions, and the Benders decomposition algorithm with nonline

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a distributionally robust optimization model in which uncertainty depends on both observed covariates and the decisions themselves. Parametric or nonparametric regression is used to learn this latent dependency, after which empirical residuals define a nominal distribution around which a Wasserstein ambiguity set is built. Under stated conditions the model delivers asymptotic optimality, convergence rates, and finite-sample guarantees. A specialized Benders decomposition algorithm equipped with nonlinear cuts is shown to reach an optimal solution in finitely many iterations. Numerical experiments illustrate the practical gains from explicitly modeling the decision dependency.

Core claim

By learning decision-dependent uncertainty through regression and centering Wasserstein ambiguity sets on the resulting empirical residuals, the contextual DRO model attains asymptotic optimality, rates of convergence, and finite-sample guarantees. The resulting optimization problem is solved to optimality in a finite number of steps by a Benders decomposition algorithm that generates nonlinear cuts.

What carries the argument

The residuals-based Wasserstein ambiguity set whose nominal distribution depends on both covariates and decisions, solved via Benders decomposition with nonlinear cuts

If this is right

  • Asymptotic optimality holds for the optimal value and solutions as sample size tends to infinity under the model conditions.
  • Rates of convergence and finite-sample guarantees are obtained for the robust solutions.
  • The Benders decomposition algorithm with nonlinear cuts converges to an optimal solution in finitely many iterations.
  • Numerical experiments confirm that incorporating decision dependency improves performance over models that ignore it.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The finite-convergence property suggests the method can be embedded in repeated or online optimization loops.
  • Replacing the Wasserstein metric with another divergence while preserving the regression step may retain similar statistical guarantees.
  • The framework offers a direct route to embed learned residuals from modern regression techniques into robust optimization.

Load-bearing premise

The regression models accurately capture the latent decision dependency in the uncertainty so that the empirical residuals form a valid nominal distribution around which the Wasserstein ambiguity set is constructed.

What would settle it

A dataset or counterexample in which the regression fails to capture the true decision dependency and the claimed asymptotic optimality or finite-sample guarantees do not hold, or the algorithm requires more than finitely many steps.

Figures

Figures reproduced from arXiv: 2406.20004 by Guzin Bayraksan, Qing Zhu, Xian Yu.

Figure 1
Figure 1. Figure 1: Out-of-sample cost comparison between ER-DD-SAA [PITH_FULL_IMAGE:figures/full_fig_p032_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Out-of-sample cost comparison between ER-D [PITH_FULL_IMAGE:figures/full_fig_p033_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Out-of-sample cost comparison between Algorithm [PITH_FULL_IMAGE:figures/full_fig_p034_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Out-of-sample cost comparison of ER-D3RO-W between OLS, Lasso, and Ridge regression with different sample size n. 6 Conclusion and Future Work In this paper, we considered a contextual stochastic program where the uncertainty could be affected by both covariate information and our decisions. We introduced an empirical residuals framework, where the uncertainty on the prediction is considered in a distribut… view at source ↗
read the original abstract

We consider a residuals-based distributionally robust optimization (DRO) model, where the underlying uncertainty depends on both covariate information and our decisions. We adopt both parametric and nonparametric regression models to learn the latent decision dependency and construct a nominal distribution (thereby ambiguity sets) around the learned model using empirical residuals from the regressions. We formulate the ambiguity set via the Wasserstein distance, where the nominal distribution is both decision- and covariate-dependent. We provide conditions under which desired statistical properties such as asymptotic optimality, rate of convergence, and finite sample guarantees are satisfied. To solve the resulting DRO model, we develop a specialized Bender's decomposition algorithm with nonlinear cuts and prove its finite convergence. Through numerical experiments, we illustrate the effectiveness of our approach and the benefits of integrating decision dependency into a residuals-based DRO framework.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces a residuals-based contextual DRO framework in which uncertainty depends on both covariates and decisions. Parametric or nonparametric regression is used to learn the decision-dependent map; empirical residuals then form a nominal distribution around which a Wasserstein ambiguity set is centered. Under stated conditions the authors claim asymptotic optimality, convergence rates, and finite-sample guarantees for the resulting DRO problem. They also develop a Benders decomposition algorithm with nonlinear cuts that is proved to converge in finitely many steps and illustrate the method numerically.

Significance. If the derivations are correct, the work supplies a principled way to incorporate decision-dependent uncertainty into DRO via regression residuals, together with non-asymptotic guarantees and an algorithm whose finite termination is a concrete algorithmic contribution. The explicit conditioning of all statistical claims on correct specification of the regression model is a strength that keeps the robustness interpretation well-defined.

major comments (1)
  1. [§4] §4 (statistical analysis): the finite-sample and asymptotic guarantees are stated to hold under 'conditions' that include correct specification of the regression model for the latent map from (covariates, decisions) to the conditional law of uncertainty. The manuscript should add an explicit remark (perhaps a dedicated paragraph or remark box) clarifying that, when this condition fails, the Wasserstein ball is centered at the wrong location and the DRO solution loses its out-of-sample robustness guarantee; this is load-bearing for the central claim.
minor comments (2)
  1. [Abstract] Abstract, line 8: 'Bender's decomposition' should read 'Benders decomposition'.
  2. [§3] Notation for the decision-dependent nominal distribution and the radius of the Wasserstein ball should be introduced once in §3 and used consistently thereafter to avoid re-definition in later sections.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and the constructive suggestion regarding the statistical guarantees. We address the single major comment below and will incorporate the requested clarification.

read point-by-point responses
  1. Referee: [§4] §4 (statistical analysis): the finite-sample and asymptotic guarantees are stated to hold under 'conditions' that include correct specification of the regression model for the latent map from (covariates, decisions) to the conditional law of uncertainty. The manuscript should add an explicit remark (perhaps a dedicated paragraph or remark box) clarifying that, when this condition fails, the Wasserstein ball is centered at the wrong location and the DRO solution loses its out-of-sample robustness guarantee; this is load-bearing for the central claim.

    Authors: We agree that an explicit clarification is warranted and will strengthen the manuscript. In the revised version we will add a new dedicated Remark 4.3 immediately after the statement of the main statistical results (following Theorem 4.2). The remark will read: 'All finite-sample and asymptotic guarantees in this section are derived under the maintained assumption of correct specification of the regression model. When this assumption is violated, the empirical residuals do not converge to the true conditional law of the uncertainty; consequently the Wasserstein ball is centered at an incorrect nominal distribution and the out-of-sample robustness interpretation of the DRO solution no longer holds.' This addition makes the dependence on correct specification fully transparent while leaving the existing proofs unchanged. revision: yes

Circularity Check

0 steps flagged

No circularity; statistical guarantees derived from standard residual and Wasserstein properties under explicit assumptions

full rationale

The paper states conditions under which asymptotic optimality, convergence rates, and finite-sample guarantees hold for the residuals-based contextual DRO model. These conditions require the regression (parametric or nonparametric) to accurately recover the latent decision dependency so that empirical residuals form a valid nominal distribution; the Wasserstein ambiguity set and its properties then follow from established theory on residuals and optimal transport. The Benders decomposition with nonlinear cuts is shown to converge in finite steps via direct proof. No step reduces a claimed prediction or result to a fitted input by construction, no load-bearing self-citation chain appears, and the derivation remains self-contained against external benchmarks on Wasserstein DRO and regression residuals. The reader's circularity score of 2.0 is consistent with this assessment.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on regression models accurately learning decision dependency and standard properties of the Wasserstein metric for ambiguity sets; free parameters include the ambiguity radius and regression coefficients fitted from data.

free parameters (2)
  • Wasserstein ambiguity radius
    Controls the size of the ambiguity set around the nominal distribution and is typically chosen or tuned.
  • Regression model parameters
    Coefficients or hyperparameters of parametric/nonparametric regressions fitted to data to learn decision dependency.
axioms (2)
  • standard math Wasserstein distance defines a valid metric on probability distributions
    Invoked to construct the ambiguity set around the decision- and covariate-dependent nominal distribution.
  • domain assumption Regression residuals provide a valid empirical basis for the nominal distribution
    Assumes the learned regression captures the latent dependency sufficiently for residuals to represent uncertainty.

pith-pipeline@v0.9.0 · 5671 in / 1511 out tokens · 27491 ms · 2026-05-23T23:42:29.545714+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · 1 internal anchor

  1. [1]

    Ban, G.-Y., Gallien, J., and Mersereau, A. J. (2019). Dyn amic procurement of new products with covariate information: The residual tree method. Manufacturing & Service Operations Management, 21(4):798–815

  2. [2]

    and Rudin, C

    Ban, G.-Y. and Rudin, C. (2019). The big data newsvendor: Practical insights from machine learning. Operations Research, 67(1):90–108

  3. [3]

    Basciftci, B., Ahmed, S., and Shen, S. (2021). Distribut ionally robust facility location prob- lem under decision-dependent stochastic demand. European Journal of Operational Research , 292(2):548–561

  4. [4]

    and Love, D

    Bayraksan, G. and Love, D. K. (2015). Data-driven stocha stic programming using phi- divergences. In The operations research revolution , pages 1–19. Informs

  5. [5]

    Ben-Tal, A., Den Hertog, D., De Waegenaere, A., Melenber g, B., and Rennen, G. (2013). Robust solutions of optimization problems affected by uncert ain probabilities. Management Science, 59(2):341–357

  6. [6]

    and Dunn, J

    Bertsimas, D. and Dunn, J. (2017). Optimal classificatio n trees. Machine Learning, 106:1039– 1082

  7. [7]

    and Kallus, N

    Bertsimas, D. and Kallus, N. (2020). From predictive to p rescriptive analytics. Management Science, 66(3):1025–1044

  8. [8]

    and Koduri, N

    Bertsimas, D. and Koduri, N. (2022). Data-driven optimi zation: A reproducing kernel hilbert space approach. Operations Research, 70(1):454–471

  9. [9]

    and McCord, C

    Bertsimas, D. and McCord, C. (2019). From predictions to prescriptions in multistage opti- mization problems

  10. [10]

    Bertsimas, D., McCord, C., and Sturt, B. (2023a). Dynam ic optimization with side information. European Journal of Operational Research , 304(2):634–651. 35

  11. [11]

    Bertsimas, D., Shtern, S., and Sturt, B. (2022). Two-st age sample robust optimization. Oper- ations Research, 70(1):624–640

  12. [12]

    Bertsimas, D., Shtern, S., and Sturt, B. (2023b). A data -driven approach to multistage stochas- tic linear optimization. Management Science , 69(1):51–74

  13. [13]

    and Van Parys, B

    Bertsimas, D. and Van Parys, B. (2022). Bootstrap robus t prescriptive analytics. Mathematical Programming, 195(1):39–78

  14. [14]

    and Devroye, L

    Biau, G. and Devroye, L. (2015). Lectures on the nearest neighbor method , volume 246. Springer

  15. [15]

    Birge, J. R. and Louveaux, F. (2011). Introduction to Stochastic Programming . Springer Science & Business Media

  16. [16]

    Breiman, L. (2017). Classification and Regression Trees. Routledge

  17. [17]

    B., and Wegkamp, M

    Bunea, F., Tsybakov, A. B., and Wegkamp, M. H. (2007). Sp arsity oracle inequalities for the lasso. Electronic Journal of Statistics , 1:169–194

  18. [18]

    Chatterjee, S. (2014). Assumptionless consistency of the lasso

  19. [19]

    Chen, X., Sim, M., Simchi-Levi, D., and Sun, P. (2007). R isk aversion in inventory management. Operations Research, 55(5):828–842

  20. [20]

    and Simchi-Levi, D

    Chen, X. and Simchi-Levi, D. (2004). Coordinating inve ntory control and pricing strategies with random demand and fixed ordering cost: The finite horizon case. Operations Research, 52(6):887–896

  21. [21]

    and Ye, Y

    Delage, E. and Ye, Y. (2010). Distributionally robust o ptimization under moment uncertainty with application to data-driven problems. Operations Research, 58(3):595–612

  22. [22]

    and Anitescu, M

    Dou, X. and Anitescu, M. (2019). Distributionally robu st optimization with correlated data from vector autoregressive processes. Operations Research Letters, 47(4):294–299

  23. [23]

    N., Grigas, P., and Tewa ri, A

    El Balghiti, O., Elmachtoub, A. N., Grigas, P., and Tewa ri, A. (2019). Generalization bounds in the predict-then-optimize framework. Advances in Neural Information Processing Systems , 32

  24. [24]

    predict , then optimize

    Elmachtoub, A. N. and Grigas, P. (2022). Smart “predict , then optimize”. Management Science, 68(1):9–26

  25. [25]

    Esfahani, P. M. and Kuhn, D. (2018). Data-driven distri butionally robust optimization using the Wasserstein metric: Performance guarantees and tracta ble reformulations. Mathematical Programming, 171(1):115–166. 36

  26. [26]

    and Morales, J

    Esteban-P´ erez, A. and Morales, J. M. (2022). Distribu tionally robust stochastic programs with side information based on trimmings. Mathematical Programming, 195(1):1069–1105

  27. [27]

    Estes, A. S. and Richard, J.-P. P. (2023). Smart predict -then-optimize for two-stage linear programs with side information. INFORMS Journal on Optimization , 5(3):295–320

  28. [28]

    and Junca, M

    Fonseca, D. and Junca, M. (2023). Decision-dependent d istributionally robust optimization. arXiv preprint arXiv:2303.03971

  29. [29]

    and Guillin, A

    Fournier, N. and Guillin, A. (2015). On the rate of conve rgence in Wasserstein distance of the empirical measure. Probability theory and related fields , 162(3-4):707–738

  30. [30]

    Gao, R. (2023). Finite-sample guarantees for Wasserst ein distributionally robust optimization: Breaking the curse of dimensionality. Operations Research, 71(6):2291–2306

  31. [31]

    and Kleywegt, A

    Gao, R. and Kleywegt, A. (2023). Distributionally robu st stochastic optimization with Wasser- stein distance. Mathematics of Operations Research , 48(2):603–655

  32. [32]

    and Grossmann, I

    Goel, V. and Grossmann, I. E. (2006). A class of stochast ic programs with decision dependent uncertainty. Mathematical Programming, 108(2):355–394

  33. [33]

    Hanasusanto, G. A. and Kuhn, D. (2013). Robust data-dri ven dynamic programming. In Advances in Neural Information Processing Systems , pages 827–835

  34. [34]

    Hanasusanto, G. A. and Kuhn, D. (2018). Conic programmi ng reformulations of two-stage distributionally robust linear programs over Wasserstein balls. Operations Research, 66(3):849– 869

  35. [35]

    H., and Friedm an, J

    Hastie, T., Tibshirani, R., Friedman, J. H., and Friedm an, J. H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Predicti on, volume 2. Springer

  36. [36]

    I., and Tomasgard, A

    Hellemo, L., Barton, P. I., and Tomasgard, A. (2018). De cision-dependent probabilities in stochastic programs with recourse. Computational Management Science , 15(3):1619–6988

  37. [37]

    M., and Zhang, T

    Hsu, D., Kakade, S. M., and Zhang, T. (2012). Random desi gn analysis of ridge regression. In Conference on Learning Theory , pages 9–1. JMLR Workshop and Conference Proceedings

  38. [38]

    and Guan, Y

    Jiang, R. and Guan, Y. (2018). Risk-averse two-stage st ochastic program with distributional ambiguity. Operations Research, 66(5):1390–1405

  39. [39]

    Kallenberg, O. (1997). Foundations of Modern Probability , volume 2. Springer

  40. [40]

    and Mao, X

    Kallus, N. and Mao, X. (2023). Stochastic optimization forests. Management Science , 69(4):1975–1994

  41. [41]

    Kannan, R., Bayraksan, G., and Luedtke, J. R. (2022). Da ta-driven sample average approxi- mation with covariate information. arXiv preprint arXiv:2207.13554 . 37

  42. [42]

    Kannan, R., Bayraksan, G., and Luedtke, J. R. (2023). Re siduals-based distributionally robust optimization with covariate information. Mathematical Programming, pages 1–57

  43. [43]

    M., Nguyen, V

    Kuhn, D., Esfahani, P. M., Nguyen, V. A., and Shafieezade h-Abadeh, S. (2019). Wasserstein distributionally robust optimization: Theory and applica tions in machine learning. In Tutorials in Operations Research: Operations Research & Management S cience in The Age of Analytics , pages 130–166. INFORMS

  44. [44]

    Lee, S., Homem-de Mello, T., and Kleywegt, A. J. (2012). Newsvendor-type models with decision-dependent uncertainty. Mathematical Methods of Operations Research , 76:189–221

  45. [45]

    Liu, J., Li, G., and Sen, S. (2022). Coupled learning ena bled stochastic programming with endogenous uncertainty. Mathematics of Operations Research , 47(2):1681–1705

  46. [46]

    and Zhang, Z

    Liu, W. and Zhang, Z. (2023). Solving data-driven newsv endor pricing problems with decision- dependent effect. arXiv preprint arXiv:2304.13924

  47. [47]

    and Mehrotra, S

    Luo, F. and Mehrotra, S. (2020). Distributionally robu st optimization with decision dependent ambiguity sets. Optimization Letters , 14(8):2565–2594

  48. [48]

    McCormick, G. P. (1976). Computability of global solut ions to factorable nonconvex programs: Part i—convex underestimating problems. Mathematical Programming, 10(1):147–175

  49. [49]

    and Papp, D

    Mehrotra, S. and Papp, D. (2014). A cutting surface algo rithm for semi-infinite convex pro- gramming with an application to moment robust optimization . SIAM Journal on Optimization , 24(4):1670–1697

  50. [50]

    and Kuhn, D

    Mohajerin Esfahani, P. and Kuhn, D. (2018). Data-drive n distributionally robust optimization using the Wasserstein metric: Performance guarantees and t ractable reformulations. Mathemat- ical Programming, 171(1-2):115–166

  51. [51]

    Nadaraya, E. A. (1964). On estimating regression. Theory of Probability & Its Applications , 9(1):141–142

  52. [52]

    A., Zhang, F., Blanchet, J., Delage, E., and Y e, Y

    Nguyen, V. A., Zhang, F., Blanchet, J., Delage, E., and Y e, Y. (2020). Distributionally robust local non-parametric conditional estimation. Advances in Neural Information Processing Systems, 33:15232–15242

  53. [53]

    and Sharma, K

    Nohadani, O. and Sharma, K. (2018). Optimization under decision-dependent uncertainty. SIAM Journal on Optimization , 28(2):1773–1795

  54. [54]

    Noyan, N., Rudolf, G., and Lejeune, M. (2022). Distribu tionally robust optimization under a decision-dependent ambiguity set with applications to ma chine scheduling and humanitarian logistics. INFORMS Journal on Computing , 34(2):729–751

  55. [55]

    V., and Tak´ aˇ c, M

    Oroojlooyjadid, A., Snyder, L. V., and Tak´ aˇ c, M. (202 0). Applying deep learning to the newsvendor problem. IISE Transactions, 52(4):444–463. 38

  56. [56]

    Petruzzi, N. C. and Dada, M. (1999). Pricing and the news vendor problem: A review with extensions. Operations Research, 47(2):183–194

  57. [57]

    Poss, M. (2013). Robust combinatorial optimization wi th variable budgeted uncertainty. 4OR, 11:75–92

  58. [58]

    and Shen, Z.-J

    Qi, M. and Shen, Z.-J. (2022). Integrating prediction/ estimation and optimization with appli- cations in operations management. In Tutorials in Operations Research: Emerging and Impactful Topics in Operations , pages 36–58. INFORMS

  59. [59]

    M., and Zheng, Z

    Qi, M., Shen, Z.-J. M., and Zheng, Z. (2024). Learning ne wsvendor problem with intertemporal dependence and moderate non-stationarities. Production and Operations Management

  60. [60]

    and Mehrotra, S

    Rahimian, H. and Mehrotra, S. (2022). Frameworks and re sults in distributionally robust optimization. Open Journal of Mathematical Optimization , 3:1–85

  61. [61]

    and H¨ utter, J.-C

    Rigollet, P. and H¨ utter, J.-C. (2017). High dimension al statistics. Lecture Notes for MIT’s 18.657 Course. URL: http://www-math.mit.edu/ ~rigollet/PDFs/RigNotes17.pdf

  62. [62]

    T., Uryasev, S., et al

    Rockafellar, R. T., Uryasev, S., et al. (2000). Optimiz ation of conditional value-at-risk. Journal of risk , 2:21–42

  63. [63]

    Sadana, U., Chenreddy, A., Delage, E., Forel, A., Freji nger, E., and Vidal, T. (2024). A survey of contextual optimization methods for decision-making un der uncertainty. European Journal of Operational Research, pages 1–19. Article in press

  64. [64]

    and Deng, Y

    Sen, S. and Deng, Y. (2022). Predictive stochastic prog ramming. Computational Management Science, 19:1–45

  65. [65]

    Shapiro, A., Dentcheva, D., and Ruszczynski, A. (2021) . Lectures on Stochastic Programming: Modeling and Theory . SIAM

  66. [66]

    Trillos, N. G. and Slepˇ cev, D. (2015). On the rate of con vergence of empirical measures in ∞ -transportation distance. Canadian Journal of Mathematics , 67(6):1358–1383

  67. [67]

    Vayanos, P., Georghiou, A., and Yu, H. (2020). Robust op timization with decision-dependent information discovery. arXiv preprint arXiv:2004.08490

  68. [68]

    Vayanos, P., Kuhn, D., and Rustem, B. (2011). Decision r ules for information discovery in multi-stage stochastic programming. In 2011 50th IEEE Conference on Decision and Control and European Control Conference , pages 7368–7373. IEEE

  69. [69]

    Villani, C. et al. (2009). Optimal Transport: Old and New , volume 338. Springer

  70. [70]

    Watson, G. S. (1964). Smooth regression analysis. Sankhy¯ a: The Indian Journal of Statistics, Series A , pages 359–372. 39

  71. [71]

    Webster, M., Santen, N., and Parpas, P. (2012). An appro ximate dynamic programming frame- work for modeling global climate policy under decision-dep endent uncertainty. Computational Management Science , 9:339–362

  72. [72]

    White, H. (2014). Asymptotic theory for econometricians . Academic press

  73. [73]

    Xie, W. (2020). Tractable reformulations of two-stage distributionally robust linear programs over the type- ∞ Wasserstein ball. Operations Research Letters, 48(4):513–523

  74. [74]

    Yang, J., Zhang, L., Chen, N., Gao, R., and Hu, M. (2022). Decision-making with side information: A causal transport robust approach. Optimization Online. URL: https://optimization-online.org/?p=20639, pages=1–40

  75. [75]

    and Shen, S

    Yu, X. and Shen, S. (2022). Multistage distributionall y robust mixed-integer programming with decision-dependent moment-based ambiguity sets. Mathematical Programming, 196(1):1025– 1064

  76. [76]

    Zhang, L., Yang, J., and Gao, R. (2023). Optimal robust p olicy for feature-based newsvendor. Management Science

  77. [77]

    Zhang, Y., Jiang, R., and Shen, S. (2018). Ambiguous cha nce-constrained binary programs under mean-covariance information. SIAM Journal on Optimization , 28(4):2922–2944. A Omitted proofs A.1 Proof of Proposition 1 Proof. By triangle inequality, we have dW, p ( ˆP ER n (x, z ), P Y |X=x,Z =z) ≤ dW, p ( ˆP ER n (x, z ), P ∗ n (x, z )) + dW, p (P ∗ n (x, z...

  78. [78]

    Residuals-Based Contextual Distributionally Robust Optimization with Decision-Dependent Uncertainty: Theoretical Guarantees and Decomposition Algorithm

    satisfies 2L1(z∗(x))κ(1) p,n (α, x ) ≤ κ 2 . Furthermore, by Eq. ( 15), we know for a.e. x ∈ X , if α ≥ c1(exp(− c2n( κ 4L1(z∗(x)) )1/s ) with s = min {p/d y, 1/ 2} or p/a , then we have 2 L1(z∗(x))κ(2) p,n (α ) ≤ κ 2 . Therefore, there exist positive constants ˜Ω1(κ, x ), ˜ω 1(κ, x ) s.t. the solution of the ER-D 3RO problem ( 6) with risk level α = ˜Ω1(κ...