pith. sign in

arxiv: 2604.20147 · v1 · submitted 2026-04-22 · 🧮 math.OC · cs.LG

Robust Out-of-Distribution Stochastic Optimization

Pith reviewed 2026-05-10 00:30 UTC · model grok-4.3

classification 🧮 math.OC cs.LG
keywords out-of-distribution generalizationrobust stochastic optimizationuncertainty setreproducing kernel Hilbert spacemeta-distributionmin-max optimizationdata-driven decision makingnewsvendor problem
0
0 comments X

The pith

Assuming distributions are drawn from a meta-distribution allows construction of a data-driven uncertainty set in RKHS that delivers rigorous out-of-distribution generalization bounds for robust stochastic decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses decision-making when no historical data exists from the actual target distribution. It assumes instead that all available distributions are randomly sampled from an unknown meta-distribution over distributions. From this assumption the authors build a conservative uncertainty set inside a reproducing kernel Hilbert space that encloses plausible future distributions with high probability. This set is then inserted into a min-max stochastic program whose solution inherits explicit generalization guarantees. Numerical tests on newsvendor and portfolio problems show the resulting decisions outperform standard approaches on held-out distributions even when only modest numbers of source distributions are available.

Core claim

Under the randomness assumption on distribution generation, the framework learns a data-driven uncertainty set in RKHS whose radius can be tuned for adjustable conservatism; the corresponding min-max stochastic program then produces decisions whose out-of-distribution performance is bounded by explicit generalization inequalities that hold simultaneously for the uncertainty set itself and for the obtained solution.

What carries the argument

The data-driven uncertainty set constructed in a reproducing kernel Hilbert space from relevant source distributions, embedded inside a min-max stochastic program.

If this is right

  • Robust decisions become feasible even when zero samples from the target distribution are ever observed.
  • Both the learned uncertainty set and the resulting decision enjoy explicit finite-sample out-of-distribution bounds that scale with the number of source distributions.
  • The conservatism parameter in the RKHS uncertainty set directly trades off robustness against average-case performance.
  • An approximate finite-dimensional parametrization with provable suboptimality gap reduces the infinite-dimensional problem to a tractable row-generation algorithm.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same meta-distribution assumption could be tested empirically by checking whether held-out source distributions fall inside the learned uncertainty set at the predicted rate.
  • If the meta-distribution assumption fails in practice, the framework's guarantees collapse, suggesting a diagnostic that measures how well new sources fit the learned RKHS ball.
  • The RKHS construction might be replaced by a neural-network feature map for higher-dimensional or structured data while preserving the same generalization argument.

Load-bearing premise

All observed data distributions are randomly generated from a single unknown meta-distribution over distributions.

What would settle it

Draw a fresh target distribution from the same meta-distribution, solve the robust program, and check whether its realized cost exceeds the non-robust empirical optimum by more than the paper's derived generalization bound with probability greater than the claimed failure rate.

Figures

Figures reproduced from arXiv: 2604.20147 by Chao Shang, Huan Xu, Xianyu Li, Xiaolin Huang.

Figure 1
Figure 1. Figure 1: Illustrative example: a retail cold-start problem. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Schematic of Prior Methods for Aggregating Multiple Data Distributions [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Our proposal: meta-distributional modeling and embeddings in RKHS. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Computational performance of RooD-SO on the two-item newsvendor task [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗
read the original abstract

Data-driven decision-making under uncertainty typically presumes the collection of historical data from an unknown target probability distribution. However, one may have no access to any data from the target distribution prior to decision-making. To address this challenge, we propose robust out-of-distribution stochastic optimization, a novel data-driven framework that effectively utilizes relevant data distributions for robust decision-making under unseen distributions. A key feature of our framework is that all data distributions are assumed to be randomly generated from a meta-distribution over distributions. To describe uncertainty in distribution generation, we propose to learn a data-driven uncertainty set in a reproducing kernel Hilbert space (RKHS) from relevant data distributions, with adjustable conservatism. We then incorporate this set into a min-max stochastic program to derive robust decisions. Notably, under randomness of distribution generation, we establish rigorous out-of-distribution generalization guarantees for the uncertainty set as well as the solution. To ease problem-solving in RKHS, an approximate parametrization with a provably bounded suboptimality and a row generation strategy are presented. Extensive numerical experiments on multi-item newsvendor and portfolio optimization demonstrate the superior out-of-distribution performance of our decision-making framework under unseen data distribution, even when only a small or moderate number of relevant sources are available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes a framework for robust out-of-distribution stochastic optimization. It assumes that all data distributions are randomly generated from a meta-distribution over distributions. The approach learns a data-driven uncertainty set in a reproducing kernel Hilbert space (RKHS) from relevant distributions with adjustable conservatism, incorporates it into a min-max stochastic program, establishes rigorous out-of-distribution generalization guarantees for the uncertainty set and the solution under the randomness assumption, provides an approximate parametrization with bounded suboptimality and a row-generation strategy, and demonstrates superior performance on multi-item newsvendor and portfolio optimization problems.

Significance. If the generalization guarantees hold under the stated meta-distribution assumption, this work contributes a theoretically grounded method for making robust decisions when the target distribution is unseen but related distributions are available. The use of RKHS for uncertainty sets and the provision of approximation algorithms with provable bounds are notable strengths. The empirical results on standard problems suggest practical applicability in operations research and finance.

minor comments (2)
  1. [Numerical Experiments] The numerical experiments on the multi-item newsvendor and portfolio optimization problems would benefit from explicit details on the baselines used for comparison, the number of replications or random instances, and any statistical significance testing to better support the claims of superior out-of-distribution performance.
  2. [Method] A short discussion on how the adjustable conservatism parameter in the RKHS uncertainty set is selected in practice, or its sensitivity in the reported experiments, would improve reproducibility and clarity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the supportive summary of our manuscript, the positive assessment of its significance, and the recommendation for minor revision. The referee's description accurately reflects the core contributions of the proposed meta-distribution-based robust optimization framework, including the RKHS uncertainty sets, out-of-distribution generalization guarantees, approximation schemes, and empirical results on newsvendor and portfolio problems.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper explicitly states the meta-distribution assumption as a modeling choice upfront and derives OOD generalization bounds for the RKHS uncertainty set and min-max solution via standard concentration inequalities under that assumption. No load-bearing step reduces a claimed prediction or guarantee to a fitted parameter by construction, nor imports uniqueness via self-citation chains, nor renames known results. The derivation chain remains self-contained once the stated randomness assumption is granted, with no internal reduction of outputs to inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the meta-distribution assumption for data generation and the embedding of distributions into an RKHS; no free parameters are explicitly fitted in the abstract description beyond an adjustable conservatism level.

free parameters (1)
  • adjustable conservatism parameter
    Controls the size of the learned uncertainty set in RKHS; its value is chosen to balance robustness and performance.
axioms (1)
  • domain assumption All relevant data distributions are randomly generated from an unknown meta-distribution over distributions
    Invoked to derive the out-of-distribution generalization guarantees for the uncertainty set and the resulting decisions.

pith-pipeline@v0.9.0 · 5515 in / 1210 out tokens · 26489 ms · 2026-05-10T00:30:30.069158+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages

  1. [1]

    Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

    Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

  2. [2]

    Distributionally robust data join

    Pranjal Awasthi, Christopher Jung, and Jamie Morgenstern. Distributionally robust data join. arXiv:2202.05797, 2022. 28

  3. [3]

    On the equivalence between kernel quadrature rules and random feature expan- sions.Journal of Machine Learning Research, 18(21):1–38, 2017

    Francis Bach. On the equivalence between kernel quadrature rules and random feature expan- sions.Journal of Machine Learning Research, 18(21):1–38, 2017

  4. [4]

    Robust solutions of optimization problems affected by uncertain probabilities.Manage- ment Science, 59(2):341–357, 2013

    Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Ren- nen. Robust solutions of optimization problems affected by uncertain probabilities.Manage- ment Science, 59(2):341–357, 2013

  5. [5]

    Deriving robust counterparts of nonlinear uncertain inequalities.Mathematical Programming, 149(1):265–299, 2015

    Aharon Ben-Tal, Dick Den Hertog, and Jean-Philippe Vial. Deriving robust counterparts of nonlinear uncertain inequalities.Mathematical Programming, 149(1):265–299, 2015

  6. [6]

    Infinitely constrained optimization problems.Journal of Optimization Theory and Applications, 19(2):261–281, 1976

    Jerry W Blankenship and James E Falk. Infinitely constrained optimization problems.Journal of Optimization Theory and Applications, 19(2):261–281, 1976

  7. [7]

    Distributionally robust optimization via ball oracle acceler- ation

    Yair Carmon and Danielle Hausler. Distributionally robust optimization via ball oracle acceler- ation. InAdvances in Neural Information Processing Systems, volume 35, pages 35866–35879, 2022

  8. [8]

    Super-samples from kernel herding

    Yutian Chen, Max Welling, and Alex Smola. Super-samples from kernel herding. InProceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, page 109–116, 2010

  9. [9]

    Fast computation of Wasserstein barycenters

    Marco Cuturi and Arnaud Doucet. Fast computation of Wasserstein barycenters. InInterna- tional Conference on Machine Learning, pages 685–693. PMLR, 2014

  10. [10]

    Distributionally robust federated averaging.Advances in Neural Information Processing Systems, 33:15111–15122, 2020

    Yuyang Deng, Mohammad Mahdi Kamani, and Mehrdad Mahdavi. Distributionally robust federated averaging.Advances in Neural Information Processing Systems, 33:15111–15122, 2020

  11. [11]

    A permutation-based kernel conditional independence test

    Gary Doran, Krikamol Muandet, Kun Zhang, and Bernhard Sch¨ olkopf. A permutation-based kernel conditional independence test. InProceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, page 132–141, 2014

  12. [12]

    Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021

    John C Duchi and Hongseok Namkoong. Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021

  13. [13]

    Data pooling for multiple single- component systems under population heterogeneity.International Journal of Production Eco- nomics, 250:108665, 2022

    ˙Ipek Dursun, Alp Ak¸ cay, and Geert-Jan Van Houtum. Data pooling for multiple single- component systems under population heterogeneity.International Journal of Production Eco- nomics, 250:108665, 2022

  14. [14]

    Yara Kayyali Elalem, Sebastian Maier, and Ralf W. Seifert. A machine learning-based frame- work for forecasting sales of new products with short life cycles using deep neural networks. International Journal of Forecasting, 39(4):1874–1894, 2023

  15. [15]

    Imperial College London, London, 2020

    Neil M Ferguson, Daniel Laydon, Gemma Nedjati-Gilani, Natsuko Imai, Kylie Ainslie, Marc Baguelin, Sangeeta Bhatia, Adhiratha Boonyasiri, Zulma Cucunub´ a, Gina Cuomo- Dannenburg, et al.Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce 29 COVID19 mortality and healthcare demand, volume 16. Imperial College London, London, 2020

  16. [16]

    A stochastic approach to the gamma function.The American Mathematical Monthly, 101(9):858–865, 1994

    Louis Gordon. A stochastic approach to the gamma function.The American Mathematical Monthly, 101(9):858–865, 1994

  17. [17]

    A kernel method for the two-sample-problem

    Arthur Gretton, Karsten Borgwardt, Malte Rasch, Bernhard Sch¨ olkopf, and Alex Smola. A kernel method for the two-sample-problem. InAdvances in Neural Information Processing Systems, volume 19, 2006

  18. [18]

    A kernel two-sample test.Journal of Machine Learning Research, 13(1):723–773, 2012

    Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13(1):723–773, 2012

  19. [19]

    Support measure data description for group anomaly detection

    Jorge Guevara, Stephane Canu, and Roberto Hirata. Support measure data description for group anomaly detection. InODDx3 Workshop on Outlier Definition, Detection, and Descrip- tion at the 21st ACM SIGKDD International Conference On Knowledge Discovery And Data Mining (KDD2015), 2015

  20. [20]

    Statistical analysis of conditional group distributionally robust optimization with cross-entropy loss.arXiv:2507.09905, 2026

    Zijian Guo, Zhenyu Wang, Yifan Hu, and Francis Bach. Statistical analysis of conditional group distributionally robust optimization with cross-entropy loss.arXiv:2507.09905, 2026

  21. [21]

    Data pooling in stochastic optimization.Management Science, 68(3):1595–1615, 2022

    Vishal Gupta and Nathan Kallus. Data pooling in stochastic optimization.Management Science, 68(3):1595–1615, 2022

  22. [22]

    Fairness without demographics in repeated loss minimization

    Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without demographics in repeated loss minimization. InInternational Conference on Machine Learning, pages 1929–1938, 2018

  23. [23]

    Thomas Hofmann, Bernhard Sch¨ olkopf, and Alexander J. Smola. Kernel methods in machine learning.The Annals of Statistics, 36(1):1171–1220, 2008

  24. [24]

    Cambridge university press, 2012

    Roger A Horn and Charles R Johnson.Matrix Analysis. Cambridge university press, 2012

  25. [25]

    Portfolio optimization with condi- tional value-at-risk objective and constraints.Journal of Risk, 4(2):43–68, 2002

    Pavlo Krokhmal, Jonas Palmquist, and Stanislav Uryasev. Portfolio optimization with condi- tional value-at-risk objective and constraints.Journal of Risk, 4(2):43–68, 2002

  26. [26]

    Wasserstein distributionally robust optimization: Theory and applications in machine learning.Operations Research & Management Science in the Age of Analytics, pages 130–166, 2019

    Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh Shafieezadeh- Abadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning.Operations Research & Management Science in the Age of Analytics, pages 130–166, 2019

  27. [27]

    Distributionally robust optimization

    Daniel Kuhn, Soroosh Shafiee, and Wolfram Wiesemann. Distributionally robust optimization. Acta Numerica, 34:579–804, 2025. 30

  28. [28]

    Fairness without demographics through adversarially reweighted learning

    Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed Chi. Fairness without demographics through adversarially reweighted learning. InAdvances in Neural Information Processing Systems, volume 33, pages 728–740, 2020

  29. [29]

    Springer Science & Business Media, 2013

    Michel Ledoux and Michel Talagrand.Probability in Banach Spaces: Isoperimetry and Pro- cesses. Springer Science & Business Media, 2013

  30. [30]

    Temporally and distributionally robust optimization for cold-start recommendation

    Xinyu Lin, Wenjie Wang, Jujia Zhao, Yongqi Li, Fuli Feng, and Tat-Seng Chua. Temporally and distributionally robust optimization for cold-start recommendation. InProceedings of the AAAI Conference on Artificial Intelligence, 2024

  31. [31]

    Multi-source conformal infer- ence under distribution shift

    Yi Liu, Alexander Levis, Sharon-Lise Normand, and Larry Han. Multi-source conformal infer- ence under distribution shift. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 31344–31382, 2024

  32. [32]

    On the method of bounded differences.Surveys in Combinatorics, 141(1):148–188, 1989

    Colin McDiarmid et al. On the method of bounded differences.Surveys in Combinatorics, 141(1):148–188, 1989

  33. [33]

    A comparison of three methods for selecting values of input variables in the analysis of output from a computer code

    Michael D McKay, Richard J Beckman, and William J Conover. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 42(1):55–61, 2000

  34. [34]

    Peyman Mohajerin Esfahani and Daniel Kuhn. Data-driven distributionally robust optimiza- tion using the Wasserstein metric: Performance guarantees and tractable reformulations.Math- ematical Programming, 171(1):115–166, 2018

  35. [35]

    Agnostic federated learning

    Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. Agnostic federated learning. In International Conference on Machine Learning, pages 4615–4625, 2019

  36. [36]

    Wasserstein barycenter for multi-source domain adaptation

    Eduardo Fernandes Montesuma and Fred Maurice Ngole Mboula. Wasserstein barycenter for multi-source domain adaptation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16785–16793, 2021

  37. [37]

    Ker- nel mean embedding of distributions: A review and beyond.Foundations and Trends®in Machine Learning, 10(1-2):1–141, 2017

    Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Bernhard Sch¨ olkopf, et al. Ker- nel mean embedding of distributions: A review and beyond.Foundations and Trends®in Machine Learning, 10(1-2):1–141, 2017

  38. [38]

    One-class support measure machines for group anomaly detection

    Krikamol Muandet and Bernhard Sch¨ olkopf. One-class support measure machines for group anomaly detection. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 449–458, 2013

  39. [39]

    Cutting-set methods for robust convex optimization with pessimizing oracles.Optimization Methods & Software, 24(3):381–406, 2009

    Almir Mutapcic and Stephen Boyd. Cutting-set methods for robust convex optimization with pessimizing oracles.Optimization Methods & Software, 24(3):381–406, 2009

  40. [40]

    Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

    Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994. 31

  41. [41]

    Sequential minimal optimization: A fast algorithm for training support vector machines.Advances in Kernel Methods-Support Vector Learning, 208, 1998

    John Platt. Sequential minimal optimization: A fast algorithm for training support vector machines.Advances in Kernel Methods-Support Vector Learning, 208, 1998

  42. [42]

    Potra and Stephen J

    Florian A. Potra and Stephen J. Wright. Interior-point methods.Journal of Computational and Applied Mathematics, 124(1):281–302, 2000

  43. [43]

    Tyrrell Rockafellar and Stanislav Uryasev

    R. Tyrrell Rockafellar and Stanislav Uryasev. Optimization of conditional value-at risk.Journal of Risk, 3:21–41, 2000

  44. [44]

    Rychener, A

    Yves Rychener, Adri´ an Esteban-P´ erez, Juan M Morales, and Daniel Kuhn. Wasserstein dis- tributionally robust optimization with heterogeneous data sources.arXiv:2407.13582, 2024

  45. [45]

    A survey of contextual optimization methods for decision making under uncertainty.European Journal of Operational Research, 2024

    Utsav Sadana, Abhilash Reddy Chenreddy, Erick Delage, Alexandre Forel, Emma Frejinger, and Thibaut Vidal. A survey of contextual optimization methods for decision making under uncertainty.European Journal of Operational Research, 2024

  46. [46]

    Hashimoto, and Percy Liang

    Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, and Percy Liang. Distributionally robust neural networks. InInternational Conference on Learning Representations, 2020

  47. [47]

    Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning.SIAM Journal on Imaging Sci- ences, 11(1):643–678, 2018

    Morgan A Schmitz, Matthieu Heitz, Nicolas Bonneel, Fred Ngole, David Coeurjolly, Marco Cuturi, Gabriel Peyr´ e, and Jean-Luc Starck. Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning.SIAM Journal on Imaging Sci- ences, 11(1):643–678, 2018

  48. [48]

    A generalized representer theorem

    Bernhard Sch¨ olkopf, Ralf Herbrich, and Alex J Smola. A generalized representer theorem. In International Conference on Computational Learning Theory, pages 416–426. Springer, 2001

  49. [49]

    MIT press, 2002

    Bernhard Sch¨ olkopf and Alexander J Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT press, 2002

  50. [50]

    Potluru, Tucker Balch, and Manuela Veloso

    Aras Selvi, Eleonora Kreacic, Mohsen Ghassemi, Vamsi K. Potluru, Tucker Balch, and Manuela Veloso. Distributionally and adversarially robust logistic regression via intersecting Wasserstein balls. InProceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, volume 286, pages 3641–3674, 2025

  51. [51]

    SIAM, 2021

    Alexander Shapiro, Darinka Dentcheva, and Andrzej Ruszczynski.Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2021

  52. [52]

    Siebes, and Siamak Mehrkanoon

    Jie Shi, Arno P.J.M. Siebes, and Siamak Mehrkanoon. Transcoralnet: A two-stream trans- former coral networks for supply chain credit assessment cold start.Expert Systems with Applications, 282:127581, 2025

  53. [53]

    A Hilbert space embedding for distributions

    Alex Smola, Arthur Gretton, Le Song, and Bernhard Sch¨ olkopf. A Hilbert space embedding for distributions. InInternational Conference on Algorithmic Learning Theory, pages 13–31, 2007. 32

  54. [54]

    Hilbert space embeddings and metrics on probability measures.Jour- nal of Machine Learning Research, 11(50):1517–1561, 2010

    Bharath K Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Sch¨ olkopf, and Gert RG Lanckriet. Hilbert space embeddings and metrics on probability measures.Jour- nal of Machine Learning Research, 11(50):1517–1561, 2010

  55. [55]

    Scalable Bayes via barycenter in Wasser- stein space.Journal of Machine Learning Research, 19(8):1–35, 2018

    Sanvesh Srivastava, Cheng Li, and David B Dunson. Scalable Bayes via barycenter in Wasser- stein space.Journal of Machine Learning Research, 19(8):1–35, 2018

  56. [56]

    Distributionally robust optimization and generalization in kernel methods

    Matthew Staib and Stefanie Jegelka. Distributionally robust optimization and generalization in kernel methods. InAdvances in Neural Information Processing Systems, volume 32, 2019

  57. [57]

    Se- quential domain adaptation by synthesizing distributionally robust experts

    Bahar Taskesen, Man-Chung Yue, Jose Blanchet, Daniel Kuhn, and Viet Anh Nguyen. Se- quential domain adaptation by synthesizing distributionally robust experts. InProceedings of the 38th International Conference on Machine Learning, volume 139, pages 10162–10172, 2021

  58. [58]

    Minimax estimation of kernel mean embeddings.Journal of Machine Learning Research, 18(86):1–47, 2017

    Ilya Tolstikhin, Bharath K Sriperumbudur, Krikamol Mu, et al. Minimax estimation of kernel mean embeddings.Journal of Machine Learning Research, 18(86):1–47, 2017

  59. [59]

    The multi-product newsvendor problem: Review, extensions, and directions for future research.Handbook of Newsvendor Problems: Models, Extensions and Applications, pages 3–39, 2012

    Nazli Turken, Yinliang Tan, Asoo J Vakharia, Lan Wang, Ruoxuan Wang, and Arda Yeni- pazarli. The multi-product newsvendor problem: Review, extensions, and directions for future research.Handbook of Newsvendor Problems: Models, Extensions and Applications, pages 3–39, 2012

  60. [60]

    Springer Science & Business Media, 2006

    Vladimir Vapnik.Estimation of Dependences Based on Empirical Data. Springer Science & Business Media, 2006

  61. [61]

    Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges.Statistical Science, 30(2):199, 2015

    Sof´ ıa S Villar, Jack Bowden, and James Wason. Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges.Statistical Science, 30(2):199, 2015

  62. [62]

    Contextual optimization under covariate shift: A robust approach by intersecting Wasserstein balls.arXiv:2406.02426, 2024

    Tianyu Wang, Ningyuan Chen, and Chun Wang. Contextual optimization under covariate shift: A robust approach by intersecting Wasserstein balls.arXiv:2406.02426, 2024

  63. [63]

    Gaussian mixture model based distri- butionally robust optimal power flow with CVaR constraints.arXiv:2110.13336, 2021

    Lei You, Hui Ma, Tapan Kumar Saha, and Gang Liu. Gaussian mixture model based distri- butionally robust optimal power flow with CVaR constraints.arXiv:2110.13336, 2021

  64. [64]

    Efficient algorithms for empirical group distributionally robust optimization and beyond

    Dingzhi Yu, Yunuo Cai, Wei Jiang, and Lijun Zhang. Efficient algorithms for empirical group distributionally robust optimization and beyond. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 57384–57414, 2024

  65. [65]

    Stochastic approximation approaches to group distributionally robust optimization

    Lijun Zhang, Peng Zhao, Zhen-Hua Zhuang, Tianbao Yang, and Zhi-Hua Zhou. Stochastic approximation approaches to group distributionally robust optimization. InAdvances in Neural Information Processing Systems, volume 36, pages 52490–52522, 2023

  66. [66]

    Kernel distribu- tionally robust optimization: Generalized duality theorem and stochastic approximation

    Jia-Jie Zhu, Wittawat Jitkrittum, Moritz Diehl, and Bernhard Sch¨ olkopf. Kernel distribu- tionally robust optimization: Generalized duality theorem and stochastic approximation. In International Conference on Artificial Intelligence and Statistics, pages 280–288, 2021. 33

  67. [67]

    Demand forecasting tool for inventory control smart systems.Journal of Communications Software and Systems, 17(2):185–196, 2021

    Fatima Zohra Benhamida, Ouahiba Kaddouri, Tahar Ouhrouche, Mohammed Benaichouche, Diego Casado-Mansilla, and Diego L´ opez-de Ipina. Demand forecasting tool for inventory control smart systems.Journal of Communications Software and Systems, 17(2):185–196, 2021. 34 Appendix A Proofs A.1 Additional notation Before proceeding, we introduce notations that are...

  68. [68]

    Ki′i′ −2 MX l=1 αlKli′ +α ⊤Kα # ,(A.59) with the empirical plug-in version ˆR2 = sup i′∈BSM

    Next, squaring both sides of (A.27), taking expectations, and using∥x+y∥ 2 2 ≤2 (∥x∥ 2 2 +∥y∥ 2 2), we obtain an upper bound forE ∥ˆα−α∥ 2 2 : E ∥ˆα−α∥ 2 2 ≤ 2 λ2 min(K) ∥α∥2 2 E n ∥ ˆK−K∥ 2 2 o + 1 4 E n ∥diag( ˆK)−diag(K)∥ 2 2 o .(A.46) To control the first term on the right-hand side, we use the decomposition from the proof of Proposition 4 (See Append...

  69. [69]

    This concludes the proof

    Therefore, ∥ˆg∗ −Proj HΓˆg∗∥Hk ≤ ∥θ ∗∥2 q ∥KN N −K N SK† SSKSN ∥2 =∥θ ∗∥2 p ∥R∥2. This concludes the proof. 58