Robust Out-of-Distribution Stochastic Optimization

Chao Shang; Huan Xu; Xianyu Li; Xiaolin Huang

arxiv: 2604.20147 · v1 · submitted 2026-04-22 · 🧮 math.OC · cs.LG

Robust Out-of-Distribution Stochastic Optimization

Xianyu Li , Huan Xu , Xiaolin Huang , Chao Shang This is my paper

Pith reviewed 2026-05-10 00:30 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords out-of-distribution generalizationrobust stochastic optimizationuncertainty setreproducing kernel Hilbert spacemeta-distributionmin-max optimizationdata-driven decision makingnewsvendor problem

0 comments

The pith

Assuming distributions are drawn from a meta-distribution allows construction of a data-driven uncertainty set in RKHS that delivers rigorous out-of-distribution generalization bounds for robust stochastic decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses decision-making when no historical data exists from the actual target distribution. It assumes instead that all available distributions are randomly sampled from an unknown meta-distribution over distributions. From this assumption the authors build a conservative uncertainty set inside a reproducing kernel Hilbert space that encloses plausible future distributions with high probability. This set is then inserted into a min-max stochastic program whose solution inherits explicit generalization guarantees. Numerical tests on newsvendor and portfolio problems show the resulting decisions outperform standard approaches on held-out distributions even when only modest numbers of source distributions are available.

Core claim

Under the randomness assumption on distribution generation, the framework learns a data-driven uncertainty set in RKHS whose radius can be tuned for adjustable conservatism; the corresponding min-max stochastic program then produces decisions whose out-of-distribution performance is bounded by explicit generalization inequalities that hold simultaneously for the uncertainty set itself and for the obtained solution.

What carries the argument

The data-driven uncertainty set constructed in a reproducing kernel Hilbert space from relevant source distributions, embedded inside a min-max stochastic program.

If this is right

Robust decisions become feasible even when zero samples from the target distribution are ever observed.
Both the learned uncertainty set and the resulting decision enjoy explicit finite-sample out-of-distribution bounds that scale with the number of source distributions.
The conservatism parameter in the RKHS uncertainty set directly trades off robustness against average-case performance.
An approximate finite-dimensional parametrization with provable suboptimality gap reduces the infinite-dimensional problem to a tractable row-generation algorithm.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same meta-distribution assumption could be tested empirically by checking whether held-out source distributions fall inside the learned uncertainty set at the predicted rate.
If the meta-distribution assumption fails in practice, the framework's guarantees collapse, suggesting a diagnostic that measures how well new sources fit the learned RKHS ball.
The RKHS construction might be replaced by a neural-network feature map for higher-dimensional or structured data while preserving the same generalization argument.

Load-bearing premise

All observed data distributions are randomly generated from a single unknown meta-distribution over distributions.

What would settle it

Draw a fresh target distribution from the same meta-distribution, solve the robust program, and check whether its realized cost exceeds the non-robust empirical optimum by more than the paper's derived generalization bound with probability greater than the claimed failure rate.

Figures

Figures reproduced from arXiv: 2604.20147 by Chao Shang, Huan Xu, Xianyu Li, Xiaolin Huang.

**Figure 2.** Figure 2: Schematic of Prior Methods for Aggregating Multiple Data Distributions [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Our proposal: meta-distributional modeling and embeddings in RKHS. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Computational performance of RooD-SO on the two-item newsvendor task [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

read the original abstract

Data-driven decision-making under uncertainty typically presumes the collection of historical data from an unknown target probability distribution. However, one may have no access to any data from the target distribution prior to decision-making. To address this challenge, we propose robust out-of-distribution stochastic optimization, a novel data-driven framework that effectively utilizes relevant data distributions for robust decision-making under unseen distributions. A key feature of our framework is that all data distributions are assumed to be randomly generated from a meta-distribution over distributions. To describe uncertainty in distribution generation, we propose to learn a data-driven uncertainty set in a reproducing kernel Hilbert space (RKHS) from relevant data distributions, with adjustable conservatism. We then incorporate this set into a min-max stochastic program to derive robust decisions. Notably, under randomness of distribution generation, we establish rigorous out-of-distribution generalization guarantees for the uncertainty set as well as the solution. To ease problem-solving in RKHS, an approximate parametrization with a provably bounded suboptimality and a row generation strategy are presented. Extensive numerical experiments on multi-item newsvendor and portfolio optimization demonstrate the superior out-of-distribution performance of our decision-making framework under unseen data distribution, even when only a small or moderate number of relevant sources are available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean way to do robust stochastic optimization with no target data by assuming distributions come from a meta-distribution and learning an RKHS uncertainty set, with generalization bounds that follow once the assumption is granted.

read the letter

The core contribution is a min-max formulation that builds a data-driven uncertainty set in RKHS from source distributions, then solves for decisions that are robust to unseen targets. They add an adjustable conservatism knob and supply an approximate parametrization plus row-generation solver with a suboptimality bound. Under the meta-distribution randomness, they claim OOD generalization for both the set and the solution. That combination of meta-distribution modeling with RKHS sets is not standard in the DRO literature they cite, so the integration is new on its face. The experiments on multi-item newsvendor and portfolio problems are presented as showing better out-of-distribution performance than baselines when only a few sources are available, which is the practical angle they emphasize. The theory is straightforward once the modeling assumption is accepted, and the algorithmic pieces look workable for implementation. The main limitation is that the entire guarantee structure collapses if the meta-distribution assumption does not hold in the data; it is stated upfront but remains a strong modeling choice rather than something tested or relaxed. The numerical results are described at a high level, so it is unclear how carefully baselines were matched or whether statistical significance was checked across the reported runs. No obvious internal contradictions appear in the argument as summarized. This is aimed at researchers already working in distributionally robust optimization who are willing to adopt the meta-distribution framing for OOD cases. Readers who want formal bounds under a specific generative model and a practical solver will find usable material. It has enough formal content and a distinct angle to merit sending to referees rather than a desk reject, though any review should press on how sensitive the results are to the core assumption and on the experimental controls.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes a framework for robust out-of-distribution stochastic optimization. It assumes that all data distributions are randomly generated from a meta-distribution over distributions. The approach learns a data-driven uncertainty set in a reproducing kernel Hilbert space (RKHS) from relevant distributions with adjustable conservatism, incorporates it into a min-max stochastic program, establishes rigorous out-of-distribution generalization guarantees for the uncertainty set and the solution under the randomness assumption, provides an approximate parametrization with bounded suboptimality and a row-generation strategy, and demonstrates superior performance on multi-item newsvendor and portfolio optimization problems.

Significance. If the generalization guarantees hold under the stated meta-distribution assumption, this work contributes a theoretically grounded method for making robust decisions when the target distribution is unseen but related distributions are available. The use of RKHS for uncertainty sets and the provision of approximation algorithms with provable bounds are notable strengths. The empirical results on standard problems suggest practical applicability in operations research and finance.

minor comments (2)

[Numerical Experiments] The numerical experiments on the multi-item newsvendor and portfolio optimization problems would benefit from explicit details on the baselines used for comparison, the number of replications or random instances, and any statistical significance testing to better support the claims of superior out-of-distribution performance.
[Method] A short discussion on how the adjustable conservatism parameter in the RKHS uncertainty set is selected in practice, or its sensitivity in the reported experiments, would improve reproducibility and clarity.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the supportive summary of our manuscript, the positive assessment of its significance, and the recommendation for minor revision. The referee's description accurately reflects the core contributions of the proposed meta-distribution-based robust optimization framework, including the RKHS uncertainty sets, out-of-distribution generalization guarantees, approximation schemes, and empirical results on newsvendor and portfolio problems.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper explicitly states the meta-distribution assumption as a modeling choice upfront and derives OOD generalization bounds for the RKHS uncertainty set and min-max solution via standard concentration inequalities under that assumption. No load-bearing step reduces a claimed prediction or guarantee to a fitted parameter by construction, nor imports uniqueness via self-citation chains, nor renames known results. The derivation chain remains self-contained once the stated randomness assumption is granted, with no internal reduction of outputs to inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on the meta-distribution assumption for data generation and the embedding of distributions into an RKHS; no free parameters are explicitly fitted in the abstract description beyond an adjustable conservatism level.

free parameters (1)

adjustable conservatism parameter
Controls the size of the learned uncertainty set in RKHS; its value is chosen to balance robustness and performance.

axioms (1)

domain assumption All relevant data distributions are randomly generated from an unknown meta-distribution over distributions
Invoked to derive the out-of-distribution generalization guarantees for the uncertainty set and the resulting decisions.

pith-pipeline@v0.9.0 · 5515 in / 1210 out tokens · 26489 ms · 2026-05-10T00:30:30.069158+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages

[1]

Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

work page 1950
[2]

Distributionally robust data join

Pranjal Awasthi, Christopher Jung, and Jamie Morgenstern. Distributionally robust data join. arXiv:2202.05797, 2022. 28

work page arXiv 2022
[3]

On the equivalence between kernel quadrature rules and random feature expan- sions.Journal of Machine Learning Research, 18(21):1–38, 2017

Francis Bach. On the equivalence between kernel quadrature rules and random feature expan- sions.Journal of Machine Learning Research, 18(21):1–38, 2017

work page 2017
[4]

Robust solutions of optimization problems affected by uncertain probabilities.Manage- ment Science, 59(2):341–357, 2013

Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Ren- nen. Robust solutions of optimization problems affected by uncertain probabilities.Manage- ment Science, 59(2):341–357, 2013

work page 2013
[5]

Deriving robust counterparts of nonlinear uncertain inequalities.Mathematical Programming, 149(1):265–299, 2015

Aharon Ben-Tal, Dick Den Hertog, and Jean-Philippe Vial. Deriving robust counterparts of nonlinear uncertain inequalities.Mathematical Programming, 149(1):265–299, 2015

work page 2015
[6]

Infinitely constrained optimization problems.Journal of Optimization Theory and Applications, 19(2):261–281, 1976

Jerry W Blankenship and James E Falk. Infinitely constrained optimization problems.Journal of Optimization Theory and Applications, 19(2):261–281, 1976

work page 1976
[7]

Distributionally robust optimization via ball oracle acceler- ation

Yair Carmon and Danielle Hausler. Distributionally robust optimization via ball oracle acceler- ation. InAdvances in Neural Information Processing Systems, volume 35, pages 35866–35879, 2022

work page 2022
[8]

Super-samples from kernel herding

Yutian Chen, Max Welling, and Alex Smola. Super-samples from kernel herding. InProceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, page 109–116, 2010

work page 2010
[9]

Fast computation of Wasserstein barycenters

Marco Cuturi and Arnaud Doucet. Fast computation of Wasserstein barycenters. InInterna- tional Conference on Machine Learning, pages 685–693. PMLR, 2014

work page 2014
[10]

Distributionally robust federated averaging.Advances in Neural Information Processing Systems, 33:15111–15122, 2020

Yuyang Deng, Mohammad Mahdi Kamani, and Mehrdad Mahdavi. Distributionally robust federated averaging.Advances in Neural Information Processing Systems, 33:15111–15122, 2020

work page 2020
[11]

A permutation-based kernel conditional independence test

Gary Doran, Krikamol Muandet, Kun Zhang, and Bernhard Sch¨ olkopf. A permutation-based kernel conditional independence test. InProceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, page 132–141, 2014

work page 2014
[12]

Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021

John C Duchi and Hongseok Namkoong. Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021

work page 2021
[13]

Data pooling for multiple single- component systems under population heterogeneity.International Journal of Production Eco- nomics, 250:108665, 2022

˙Ipek Dursun, Alp Ak¸ cay, and Geert-Jan Van Houtum. Data pooling for multiple single- component systems under population heterogeneity.International Journal of Production Eco- nomics, 250:108665, 2022

work page 2022
[14]

Yara Kayyali Elalem, Sebastian Maier, and Ralf W. Seifert. A machine learning-based frame- work for forecasting sales of new products with short life cycles using deep neural networks. International Journal of Forecasting, 39(4):1874–1894, 2023

work page 2023
[15]

Imperial College London, London, 2020

Neil M Ferguson, Daniel Laydon, Gemma Nedjati-Gilani, Natsuko Imai, Kylie Ainslie, Marc Baguelin, Sangeeta Bhatia, Adhiratha Boonyasiri, Zulma Cucunub´ a, Gina Cuomo- Dannenburg, et al.Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce 29 COVID19 mortality and healthcare demand, volume 16. Imperial College London, London, 2020

work page 2020
[16]

A stochastic approach to the gamma function.The American Mathematical Monthly, 101(9):858–865, 1994

Louis Gordon. A stochastic approach to the gamma function.The American Mathematical Monthly, 101(9):858–865, 1994

work page 1994
[17]

A kernel method for the two-sample-problem

Arthur Gretton, Karsten Borgwardt, Malte Rasch, Bernhard Sch¨ olkopf, and Alex Smola. A kernel method for the two-sample-problem. InAdvances in Neural Information Processing Systems, volume 19, 2006

work page 2006
[18]

A kernel two-sample test.Journal of Machine Learning Research, 13(1):723–773, 2012

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13(1):723–773, 2012

work page 2012
[19]

Support measure data description for group anomaly detection

Jorge Guevara, Stephane Canu, and Roberto Hirata. Support measure data description for group anomaly detection. InODDx3 Workshop on Outlier Definition, Detection, and Descrip- tion at the 21st ACM SIGKDD International Conference On Knowledge Discovery And Data Mining (KDD2015), 2015

work page 2015
[20]

Statistical analysis of conditional group distributionally robust optimization with cross-entropy loss.arXiv:2507.09905, 2026

Zijian Guo, Zhenyu Wang, Yifan Hu, and Francis Bach. Statistical analysis of conditional group distributionally robust optimization with cross-entropy loss.arXiv:2507.09905, 2026

work page arXiv 2026
[21]

Data pooling in stochastic optimization.Management Science, 68(3):1595–1615, 2022

Vishal Gupta and Nathan Kallus. Data pooling in stochastic optimization.Management Science, 68(3):1595–1615, 2022

work page 2022
[22]

Fairness without demographics in repeated loss minimization

Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without demographics in repeated loss minimization. InInternational Conference on Machine Learning, pages 1929–1938, 2018

work page 1929
[23]

Thomas Hofmann, Bernhard Sch¨ olkopf, and Alexander J. Smola. Kernel methods in machine learning.The Annals of Statistics, 36(1):1171–1220, 2008

work page 2008
[24]

Cambridge university press, 2012

Roger A Horn and Charles R Johnson.Matrix Analysis. Cambridge university press, 2012

work page 2012
[25]

Portfolio optimization with condi- tional value-at-risk objective and constraints.Journal of Risk, 4(2):43–68, 2002

Pavlo Krokhmal, Jonas Palmquist, and Stanislav Uryasev. Portfolio optimization with condi- tional value-at-risk objective and constraints.Journal of Risk, 4(2):43–68, 2002

work page 2002
[26]

Wasserstein distributionally robust optimization: Theory and applications in machine learning.Operations Research & Management Science in the Age of Analytics, pages 130–166, 2019

Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh Shafieezadeh- Abadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning.Operations Research & Management Science in the Age of Analytics, pages 130–166, 2019

work page 2019
[27]

Distributionally robust optimization

Daniel Kuhn, Soroosh Shafiee, and Wolfram Wiesemann. Distributionally robust optimization. Acta Numerica, 34:579–804, 2025. 30

work page 2025
[28]

Fairness without demographics through adversarially reweighted learning

Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed Chi. Fairness without demographics through adversarially reweighted learning. InAdvances in Neural Information Processing Systems, volume 33, pages 728–740, 2020

work page 2020
[29]

Springer Science & Business Media, 2013

Michel Ledoux and Michel Talagrand.Probability in Banach Spaces: Isoperimetry and Pro- cesses. Springer Science & Business Media, 2013

work page 2013
[30]

Temporally and distributionally robust optimization for cold-start recommendation

Xinyu Lin, Wenjie Wang, Jujia Zhao, Yongqi Li, Fuli Feng, and Tat-Seng Chua. Temporally and distributionally robust optimization for cold-start recommendation. InProceedings of the AAAI Conference on Artificial Intelligence, 2024

work page 2024
[31]

Multi-source conformal infer- ence under distribution shift

Yi Liu, Alexander Levis, Sharon-Lise Normand, and Larry Han. Multi-source conformal infer- ence under distribution shift. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 31344–31382, 2024

work page 2024
[32]

On the method of bounded differences.Surveys in Combinatorics, 141(1):148–188, 1989

Colin McDiarmid et al. On the method of bounded differences.Surveys in Combinatorics, 141(1):148–188, 1989

work page 1989
[33]

A comparison of three methods for selecting values of input variables in the analysis of output from a computer code

Michael D McKay, Richard J Beckman, and William J Conover. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 42(1):55–61, 2000

work page 2000
[34]

Peyman Mohajerin Esfahani and Daniel Kuhn. Data-driven distributionally robust optimiza- tion using the Wasserstein metric: Performance guarantees and tractable reformulations.Math- ematical Programming, 171(1):115–166, 2018

work page 2018
[35]

Agnostic federated learning

Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. Agnostic federated learning. In International Conference on Machine Learning, pages 4615–4625, 2019

work page 2019
[36]

Wasserstein barycenter for multi-source domain adaptation

Eduardo Fernandes Montesuma and Fred Maurice Ngole Mboula. Wasserstein barycenter for multi-source domain adaptation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16785–16793, 2021

work page 2021
[37]

Ker- nel mean embedding of distributions: A review and beyond.Foundations and Trends®in Machine Learning, 10(1-2):1–141, 2017

Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Bernhard Sch¨ olkopf, et al. Ker- nel mean embedding of distributions: A review and beyond.Foundations and Trends®in Machine Learning, 10(1-2):1–141, 2017

work page 2017
[38]

One-class support measure machines for group anomaly detection

Krikamol Muandet and Bernhard Sch¨ olkopf. One-class support measure machines for group anomaly detection. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 449–458, 2013

work page 2013
[39]

Cutting-set methods for robust convex optimization with pessimizing oracles.Optimization Methods & Software, 24(3):381–406, 2009

Almir Mutapcic and Stephen Boyd. Cutting-set methods for robust convex optimization with pessimizing oracles.Optimization Methods & Software, 24(3):381–406, 2009

work page 2009
[40]

Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994. 31

work page 1994
[41]

Sequential minimal optimization: A fast algorithm for training support vector machines.Advances in Kernel Methods-Support Vector Learning, 208, 1998

John Platt. Sequential minimal optimization: A fast algorithm for training support vector machines.Advances in Kernel Methods-Support Vector Learning, 208, 1998

work page 1998
[42]

Potra and Stephen J

Florian A. Potra and Stephen J. Wright. Interior-point methods.Journal of Computational and Applied Mathematics, 124(1):281–302, 2000

work page 2000
[43]

Tyrrell Rockafellar and Stanislav Uryasev

R. Tyrrell Rockafellar and Stanislav Uryasev. Optimization of conditional value-at risk.Journal of Risk, 3:21–41, 2000

work page 2000
[44]

Rychener, A

Yves Rychener, Adri´ an Esteban-P´ erez, Juan M Morales, and Daniel Kuhn. Wasserstein dis- tributionally robust optimization with heterogeneous data sources.arXiv:2407.13582, 2024

work page arXiv 2024
[45]

A survey of contextual optimization methods for decision making under uncertainty.European Journal of Operational Research, 2024

Utsav Sadana, Abhilash Reddy Chenreddy, Erick Delage, Alexandre Forel, Emma Frejinger, and Thibaut Vidal. A survey of contextual optimization methods for decision making under uncertainty.European Journal of Operational Research, 2024

work page 2024
[46]

Hashimoto, and Percy Liang

Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, and Percy Liang. Distributionally robust neural networks. InInternational Conference on Learning Representations, 2020

work page 2020
[47]

Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning.SIAM Journal on Imaging Sci- ences, 11(1):643–678, 2018

Morgan A Schmitz, Matthieu Heitz, Nicolas Bonneel, Fred Ngole, David Coeurjolly, Marco Cuturi, Gabriel Peyr´ e, and Jean-Luc Starck. Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning.SIAM Journal on Imaging Sci- ences, 11(1):643–678, 2018

work page 2018
[48]

A generalized representer theorem

Bernhard Sch¨ olkopf, Ralf Herbrich, and Alex J Smola. A generalized representer theorem. In International Conference on Computational Learning Theory, pages 416–426. Springer, 2001

work page 2001
[49]

MIT press, 2002

Bernhard Sch¨ olkopf and Alexander J Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT press, 2002

work page 2002
[50]

Potluru, Tucker Balch, and Manuela Veloso

Aras Selvi, Eleonora Kreacic, Mohsen Ghassemi, Vamsi K. Potluru, Tucker Balch, and Manuela Veloso. Distributionally and adversarially robust logistic regression via intersecting Wasserstein balls. InProceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, volume 286, pages 3641–3674, 2025

work page 2025
[51]

SIAM, 2021

Alexander Shapiro, Darinka Dentcheva, and Andrzej Ruszczynski.Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2021

work page 2021
[52]

Siebes, and Siamak Mehrkanoon

Jie Shi, Arno P.J.M. Siebes, and Siamak Mehrkanoon. Transcoralnet: A two-stream trans- former coral networks for supply chain credit assessment cold start.Expert Systems with Applications, 282:127581, 2025

work page 2025
[53]

A Hilbert space embedding for distributions

Alex Smola, Arthur Gretton, Le Song, and Bernhard Sch¨ olkopf. A Hilbert space embedding for distributions. InInternational Conference on Algorithmic Learning Theory, pages 13–31, 2007. 32

work page 2007
[54]

Hilbert space embeddings and metrics on probability measures.Jour- nal of Machine Learning Research, 11(50):1517–1561, 2010

Bharath K Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Sch¨ olkopf, and Gert RG Lanckriet. Hilbert space embeddings and metrics on probability measures.Jour- nal of Machine Learning Research, 11(50):1517–1561, 2010

work page 2010
[55]

Scalable Bayes via barycenter in Wasser- stein space.Journal of Machine Learning Research, 19(8):1–35, 2018

Sanvesh Srivastava, Cheng Li, and David B Dunson. Scalable Bayes via barycenter in Wasser- stein space.Journal of Machine Learning Research, 19(8):1–35, 2018

work page 2018
[56]

Distributionally robust optimization and generalization in kernel methods

Matthew Staib and Stefanie Jegelka. Distributionally robust optimization and generalization in kernel methods. InAdvances in Neural Information Processing Systems, volume 32, 2019

work page 2019
[57]

Se- quential domain adaptation by synthesizing distributionally robust experts

Bahar Taskesen, Man-Chung Yue, Jose Blanchet, Daniel Kuhn, and Viet Anh Nguyen. Se- quential domain adaptation by synthesizing distributionally robust experts. InProceedings of the 38th International Conference on Machine Learning, volume 139, pages 10162–10172, 2021

work page 2021
[58]

Minimax estimation of kernel mean embeddings.Journal of Machine Learning Research, 18(86):1–47, 2017

Ilya Tolstikhin, Bharath K Sriperumbudur, Krikamol Mu, et al. Minimax estimation of kernel mean embeddings.Journal of Machine Learning Research, 18(86):1–47, 2017

work page 2017
[59]

The multi-product newsvendor problem: Review, extensions, and directions for future research.Handbook of Newsvendor Problems: Models, Extensions and Applications, pages 3–39, 2012

Nazli Turken, Yinliang Tan, Asoo J Vakharia, Lan Wang, Ruoxuan Wang, and Arda Yeni- pazarli. The multi-product newsvendor problem: Review, extensions, and directions for future research.Handbook of Newsvendor Problems: Models, Extensions and Applications, pages 3–39, 2012

work page 2012
[60]

Springer Science & Business Media, 2006

Vladimir Vapnik.Estimation of Dependences Based on Empirical Data. Springer Science & Business Media, 2006

work page 2006
[61]

Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges.Statistical Science, 30(2):199, 2015

Sof´ ıa S Villar, Jack Bowden, and James Wason. Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges.Statistical Science, 30(2):199, 2015

work page 2015
[62]

Contextual optimization under covariate shift: A robust approach by intersecting Wasserstein balls.arXiv:2406.02426, 2024

Tianyu Wang, Ningyuan Chen, and Chun Wang. Contextual optimization under covariate shift: A robust approach by intersecting Wasserstein balls.arXiv:2406.02426, 2024

work page arXiv 2024
[63]

Gaussian mixture model based distri- butionally robust optimal power flow with CVaR constraints.arXiv:2110.13336, 2021

Lei You, Hui Ma, Tapan Kumar Saha, and Gang Liu. Gaussian mixture model based distri- butionally robust optimal power flow with CVaR constraints.arXiv:2110.13336, 2021

work page arXiv 2021
[64]

Efficient algorithms for empirical group distributionally robust optimization and beyond

Dingzhi Yu, Yunuo Cai, Wei Jiang, and Lijun Zhang. Efficient algorithms for empirical group distributionally robust optimization and beyond. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 57384–57414, 2024

work page 2024
[65]

Stochastic approximation approaches to group distributionally robust optimization

Lijun Zhang, Peng Zhao, Zhen-Hua Zhuang, Tianbao Yang, and Zhi-Hua Zhou. Stochastic approximation approaches to group distributionally robust optimization. InAdvances in Neural Information Processing Systems, volume 36, pages 52490–52522, 2023

work page 2023
[66]

Kernel distribu- tionally robust optimization: Generalized duality theorem and stochastic approximation

Jia-Jie Zhu, Wittawat Jitkrittum, Moritz Diehl, and Bernhard Sch¨ olkopf. Kernel distribu- tionally robust optimization: Generalized duality theorem and stochastic approximation. In International Conference on Artificial Intelligence and Statistics, pages 280–288, 2021. 33

work page 2021
[67]

Demand forecasting tool for inventory control smart systems.Journal of Communications Software and Systems, 17(2):185–196, 2021

Fatima Zohra Benhamida, Ouahiba Kaddouri, Tahar Ouhrouche, Mohammed Benaichouche, Diego Casado-Mansilla, and Diego L´ opez-de Ipina. Demand forecasting tool for inventory control smart systems.Journal of Communications Software and Systems, 17(2):185–196, 2021. 34 Appendix A Proofs A.1 Additional notation Before proceeding, we introduce notations that are...

work page 2021
[68]

Ki′i′ −2 MX l=1 αlKli′ +α ⊤Kα # ,(A.59) with the empirical plug-in version ˆR2 = sup i′∈BSM

Next, squaring both sides of (A.27), taking expectations, and using∥x+y∥ 2 2 ≤2 (∥x∥ 2 2 +∥y∥ 2 2), we obtain an upper bound forE ∥ˆα−α∥ 2 2 : E ∥ˆα−α∥ 2 2 ≤ 2 λ2 min(K) ∥α∥2 2 E n ∥ ˆK−K∥ 2 2 o + 1 4 E n ∥diag( ˆK)−diag(K)∥ 2 2 o .(A.46) To control the first term on the right-hand side, we use the decomposition from the proof of Proposition 4 (See Append...

work page
[69]

This concludes the proof

Therefore, ∥ˆg∗ −Proj HΓˆg∗∥Hk ≤ ∥θ ∗∥2 q ∥KN N −K N SK† SSKSN ∥2 =∥θ ∗∥2 p ∥R∥2. This concludes the proof. 58

work page

[1] [1]

Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

Nachman Aronszajn. Theory of reproducing kernels.Transactions of the American Mathe- matical Society, 68(3):337–404, 1950

work page 1950

[2] [2]

Distributionally robust data join

Pranjal Awasthi, Christopher Jung, and Jamie Morgenstern. Distributionally robust data join. arXiv:2202.05797, 2022. 28

work page arXiv 2022

[3] [3]

On the equivalence between kernel quadrature rules and random feature expan- sions.Journal of Machine Learning Research, 18(21):1–38, 2017

Francis Bach. On the equivalence between kernel quadrature rules and random feature expan- sions.Journal of Machine Learning Research, 18(21):1–38, 2017

work page 2017

[4] [4]

Robust solutions of optimization problems affected by uncertain probabilities.Manage- ment Science, 59(2):341–357, 2013

Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Ren- nen. Robust solutions of optimization problems affected by uncertain probabilities.Manage- ment Science, 59(2):341–357, 2013

work page 2013

[5] [5]

Deriving robust counterparts of nonlinear uncertain inequalities.Mathematical Programming, 149(1):265–299, 2015

Aharon Ben-Tal, Dick Den Hertog, and Jean-Philippe Vial. Deriving robust counterparts of nonlinear uncertain inequalities.Mathematical Programming, 149(1):265–299, 2015

work page 2015

[6] [6]

Infinitely constrained optimization problems.Journal of Optimization Theory and Applications, 19(2):261–281, 1976

Jerry W Blankenship and James E Falk. Infinitely constrained optimization problems.Journal of Optimization Theory and Applications, 19(2):261–281, 1976

work page 1976

[7] [7]

Distributionally robust optimization via ball oracle acceler- ation

Yair Carmon and Danielle Hausler. Distributionally robust optimization via ball oracle acceler- ation. InAdvances in Neural Information Processing Systems, volume 35, pages 35866–35879, 2022

work page 2022

[8] [8]

Super-samples from kernel herding

Yutian Chen, Max Welling, and Alex Smola. Super-samples from kernel herding. InProceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, page 109–116, 2010

work page 2010

[9] [9]

Fast computation of Wasserstein barycenters

Marco Cuturi and Arnaud Doucet. Fast computation of Wasserstein barycenters. InInterna- tional Conference on Machine Learning, pages 685–693. PMLR, 2014

work page 2014

[10] [10]

Distributionally robust federated averaging.Advances in Neural Information Processing Systems, 33:15111–15122, 2020

Yuyang Deng, Mohammad Mahdi Kamani, and Mehrdad Mahdavi. Distributionally robust federated averaging.Advances in Neural Information Processing Systems, 33:15111–15122, 2020

work page 2020

[11] [11]

A permutation-based kernel conditional independence test

Gary Doran, Krikamol Muandet, Kun Zhang, and Bernhard Sch¨ olkopf. A permutation-based kernel conditional independence test. InProceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, page 132–141, 2014

work page 2014

[12] [12]

Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021

John C Duchi and Hongseok Namkoong. Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021

work page 2021

[13] [13]

Data pooling for multiple single- component systems under population heterogeneity.International Journal of Production Eco- nomics, 250:108665, 2022

˙Ipek Dursun, Alp Ak¸ cay, and Geert-Jan Van Houtum. Data pooling for multiple single- component systems under population heterogeneity.International Journal of Production Eco- nomics, 250:108665, 2022

work page 2022

[14] [14]

Yara Kayyali Elalem, Sebastian Maier, and Ralf W. Seifert. A machine learning-based frame- work for forecasting sales of new products with short life cycles using deep neural networks. International Journal of Forecasting, 39(4):1874–1894, 2023

work page 2023

[15] [15]

Imperial College London, London, 2020

Neil M Ferguson, Daniel Laydon, Gemma Nedjati-Gilani, Natsuko Imai, Kylie Ainslie, Marc Baguelin, Sangeeta Bhatia, Adhiratha Boonyasiri, Zulma Cucunub´ a, Gina Cuomo- Dannenburg, et al.Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce 29 COVID19 mortality and healthcare demand, volume 16. Imperial College London, London, 2020

work page 2020

[16] [16]

A stochastic approach to the gamma function.The American Mathematical Monthly, 101(9):858–865, 1994

Louis Gordon. A stochastic approach to the gamma function.The American Mathematical Monthly, 101(9):858–865, 1994

work page 1994

[17] [17]

A kernel method for the two-sample-problem

Arthur Gretton, Karsten Borgwardt, Malte Rasch, Bernhard Sch¨ olkopf, and Alex Smola. A kernel method for the two-sample-problem. InAdvances in Neural Information Processing Systems, volume 19, 2006

work page 2006

[18] [18]

A kernel two-sample test.Journal of Machine Learning Research, 13(1):723–773, 2012

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Sch¨ olkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13(1):723–773, 2012

work page 2012

[19] [19]

Support measure data description for group anomaly detection

Jorge Guevara, Stephane Canu, and Roberto Hirata. Support measure data description for group anomaly detection. InODDx3 Workshop on Outlier Definition, Detection, and Descrip- tion at the 21st ACM SIGKDD International Conference On Knowledge Discovery And Data Mining (KDD2015), 2015

work page 2015

[20] [20]

Statistical analysis of conditional group distributionally robust optimization with cross-entropy loss.arXiv:2507.09905, 2026

Zijian Guo, Zhenyu Wang, Yifan Hu, and Francis Bach. Statistical analysis of conditional group distributionally robust optimization with cross-entropy loss.arXiv:2507.09905, 2026

work page arXiv 2026

[21] [21]

Data pooling in stochastic optimization.Management Science, 68(3):1595–1615, 2022

Vishal Gupta and Nathan Kallus. Data pooling in stochastic optimization.Management Science, 68(3):1595–1615, 2022

work page 2022

[22] [22]

Fairness without demographics in repeated loss minimization

Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without demographics in repeated loss minimization. InInternational Conference on Machine Learning, pages 1929–1938, 2018

work page 1929

[23] [23]

Thomas Hofmann, Bernhard Sch¨ olkopf, and Alexander J. Smola. Kernel methods in machine learning.The Annals of Statistics, 36(1):1171–1220, 2008

work page 2008

[24] [24]

Cambridge university press, 2012

Roger A Horn and Charles R Johnson.Matrix Analysis. Cambridge university press, 2012

work page 2012

[25] [25]

Portfolio optimization with condi- tional value-at-risk objective and constraints.Journal of Risk, 4(2):43–68, 2002

Pavlo Krokhmal, Jonas Palmquist, and Stanislav Uryasev. Portfolio optimization with condi- tional value-at-risk objective and constraints.Journal of Risk, 4(2):43–68, 2002

work page 2002

[26] [26]

Wasserstein distributionally robust optimization: Theory and applications in machine learning.Operations Research & Management Science in the Age of Analytics, pages 130–166, 2019

Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, and Soroosh Shafieezadeh- Abadeh. Wasserstein distributionally robust optimization: Theory and applications in machine learning.Operations Research & Management Science in the Age of Analytics, pages 130–166, 2019

work page 2019

[27] [27]

Distributionally robust optimization

Daniel Kuhn, Soroosh Shafiee, and Wolfram Wiesemann. Distributionally robust optimization. Acta Numerica, 34:579–804, 2025. 30

work page 2025

[28] [28]

Fairness without demographics through adversarially reweighted learning

Preethi Lahoti, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, and Ed Chi. Fairness without demographics through adversarially reweighted learning. InAdvances in Neural Information Processing Systems, volume 33, pages 728–740, 2020

work page 2020

[29] [29]

Springer Science & Business Media, 2013

Michel Ledoux and Michel Talagrand.Probability in Banach Spaces: Isoperimetry and Pro- cesses. Springer Science & Business Media, 2013

work page 2013

[30] [30]

Temporally and distributionally robust optimization for cold-start recommendation

Xinyu Lin, Wenjie Wang, Jujia Zhao, Yongqi Li, Fuli Feng, and Tat-Seng Chua. Temporally and distributionally robust optimization for cold-start recommendation. InProceedings of the AAAI Conference on Artificial Intelligence, 2024

work page 2024

[31] [31]

Multi-source conformal infer- ence under distribution shift

Yi Liu, Alexander Levis, Sharon-Lise Normand, and Larry Han. Multi-source conformal infer- ence under distribution shift. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 31344–31382, 2024

work page 2024

[32] [32]

On the method of bounded differences.Surveys in Combinatorics, 141(1):148–188, 1989

Colin McDiarmid et al. On the method of bounded differences.Surveys in Combinatorics, 141(1):148–188, 1989

work page 1989

[33] [33]

A comparison of three methods for selecting values of input variables in the analysis of output from a computer code

Michael D McKay, Richard J Beckman, and William J Conover. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 42(1):55–61, 2000

work page 2000

[34] [34]

Peyman Mohajerin Esfahani and Daniel Kuhn. Data-driven distributionally robust optimiza- tion using the Wasserstein metric: Performance guarantees and tractable reformulations.Math- ematical Programming, 171(1):115–166, 2018

work page 2018

[35] [35]

Agnostic federated learning

Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. Agnostic federated learning. In International Conference on Machine Learning, pages 4615–4625, 2019

work page 2019

[36] [36]

Wasserstein barycenter for multi-source domain adaptation

Eduardo Fernandes Montesuma and Fred Maurice Ngole Mboula. Wasserstein barycenter for multi-source domain adaptation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16785–16793, 2021

work page 2021

[37] [37]

Ker- nel mean embedding of distributions: A review and beyond.Foundations and Trends®in Machine Learning, 10(1-2):1–141, 2017

Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, Bernhard Sch¨ olkopf, et al. Ker- nel mean embedding of distributions: A review and beyond.Foundations and Trends®in Machine Learning, 10(1-2):1–141, 2017

work page 2017

[38] [38]

One-class support measure machines for group anomaly detection

Krikamol Muandet and Bernhard Sch¨ olkopf. One-class support measure machines for group anomaly detection. InProceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, pages 449–458, 2013

work page 2013

[39] [39]

Cutting-set methods for robust convex optimization with pessimizing oracles.Optimization Methods & Software, 24(3):381–406, 2009

Almir Mutapcic and Stephen Boyd. Cutting-set methods for robust convex optimization with pessimizing oracles.Optimization Methods & Software, 24(3):381–406, 2009

work page 2009

[40] [40]

Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994

Iosif Pinelis. Optimum bounds for the distributions of martingales in banach spaces.The Annals of Probability, pages 1679–1706, 1994. 31

work page 1994

[41] [41]

Sequential minimal optimization: A fast algorithm for training support vector machines.Advances in Kernel Methods-Support Vector Learning, 208, 1998

John Platt. Sequential minimal optimization: A fast algorithm for training support vector machines.Advances in Kernel Methods-Support Vector Learning, 208, 1998

work page 1998

[42] [42]

Potra and Stephen J

Florian A. Potra and Stephen J. Wright. Interior-point methods.Journal of Computational and Applied Mathematics, 124(1):281–302, 2000

work page 2000

[43] [43]

Tyrrell Rockafellar and Stanislav Uryasev

R. Tyrrell Rockafellar and Stanislav Uryasev. Optimization of conditional value-at risk.Journal of Risk, 3:21–41, 2000

work page 2000

[44] [44]

Rychener, A

Yves Rychener, Adri´ an Esteban-P´ erez, Juan M Morales, and Daniel Kuhn. Wasserstein dis- tributionally robust optimization with heterogeneous data sources.arXiv:2407.13582, 2024

work page arXiv 2024

[45] [45]

A survey of contextual optimization methods for decision making under uncertainty.European Journal of Operational Research, 2024

Utsav Sadana, Abhilash Reddy Chenreddy, Erick Delage, Alexandre Forel, Emma Frejinger, and Thibaut Vidal. A survey of contextual optimization methods for decision making under uncertainty.European Journal of Operational Research, 2024

work page 2024

[46] [46]

Hashimoto, and Percy Liang

Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, and Percy Liang. Distributionally robust neural networks. InInternational Conference on Learning Representations, 2020

work page 2020

[47] [47]

Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning.SIAM Journal on Imaging Sci- ences, 11(1):643–678, 2018

Morgan A Schmitz, Matthieu Heitz, Nicolas Bonneel, Fred Ngole, David Coeurjolly, Marco Cuturi, Gabriel Peyr´ e, and Jean-Luc Starck. Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning.SIAM Journal on Imaging Sci- ences, 11(1):643–678, 2018

work page 2018

[48] [48]

A generalized representer theorem

Bernhard Sch¨ olkopf, Ralf Herbrich, and Alex J Smola. A generalized representer theorem. In International Conference on Computational Learning Theory, pages 416–426. Springer, 2001

work page 2001

[49] [49]

MIT press, 2002

Bernhard Sch¨ olkopf and Alexander J Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT press, 2002

work page 2002

[50] [50]

Potluru, Tucker Balch, and Manuela Veloso

Aras Selvi, Eleonora Kreacic, Mohsen Ghassemi, Vamsi K. Potluru, Tucker Balch, and Manuela Veloso. Distributionally and adversarially robust logistic regression via intersecting Wasserstein balls. InProceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, volume 286, pages 3641–3674, 2025

work page 2025

[51] [51]

SIAM, 2021

Alexander Shapiro, Darinka Dentcheva, and Andrzej Ruszczynski.Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2021

work page 2021

[52] [52]

Siebes, and Siamak Mehrkanoon

Jie Shi, Arno P.J.M. Siebes, and Siamak Mehrkanoon. Transcoralnet: A two-stream trans- former coral networks for supply chain credit assessment cold start.Expert Systems with Applications, 282:127581, 2025

work page 2025

[53] [53]

A Hilbert space embedding for distributions

Alex Smola, Arthur Gretton, Le Song, and Bernhard Sch¨ olkopf. A Hilbert space embedding for distributions. InInternational Conference on Algorithmic Learning Theory, pages 13–31, 2007. 32

work page 2007

[54] [54]

Hilbert space embeddings and metrics on probability measures.Jour- nal of Machine Learning Research, 11(50):1517–1561, 2010

Bharath K Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Sch¨ olkopf, and Gert RG Lanckriet. Hilbert space embeddings and metrics on probability measures.Jour- nal of Machine Learning Research, 11(50):1517–1561, 2010

work page 2010

[55] [55]

Scalable Bayes via barycenter in Wasser- stein space.Journal of Machine Learning Research, 19(8):1–35, 2018

Sanvesh Srivastava, Cheng Li, and David B Dunson. Scalable Bayes via barycenter in Wasser- stein space.Journal of Machine Learning Research, 19(8):1–35, 2018

work page 2018

[56] [56]

Distributionally robust optimization and generalization in kernel methods

Matthew Staib and Stefanie Jegelka. Distributionally robust optimization and generalization in kernel methods. InAdvances in Neural Information Processing Systems, volume 32, 2019

work page 2019

[57] [57]

Se- quential domain adaptation by synthesizing distributionally robust experts

Bahar Taskesen, Man-Chung Yue, Jose Blanchet, Daniel Kuhn, and Viet Anh Nguyen. Se- quential domain adaptation by synthesizing distributionally robust experts. InProceedings of the 38th International Conference on Machine Learning, volume 139, pages 10162–10172, 2021

work page 2021

[58] [58]

Minimax estimation of kernel mean embeddings.Journal of Machine Learning Research, 18(86):1–47, 2017

Ilya Tolstikhin, Bharath K Sriperumbudur, Krikamol Mu, et al. Minimax estimation of kernel mean embeddings.Journal of Machine Learning Research, 18(86):1–47, 2017

work page 2017

[59] [59]

The multi-product newsvendor problem: Review, extensions, and directions for future research.Handbook of Newsvendor Problems: Models, Extensions and Applications, pages 3–39, 2012

Nazli Turken, Yinliang Tan, Asoo J Vakharia, Lan Wang, Ruoxuan Wang, and Arda Yeni- pazarli. The multi-product newsvendor problem: Review, extensions, and directions for future research.Handbook of Newsvendor Problems: Models, Extensions and Applications, pages 3–39, 2012

work page 2012

[60] [60]

Springer Science & Business Media, 2006

Vladimir Vapnik.Estimation of Dependences Based on Empirical Data. Springer Science & Business Media, 2006

work page 2006

[61] [61]

Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges.Statistical Science, 30(2):199, 2015

Sof´ ıa S Villar, Jack Bowden, and James Wason. Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges.Statistical Science, 30(2):199, 2015

work page 2015

[62] [62]

Contextual optimization under covariate shift: A robust approach by intersecting Wasserstein balls.arXiv:2406.02426, 2024

Tianyu Wang, Ningyuan Chen, and Chun Wang. Contextual optimization under covariate shift: A robust approach by intersecting Wasserstein balls.arXiv:2406.02426, 2024

work page arXiv 2024

[63] [63]

Gaussian mixture model based distri- butionally robust optimal power flow with CVaR constraints.arXiv:2110.13336, 2021

Lei You, Hui Ma, Tapan Kumar Saha, and Gang Liu. Gaussian mixture model based distri- butionally robust optimal power flow with CVaR constraints.arXiv:2110.13336, 2021

work page arXiv 2021

[64] [64]

Efficient algorithms for empirical group distributionally robust optimization and beyond

Dingzhi Yu, Yunuo Cai, Wei Jiang, and Lijun Zhang. Efficient algorithms for empirical group distributionally robust optimization and beyond. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 57384–57414, 2024

work page 2024

[65] [65]

Stochastic approximation approaches to group distributionally robust optimization

Lijun Zhang, Peng Zhao, Zhen-Hua Zhuang, Tianbao Yang, and Zhi-Hua Zhou. Stochastic approximation approaches to group distributionally robust optimization. InAdvances in Neural Information Processing Systems, volume 36, pages 52490–52522, 2023

work page 2023

[66] [66]

Kernel distribu- tionally robust optimization: Generalized duality theorem and stochastic approximation

Jia-Jie Zhu, Wittawat Jitkrittum, Moritz Diehl, and Bernhard Sch¨ olkopf. Kernel distribu- tionally robust optimization: Generalized duality theorem and stochastic approximation. In International Conference on Artificial Intelligence and Statistics, pages 280–288, 2021. 33

work page 2021

[67] [67]

Demand forecasting tool for inventory control smart systems.Journal of Communications Software and Systems, 17(2):185–196, 2021

Fatima Zohra Benhamida, Ouahiba Kaddouri, Tahar Ouhrouche, Mohammed Benaichouche, Diego Casado-Mansilla, and Diego L´ opez-de Ipina. Demand forecasting tool for inventory control smart systems.Journal of Communications Software and Systems, 17(2):185–196, 2021. 34 Appendix A Proofs A.1 Additional notation Before proceeding, we introduce notations that are...

work page 2021

[68] [68]

Ki′i′ −2 MX l=1 αlKli′ +α ⊤Kα # ,(A.59) with the empirical plug-in version ˆR2 = sup i′∈BSM

Next, squaring both sides of (A.27), taking expectations, and using∥x+y∥ 2 2 ≤2 (∥x∥ 2 2 +∥y∥ 2 2), we obtain an upper bound forE ∥ˆα−α∥ 2 2 : E ∥ˆα−α∥ 2 2 ≤ 2 λ2 min(K) ∥α∥2 2 E n ∥ ˆK−K∥ 2 2 o + 1 4 E n ∥diag( ˆK)−diag(K)∥ 2 2 o .(A.46) To control the first term on the right-hand side, we use the decomposition from the proof of Proposition 4 (See Append...

work page

[69] [69]

This concludes the proof

Therefore, ∥ˆg∗ −Proj HΓˆg∗∥Hk ≤ ∥θ ∗∥2 q ∥KN N −K N SK† SSKSN ∥2 =∥θ ∗∥2 p ∥R∥2. This concludes the proof. 58

work page