Recognition: unknown
Sample Average Approximation for Distributionally Robust Optimization with φ-divergences
Pith reviewed 2026-05-10 15:03 UTC · model grok-4.3
The pith
When the divergence function φ grows superlinearly, the sample complexity for approximating worst-case expectations via sample average approximation becomes independent of the nominal distribution P.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that the growth rate of the divergence function φ completely determines whether the sample complexity of sample average approximation for the worst-case expectation can be made independent of the reference measure P. Under superlinear growth, explicit upper and lower bounds on the required sample size are derived that depend solely on φ, the divergence radius, and the target precision, with optimality shown for common φ-divergences.
What carries the argument
The φ-divergence ball centered at a nominal measure P, with the superlinear growth property of the convex function φ governing the divergence.
Load-bearing premise
The random variable of interest takes values in a fixed bounded interval [-B, B].
What would settle it
For a φ with superlinear growth, fix ε and the ball radius, then vary the nominal P; if the minimal number of samples needed for ε-accurate estimation stays bounded independently of P, the claim holds, otherwise it is falsified.
read the original abstract
It is well known that estimating the expectation of any given bounded random variable with values in $[-B, B]$ has a sample complexity of $\mathrm{O}(B^2/\epsilon^2)$ that is independent of the underlying probability measure. We show that this property can no longer hold when evaluating the worst-case expectation of the random variable, where the probability measures defining the expectation belong to a $\phi$-divergence ball centered at some nominal measure $P$. Specifically, the sample complexity and its dependence on the nominal measure can be completely characterized by the growth of the divergence function. When the divergence function $\phi$ exhibits superlinear growth, a $P$-independent sample complexity can be obtained for sample average approximation, which depends only on the growth of $\phi$, the radius of the divergence ball, and the target precision. We also provide sample complexity lower bounds and demonstrate the optimality of the obtained bounds for commonly used $\phi$-divergences. On the other hand, when superlinear growth does not hold for $\phi$, we show that for any estimation method, evaluating the worst-case expectation has a $P$-dependent sample complexity lower bound that can be made arbitrarily large by changing $P$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes the sample complexity of sample average approximation (SAA) for estimating the worst-case expectation of a bounded random variable (taking values in [-B, B]) under a φ-divergence ball of radius around a nominal distribution P. It establishes that this complexity is independent of P precisely when φ exhibits superlinear growth, in which case the bound depends only on the growth properties of φ, the ball radius, and the target accuracy ε. Matching lower bounds are derived and shown to be optimal for standard choices such as KL divergence and total variation. When φ lacks superlinear growth, any estimator has a P-dependent sample complexity that can be made arbitrarily large by suitable choice of P.
Significance. If the stated upper and lower bounds hold, the work supplies a sharp, growth-rate-based characterization of statistical complexity in distributionally robust optimization with φ-divergences. The explicit P-independence result for superlinear φ, together with the complementary lower-bound construction, clarifies when DRO problems inherit the distribution-free sample complexity of ordinary expectation estimation. The optimality demonstrations for common divergences and the clean separation by growth rate constitute concrete strengths.
major comments (2)
- [§3, Theorem 3.2] §3, Theorem 3.2 (upper bound): The claimed P-independent sample complexity for superlinear φ is derived via concentration of the empirical measure under the divergence ball; the proof invokes the boundedness constant B inside the deviation term. It is unclear whether B is absorbed into the stated dependence on φ-growth, radius, and ε or remains an explicit factor; this affects whether the bound is truly parameter-free beyond the listed quantities.
- [§5, Theorem 5.1] §5, Theorem 5.1 (lower bound construction): The argument that sample complexity can be made arbitrarily large for non-superlinear φ proceeds by constructing a sequence of nominal measures P_n whose worst-case expectations differ by Ω(1) while the divergence ball remains fixed. The construction relies on the specific form of φ near zero; a short verification that the constructed P_n indeed lie inside the admissible set for every n would strengthen the claim.
minor comments (2)
- Notation: the radius of the φ-divergence ball is denoted both by δ and by ρ in different sections; a single symbol would improve readability.
- [Assumption 2.1] The statement of Assumption 2.1 should explicitly record that B is known to the estimator, as this is used in the sample-size formulas.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and have incorporated revisions to improve clarity.
read point-by-point responses
-
Referee: [§3, Theorem 3.2] §3, Theorem 3.2 (upper bound): The claimed P-independent sample complexity for superlinear φ is derived via concentration of the empirical measure under the divergence ball; the proof invokes the boundedness constant B inside the deviation term. It is unclear whether B is absorbed into the stated dependence on φ-growth, radius, and ε or remains an explicit factor; this affects whether the bound is truly parameter-free beyond the listed quantities.
Authors: We appreciate this observation. The proof of Theorem 3.2 indeed uses the boundedness of the random variable (values in [-B, B]) within the concentration step for the empirical measure. The resulting sample-complexity bound scales with B (analogous to the classical Hoeffding dependence) while remaining independent of P and depending on the growth rate of φ, the ball radius, and ε. We will revise the statement of Theorem 3.2, the abstract, and the discussion in Section 3 to explicitly list B among the parameters on which the bound depends, thereby making the full parameter dependence transparent. revision: yes
-
Referee: [§5, Theorem 5.1] §5, Theorem 5.1 (lower bound construction): The argument that sample complexity can be made arbitrarily large for non-superlinear φ proceeds by constructing a sequence of nominal measures P_n whose worst-case expectations differ by Ω(1) while the divergence ball remains fixed. The construction relies on the specific form of φ near zero; a short verification that the constructed P_n indeed lie inside the admissible set for every n would strengthen the claim.
Authors: We thank the referee for this helpful suggestion. The sequence {P_n} is constructed so that each P_n is a valid probability measure supported on the same finite set, and the φ-divergence ball of fixed radius around P_n is well-defined. To make the argument fully self-contained, we will insert a short remark immediately after the construction in the proof of Theorem 5.1 that explicitly verifies (i) each P_n is a probability measure and (ii) the worst-case expectations under the ball are attained at extreme points that remain admissible for all n. This verification follows directly from the explicit form of the measures but will now be written out for the reader. revision: yes
Circularity Check
No significant circularity
full rationale
The derivation chain begins from the standard sample complexity for bounded random variables in [-B, B] (a well-known external fact) and extends it to worst-case expectations over φ-divergence balls by characterizing dependence on the growth rate of φ. This extension uses explicit concentration arguments and divergence properties that do not reduce the claimed P-independent bounds (when φ grows superlinearly) to quantities defined in terms of the target result itself. Lower bounds are constructed separately by varying P when superlinear growth fails, without self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The analysis remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The random variable takes values in the bounded interval [-B, B]
Reference graph
Works this paper leans on
-
[1]
Data-driven stochastic programming using phi-divergences
G¨ uzin Bayraksan and David K Love. Data-driven stochastic programming using phi-divergences. InThe operations research revolution, pages 1–19. Informs, 2015
2015
-
[2]
Cristian Ch´ avez and Yan Li. Sample complexity for markov decision processes and stochastic optimal control with static risk measures.arXiv preprint arXiv:2604.04795, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[3]
Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021
John C Duchi and Hongseok Namkoong. Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021. 16
2021
-
[4]
Large-scale methods for distributionally robust optimization.Advances in neural information processing systems, 33:8847–8860, 2020
Daniel Levy, Yair Carmon, John C Duchi, and Aaron Sidford. Large-scale methods for distributionally robust optimization.Advances in neural information processing systems, 33:8847–8860, 2020
2020
-
[5]
Robust stochastic approx- imation approach to stochastic programming.SIAM Journal on optimization, 19(4):1574–1609, 2009
Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro. Robust stochastic approx- imation approach to stochastic programming.SIAM Journal on optimization, 19(4):1574–1609, 2009
2009
-
[6]
Distributionally robust stochastic programming.SIAM Journal on Optimization, 27(4):2258–2275, 2017
Alexander Shapiro. Distributionally robust stochastic programming.SIAM Journal on Optimization, 27(4):2258–2275, 2017
2017
-
[7]
Distributionally robust model-based offline reinforcement learning with near- optimal sample complexity.Journal of Machine Learning Research, 25(200):1–91, 2024
Laixi Shi and Yuejie Chi. Distributionally robust model-based offline reinforcement learning with near- optimal sample complexity.Journal of Machine Learning Research, 25(200):1–91, 2024
2024
-
[8]
Sample complexity of variance-reduced distributionally robust q-learning.Journal of Machine Learning Research, 25(341):1–77, 2024
Shengbo Wang, Nian Si, Jose Blanchet, and Zhengyuan Zhou. Sample complexity of variance-reduced distributionally robust q-learning.Journal of Machine Learning Research, 25(341):1–77, 2024
2024
-
[9]
Improved sample complexity bounds for distribution- ally robust reinforcement learning
Zaiyan Xu, Kishan Panaganti, and Dileep Kalathil. Improved sample complexity bounds for distribution- ally robust reinforcement learning. InInternational Conference on Artificial Intelligence and Statistics, pages 9728–9754. PMLR, 2023. 17
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.