arxiv: 2604.10855 · v3 · submitted 2026-04-12 · 🧮 math.OC · math.ST· stat.TH

Recognition: unknown

Sample Average Approximation for Distributionally Robust Optimization with φ-divergences

Yan Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:03 UTC · model grok-4.3

classification 🧮 math.OC math.STstat.TH

keywords sample average approximationdistributionally robust optimizationφ-divergencessample complexityworst-case expectationdivergence ballsbounded random variables

0 comments

The pith

When the divergence function φ grows superlinearly, the sample complexity for approximating worst-case expectations via sample average approximation becomes independent of the nominal distribution P.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Estimating the expectation of a bounded random variable requires a number of samples independent of the probability measure. This paper shows that the same independence fails for the worst-case expectation over a φ-divergence ball. When φ grows faster than linearly, sample average approximation recovers a sample complexity that depends only on the growth of φ, the ball radius, and the accuracy level. The authors prove matching lower bounds establishing optimality for standard divergences such as KL divergence. If φ does not grow superlinearly, any estimation procedure suffers a sample complexity lower bound that can be made arbitrarily large by choosing an appropriate nominal measure P.

Core claim

The central discovery is that the growth rate of the divergence function φ completely determines whether the sample complexity of sample average approximation for the worst-case expectation can be made independent of the reference measure P. Under superlinear growth, explicit upper and lower bounds on the required sample size are derived that depend solely on φ, the divergence radius, and the target precision, with optimality shown for common φ-divergences.

What carries the argument

The φ-divergence ball centered at a nominal measure P, with the superlinear growth property of the convex function φ governing the divergence.

Load-bearing premise

The random variable of interest takes values in a fixed bounded interval [-B, B].

What would settle it

For a φ with superlinear growth, fix ε and the ball radius, then vary the nominal P; if the minimal number of samples needed for ε-accurate estimation stays bounded independently of P, the claim holds, otherwise it is falsified.

read the original abstract

It is well known that estimating the expectation of any given bounded random variable with values in $[-B, B]$ has a sample complexity of $\mathrm{O}(B^2/\epsilon^2)$ that is independent of the underlying probability measure. We show that this property can no longer hold when evaluating the worst-case expectation of the random variable, where the probability measures defining the expectation belong to a $\phi$-divergence ball centered at some nominal measure $P$. Specifically, the sample complexity and its dependence on the nominal measure can be completely characterized by the growth of the divergence function. When the divergence function $\phi$ exhibits superlinear growth, a $P$-independent sample complexity can be obtained for sample average approximation, which depends only on the growth of $\phi$, the radius of the divergence ball, and the target precision. We also provide sample complexity lower bounds and demonstrate the optimality of the obtained bounds for commonly used $\phi$-divergences. On the other hand, when superlinear growth does not hold for $\phi$, we show that for any estimation method, evaluating the worst-case expectation has a $P$-dependent sample complexity lower bound that can be made arbitrarily large by changing $P$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper cleanly splits sample complexity for SAA of worst-case expectations under phi-divergence balls into P-independent or P-dependent cases based on whether phi grows superlinearly, with matching bounds.

read the letter

The core result is that the usual O(B^2/eps^2) sample complexity for bounded random variables carries over to worst-case expectations over a phi-ball only when phi has superlinear growth; otherwise the complexity depends on P and can be driven arbitrarily high by choice of nominal measure. The paper gives explicit upper bounds depending on the growth rate of phi, the radius, and epsilon, plus lower bounds showing these are tight for common divergences like KL and chi-squared. This is a useful extension of the classical independent-of-P fact to the robust setting. The argument relies on standard concentration inequalities and properties of phi-divergences, without circularity or hidden dependence on P in the superlinear case. The boundedness assumption in [-B,B] is explicit and standard, though it does limit immediate applicability to unbounded variables. Proof details are not visible in the abstract, but the stress-test description shows no internal contradictions or unstated uniformity requirements. The work is aimed at researchers in distributionally robust optimization and stochastic programming who need precise conditions for when sample-average methods remain reliable. A reader working on sample complexity or robust decision-making would find the dichotomy and optimality claims directly usable. It deserves peer review because the characterization is sharp and the bounds are stated to be optimal for standard phi choices.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes the sample complexity of sample average approximation (SAA) for estimating the worst-case expectation of a bounded random variable (taking values in [-B, B]) under a φ-divergence ball of radius around a nominal distribution P. It establishes that this complexity is independent of P precisely when φ exhibits superlinear growth, in which case the bound depends only on the growth properties of φ, the ball radius, and the target accuracy ε. Matching lower bounds are derived and shown to be optimal for standard choices such as KL divergence and total variation. When φ lacks superlinear growth, any estimator has a P-dependent sample complexity that can be made arbitrarily large by suitable choice of P.

Significance. If the stated upper and lower bounds hold, the work supplies a sharp, growth-rate-based characterization of statistical complexity in distributionally robust optimization with φ-divergences. The explicit P-independence result for superlinear φ, together with the complementary lower-bound construction, clarifies when DRO problems inherit the distribution-free sample complexity of ordinary expectation estimation. The optimality demonstrations for common divergences and the clean separation by growth rate constitute concrete strengths.

major comments (2)

[§3, Theorem 3.2] §3, Theorem 3.2 (upper bound): The claimed P-independent sample complexity for superlinear φ is derived via concentration of the empirical measure under the divergence ball; the proof invokes the boundedness constant B inside the deviation term. It is unclear whether B is absorbed into the stated dependence on φ-growth, radius, and ε or remains an explicit factor; this affects whether the bound is truly parameter-free beyond the listed quantities.
[§5, Theorem 5.1] §5, Theorem 5.1 (lower bound construction): The argument that sample complexity can be made arbitrarily large for non-superlinear φ proceeds by constructing a sequence of nominal measures P_n whose worst-case expectations differ by Ω(1) while the divergence ball remains fixed. The construction relies on the specific form of φ near zero; a short verification that the constructed P_n indeed lie inside the admissible set for every n would strengthen the claim.

minor comments (2)

Notation: the radius of the φ-divergence ball is denoted both by δ and by ρ in different sections; a single symbol would improve readability.
[Assumption 2.1] The statement of Assumption 2.1 should explicitly record that B is known to the estimator, as this is used in the sample-size formulas.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and have incorporated revisions to improve clarity.

read point-by-point responses

Referee: [§3, Theorem 3.2] §3, Theorem 3.2 (upper bound): The claimed P-independent sample complexity for superlinear φ is derived via concentration of the empirical measure under the divergence ball; the proof invokes the boundedness constant B inside the deviation term. It is unclear whether B is absorbed into the stated dependence on φ-growth, radius, and ε or remains an explicit factor; this affects whether the bound is truly parameter-free beyond the listed quantities.

Authors: We appreciate this observation. The proof of Theorem 3.2 indeed uses the boundedness of the random variable (values in [-B, B]) within the concentration step for the empirical measure. The resulting sample-complexity bound scales with B (analogous to the classical Hoeffding dependence) while remaining independent of P and depending on the growth rate of φ, the ball radius, and ε. We will revise the statement of Theorem 3.2, the abstract, and the discussion in Section 3 to explicitly list B among the parameters on which the bound depends, thereby making the full parameter dependence transparent. revision: yes
Referee: [§5, Theorem 5.1] §5, Theorem 5.1 (lower bound construction): The argument that sample complexity can be made arbitrarily large for non-superlinear φ proceeds by constructing a sequence of nominal measures P_n whose worst-case expectations differ by Ω(1) while the divergence ball remains fixed. The construction relies on the specific form of φ near zero; a short verification that the constructed P_n indeed lie inside the admissible set for every n would strengthen the claim.

Authors: We thank the referee for this helpful suggestion. The sequence {P_n} is constructed so that each P_n is a valid probability measure supported on the same finite set, and the φ-divergence ball of fixed radius around P_n is well-defined. To make the argument fully self-contained, we will insert a short remark immediately after the construction in the proof of Theorem 5.1 that explicitly verifies (i) each P_n is a probability measure and (ii) the worst-case expectations under the ball are attained at extreme points that remain admissible for all n. This verification follows directly from the explicit form of the measures but will now be written out for the reader. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation chain begins from the standard sample complexity for bounded random variables in [-B, B] (a well-known external fact) and extends it to worst-case expectations over φ-divergence balls by characterizing dependence on the growth rate of φ. This extension uses explicit concentration arguments and divergence properties that do not reduce the claimed P-independent bounds (when φ grows superlinearly) to quantities defined in terms of the target result itself. Lower bounds are constructed separately by varying P when superlinear growth fails, without self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The analysis remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the mathematical properties of φ-divergences and the boundedness of the random variable; no data-fitting parameters are introduced.

axioms (1)

domain assumption The random variable takes values in the bounded interval [-B, B]
This boundedness is invoked to obtain the classical O(B²/ε²) bound and is carried over to the worst-case analysis.

pith-pipeline@v0.9.0 · 5510 in / 1346 out tokens · 47148 ms · 2026-05-10T15:03:22.025549+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Data-driven stochastic programming using phi-divergences

G¨ uzin Bayraksan and David K Love. Data-driven stochastic programming using phi-divergences. InThe operations research revolution, pages 1–19. Informs, 2015

2015
[2]

Sample Complexity for Markov Decision Processes and Stochastic Optimal Control with Static Risk Measures

Cristian Ch´ avez and Yan Li. Sample complexity for markov decision processes and stochastic optimal control with static risk measures.arXiv preprint arXiv:2604.04795, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[3]

Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021

John C Duchi and Hongseok Namkoong. Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021. 16

2021
[4]

Large-scale methods for distributionally robust optimization.Advances in neural information processing systems, 33:8847–8860, 2020

Daniel Levy, Yair Carmon, John C Duchi, and Aaron Sidford. Large-scale methods for distributionally robust optimization.Advances in neural information processing systems, 33:8847–8860, 2020

2020
[5]

Robust stochastic approx- imation approach to stochastic programming.SIAM Journal on optimization, 19(4):1574–1609, 2009

Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro. Robust stochastic approx- imation approach to stochastic programming.SIAM Journal on optimization, 19(4):1574–1609, 2009

2009
[6]

Distributionally robust stochastic programming.SIAM Journal on Optimization, 27(4):2258–2275, 2017

Alexander Shapiro. Distributionally robust stochastic programming.SIAM Journal on Optimization, 27(4):2258–2275, 2017

2017
[7]

Distributionally robust model-based offline reinforcement learning with near- optimal sample complexity.Journal of Machine Learning Research, 25(200):1–91, 2024

Laixi Shi and Yuejie Chi. Distributionally robust model-based offline reinforcement learning with near- optimal sample complexity.Journal of Machine Learning Research, 25(200):1–91, 2024

2024
[8]

Sample complexity of variance-reduced distributionally robust q-learning.Journal of Machine Learning Research, 25(341):1–77, 2024

Shengbo Wang, Nian Si, Jose Blanchet, and Zhengyuan Zhou. Sample complexity of variance-reduced distributionally robust q-learning.Journal of Machine Learning Research, 25(341):1–77, 2024

2024
[9]

Improved sample complexity bounds for distribution- ally robust reinforcement learning

Zaiyan Xu, Kishan Panaganti, and Dileep Kalathil. Improved sample complexity bounds for distribution- ally robust reinforcement learning. InInternational Conference on Artificial Intelligence and Statistics, pages 9728–9754. PMLR, 2023. 17

2023