pith. machine review for the scientific record. sign in

arxiv: 2604.10855 · v3 · submitted 2026-04-12 · 🧮 math.OC · math.ST· stat.TH

Recognition: unknown

Sample Average Approximation for Distributionally Robust Optimization with φ-divergences

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:03 UTC · model grok-4.3

classification 🧮 math.OC math.STstat.TH
keywords sample average approximationdistributionally robust optimizationφ-divergencessample complexityworst-case expectationdivergence ballsbounded random variables
0
0 comments X

The pith

When the divergence function φ grows superlinearly, the sample complexity for approximating worst-case expectations via sample average approximation becomes independent of the nominal distribution P.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Estimating the expectation of a bounded random variable requires a number of samples independent of the probability measure. This paper shows that the same independence fails for the worst-case expectation over a φ-divergence ball. When φ grows faster than linearly, sample average approximation recovers a sample complexity that depends only on the growth of φ, the ball radius, and the accuracy level. The authors prove matching lower bounds establishing optimality for standard divergences such as KL divergence. If φ does not grow superlinearly, any estimation procedure suffers a sample complexity lower bound that can be made arbitrarily large by choosing an appropriate nominal measure P.

Core claim

The central discovery is that the growth rate of the divergence function φ completely determines whether the sample complexity of sample average approximation for the worst-case expectation can be made independent of the reference measure P. Under superlinear growth, explicit upper and lower bounds on the required sample size are derived that depend solely on φ, the divergence radius, and the target precision, with optimality shown for common φ-divergences.

What carries the argument

The φ-divergence ball centered at a nominal measure P, with the superlinear growth property of the convex function φ governing the divergence.

Load-bearing premise

The random variable of interest takes values in a fixed bounded interval [-B, B].

What would settle it

For a φ with superlinear growth, fix ε and the ball radius, then vary the nominal P; if the minimal number of samples needed for ε-accurate estimation stays bounded independently of P, the claim holds, otherwise it is falsified.

read the original abstract

It is well known that estimating the expectation of any given bounded random variable with values in $[-B, B]$ has a sample complexity of $\mathrm{O}(B^2/\epsilon^2)$ that is independent of the underlying probability measure. We show that this property can no longer hold when evaluating the worst-case expectation of the random variable, where the probability measures defining the expectation belong to a $\phi$-divergence ball centered at some nominal measure $P$. Specifically, the sample complexity and its dependence on the nominal measure can be completely characterized by the growth of the divergence function. When the divergence function $\phi$ exhibits superlinear growth, a $P$-independent sample complexity can be obtained for sample average approximation, which depends only on the growth of $\phi$, the radius of the divergence ball, and the target precision. We also provide sample complexity lower bounds and demonstrate the optimality of the obtained bounds for commonly used $\phi$-divergences. On the other hand, when superlinear growth does not hold for $\phi$, we show that for any estimation method, evaluating the worst-case expectation has a $P$-dependent sample complexity lower bound that can be made arbitrarily large by changing $P$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes the sample complexity of sample average approximation (SAA) for estimating the worst-case expectation of a bounded random variable (taking values in [-B, B]) under a φ-divergence ball of radius around a nominal distribution P. It establishes that this complexity is independent of P precisely when φ exhibits superlinear growth, in which case the bound depends only on the growth properties of φ, the ball radius, and the target accuracy ε. Matching lower bounds are derived and shown to be optimal for standard choices such as KL divergence and total variation. When φ lacks superlinear growth, any estimator has a P-dependent sample complexity that can be made arbitrarily large by suitable choice of P.

Significance. If the stated upper and lower bounds hold, the work supplies a sharp, growth-rate-based characterization of statistical complexity in distributionally robust optimization with φ-divergences. The explicit P-independence result for superlinear φ, together with the complementary lower-bound construction, clarifies when DRO problems inherit the distribution-free sample complexity of ordinary expectation estimation. The optimality demonstrations for common divergences and the clean separation by growth rate constitute concrete strengths.

major comments (2)
  1. [§3, Theorem 3.2] §3, Theorem 3.2 (upper bound): The claimed P-independent sample complexity for superlinear φ is derived via concentration of the empirical measure under the divergence ball; the proof invokes the boundedness constant B inside the deviation term. It is unclear whether B is absorbed into the stated dependence on φ-growth, radius, and ε or remains an explicit factor; this affects whether the bound is truly parameter-free beyond the listed quantities.
  2. [§5, Theorem 5.1] §5, Theorem 5.1 (lower bound construction): The argument that sample complexity can be made arbitrarily large for non-superlinear φ proceeds by constructing a sequence of nominal measures P_n whose worst-case expectations differ by Ω(1) while the divergence ball remains fixed. The construction relies on the specific form of φ near zero; a short verification that the constructed P_n indeed lie inside the admissible set for every n would strengthen the claim.
minor comments (2)
  1. Notation: the radius of the φ-divergence ball is denoted both by δ and by ρ in different sections; a single symbol would improve readability.
  2. [Assumption 2.1] The statement of Assumption 2.1 should explicitly record that B is known to the estimator, as this is used in the sample-size formulas.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and constructive suggestions. We address each major comment below and have incorporated revisions to improve clarity.

read point-by-point responses
  1. Referee: [§3, Theorem 3.2] §3, Theorem 3.2 (upper bound): The claimed P-independent sample complexity for superlinear φ is derived via concentration of the empirical measure under the divergence ball; the proof invokes the boundedness constant B inside the deviation term. It is unclear whether B is absorbed into the stated dependence on φ-growth, radius, and ε or remains an explicit factor; this affects whether the bound is truly parameter-free beyond the listed quantities.

    Authors: We appreciate this observation. The proof of Theorem 3.2 indeed uses the boundedness of the random variable (values in [-B, B]) within the concentration step for the empirical measure. The resulting sample-complexity bound scales with B (analogous to the classical Hoeffding dependence) while remaining independent of P and depending on the growth rate of φ, the ball radius, and ε. We will revise the statement of Theorem 3.2, the abstract, and the discussion in Section 3 to explicitly list B among the parameters on which the bound depends, thereby making the full parameter dependence transparent. revision: yes

  2. Referee: [§5, Theorem 5.1] §5, Theorem 5.1 (lower bound construction): The argument that sample complexity can be made arbitrarily large for non-superlinear φ proceeds by constructing a sequence of nominal measures P_n whose worst-case expectations differ by Ω(1) while the divergence ball remains fixed. The construction relies on the specific form of φ near zero; a short verification that the constructed P_n indeed lie inside the admissible set for every n would strengthen the claim.

    Authors: We thank the referee for this helpful suggestion. The sequence {P_n} is constructed so that each P_n is a valid probability measure supported on the same finite set, and the φ-divergence ball of fixed radius around P_n is well-defined. To make the argument fully self-contained, we will insert a short remark immediately after the construction in the proof of Theorem 5.1 that explicitly verifies (i) each P_n is a probability measure and (ii) the worst-case expectations under the ball are attained at extreme points that remain admissible for all n. This verification follows directly from the explicit form of the measures but will now be written out for the reader. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The derivation chain begins from the standard sample complexity for bounded random variables in [-B, B] (a well-known external fact) and extends it to worst-case expectations over φ-divergence balls by characterizing dependence on the growth rate of φ. This extension uses explicit concentration arguments and divergence properties that do not reduce the claimed P-independent bounds (when φ grows superlinearly) to quantities defined in terms of the target result itself. Lower bounds are constructed separately by varying P when superlinear growth fails, without self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The analysis remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the mathematical properties of φ-divergences and the boundedness of the random variable; no data-fitting parameters are introduced.

axioms (1)
  • domain assumption The random variable takes values in the bounded interval [-B, B]
    This boundedness is invoked to obtain the classical O(B²/ε²) bound and is carried over to the worst-case analysis.

pith-pipeline@v0.9.0 · 5510 in / 1346 out tokens · 47148 ms · 2026-05-10T15:03:22.025549+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

9 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    Data-driven stochastic programming using phi-divergences

    G¨ uzin Bayraksan and David K Love. Data-driven stochastic programming using phi-divergences. InThe operations research revolution, pages 1–19. Informs, 2015

  2. [2]

    Sample Complexity for Markov Decision Processes and Stochastic Optimal Control with Static Risk Measures

    Cristian Ch´ avez and Yan Li. Sample complexity for markov decision processes and stochastic optimal control with static risk measures.arXiv preprint arXiv:2604.04795, 2026

  3. [3]

    Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021

    John C Duchi and Hongseok Namkoong. Learning models with uniform performance via distributionally robust optimization.The Annals of Statistics, 49(3):1378–1406, 2021. 16

  4. [4]

    Large-scale methods for distributionally robust optimization.Advances in neural information processing systems, 33:8847–8860, 2020

    Daniel Levy, Yair Carmon, John C Duchi, and Aaron Sidford. Large-scale methods for distributionally robust optimization.Advances in neural information processing systems, 33:8847–8860, 2020

  5. [5]

    Robust stochastic approx- imation approach to stochastic programming.SIAM Journal on optimization, 19(4):1574–1609, 2009

    Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro. Robust stochastic approx- imation approach to stochastic programming.SIAM Journal on optimization, 19(4):1574–1609, 2009

  6. [6]

    Distributionally robust stochastic programming.SIAM Journal on Optimization, 27(4):2258–2275, 2017

    Alexander Shapiro. Distributionally robust stochastic programming.SIAM Journal on Optimization, 27(4):2258–2275, 2017

  7. [7]

    Distributionally robust model-based offline reinforcement learning with near- optimal sample complexity.Journal of Machine Learning Research, 25(200):1–91, 2024

    Laixi Shi and Yuejie Chi. Distributionally robust model-based offline reinforcement learning with near- optimal sample complexity.Journal of Machine Learning Research, 25(200):1–91, 2024

  8. [8]

    Sample complexity of variance-reduced distributionally robust q-learning.Journal of Machine Learning Research, 25(341):1–77, 2024

    Shengbo Wang, Nian Si, Jose Blanchet, and Zhengyuan Zhou. Sample complexity of variance-reduced distributionally robust q-learning.Journal of Machine Learning Research, 25(341):1–77, 2024

  9. [9]

    Improved sample complexity bounds for distribution- ally robust reinforcement learning

    Zaiyan Xu, Kishan Panaganti, and Dileep Kalathil. Improved sample complexity bounds for distribution- ally robust reinforcement learning. InInternational Conference on Artificial Intelligence and Statistics, pages 9728–9754. PMLR, 2023. 17