Mean Robust Optimization

Bartolomeo Stellato; Bart Van Parys; Cole Becker; Irina Wang

arxiv: 2207.10820 · v5 · submitted 2022-07-21 · 🧮 math.OC

Mean Robust Optimization

Irina Wang , Cole Becker , Bart Van Parys , Bartolomeo Stellato This is my paper

Pith reviewed 2026-05-24 11:06 UTC · model grok-4.3

classification 🧮 math.OC

keywords mean robust optimizationclustered uncertainty setsWasserstein distributionally robust optimizationfinite-sample guaranteesrobust optimizationclusteringuncertainty sets

0 comments

The pith

Mean robust optimization builds uncertainty sets from clustered data to trade off problem size against conservatism.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents mean robust optimization as a framework that replaces individual data points with cluster centers when forming uncertainty sets. This shrinks the resulting optimization problems while still delivering finite-sample performance guarantees and an explicit bound on any extra conservatism introduced by the clustering step. By changing the number of clusters the method moves continuously between classical robust optimization and Wasserstein distributionally robust optimization. When the uncertain parameters appear linearly inside the constraints, the paper shows conditions under which the optimal decision stays exactly the same after clustering. The approach therefore makes data-driven robust decisions feasible on larger instances without sacrificing the theoretical safety properties.

Core claim

Mean robust optimization constructs uncertainty sets from clustered data rather than from the raw observations. This construction reduces problem size, supplies finite-sample performance guarantees, and lets the user control the additional pessimism caused by any clustering procedure. When uncertainty enters the constraints linearly, the paper proves conditions under which clustering leaves the optimal solution unchanged. Adjusting the number of clusters therefore interpolates between robust optimization and Wasserstein distributionally robust optimization.

What carries the argument

Uncertainty sets constructed from clustered data rather than from individual observations, which reduce problem size while preserving finite-sample guarantees.

If this is right

Varying the number of clusters creates a continuous trade-off between computation time and conservatism.
Finite-sample performance guarantees hold for the decisions obtained from the clustered formulation.
Any extra pessimism introduced by clustering can be bounded explicitly.
Under the stated conditions, linear uncertainty in the constraints makes the optimal solution invariant to clustering.
Numerical instances exhibit orders-of-magnitude reductions in solve time with negligible change in solution quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same clustering reduction could be applied to other distributionally robust formulations that currently scale poorly with data size.
Adaptive or problem-specific clustering rules might further tighten the conservatism bound without losing the guarantees.
The linear-invariance result suggests the method is exact for many standard linear and mixed-integer programs that appear in practice.

Load-bearing premise

The finite-sample guarantees and the bound on extra pessimism are assumed to remain valid for any clustering procedure.

What would settle it

A counter-example in which clustering changes the optimal solution on a linearly constrained problem, or an empirical test in which out-of-sample performance falls outside the stated finite-sample bounds, would falsify the central claims.

read the original abstract

Robust optimization is a tractable and expressive technique for decision-making under uncertainty, but it can lead to overly conservative decisions when pessimistic assumptions are made on the uncertain parameters. Wasserstein distributionally robust optimization can reduce conservatism by being data-driven, but it often leads to very large problems with prohibitive solution times. We introduce mean robust optimization, a general framework that combines the best of both worlds by providing a trade-off between computational effort and conservatism. We propose uncertainty sets constructed based on clustered data rather than on observed data points directly thereby significantly reducing problem size. By varying the number of clusters, our method bridges between robust and Wasserstein distributionally robust optimization. We show finite-sample performance guarantees and explicitly control the potential additional pessimism introduced by any clustering procedure. In addition, we prove conditions for which, when the uncertainty enters linearly in the constraints, clustering does not affect the optimal solution. We illustrate the efficiency and performance preservation of our method on several numerical examples, obtaining multiple orders of magnitude speedups in solution time with little-to-no effect on the solution quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Clustering-based uncertainty sets give a tunable size reduction between RO and Wasserstein DRO, but the finite-sample guarantees for arbitrary clustering need the proofs to hold without hidden bias terms.

read the letter

The paper introduces mean robust optimization by building uncertainty sets around cluster centers rather than individual data points. This shrinks the resulting optimization problems while varying the cluster count to interpolate between classical robust optimization and Wasserstein DRO. The claimed finite-sample performance guarantees plus explicit control of extra pessimism from clustering are the main technical points, along with an invariance result that leaves the optimal solution unchanged when uncertainty enters linearly in the constraints. The numerical examples report multiple orders of magnitude faster solve times with little change in solution quality, which matches the practical motivation. The construction itself does not appear to restate prior work directly. The soft spot is whether the finite-sample bounds truly cover arbitrary clustering procedures. If a clustering step merges points in a way that distorts the empirical measure beyond the paper's additive pessimism term, the claimed trade-off between size and conservatism could weaken or the guarantees could fail to apply. The abstract states the control holds for any clustering, but that premise is load-bearing and would need the derivations to confirm it does not rely on post-hoc restrictions. The linear-invariance result looks narrower but cleaner. This paper is for readers working on scalable distributionally robust methods in operations research who already know the Wasserstein and robust baselines. It deserves a serious referee because the efficiency gains are demonstrated and the framework is motivated by a real computational bottleneck, even if the theoretical claims require close verification in review.

Referee Report

2 major / 2 minor

Summary. The paper introduces mean robust optimization, a framework that constructs uncertainty sets from clustered data points rather than raw observations. By varying the number of clusters, the method interpolates between classical robust optimization and Wasserstein distributionally robust optimization, claiming a tunable trade-off between computational tractability and conservatism. It asserts finite-sample performance guarantees that remain valid for arbitrary clustering procedures, provides an explicit additive term to control extra pessimism induced by clustering, and proves that clustering leaves the optimal solution unchanged when uncertainty enters the constraints linearly. Numerical examples are used to illustrate speedups of multiple orders of magnitude with negligible degradation in solution quality.

Significance. If the finite-sample guarantees and the explicit control of clustering-induced pessimism hold for arbitrary procedures, the framework would supply a practical, scalable bridge between robust optimization and data-driven DRO methods, particularly valuable for large-scale problems where full Wasserstein DRO is intractable. The result on invariance of the optimum under linear uncertainty (when the stated conditions hold) is a clean theoretical contribution that could be cited in follow-on work on clustering-based approximations.

major comments (2)

[Abstract and finite-sample guarantee section] Abstract and the section deriving the finite-sample bounds: the claim that performance guarantees plus explicit control of additional pessimism hold for any clustering procedure is load-bearing for the central trade-off. The skeptic correctly flags that if a clustering distorts the empirical measure such that the Wasserstein contribution of merged points is not linearly bounded by the cluster radius, the additive pessimism term may fail to restore the guarantee; the manuscript must exhibit the precise inequality that absorbs all such distortions.
[Theorem on linear uncertainty] The theorem establishing conditions under which clustering does not affect the optimal solution (linear uncertainty case): the proof appears to rely on the uncertainty set being defined via cluster centers; it is unclear whether the same invariance holds when the clustering procedure is chosen adversarially or when the radius term interacts with the linear structure in a non-trivial way. This needs an explicit counter-example or a tightened hypothesis.

minor comments (2)

Notation for the cluster-radius term and the pessimism additive should be introduced once and used consistently; multiple ad-hoc symbols appear in the abstract and early sections.
The numerical examples would benefit from an explicit table reporting both solution time and objective value for the full Wasserstein DRO baseline, the clustered version at several cluster counts, and classical RO, so that the claimed “little-to-no effect on solution quality” can be verified quantitatively.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The two major comments raise important points about the generality of the finite-sample bounds and the linear-uncertainty invariance result. We address each below and will revise the manuscript to strengthen the exposition.

read point-by-point responses

Referee: [Abstract and finite-sample guarantee section] Abstract and the section deriving the finite-sample bounds: the claim that performance guarantees plus explicit control of additional pessimism hold for any clustering procedure is load-bearing for the central trade-off. The skeptic correctly flags that if a clustering distorts the empirical measure such that the Wasserstein contribution of merged points is not linearly bounded by the cluster radius, the additive pessimism term may fail to restore the guarantee; the manuscript must exhibit the precise inequality that absorbs all such distortions.

Authors: The finite-sample guarantee is obtained via the triangle inequality W(P*, hat{P}_C) <= W(P*, hat{P}) + W(hat{P}, hat{P}_C), where hat{P}_C is the empirical measure on the cluster centers. The second term is bounded above by the sum over clusters of (radius of cluster i) times (number of points in cluster i divided by total sample size). This bound depends only on the radii and holds for an arbitrary clustering procedure; it does not require any further assumption on how points are merged. The explicit additive pessimism term we introduce is precisely this quantity (or its maximum-radius relaxation). We will insert the displayed inequality W(hat{P}, hat{P}_C) <= sum_i (r_i * |C_i| / N) immediately before the statement of the main guarantee theorem in the revision. revision: yes
Referee: [Theorem on linear uncertainty] The theorem establishing conditions under which clustering does not affect the optimal solution (linear uncertainty case): the proof appears to rely on the uncertainty set being defined via cluster centers; it is unclear whether the same invariance holds when the clustering procedure is chosen adversarially or when the radius term interacts with the linear structure in a non-trivial way. This needs an explicit counter-example or a tightened hypothesis.

Authors: The invariance theorem assumes only that the uncertainty set is the union of balls centered at the cluster centers (with the given radii) and that the uncertain parameters enter the constraints linearly. Under these conditions the worst-case value of each linear expression is attained at a cluster center; the radius contribution can be moved to the right-hand side without changing the feasible set for the decision variable. Because the statement is for any fixed clustering, the result is independent of how the clusters were obtained and therefore holds even if the clustering procedure is chosen adversarially (provided the clustering is performed on the data before the optimization problem is solved). We will add a short remark after the theorem clarifying that the hypothesis already covers arbitrary clustering procedures and that no further restriction on the clustering algorithm is required. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation introduces independent bounds and parameterization

full rationale

The paper defines mean robust optimization via new uncertainty sets constructed from cluster centers, then states finite-sample performance guarantees and an explicit additive control on extra pessimism induced by clustering. These guarantees are presented as derived results rather than fitted quantities or self-citations; the interpolation between robust optimization and Wasserstein DRO is achieved by varying the (user-chosen) number of clusters, which is an explicit design parameter, not a quantity recovered from the output. No equation or claim reduces by construction to a prior fitted value or to a self-citation chain whose validity depends on the present work. The central claims therefore remain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claim rests on the existence of finite-sample bounds that survive clustering and on the ability to control extra pessimism without further assumptions on the data-generating process.

free parameters (1)

number of clusters
Varying the number of clusters is the explicit control knob that trades computation for conservatism; its value is chosen by the user.

axioms (1)

domain assumption Finite-sample performance guarantees can be established for uncertainty sets constructed from arbitrary clustering procedures.
Stated in the abstract as a shown result; the specific statistical assumptions required for the bounds are not visible.

pith-pipeline@v0.9.0 · 5709 in / 1325 out tokens · 24263 ms · 2026-05-24T11:06:49.405264+00:00 · methodology

Mean Robust Optimization

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)