pith. sign in

arxiv: 2504.10796 · v4 · pith:XSPM4HSLnew · submitted 2025-04-15 · 🧮 math.OC · cs.LG

Wasserstein Distributionally Robust Regret Optimization

classification 🧮 math.OC cs.LG
keywords wassersteinwdrrodistributionallydrrooptimizationregretrobustconvex
0
0 comments X
read the original abstract

Distributionally robust optimization (DRO) is widely used for decision-making under uncertainty, but its adversarial focus on worst-case loss can lead to overly conservative policies. To mitigate this, we study ex-ante Distributionally Robust Regret Optimization (DRRO) with Wasserstein ambiguity sets, designed to balance robustness with upside potential. We develop a theory of Wasserstein DRRO (WDRRO) paralleling Wasserstein DRO. Under smoothness and regularity, WDRRO selects among ERM optima by a first-order gradient-discrepancy rule. If the ERM optimizer is unique, first-order sensitivity vanishes and a second-order expansion governs deviations. For convex quadratics ERM and DRRO coincide for any radius. We then study regimes where these assumptions fail: nondifferentiable max-affine losses, discrete references, and larger radii, where WDRRO can differ from ERM and WDRO. We show that computing WDRRO regret is NP-hard even without bilinear terms. Nevertheless, we develop exact algorithms, a tractable convex relaxation with guarantees, and experiments showing tightness and loss-dependent behavior.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Distributionally Robust Regret Optimal LQR with Common Stage-Law Ambiguity

    math.OC 2026-04 unverdicted novelty 7.0

    The multistage DRRO-LQR problem over linear disturbance-feedback policies admits an exact SDP reformulation whose solution is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction.

  2. Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

    cs.LG 2026-04 unverdicted novelty 6.0

    DRRO for RLHF replaces worst-case value with worst-case regret in Wasserstein DRO, producing an exact water-filling solution under l1 ambiguity and a practical sampled-bonus algorithm that reduces proxy over-optimization.

  3. Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

    cs.LG 2026-04 unverdicted novelty 6.0

    DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a...