Wasserstein Distributionally Robust Regret Optimization
read the original abstract
Distributionally robust optimization (DRO) is widely used for decision-making under uncertainty, but its adversarial focus on worst-case loss can lead to overly conservative policies. To mitigate this, we study ex-ante Distributionally Robust Regret Optimization (DRRO) with Wasserstein ambiguity sets, designed to balance robustness with upside potential. We develop a theory of Wasserstein DRRO (WDRRO) paralleling Wasserstein DRO. Under smoothness and regularity, WDRRO selects among ERM optima by a first-order gradient-discrepancy rule. If the ERM optimizer is unique, first-order sensitivity vanishes and a second-order expansion governs deviations. For convex quadratics ERM and DRRO coincide for any radius. We then study regimes where these assumptions fail: nondifferentiable max-affine losses, discrete references, and larger radii, where WDRRO can differ from ERM and WDRO. We show that computing WDRRO regret is NP-hard even without bilinear terms. Nevertheless, we develop exact algorithms, a tractable convex relaxation with guarantees, and experiments showing tightness and loss-dependent behavior.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Distributionally Robust Regret Optimal LQR with Common Stage-Law Ambiguity
The multistage DRRO-LQR problem over linear disturbance-feedback policies admits an exact SDP reformulation whose solution is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction.
-
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
DRRO for RLHF replaces worst-case value with worst-case regret in Wasserstein DRO, producing an exact water-filling solution under l1 ambiguity and a practical sampled-bonus algorithm that reduces proxy over-optimization.
-
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.