Wasserstein distributionally robust regret optimization.arXiv preprint arXiv:2504.10796

Lukas-Benedikt Fiechtner, Jose Blanchet · 2025 · arXiv 2504.10796

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Distributionally Robust Regret Optimal LQR with Common Stage-Law Ambiguity

math.OC · 2026-04-07 · unverdicted · novelty 7.0

The multistage DRRO-LQR problem over linear disturbance-feedback policies admits an exact SDP reformulation whose solution is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction.

Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.

citing papers explorer

Showing 2 of 2 citing papers.

Distributionally Robust Regret Optimal LQR with Common Stage-Law Ambiguity math.OC · 2026-04-07 · unverdicted · none · ref 17
The multistage DRRO-LQR problem over linear disturbance-feedback policies admits an exact SDP reformulation whose solution is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction.
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback cs.LG · 2026-04-30 · unverdicted · none · ref 9
DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.

Wasserstein distributionally robust regret optimization.arXiv preprint arXiv:2504.10796

fields

years

verdicts

representative citing papers

citing papers explorer