The multistage DRRO-LQR problem over linear disturbance-feedback policies admits an exact SDP reformulation whose solution is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction.
Wasserstein distributionally robust regret optimization.arXiv preprint arXiv:2504.10796
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.
citing papers explorer
-
Distributionally Robust Regret Optimal LQR with Common Stage-Law Ambiguity
The multistage DRRO-LQR problem over linear disturbance-feedback policies admits an exact SDP reformulation whose solution is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction.
-
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.