Minimax regret optimization for robust machine learning under distribution shift.arXiv preprint arXiv:2202.05436

Agarwal, Alekh, Zhang, Tong , year = · 2022 · arXiv 2202.05436

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

The Statistical Cost of Adaptation in Multi-Source Transfer Learning

math.ST · 2026-05-10 · unverdicted · novelty 8.0

Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.

Distributionally Robust Regret Optimal LQR with Common Stage-Law Ambiguity

math.OC · 2026-04-07 · unverdicted · novelty 7.0

The multistage DRRO-LQR problem over linear disturbance-feedback policies admits an exact SDP reformulation whose solution is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction.

Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.

citing papers explorer

Showing 3 of 3 citing papers.

The Statistical Cost of Adaptation in Multi-Source Transfer Learning math.ST · 2026-05-10 · unverdicted · none · ref 126
Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
Distributionally Robust Regret Optimal LQR with Common Stage-Law Ambiguity math.OC · 2026-04-07 · unverdicted · none · ref 4
The multistage DRRO-LQR problem over linear disturbance-feedback policies admits an exact SDP reformulation whose solution is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction.
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback cs.LG · 2026-04-30 · unverdicted · none · ref 2
DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.

Minimax regret optimization for robust machine learning under distribution shift.arXiv preprint arXiv:2202.05436

fields

years

verdicts

representative citing papers

citing papers explorer