Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
Minimax regret optimization for robust machine learning under distribution shift.arXiv preprint arXiv:2202.05436
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
The multistage DRRO-LQR problem over linear disturbance-feedback policies admits an exact SDP reformulation whose solution is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction.
DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.
citing papers explorer
-
The Statistical Cost of Adaptation in Multi-Source Transfer Learning
Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
-
Distributionally Robust Regret Optimal LQR with Common Stage-Law Ambiguity
The multistage DRRO-LQR problem over linear disturbance-feedback policies admits an exact SDP reformulation whose solution is the nominal certainty-equivalent LQR law plus a strictly causal empirical-mean correction.
-
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.