DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.
Advances in Neural Information Processing Systems , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
In agentic AI, safety and fairness are governed by interaction topology rather than model scale or alignment.
MLE-based pairwise ranking systems exhibit a sharp phase-transition vulnerability where limited strategic perturbations can substantially alter global rankings.
citing papers explorer
-
Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback
DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.
-
Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment
In agentic AI, safety and fairness are governed by interaction topology rather than model scale or alignment.
-
Ranking Abuse via Strategic Pairwise Data Perturbations
MLE-based pairwise ranking systems exhibit a sharp phase-transition vulnerability where limited strategic perturbations can substantially alter global rankings.