DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.
Advances in Neural Information Processing Systems , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
In agentic AI, safety and fairness are governed by interaction topology rather than model scale or alignment.
MLE-based pairwise ranking systems exhibit a sharp phase-transition vulnerability where limited strategic perturbations can substantially alter global rankings.
citing papers explorer
-
Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment
In agentic AI, safety and fairness are governed by interaction topology rather than model scale or alignment.