Advances in Neural Information Processing Systems , year=

Deep Reinforcement Learning from Human Preferences , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

DRRO for RLHF minimizes worst-case regret relative to the best policy under Wasserstein reward perturbations, yielding an exact inner solution and water-filling policy structure for the promptwise simplex model plus a practical policy-gradient algorithm.

Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment

cs.AI · 2026-05-01 · unverdicted · novelty 5.0

In agentic AI, safety and fairness are governed by interaction topology rather than model scale or alignment.

Ranking Abuse via Strategic Pairwise Data Perturbations

cs.LG · 2026-04-20 · unverdicted · novelty 4.0

MLE-based pairwise ranking systems exhibit a sharp phase-transition vulnerability where limited strategic perturbations can substantially alter global rankings.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment cs.AI · 2026-05-01 · unverdicted · none · ref 30
In agentic AI, safety and fairness are governed by interaction topology rather than model scale or alignment.

Advances in Neural Information Processing Systems , year=

fields

years

verdicts

representative citing papers

citing papers explorer