A perturbation framework with Drop/Add/Flip and player-removal operations demonstrates that Bradley-Terry leaderboards are non-robust to sub-1% targeted changes that alter top ranks, Kendall tau, and confidence intervals.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
A statistical survey of RLHF for LLM alignment that connects preference learning and policy optimization to models like Bradley-Terry-Luce while reviewing methods, extensions, and open challenges.
citing papers explorer
-
A Unified Perturbation Framework for Analyzing Leaderboard Stability and Manipulation
A perturbation framework with Drop/Add/Flip and player-removal operations demonstrates that Bradley-Terry leaderboards are non-robust to sub-1% targeted changes that alter top ranks, Kendall tau, and confidence intervals.
-
Reinforcement Learning from Human Feedback: A Statistical Perspective
A statistical survey of RLHF for LLM alignment that connects preference learning and policy optimization to models like Bradley-Terry-Luce while reviewing methods, extensions, and open challenges.