A perturbation framework with Drop/Add/Flip and player-removal operations demonstrates that Bradley-Terry leaderboards are non-robust to sub-1% targeted changes that alter top ranks, Kendall tau, and confidence intervals.
Advances in Neural Information Processing Systems , volume =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SCARV improves global and local stability of sample rankings in redundant NLP datasets by layering robust multi-seed aggregation with structure-aware allocation over redundancy clusters.
citing papers explorer
-
A Unified Perturbation Framework for Analyzing Leaderboard Stability and Manipulation
A perturbation framework with Drop/Add/Flip and player-removal operations demonstrates that Bradley-Terry leaderboards are non-robust to sub-1% targeted changes that alter top ranks, Kendall tau, and confidence intervals.
-
SCARV: Structure-Constrained Aggregation for Stable Sample Ranking in Redundant NLP Datasets
SCARV improves global and local stability of sample rankings in redundant NLP datasets by layering robust multi-seed aggregation with structure-aware allocation over redundancy clusters.