pith. sign in

arxiv: 2605.23854 · v1 · pith:MCPTHF4Hnew · submitted 2026-05-22 · 💻 cs.LG · math.ST· stat.ML· stat.TH

Entrywise Error Bounds for Spectral Ranking with Semi-Random Adversaries

Pith reviewed 2026-05-25 04:42 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.MLstat.TH
keywords spectralestimationgraphsperformanceadversaryedgeserrorsampled
0
0 comments X

The pith

Entrywise error bounds for spectral methods on BTL ranking under semi-random adversaries depend on graph spectral properties and can be recovered asymptotically via reweighting to restore the spectral gap.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Ranking items from pairwise comparisons often uses the Bradley-Terry-Luce model, where each pair has a win probability based on hidden strengths. Spectral methods turn the comparison graph into an eigenvector problem to estimate those strengths. When an adversary makes some edges more likely to appear, the standard unweighted spectral estimator's accuracy varies with how well-connected the resulting graph is. The paper shows that reweighting the observed edges to offset the boosted probabilities can bring the error bounds back to levels seen under uniform random sampling, supported by theory and simulations.

Core claim

asymptotic performance approaching that of uniformly sampled graphs can be recovered by appropriately reweighting the observed edges to counteract the adversary and restore the spectral gap

Load-bearing premise

The reweighting procedure can be performed with knowledge or accurate estimation of which edges were boosted by the adversary so that the effective graph regains a sufficient spectral gap (abstract does not specify how the reweighting weights are obtained).

Figures

Figures reproduced from arXiv: 2605.23854 by Anuran Makur, Dongmin Lee, Japneet Singh.

Figure 1
Figure 1. Figure 1: A visualization of how adding an edge to a graph can [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Experiment 1 (Subfigures (a)–(c)): 3-block SBM with edge sampling probabilities between each pair of blocks given by the matrix (7). Experiment 2 (Subfigures (d)–(f)): Erdős-Rényi model with edge sampling probability 𝑝 = 2 log(𝑛)/𝑛. 𝑃 =        1 1 0 1 2 log(𝑛) 𝑛 2 log(𝑛) 𝑛 0 2 log(𝑛) 𝑛 2 log(𝑛) 𝑛        , (7) where 𝑛 ∈ {30, 45, 60, 75, 90, 105, 120, 135} is the number of items (note that we c… view at source ↗
read the original abstract

Bradley-Terry-Luce (BTL) model estimation is a well-established strategy to rank a collection of items given a dataset of pairwise comparisons. Although the theoretical performance of BTL estimation methods, such as spectral and maximum likelihood estimation, is well studied in the regime of uniformly sampled graphs, generalizing such results to a wider class of random graphs has proved challenging. In this work, we investigate the entry-wise error of spectral algorithms against a semi-random adversary that can arbitrarily boost the sampling probabilities of certain edges. We find that the performance of the unweighted spectral method is heavily dependent on the spectral properties of the generated graph. Furthermore, we show that asymptotic performance approaching that of uniformly sampled graphs can be recovered by appropriately reweighting the observed edges to counteract the adversary and restore the spectral gap. Finally, we provide numerical simulations that support our theoretical findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims to derive entrywise error bounds for spectral algorithms estimating rankings under the Bradley-Terry-Luce model when the comparison graph is subject to a semi-random adversary that arbitrarily boosts sampling probabilities of selected edges. It shows that the unweighted spectral method's performance depends on the spectral properties of the realized graph, but that appropriately reweighting the observed edges restores the spectral gap and recovers asymptotic performance comparable to the uniform-sampling case; the claims are supported by theoretical analysis and numerical simulations.

Significance. If the reweighting procedure can be implemented from data alone while preserving the restored gap, the result would meaningfully extend spectral ranking guarantees beyond uniform graphs to more realistic semi-random models. The provision of entrywise (rather than only norm) bounds and the simulation support are positive features.

major comments (1)
  1. The central claim that reweighting recovers uniform-graph performance (abstract and main theorem) requires an explicit, non-oracle procedure for selecting the per-edge weights. The manuscript must specify the estimator or algorithm used to obtain these weights from the observed comparisons alone and prove that any estimation error does not destroy the restored spectral gap; without this, the recovery result is conditional on unavailable information.
minor comments (1)
  1. The abstract states that simulations support the findings but provides no information on the number of trials, graph sizes, or error-bar reporting; adding these details would make the empirical evidence easier to assess.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The single major comment is addressed point-by-point below.

read point-by-point responses
  1. Referee: The central claim that reweighting recovers uniform-graph performance (abstract and main theorem) requires an explicit, non-oracle procedure for selecting the per-edge weights. The manuscript must specify the estimator or algorithm used to obtain these weights from the observed comparisons alone and prove that any estimation error does not destroy the restored spectral gap; without this, the recovery result is conditional on unavailable information.

    Authors: We agree that the current manuscript presents the reweighting construction in an oracle setting that assumes knowledge of the ideal per-edge weights restoring the spectral gap. This is a valid observation. In the revision we will add an explicit, fully data-driven procedure that estimates the required weights from the observed comparison counts and node degrees alone. We will also supply a supplementary concentration argument showing that the plug-in estimation error is of lower order than the restored spectral gap and therefore does not invalidate the entrywise error bounds with high probability. The revised main theorem will be stated for this estimated weighting. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation remains self-contained against external spectral gap assumptions

full rationale

The abstract and provided context present entrywise error bounds for spectral ranking under a semi-random adversary, with a corrective reweighting step to restore the spectral gap. No quoted equations or steps reduce a claimed prediction or bound to a fitted quantity defined from the same data by construction. The reweighting is described as an external corrective procedure rather than a self-referential fit. No self-citation chains, uniqueness theorems imported from the same authors, or ansatzes smuggled via prior work appear in the given text. The central claim relies on standard spectral properties of graphs and does not collapse to renaming or self-definition. This is the expected non-finding for a paper whose core bounds are stated in terms of independent graph-theoretic quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions of the BTL model (pairwise probabilities determined by latent scores) and spectral graph theory (existence of a usable spectral gap after reweighting). No free parameters or invented entities are mentioned in the abstract.

axioms (2)
  • domain assumption Bradley-Terry-Luce model holds for the underlying comparison probabilities
    Invoked implicitly as the generative model for the observed comparisons.
  • domain assumption The reweighted graph admits a sufficient spectral gap for the error bounds to hold
    Central to the recovery claim; location is the reweighting step described in the abstract.

pith-pipeline@v0.9.0 · 5683 in / 1247 out tokens · 30067 ms · 2026-05-25T04:42:14.470379+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

  1. [1]

    Eldridge S Adams. 2005. Bayesian analysis of linear dominance hierarchies. Animal Behaviour69, 5 (May 2005), 1191–1201

  2. [2]

    Julian Asilis, Xi Chen, Dutch Hansen, and Shang-Hua Teng. 2026. Semi-Random Graphs, Robust Asymmetry, and Reconstruction. InProceedings of the 17th Innova- tions in Theoretical Computer Science Conference (ITCS). Milan, Italy, 12:1–12:21

  3. [3]

    Linas Baltrunas, Tadas Makcinskas, and Francesco Ricci. 2010. Group recommen- dations with rank aggregation and collaborative filtering. InProceedings of the Fourth ACM Conference on Recommender Systems. Barcelona, Spain, 119––126

  4. [4]

    Daniel Barry and JA Hartigan. 1993. Choice models for predicting divisional winners in major league baseball.J. Amer. Statist. Assoc.88, 423 (1993), 766–774

  5. [5]

    Avrim Blum and Joel Spencer. 1995. Coloring random and semi-random 𝑘- colorable graphs.Journal of Algorithms19, 2 (November 1995), 204–234

  6. [6]

    Heejong Bong, Wanshan Li, Shamindra Shrotriya, and Alessandro Rinaldo. 2020. Nonparametric estimation in the dynamic Bradley-Terry model. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS). 3317–3326

  7. [7]

    Stéphane Boucheron, Gábor Lugosi, and Olivier Bousquet. 2004. Concentration Inequalities. InAdvanced Lectures on Machine Learning, Olivier Bousquet, Ulrike von Luxburg, and Gunnar Rätsch (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 208–240

  8. [8]

    Ralph Allan Bradley and Milton E. Terry. 1952. Rank Analysis of Incomplete Block Designs. I. The Method of Paired Comparisons.Biometrika39, 3/4 (December 1952), 324–345

  9. [9]

    François Caron and Arnaud Doucet. 2012. Efficient Bayesian Inference for Generalized Bradley-Terry Models.Journal of Computational and Graphical Statistics21, 1 (March 2012), 174–196

  10. [10]

    Manuela Cattelan, Cristiano Varin, and David Firth. 2013. Dynamic Bradley- Terry modelling of sports tournaments.Journal of the Royal Statistical Society: Series C (Applied Statistics)62, 1 (January 2013), 135–150

  11. [11]

    Sourav Chatterjee. 2015. Matrix Estimation by Universal Singular Value Thresh- olding.The Annals of Statistics43, 1 (February 2015), 177–214

  12. [12]

    Pinhan Chen, Chao Gao, and Anderson Y Zhang. 2022. Partial recovery for top-𝑘 ranking: optimality of MLE and suboptimality of the spectral method.The Annals of Statistics50, 3 (June 2022), 1618–1652

  13. [13]

    Yanxi Chen. 2023. Ranking from Pairwise Comparisons in General Graphs and Graphs with Locality. (April 2023). https://arxiv.org/abs/2304.06821 arXiv:2304.06821 [stat.ML]

  14. [14]

    Yuxin Chen, Jianqing Fan, Cong Ma, and Kaizheng Wang. 2019. Spectral Method and Regularized MLE are both Optimal for Top- 𝐾 Ranking.The Annals of Statistics47, 4 (2019), 2204–2235

  15. [15]

    Yuxin Chen and Changho Suh. 2015. Spectral MLE: Top- 𝑘 rank aggregation from pairwise comparisons. InProceedings of the 32nd International Conference on Machine Learning. Lille, France, 371–380

  16. [16]

    Yu Cheng and Rong Ge. 2018. Non-convex matrix completion against a semi- random adversary. InProceedings of the 31st Conference On Learning Theory (COLT). Stockholm, Sweden, 1362–1394

  17. [17]

    Gonzalez, and Ion Stoica

    Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael Jordan, Joseph E. Gonzalez, and Ion Stoica. 2024. Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. InProceedings of the 41st International Conference on Machine Learning (ICML). Vienna, Austria, 8359–8388

  18. [18]

    Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. InPro- ceedings in the Advances of Neural Information Processing Systems 30 (NIPS). Long Beach, CA, USA, 4299–4307

  19. [19]

    Ronen Eldan, Miklós Z Rácz, and Tselil Schramm. 2017. Braess’s paradox for the spectral gap in random graphs and delocalization of eigenvectors.Random Structures & Algorithms50, 4 (July 2017), 584–611

  20. [20]

    Arpad E. Elo. 1978.The Rating of Chess Players, Past and Present. Arco, New York, NY, USA

  21. [21]

    Jianqing Fan, Zhipeng Lou, Weichen Wang, and Mengxin Yu. 2025. Ranking inferences based on the top choice of multiway comparisons.J. Amer. Statist. Assoc.120, 549 (March 2025), 237–250

  22. [22]

    Jianqing Fan, Zhipeng Lou, Weichen Wang, and Mengxin Yu. 2026. Spectral Rank- ing Inferences based on General Multiway Comparisons.Operations Research74, 1 (January–February 2026), 161–180

  23. [23]

    Uriel Feige and Joe Kilian. 2001. Heuristics for semirandom graph problems.J. Comput. System Sci.63, 4 (December 2001), 639–671

  24. [24]

    Miroslav Fiedler. 1973. Algebraic connectivity of graphs.Czechoslovak Mathe- matical Journal23, 2 (1973), 298–305

  25. [25]

    L. R. Ford. 1957. Solution of a Ranking Problem from Binary Comparisons.The American Mathematical Monthly64, 8 Part 2 (October 1957), 28–33

  26. [26]

    Chao Gao, Yandi Shen, and Anderson Y. Zhang. 2023. Uncertainty quantification in the Bradley-Terry-Luce model.Information and Inference12, 2 (June 2023), 1073–1140

  27. [27]

    Xing Gao and Yu Cheng. 2023. Robust matrix sensing in the semi-random model. InProceedings of the Advances in Neural Information Processing Systems 36 (NeurIPS). New Orleans, LA, USA, 63385–63408

  28. [28]

    John Guiver and Edward Snelson. 2009. Bayesian inference for Plackett-Luce ranking models. InProceedings of the 26th Annual International Conference on Machine Learning (ICML). Montreal, QC, Canada, 377–384

  29. [29]

    Julien Hendrickx, Alex Olshevsky, and Venkatesh Saligrama. 2020. Minimax Rate for Learning From Pairwise Comparisons in the BTL Model. InProceedings of the 37th Annual International Conference on Machine Learning (ICML), Vol. 119. Vienna, Austria, 4193–4202

  30. [30]

    Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt

    Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. 1983. Sto- chastic blockmodels: First steps.Social Networks5, 2 (June 1983), 109–137

  31. [31]

    David R. Hunter. 2004. MM Algorithms for Generalized Bradley-Terry Models. The Annals of Statistics32, 1 (February 2004), 384–406

  32. [32]

    Ali Jadbabaie, Anuran Makur, and Devavrat Shah. 2020. Estimation of Skill Distri- bution from a Tournament. InProceedings of the Advances in Neural Information Processing Systems 33 (NeurIPS). Vancouver, BC, Canada, 8418–8429

  33. [33]

    Ali Jadbabaie, Anuran Makur, and Devavrat Shah. 2024. Estimation of skill distributions.IEEE Transactions on Information Theory70, 9 (September 2024), 6447–6480

  34. [34]

    Arun Jambulapati, Jerry Li, Christopher Musco, Aaron Sidford, and Kevin Tian

  35. [35]

    (November 2021)

    Fast and Near-Optimal Diagonal Preconditioning. (November 2021). arXiv:2008.01722v2 [math.OC]

  36. [36]

    Minje Jang, Sunghyun Kim, and Changho Suh. 2018. Top-𝐾 rank aggregation from 𝑀-wise comparisons.IEEE Journal of Selected Topics in Signal Processing 12, 5 (May 2018), 989–1004

  37. [37]

    Jonathan Kelner, Jerry Li, Allen X Liu, Aaron Sidford, and Kevin Tian. 2023. Semi- random sparse recovery in nearly-linear time. InProceedings of the Thirty Sixth Annual Conference on Learning Theory (COLT). Bangalore, India, 2352–2398

  38. [38]

    Levin, Yuval Peres, and Elizabeth L

    David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. 2009.Markov Chains and Mixing Times(first ed.). American Mathematical Society

  39. [39]

    Wanshan Li, Shamindra Shrotriya, and Alessandro Rinaldo. 2022. ℓ∞-Bounds of the MLE in the BTL Model under General Comparison Graphs. InProceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence. Eindhoven, The Netherlands, 1178–1187

  40. [40]

    Fang, and Junwei Lu

    Yue Liu, Ethan X. Fang, and Junwei Lu. 2023. Lagrangian inference for ranking problems.Operations Research71, 1 (January–February 2023), 202–223

  41. [41]

    Matthias Löwe and Sara Terveer. 2025. Spectral properties of the stochastic block model and their application to hitting times of random walks. (April 2025). https://arxiv.org/abs/2401.07896v3 arXiv:2401.07896v3 [math.PR]

  42. [42]

    1959.Individual Choice Behavior: A Theoretical Analysis

    Robert Duncan Luce. 1959.Individual Choice Behavior: A Theoretical Analysis. John Wiley & Sons Inc., New York, NY, USA

  43. [43]

    Konstantin Makarychev, Yury Makarychev, and Aravindan Vijayaraghavan. 2012. Approximation algorithms for semi-random partitioning problems. InProceedings of the Forty-Fourth Annual ACM Symposium on Theory of Computing. New York, NY, USA, 367–384

  44. [44]

    Anuran Makur and Japneet Singh. 2025. Hypothesis Testing for Generalized Thurstone Models. InProceedings of the 42nd International Conference on Machine Learning (ICML). Vancouver, BC, Canada, 42730–42764

  45. [45]

    Anuran Makur and Japneet Singh. 2025. Minimax Hypothesis Testing for the Bradley–Terry–Luce Model.IEEE Transactions on Information Theory71, 12 (December 2025), 9163–9202

  46. [46]

    Kenneth Massey. 1997. Statistical models applied to the rating of sports teams. (1997). Bluefield College

  47. [47]

    J. N. S. Matthews and K. P. Morris. 1995. An Application of Bradley-Terry-Type Models to the Measurement of Pain.Journal of the Royal Statistical Society Series C (Applied Statistics)44, 2 (1995), 243–255. 9 Lee, Makur, and Singh

  48. [48]

    Sahand Negahban, Sewoong Oh, and Devavrat Shah. 2017. Rank Centrality: Rank- ing from Pairwise Comparisons.Operations Research65, 1 (January–February 2017), 266–287

  49. [49]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schul- man, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human...

  50. [50]

    1999.The PageRank Citation Ranking: Bringing Order to the Web

    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999.The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66. Stanford InfoLab

  51. [51]

    Plackett

    Robin L. Plackett. 1975. The Analysis of Permutations.Journal of the Royal Statistical Society, Series C (Applied Statistics)24, 2 (1975), 193–202

  52. [52]

    Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kan- nan Ramchandran, and Martin J

    Nihar B. Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kan- nan Ramchandran, and Martin J. Wainwright. 2016. Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence.Journal of Machine Learning Research17, 58 (2016), 1–47

  53. [53]

    Gordon Simons and Yi-Ching Yao. 1999. Asymptotics when the number of parameters tends to infinity in the Bradley-Terry model for paired comparisons. The Annals of Statistics27, 3 (June 1999), 1041–1060

  54. [54]

    Stephen M. Stigler. 1994. Citation Patterns in the Journals of Statistics and Probability.Statist. Sci.9, 1 (February 1994), 94–108

  55. [55]

    Joel A Tropp. 2015. An introduction to matrix concentration inequalities.Foun- dations and Trends in Machine Learning8, 1-2 (May 2015), 1–230

  56. [56]

    Yuepeng Yang, Antares Chen, Lorenzo Orecchia, and Cong Ma. 2024. Top- 𝐾 ranking with a monotone adversary. InProceedings of the Thirty Seventh Conference on Learning Theory (COLT), Shipra Agrawal and Aaron Roth (Eds.). Edmonton, AB, Canada, 5123–5162

  57. [57]

    𝑎≥𝑏with probability at least1−𝑂(𝑛 −𝑡 )

    Ernst Zermelo. 1929. Die Berechnung der Turnier-Ergebnisse als ein Maxi- mumproblem der Wahrscheinlichkeitsrechnung.Mathematische Zeitschrift29, 1 (December 1929), 436–460. A Introduction to Appendices In the appendices, we prove the various theorems and propositions introduced in the body. We use the shorthand 𝑎≥ 1−𝑂(𝑛 −𝑡 ) 𝑏 to say “𝑎≥𝑏with probability ...