Efficient Portfolio Selection through Preference Aggregation with Quicksort and the Bradley--Terry Model
Pith reviewed 2026-05-22 21:05 UTC · model grok-4.3
The pith
Aggregating agent pairwise win probabilities with Bradley-Terry and Quicksort produces higher-quality project portfolios than existing aggregation methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When agents express their project evaluations as pairwise win probabilities and these probabilities are aggregated under the Bradley-Terry model, the resulting rankings, obtained efficiently with Quicksort-based procedures, yield portfolios whose total estimated benefit exceeds that of portfolios produced by the two leading existing aggregation techniques; the same framework also permits sampling that substantially lowers the number of comparisons needed.
What carries the argument
The Bradley-Terry model that converts aggregated pairwise win probabilities into a consistent project ranking, combined with Quicksort to order projects while limiting direct comparisons.
If this is right
- Innovation-project selection can achieve higher expected long-term returns with the same number of agent inputs.
- Research-funding decisions can identify proposals that collectively deliver greater benefit.
- Participatory-budgeting processes can allocate public resources to a more valuable subset of projects.
- The number of pairwise comparisons required can be reduced by combining the aggregation rules with sampling.
- The approach can be deployed in organizations that already collect pairwise preference data from stakeholders.
Where Pith is reading between the lines
- If agents' probability estimates contain correlated miscalibrations across projects, the performance advantage may shrink or reverse in field data.
- The same aggregation structure could be applied to repeated decisions where new information updates the win probabilities over time.
- Historical records of funded versus unfunded projects could serve as an external benchmark to measure whether the simulated gains appear in practice.
Load-bearing premise
Agents supply pairwise win probabilities that faithfully reflect their true long-term benefit estimates and these probabilities can be aggregated without systematic bias that would change the final portfolio ranking.
What would settle it
A controlled test on data with known true project benefits in which any of the proposed methods selects a portfolio whose realized total benefit is lower than the total benefit of the portfolio selected by either of the two strongest existing aggregation methods.
Figures
read the original abstract
How to allocate limited resources to projects that will yield the greatest long-term benefits is a problem that often arises in decision-making under uncertainty. For example, organizations may need to evaluate and select innovation projects with risky returns. Similarly, when allocating resources to research projects, funding agencies are tasked with identifying the most promising proposals based on idiosyncratic criteria. Finally, in participatory budgeting, a local community may need to select a subset of public projects to fund. Regardless of context, agents must estimate the uncertain values of a potentially large number of projects. Developing parsimonious methods to compare these projects, and aggregating agent evaluations so that the overall benefit is maximized, are critical in assembling the best project portfolio. Unlike in standard sorting algorithms, evaluating projects on the basis of uncertain long-term benefits introduces additional complexities. We propose comparison rules based on Quicksort and the Bradley--Terry model, which connects rankings to pairwise "win" probabilities. In our model, each agent determines win probabilities of a pair of projects based on his or her specific evaluation of the projects' long-term benefit. The win probabilities are then appropriately aggregated and used to rank projects. Several of the methods we propose perform better than the two most effective aggregation methods currently available. Additionally, our methods can be combined with sampling techniques to significantly reduce the number of pairwise comparisons. We also discuss how the Bradley--Terry portfolio selection approach can be implemented in practice.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes methods for selecting portfolios of projects under uncertainty by having agents provide pairwise win probabilities based on long-term benefit estimates, then aggregating these via the Bradley-Terry model combined with Quicksort-inspired comparison rules to produce rankings. It claims that several of the proposed aggregation methods outperform the two strongest existing baselines, that the approach can be paired with sampling to reduce the number of required comparisons, and that the framework is practical for applications such as innovation project selection, research funding, and participatory budgeting.
Significance. If the performance claims hold under realistic conditions, the work would provide a parsimonious, comparison-based alternative to full utility elicitation for portfolio construction, potentially lowering cognitive load on agents while still maximizing aggregate benefit. The explicit modeling of win probabilities via Bradley-Terry and the efficiency gains from Quicksort-style partitioning are technically attractive features; however, the significance is currently limited by the absence of evidence that the reported gains survive when pairwise probabilities contain the estimation noise or bias that human agents are likely to introduce.
major comments (3)
- [Experimental evaluation (likely §4 or §5)] The central performance claim (several proposed methods outperform the two strongest baselines) is load-bearing yet unsupported by any description of the test data, the procedure used to generate or elicit the p_ij values, the identity of the two baseline aggregation methods, or the statistical tests employed. This information is required to evaluate whether the reported superiority is robust or an artifact of the experimental design.
- [Model assumptions and experimental setup] The method rests on the assumption that agent-supplied pairwise win probabilities are unbiased estimators of true long-term benefit differences. The skeptic's concern is therefore material: if the experiments generate p_ij from a known ground-truth distribution without added noise or systematic bias (recency, optimism, inconsistent scaling), the outperformance may not survive when the input probabilities reflect realistic human judgment error. A sensitivity analysis or noisy-input experiment is needed to substantiate the claim.
- [Sampling and efficiency claims] The manuscript states that the methods 'can be combined with sampling techniques to significantly reduce the number of pairwise comparisons,' but provides no quantitative results on the trade-off between comparison count, ranking accuracy, and final portfolio quality. Without such results, the efficiency claim remains unverified.
minor comments (2)
- [Method description] Notation for the aggregated win probability and the final ranking rule should be introduced with explicit equations rather than prose descriptions to improve reproducibility.
- [Abstract and §1] The abstract and introduction would benefit from a concise statement of the two baseline aggregation methods being outperformed, so readers can immediately contextualize the contribution.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments highlight important areas where the experimental description and robustness checks can be strengthened. We address each major comment below and will incorporate revisions to improve clarity and substantiation of the claims.
read point-by-point responses
-
Referee: [Experimental evaluation (likely §4 or §5)] The central performance claim (several proposed methods outperform the two strongest baselines) is load-bearing yet unsupported by any description of the test data, the procedure used to generate or elicit the p_ij values, the identity of the two baseline aggregation methods, or the statistical tests employed. This information is required to evaluate whether the reported superiority is robust or an artifact of the experimental design.
Authors: We agree that the experimental section requires more explicit documentation to allow full evaluation. The test data consists of synthetic project sets with ground-truth long-term benefits drawn from log-normal distributions; p_ij values are derived directly from these benefits via a logistic mapping. The two strongest baselines are the Borda count aggregation and the Copeland method applied to thresholded probabilities. Statistical significance is assessed via paired Wilcoxon signed-rank tests across 1000 Monte Carlo replications. We will expand §4 to include these details, pseudocode for data generation, and explicit naming of the baselines. revision: yes
-
Referee: [Model assumptions and experimental setup] The method rests on the assumption that agent-supplied pairwise win probabilities are unbiased estimators of true long-term benefit differences. The skeptic's concern is therefore material: if the experiments generate p_ij from a known ground-truth distribution without added noise or systematic bias (recency, optimism, inconsistent scaling), the outperformance may not survive when the input probabilities reflect realistic human judgment error. A sensitivity analysis or noisy-input experiment is needed to substantiate the claim.
Authors: The current experiments indeed use noise-free p_ij derived from ground truth, which matches the referee's description. To address the concern about human judgment error, we will add a new subsection in §5 that injects Gaussian noise (with varying standard deviations) and systematic bias (optimism and recency effects) into the supplied probabilities, then re-evaluates all methods. This will quantify how performance degrades and whether the proposed Quicksort-Bradley-Terry variants retain their advantage under realistic noise levels. revision: yes
-
Referee: [Sampling and efficiency claims] The manuscript states that the methods 'can be combined with sampling techniques to significantly reduce the number of pairwise comparisons,' but provides no quantitative results on the trade-off between comparison count, ranking accuracy, and final portfolio quality. Without such results, the efficiency claim remains unverified.
Authors: We acknowledge that the efficiency claim is stated qualitatively without supporting numbers. We will add an experimental subsection (or appendix) that reports results for active sampling variants (e.g., uncertainty sampling and Quicksort-style partitioning) across different budgets of pairwise comparisons. Metrics will include Kendall-tau ranking correlation, portfolio value achieved, and the number of comparisons required to reach a target accuracy threshold, thereby providing the requested quantitative trade-off analysis. revision: yes
Circularity Check
No circularity detected; performance claims rest on independent empirical comparisons
full rationale
The paper proposes algorithmic methods that combine Quicksort with the Bradley-Terry model to aggregate agent-supplied pairwise win probabilities for ranking projects by estimated long-term benefit. The headline performance claim (superiority over two existing aggregation methods) is presented as an empirical outcome of testing these methods, not as a mathematical identity or fitted parameter renamed as a prediction. No self-definitional loops, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via self-citation appear in the abstract or description; the Bradley-Terry aggregation is a standard model applied to externally supplied probabilities, and the reported outperformance is evaluated against baselines outside the method definitions themselves. The derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
F. A. Csaszar, J. Eggers, Organizational decision making: An information aggregation view, Manag. Sci. 59 (10) (2013) 2257–2277
work page 2013
-
[2]
L. B ¨ottcher, R. Klingebiel, Organizational selection of innovation, Organ. Sci. (2024)
work page 2024
-
[3]
Wampler, A guide to participatory budgeting, in: A
B. Wampler, A guide to participatory budgeting, in: A. Shah (Ed.), Participatory Budgeting, The World Bank, Washington, DC, 2007, pp. 21–54
work page 2007
-
[4]
H. Aziz, N. Shah, Participatory budgeting: Models and approaches, in: T. Rudas, G. P ´eli (Eds.), Pathways Between Social Science and Computational Social Science: Theories, Methods, and Interpretations, Springer International Publishing, Cham, Switzerland, 2021, pp. 215–236
work page 2021
- [5]
-
[6]
D. Helbing, S. Mahajan, R. H. Fricker, A. Musso, C. I. Hausladen, C. Carissimo, D. Carpentras, E. Stockinger, J. A. Sanchez-Vaquerizo, J. C. Yang, et al., Democracy by design: Perspectives for digitally assisted, participatory upgrades of society, J. Comp. Sci. 71 (2023) 102061
work page 2023
-
[7]
J. C. Yang, C. I. Hausladen, D. Peters, E. Pournaras, R. Hnggli Fricker, D. Helbing, Designing digital voting systems for citizens: Achieving fairness and legitimacy in participatory budgeting, Digit. Gov.: Res. Pract. 5 (3) (2024) 1–30
work page 2024
- [8]
- [9]
-
[10]
J.-C. de Borda, M ´emoire sur les ´elections au scrutin, Histoire et M´emoires de l’Acad´emie royale des sciences (1781) 657–665. 13
- [11]
-
[12]
G. A. Miller, The magical number seven, plus or minus two: Some limits on our capacity for processing information, Psychol. Rev. 63 (2) (1956) 81
work page 1956
-
[13]
W. S. Torgerson, Theory and Methods of Scaling, Wiley, New York, NY , USA, 1958
work page 1958
-
[14]
F. Wauthier, M. Jordan, N. Jojic, E fficient ranking from pairwise comparisons, in: S. Dasgupta, D. McAllester (Eds.), Proceedings of the 30th International Conference on Machine Learning, V ol. 28 of Proceedings of Machine Learning Research, PMLR, Atlanta, Georgia, USA, 2013, pp. 109–117
work page 2013
-
[15]
N. Ailon, An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity, J. Mach. Learn. Res. 13 (2012) 137–164
work page 2012
-
[16]
M. Braverman, E. Mossel, Noisy sorting without resampling, in: S. Teng (Ed.), Proceedings of the Nineteenth Annual ACM-SIAM Sym- posium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008, SIAM, Philadelphia, PA, 2008, pp. 268–276
work page 2008
-
[17]
Sorting from Noisy Information
M. Braverman, E. Mossel, Sorting from noisy information, arXiv preprint arXiv:0910.1191 (2009)
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[18]
M. Braverman, J. Mao, S. M. Weinberg, Parallel algorithms for select and partition with noisy comparisons, in: Proceedings of the 48th annual ACM symposium on Theory of Computing, Association for Computing Machinery, New York, NY , USA, 2016, pp. 851–862
work page 2016
-
[19]
E. Zermelo, Die berechnung der turnier-ergebnisse als ein maximumproblem der wahrscheinlichkeitsrechnung, Math. Z. 29 (1) (1929) 436– 460
work page 1929
-
[20]
R. A. Bradley, M. E. Terry, Rank analysis of incomplete block designs: I. the method of paired comparisons, Biometrika 39 (3 /4) (1952) 324–345
work page 1952
-
[21]
L. R. Ford, Solution of a ranking problem from binary comparisons, Am. Math. Mon. 64 (1957) 28–33
work page 1957
-
[22]
D. R. Hunter, MM algorithms for generalized Bradley-Terry models, Ann. Stat. 32 (2004) 384–406
work page 2004
-
[23]
Newman, E fficient computation of rankings from pairwise comparisons, J
M. Newman, E fficient computation of rankings from pairwise comparisons, J. Mach. Learn. Res. 24 (238) (2023) 1–25
work page 2023
-
[24]
R. N. Pendergrass, R. A. Bradley, Ranking in triple comparisons, in: I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow, H. B. Mann (Eds.), Contributions to probability and statistics: Essays in honor of Harold Hotelling, Stanford University Press, Palo Alto, CA, USA, 1960, pp. 331–351
work page 1960
-
[25]
P. V . Rao, L. L. Kupper, Ties in paired-comparison experiments: A generalization of the Bradley–Terry model, J. Am. Stat. Assoc. 62 (1967) 194–204
work page 1967
-
[26]
R. R. Davidson, On extending the Bradley-Terry model to accommodate ties in paired comparison experiments, J. Am. Stat. Assoc. 65 (1970) 317–328
work page 1970
-
[27]
Agresti, Categorical Data Analysis, Wiley, New York, NY , USA, 1990
A. Agresti, Categorical Data Analysis, Wiley, New York, NY , USA, 1990
work page 1990
-
[28]
J. I. Marden, Analyzing and Modeling Rank Data, Chapman and Hall, London, UK, 1995
work page 1995
-
[29]
C. Muslimani, M. E. Taylor, Leveraging sub-optimal data for human-in-the-loop reinforcement learning, in: Proceedings of the 23rd Inter- national Conference on Autonomous Agents and Multiagent Systems, AAMAS ’24, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 2024, pp. 2399–2401
work page 2024
-
[30]
Wainer, A Bayesian Bradley-Terry model to compare multiple ML algorithms on multiple data sets, J
J. Wainer, A Bayesian Bradley-Terry model to compare multiple ML algorithms on multiple data sets, J. Mach. Learn. Res. 24 (341) (2023) 1–34
work page 2023
-
[31]
M. Peschl, A. Zgonnikov, F. A. Oliehoek, L. C. Siebert, Moral: Aligning ai with human norms through multi-objective reinforced active learning, in: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’22, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 2022, p. 1038–1046
work page 2022
- [32]
- [33]
-
[34]
R. Herbrich, T. Minka, T. Graepel, Trueskill ™: A bayesian skill rating system, in: B. Sch ¨olkopf, J. Platt, T. Ho ffman (Eds.), Advances in Neural Information Processing Systems, V ol. 19, MIT Press, 2006
work page 2006
-
[35]
T. P. Minka, R. Cleven, Y . Zaykov, TrueSkill 2: An improved Bayesian skill rating system (2018). URL https://api.semanticscholar.org/CorpusID:52906551
work page 2018
-
[36]
H. J. Einhorn, R. M. Hogarth, E. Klempner, Quality of group judgment., Psychol. Bull. 84 (1) (1977) 158
work page 1977
-
[37]
S. E. Humphrey, J. R. Hollenbeck, C. J. Meyer, D. R. Ilgen, Hierarchical team decision-making, Research in Personnel and Human Resources Management 21 (2002) 175–213
work page 2002
-
[38]
C. I. Hausladen, R. H ¨anggli Fricker, D. Helbing, R. Kunz, J. Wang, E. Pournaras, How voting rules impact legitimacy, Humanit. Soc. Sci. Commun. 11 (1) (2024) 1–10
work page 2024
-
[39]
C. Al ´os-Ferrer, J. Buckenmaier, V oting for compromises: alternative voting methods in polarized societies, University of Zurich, Department of Economics, Working Paper (394) (2021)
work page 2021
- [40]
-
[41]
C. Zopounidis, M. Doumpos, Multicriteria classification and sorting methods: A literature review, Eur. J. Oper. Res. 138 (2002) 229–246
work page 2002
-
[42]
X. Bai, C. Coester, Sorting with predictions, in: A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine (Eds.), Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Curran Associates Inc., Red Hook, NY , USA, 2024, pp. 26563–26584
work page 2024
-
[43]
P. Lu, X. Ren, E. Sun, Y . Zhang, Generalized sorting with predictions, in: H. V . Le, V . King (Eds.), 4th Symposium on Simplicity in Algorithms, SOSA, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2021, pp. 111–117
work page 2021
-
[44]
T. H. Chan, E. Sun, B. Wang, Generalized sorting with predictions revisited, in: M. Li, X. Sun, X. Wu (Eds.), Frontiers of Algorithmics - 17th International Joint Conference, IJTCS-FAW 2023 Macau, China, August 14-18, 2023, Proceedings, V ol. 13933 of Lecture Notes in Computer Science, Springer International Publishing, Cham, Switzerland, 2023, pp. 29–41
work page 2023
- [45]
-
[46]
Geissmann, Longest increasing subsequence under persistent comparison errors, Theory Comput
B. Geissmann, Longest increasing subsequence under persistent comparison errors, Theory Comput. Syst. 64 (2020) 662–680
work page 2020
-
[47]
Geissmann, Longest increasing subsequence under persistent comparison errors, Theory Comput
B. Geissmann, Longest increasing subsequence under persistent comparison errors, Theory Comput. Syst. 64 (4) (2020) 662–680
work page 2020
-
[48]
Kress, Numerical Analysis, V ol
R. Kress, Numerical Analysis, V ol. 181, Springer Science & Business Media, Berlin, Germany, 2012
work page 2012
-
[49]
Hotelling, Stability in competition, Econ
H. Hotelling, Stability in competition, Econ. J. 39 (153) (1929) 41–57
work page 1929
-
[50]
Novshek, Equilibrium in simple spatial (or di fferentiated product) models, in: A
W. Novshek, Equilibrium in simple spatial (or di fferentiated product) models, in: A. Mas-Colell (Ed.), Noncooperative Approaches to the Theory of Perfect Competition, Academic Press, Cambridge, MA, USA, 1982, pp. 199–212
work page 1982
-
[51]
C. A. Hoare, Quicksort, Comput. J. 5 (1962) 10–16
work page 1962
-
[52]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to algorithms, MIT Press, Cambridge, MA, USA, 2022
work page 2022
-
[53]
Y . Ge, D. Yang, A. L. Bertozzi, Iterative active learning strategies for subgraph matching, Pattern Recognit. 158 (2025) 110797
work page 2025
- [54]
- [55]
-
[56]
A. Stomakhin, M. B. Short, A. L. Bertozzi, Reconstruction of missing data in social networks based on temporal patterns of interactions, Inverse Probl. 27 (11) (2011) 115013
work page 2011
-
[57]
L. B ¨ottcher, J. Nagler, H. J. Herrmann, Critical behaviors in contagion dynamics, Phys. Rev. Lett. 118 (8) (2017) 088301
work page 2017
-
[58]
J. R. Zipkin, F. P. Schoenberg, K. Coronges, A. L. Bertozzi, Point-process models of social network interactions: Parameter estimation and missing data recovery, Eur. J. Appl. Math. 27 (3) (2016) 502–529. 15
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.