Ranking matters: Does the new format select the best teams for the knockout phase in the UEFA Champions League?

Andr\'as Gyimesi; Dries Goossens; Frits Spieksma; Karel Devriesere; L\'aszl\'o Csat\'o; Roel Lambers

arxiv: 2503.13569 · v2 · submitted 2025-03-17 · ⚛️ physics.soc-ph · math.OC· stat.AP

Ranking matters: Does the new format select the best teams for the knockout phase in the UEFA Champions League?

L\'aszl\'o Csat\'o , Karel Devriesere , Dries Goossens , Andr\'as Gyimesi , Roel Lambers , Frits Spieksma This is my paper

Pith reviewed 2026-05-22 23:48 UTC · model grok-4.3

classification ⚛️ physics.soc-ph math.OCstat.AP

keywords UEFA Champions Leagueleague phaseranking methodsincomplete round robinpoints systemtournament rankingsports competition

0 comments

The pith

The official points-based ranking in the new UEFA Champions League league phase may not select the best teams.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the new league phase format of the UEFA Champions League, where 36 teams compete in an incomplete round robin. Because teams face different sets of opponents with varying strengths, direct comparison by points is complicated. Several established ranking methods for such tournaments are applied to the 2024/25 season data. These methods produce orderings that differ from the official standings. The results indicate that the points system may not reliably identify the strongest teams for the knockout phase.

Core claim

The paper claims that applying several well-known ranking methods for incomplete round robin tournaments to the 2024/25 UEFA Champions League league phase shows inconsistencies with the official ranking, making it doubtful whether the currently used point-based system provides the best ranking of the teams.

What carries the argument

Ranking methods for incomplete round robin tournaments, used to test the robustness of the official points-based ranking.

Load-bearing premise

That the alternative ranking methods applied are more appropriate or accurate than the official points system for determining advancement.

What would settle it

A direct comparison of which teams advance under the points system versus the alternative methods, checked against their actual performance in the subsequent knockout phase.

read the original abstract

Starting in the 2024/25 season, the Union of European Football Associations (UEFA) has fundamentally changed the format of its club competitions: the group stage has been replaced by a league phase played by 36 teams in an incomplete round robin format. This makes ranking the teams based on their results challenging because teams play against different sets of opponents, whose strengths vary. In this research note, we apply several well-known ranking methods for incomplete round robin tournaments to the 2024/25 UEFA Champions League league phase in order to check the robustness of the official ranking, as well as to call the attention of organizers to the non-trivial issue of ranking in these competitions. Our results show that it is doubtful whether the currently used point-based system provides the best ranking of the teams.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper runs standard ranking methods on the 2024/25 UCL league phase and finds order differences from the points system, but supplies no independent test to decide which ranking is better.

read the letter

The main point is that several established ranking methods applied to the actual 2024/25 Champions League results produce different team orders than UEFA's official points table. The authors flag this as a reason to question the points system for the new incomplete schedule format. That is the entire contribution in a nutshell. They do the straightforward thing of taking published results and feeding them into known methods from the ranking literature, which is a reasonable first step for a research note on a live tournament change. The work is honest about its scope and draws attention to a real operational issue for organizers. The soft spot is exactly the one in the stress-test note: showing that other methods disagree does not by itself make the points system doubtful or suboptimal. Without a separate criterion—predictive accuracy on knockout games, recovery of team strength, or consistency with a stated fairness rule—the differences are just evidence of sensitivity, not evidence of error. The abstract gives no details on data cleaning, exact implementations, or any validation step, so a reader cannot judge whether the shifts are large, stable, or driven by a few matches. This is a short applied note for people who track football analytics or tournament rules. It is not advancing new methods or settling a methodological question. A serious editor could still send it to referees to check the calculations and ask for a clearer justification of why disagreement matters here. I would bring it to a reading group if the topic was current sports ranking problems, but I would not cite it in my own work.

Referee Report

2 major / 2 minor

Summary. The manuscript applies several well-known ranking methods for incomplete round-robin tournaments (e.g., Massey, Colley, and eigenvector-based approaches) to the 2024/25 UEFA Champions League league phase. It compares the resulting team orderings against the official points-based ranking and concludes that discrepancies indicate it is doubtful whether the points system provides the best ranking for selecting teams for the knockout phase.

Significance. The work usefully illustrates the sensitivity of rankings to method choice in unbalanced schedules with heterogeneous opponents, a timely issue given the new UCL format. The use of real 2024/25 data provides a concrete case study. If an independent validation criterion were supplied showing that alternatives outperform points on a measurable objective (e.g., predictive power), the findings could inform tournament design; absent that, the significance for questioning the official system remains limited.

major comments (2)

[Abstract] Abstract: The central claim that 'it is doubtful whether the currently used point-based system provides the best ranking of the teams' is not supported by any independent criterion for 'best.' The manuscript reports rank differences across methods but supplies no validation (such as out-of-sample prediction of knockout outcomes, recovery of latent strengths, or alignment with a fairness axiom) demonstrating that the alternatives are superior to points under its own design goals.
[Abstract] The abstract states a conclusion but provides no details on data handling, exact implementations of the ranking methods, statistical tests for rank differences, or robustness checks; this prevents verification that observed discrepancies are meaningful rather than artifacts of arbitrary choices or incomplete data.

minor comments (2)

Consider adding a table or figure explicitly listing the top 8-10 teams under each method alongside their official points positions and any qualification changes.
Clarify in the methods whether the alternative rankings were computed on the full observed results or adjusted for schedule imbalance beyond the standard formulations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond point-by-point to the major comments below.

read point-by-point responses

Referee: [Abstract] The central claim that 'it is doubtful whether the currently used point-based system provides the best ranking of the teams' is not supported by any independent criterion for 'best.' The manuscript reports rank differences across methods but supplies no validation (such as out-of-sample prediction of knockout outcomes, recovery of latent strengths, or alignment with a fairness axiom) demonstrating that the alternatives are superior to points under its own design goals.

Authors: The manuscript is a short research note whose goal is to apply established ranking methods for incomplete round-robin tournaments to the 2024/25 UCL league-phase data and to demonstrate that the official points ranking differs from orderings produced by Massey, Colley, and eigenvector approaches. These discrepancies indicate that the points system is not robust to reasonable methodological alternatives, thereby raising doubt about whether it yields the 'best' ranking when no consensus definition of 'best' exists for unbalanced schedules. The note does not claim or attempt to demonstrate superiority of any alternative via an external validation criterion, as that would require a different study; its purpose is to highlight the non-trivial ranking problem created by the new format. We are prepared to revise the abstract to clarify this scope and moderate the concluding language. revision: partial
Referee: [Abstract] The abstract states a conclusion but provides no details on data handling, exact implementations of the ranking methods, statistical tests for rank differences, or robustness checks; this prevents verification that observed discrepancies are meaningful rather than artifacts of arbitrary choices or incomplete data.

Authors: The abstract is deliberately concise, as is conventional for a research note. The main text supplies the data source (official UEFA match results), the precise formulations and references for the Massey, Colley, and eigenvector methods, and the direct rank comparisons. We will revise the abstract to include a brief statement on the data and methods employed. Additional statistical tests or robustness checks can be added to the body of the paper during revision if the editor deems them necessary. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of independent ranking methods on external data

full rationale

The paper applies several established, well-known ranking methods (such as those for incomplete round-robin tournaments) directly to the observed 2024/25 UCL league phase match results and compares the resulting rankings to the official points system. There are no equations, fitted parameters, predictions derived from subsets of the data, self-citations invoked as uniqueness theorems, or ansatzes smuggled in. The central claim of doubt regarding the points system rests on observed discrepancies between independent methods and the official ranking, which is a standard empirical sensitivity check rather than any reduction by construction. The derivation chain is self-contained against external benchmarks (real match outcomes) with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5701 in / 879 out tokens · 32253 ms · 2026-05-22T23:48:47.921841+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

P., Kreutzer, E., Langville, A

Chartier, T. P., Kreutzer, E., Langville, A. N., and Pedings, K. E. (2011). Sensitivity and stability of ranking vectors. SIAM Journal on Scientific Computing , 33(3):1077–1102. 8

work page 2011
[2]

Chebotarev, P. Yu. (1994). Aggregation of preferences by the generalized row sum method. Mathematical Social Sciences, 27(3):293–320

work page 1994
[3]

Colley, W. (2002). Colley’s bias free college football ranking method.Princeton, NJ, USA: Princeton University . Csató, L. (2017). On the ranking of a Swiss system chess team tournament. Annals of Operations Research , 254(1- 2):17–36. Csató, L. (2021). Coronavirus and sports leagues: obtaining a fair ranking when the season cannot resume. IMA Journal of ...

work page 2002
[4]

Dabadghao, S. S. and Vaziri, B. (2022). The predictive power of popular sports ranking methods in the NFL, NBA, and NHL. Operational Research, 22(3):2767–2783

work page 2022
[5]

and Karwan, M

DeHollander, A. and Karwan, M. (2025). Improving strength of schedule metrics in sports scheduling. Journal of Quantitative Analysis in Sports, in press. DOI: 10.1515/jqas-2024-0171

work page doi:10.1515/jqas-2024-0171 2025
[6]

and Treloar, T

Devlin, S. and Treloar, T. (2018). A network diffusion ranking family that includes the methods of Markov, Massey, and Colley. Journal of Quantitative Analysis in Sports , 14(3):91–101

work page 2018
[7]

and Taylor, B

Fearnhead, P. and Taylor, B. M. (2010). Calculating strength of schedule, and choosing teams for March Madness. The American Statistician, 64(2):108–115. FIFA (2018). Revision of the FIFA / Coca-Cola World Ranking. https://digitalhub.fifa.com/m/ f99da4f73212220/original/edbm045h0udbwkqew35a-pdf.pdf

work page 2010
[8]

Freixas, J. (2022). The decline of the Buchholz tiebreaker system: A preferable alternative. In Nguyen, N. T., Kowal- czyk, R., Mercik, J., and Motylska-Kuźma, A., editors,Transactions on Computational Collective Intelligence XXXVII, pages 1–20. Springer, Berlin, Heidelberg, Germany. González-Díaz, J., Hendrickx, R., and Lohmann, E. (2014). Paired compari...

work page 2022
[9]

Keener, J. P. (1993). The Perron–Frobenius theorem and the ranking of football teams. SIAM Review, 35(1):80–93

work page 1993
[10]

and Spieksma, F

Lambers, R. and Spieksma, F. C. R. (2020). True rankings. Manuscript. https://www.euro-online.org/ websites/orinsports/wp-content/uploads/sites/10/2020/05/TrueRanking.pdf

work page 2020
[11]

Landau, E. (1895). Zur relativen Wertbemessung der Turnierresultate.Deutsches Wochenschach, 11:366–369. https: //books.google.nl/books?id=rDr8AmfYCFkC&pg=PA366. Lapré, M. A. and Palazzolo, E. M. (2022). Quantifying the impact of imbalanced groups in FIFA Women’s World Cup tournaments 1991–2019. Journal of Quantitative Analysis in Sports , 18(3):187–199

work page 2022
[12]

Lasek, J., Szlávik, Z., and Bhulai, S. (2013). The predictive power of ranking systems in association football. Interna- tional Journal of Applied Pattern Recognition , 1(1):27–46. Leiva Bertrán, F. (2025). Ranking in incomplete tournaments: The generalized win percentage method, efficiency, and NCAA football. Journal of Sports Economics , 26(1):3–34

work page 2013
[13]

and Ziegler, G

Sinn, R. and Ziegler, G. M. (2022). Landau on chess tournaments and Google’s PageRank. Manuscript. DOI: 10.48550/arXiv.2210.17300

work page doi:10.48550/arxiv.2210.17300 2022
[14]

Stefani, R. T. (1977). Football and basketball predictions using least squares. IEEE Transactions on Systems, Man, and Cybernetics, 7(2):117–121

work page 1977
[15]

Stefani, R. T. (1980). Improved least squares football, basketball, and soccer predictions.IEEE Transactions on Systems, Man, and Cybernetics, 10(2):116–123

work page 1980
[16]

Vaziri, B., Dabadghao, S., Yih, Y., and Morin, T. L. (2018). Properties of sports ranking methods. Journal of the Operational Research Society, 69(5):776–787. 9 Appendix Table A.1: Rankings with the Generalized Row Sum method in the 2024/25 UEFA Champions League league phase Parameter (ε) Ranking 0 0.01 0.1 0.25 0.5 0.75 1 2 5 10 100 ∞ Liverpool 1 1 1 1 1...

work page 2018

[1] [1]

P., Kreutzer, E., Langville, A

Chartier, T. P., Kreutzer, E., Langville, A. N., and Pedings, K. E. (2011). Sensitivity and stability of ranking vectors. SIAM Journal on Scientific Computing , 33(3):1077–1102. 8

work page 2011

[2] [2]

Chebotarev, P. Yu. (1994). Aggregation of preferences by the generalized row sum method. Mathematical Social Sciences, 27(3):293–320

work page 1994

[3] [3]

Colley, W. (2002). Colley’s bias free college football ranking method.Princeton, NJ, USA: Princeton University . Csató, L. (2017). On the ranking of a Swiss system chess team tournament. Annals of Operations Research , 254(1- 2):17–36. Csató, L. (2021). Coronavirus and sports leagues: obtaining a fair ranking when the season cannot resume. IMA Journal of ...

work page 2002

[4] [4]

Dabadghao, S. S. and Vaziri, B. (2022). The predictive power of popular sports ranking methods in the NFL, NBA, and NHL. Operational Research, 22(3):2767–2783

work page 2022

[5] [5]

and Karwan, M

DeHollander, A. and Karwan, M. (2025). Improving strength of schedule metrics in sports scheduling. Journal of Quantitative Analysis in Sports, in press. DOI: 10.1515/jqas-2024-0171

work page doi:10.1515/jqas-2024-0171 2025

[6] [6]

and Treloar, T

Devlin, S. and Treloar, T. (2018). A network diffusion ranking family that includes the methods of Markov, Massey, and Colley. Journal of Quantitative Analysis in Sports , 14(3):91–101

work page 2018

[7] [7]

and Taylor, B

Fearnhead, P. and Taylor, B. M. (2010). Calculating strength of schedule, and choosing teams for March Madness. The American Statistician, 64(2):108–115. FIFA (2018). Revision of the FIFA / Coca-Cola World Ranking. https://digitalhub.fifa.com/m/ f99da4f73212220/original/edbm045h0udbwkqew35a-pdf.pdf

work page 2010

[8] [8]

Freixas, J. (2022). The decline of the Buchholz tiebreaker system: A preferable alternative. In Nguyen, N. T., Kowal- czyk, R., Mercik, J., and Motylska-Kuźma, A., editors,Transactions on Computational Collective Intelligence XXXVII, pages 1–20. Springer, Berlin, Heidelberg, Germany. González-Díaz, J., Hendrickx, R., and Lohmann, E. (2014). Paired compari...

work page 2022

[9] [9]

Keener, J. P. (1993). The Perron–Frobenius theorem and the ranking of football teams. SIAM Review, 35(1):80–93

work page 1993

[10] [10]

and Spieksma, F

Lambers, R. and Spieksma, F. C. R. (2020). True rankings. Manuscript. https://www.euro-online.org/ websites/orinsports/wp-content/uploads/sites/10/2020/05/TrueRanking.pdf

work page 2020

[11] [11]

Landau, E. (1895). Zur relativen Wertbemessung der Turnierresultate.Deutsches Wochenschach, 11:366–369. https: //books.google.nl/books?id=rDr8AmfYCFkC&pg=PA366. Lapré, M. A. and Palazzolo, E. M. (2022). Quantifying the impact of imbalanced groups in FIFA Women’s World Cup tournaments 1991–2019. Journal of Quantitative Analysis in Sports , 18(3):187–199

work page 2022

[12] [12]

Lasek, J., Szlávik, Z., and Bhulai, S. (2013). The predictive power of ranking systems in association football. Interna- tional Journal of Applied Pattern Recognition , 1(1):27–46. Leiva Bertrán, F. (2025). Ranking in incomplete tournaments: The generalized win percentage method, efficiency, and NCAA football. Journal of Sports Economics , 26(1):3–34

work page 2013

[13] [13]

and Ziegler, G

Sinn, R. and Ziegler, G. M. (2022). Landau on chess tournaments and Google’s PageRank. Manuscript. DOI: 10.48550/arXiv.2210.17300

work page doi:10.48550/arxiv.2210.17300 2022

[14] [14]

Stefani, R. T. (1977). Football and basketball predictions using least squares. IEEE Transactions on Systems, Man, and Cybernetics, 7(2):117–121

work page 1977

[15] [15]

Stefani, R. T. (1980). Improved least squares football, basketball, and soccer predictions.IEEE Transactions on Systems, Man, and Cybernetics, 10(2):116–123

work page 1980

[16] [16]

Vaziri, B., Dabadghao, S., Yih, Y., and Morin, T. L. (2018). Properties of sports ranking methods. Journal of the Operational Research Society, 69(5):776–787. 9 Appendix Table A.1: Rankings with the Generalized Row Sum method in the 2024/25 UEFA Champions League league phase Parameter (ε) Ranking 0 0.01 0.1 0.25 0.5 0.75 1 2 5 10 100 ∞ Liverpool 1 1 1 1 1...

work page 2018