arxiv: 2604.09143 · v1 · submitted 2026-04-10 · 💻 cs.LG · stat.ME

Recognition: unknown

Score-Driven Rating System for Sports

Vladim\'ir Hol\'y , Michal \v{C}ern\'y

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:49 UTC · model grok-4.3

classification 💻 cs.LG stat.ME

keywords score-driven ratingElo generalizationlog-likelihood gradientsports performance modelingdynamic rating systemreversion propertyprobabilistic outcome models

0 comments

The pith

Using the gradient of the log-likelihood as the update rule produces a rating system that generalizes Elo while enforcing fairness and reversion to true skills.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper replaces ad-hoc update rules in rating systems with the score, which is the gradient of the log-likelihood of observed game results. This single change works for win-loss, point differences, draws, or full rankings because any differentiable probabilistic outcome model supplies the needed gradient. The resulting updates are shown to have zero expected value for each player, to sum exactly to zero across all players, and to decrease as a player's rating rises. These properties together guarantee that the system stays internally consistent and that ratings drift back toward the unobserved true skills over repeated games.

Core claim

The score-driven rating system employs the gradient of the log-likelihood of game outcomes as the direct mechanism for adjusting player ratings. This choice yields four properties: the expected score is zero, scores sum to zero over all players, the score is a decreasing function of a player's current rating, and the overall system exhibits reversion toward the latent true skills.

What carries the argument

The score, defined as the gradient of the log-likelihood of the observed outcomes with respect to each player's rating parameter.

If this is right

Ratings remain internally consistent because their sum is identically zero after every update.
No player receives a systematic positive or negative bias because the expected score is zero.
Higher-rated players receive smaller positive updates and larger negative updates than lower-rated ones.
Ratings automatically revert toward the underlying true skills whenever the probabilistic model is correctly specified.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Any existing dynamic rating model whose update can be expressed as a likelihood gradient can be re-derived inside this single framework.
The same score mechanism could be applied to non-sports ranking problems such as recommendation systems or credit scoring whenever outcomes are probabilistic.

Load-bearing premise

A probabilistic model for game outcomes exists whose log-likelihood is differentiable with respect to the rating parameters.

What would settle it

Generate repeated match outcomes from a fixed known true-skill distribution, apply the score updates, and verify whether the vector of ratings continues to sum exactly to zero and whether the ratings converge in expectation to the true skills; systematic deviation in either property would refute the claimed guarantees.

Figures

Figures reproduced from arXiv: 2604.09143 by Michal \v{C}ern\'y, Vladim\'ir Hol\'y.

**Figure 2.** Figure 2: The score function for a win/loss game outcome, modeled by ( [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: The score function for a margin of victory game outcome, modeled by ( [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The score function for a win/draw/loss game outcome, modeled by ( [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: The score function for a ranking game outcome with three players, modeled by ( [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Simulated paths of score-driven ratings for three players. Players are paired randomly for [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

read the original abstract

This paper introduces a score-driven rating system, a generalization of the classical Elo rating system that employs the score, i.e. the gradient of the log-likelihood, as the updating mechanism for player and team ratings. The proposed framework extends beyond simple win/loss game outcomes and accommodates a wide range of game results, such as point differences, win/draw/loss outcomes, or complete rankings. Theoretical properties of the score are derived, showing that it has zero expected value, sums to zero across all players, and decreases with increasing value of a player's rating, thereby ensuring internal consistency and fairness. Furthermore, the score-driven rating system exhibits a reversion property, meaning that ratings tend to follow the underlying unobserved true skills over time. The proposed framework provides a theoretical rationale for existing dynamic models of sports performance and offers a systematic approach for constructing new ones.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper cleanly generalizes Elo by using the score from a differentiable outcome model as the update rule and derives its zero-mean, sum-to-zero, and reversion properties from standard regularity conditions.

read the letter

The main point is that this paper replaces the usual Elo-style win/loss adjustment with the gradient of the log-likelihood (the score) from a probabilistic model of the game result. That single change lets the same framework handle point differences, win-draw-loss, or full rankings, and the authors derive that the score has zero expectation, sums to zero across players, decreases in a player's rating, and produces ratings that revert toward the unobserved true skills. These properties follow directly from the usual regularity conditions on the likelihood and the fact that ratings are only identified up to an additive constant, so the derivations are straightforward and internally consistent.

Referee Report

0 major / 4 minor

Summary. The paper introduces a score-driven rating system for sports as a generalization of the classical Elo system. Ratings are updated using the score, defined as the gradient of the log-likelihood of observed game outcomes with respect to the rating parameters. The framework accommodates a range of outcome types, including point differences, win/draw/loss, and full rankings. Theoretical properties are derived for the score: zero expected value under the model, summation to zero across all players, and monotonic decrease in a given player's rating. The system is also shown to exhibit a reversion property, in which ratings track the underlying unobserved true skills over time. This provides a unified rationale for existing dynamic rating models and a systematic way to construct new ones.

Significance. If the derivations hold, the work supplies a statistically grounded unification of rating systems that explains the success of existing models (such as Elo and its variants) via standard properties of maximum-likelihood estimation and invariance. The zero-expectation, zero-sum, and monotonicity properties ensure internal consistency and fairness without ad-hoc adjustments, while the reversion property justifies the use of dynamic updates in non-stationary skill settings. This could facilitate the development of new, probabilistically justified rating systems for sports analytics and prediction tasks.

minor comments (4)

The abstract and introduction repeat the list of derived properties almost verbatim; a single consolidated statement of the main theorems would improve readability.
Notation for the score function (gradient of the log-likelihood) and the link function should be introduced with a dedicated equation early in Section 2 rather than inline in the text.
The reversion property is stated qualitatively; adding a brief remark on the expected magnitude of the update (e.g., via the Fisher information or Hessian) would strengthen the dynamic analysis without altering the central claim.
A short table or example comparing the score-driven update to the classical Elo update for a simple win/loss model would help readers see the concrete difference.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript and for recommending minor revision. The report provides a clear summary of the paper's contributions without raising any specific major comments or concerns. We have prepared a revised version that incorporates minor improvements for clarity and presentation.

Circularity Check

0 steps flagged

No significant circularity; derivations follow from standard likelihood properties

full rationale

The claimed properties (zero expected score, cross-player summation to zero, monotonic decrease in rating, and reversion) are derived directly from the definition of the score as the gradient of a differentiable log-likelihood under standard regularity conditions (e.g., interchange of derivative and expectation, invariance to uniform rating shifts for relative outcomes). These hold for the general class of models (point differences, win/draw/loss, rankings) without reducing to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The paper presents them as consequences of the probabilistic setup rather than tautological restatements, and the framework remains self-contained against external benchmarks like classical Elo or dynamic models. No steps meet the criteria for circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the framework rests on standard statistical assumptions about outcome models rather than new postulates. No free parameters or invented entities are explicitly introduced in the summary.

axioms (1)

domain assumption Game outcomes admit a probabilistic model with a differentiable log-likelihood function whose gradient defines the update score.
This is required to define the score-driven mechanism and derive its listed properties.

pith-pipeline@v0.9.0 · 5437 in / 1376 out tokens · 51203 ms · 2026-05-10T16:49:03.700061+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 19 canonical work pages

[1]

(2017): Elo ratings and the sports model: A neglected topic in applied probability? Statist

Aldous D (2017). Elo Ratings and the Sports Model: A Neglected Topic in Applied Probability? Statistical Science, 32(4), 616--629. ISSN 0883-4237. https://doi.org/10.1214/17-sts628

work page doi:10.1214/17-sts628 2017
[2]

GENERALIZED AUTOREGRESSIVE SCORE MODELS WITH APPLICATIONS , year =

Creal D, Koopman SJ, Lucas A (2013). Generalized Autoregressive Score Models with Applications. Journal of Applied Econometrics, 28(5), 777--795. ISSN 0883-7252. https://doi.org/10.1002/jae.1279

work page doi:10.1002/jae.1279 2013
[3]

First (?) Occurrence of Common Terms in Mathematical Statistics

David HA (1995). First (?) Occurrence of Common Terms in Mathematical Statistics. The American Statistician, 49(2), 121--133. ISSN 0003-1305. https://doi.org/10.1080/00031305.1995.10476129

work page doi:10.1080/00031305.1995.10476129 1995
[4]

The Rating of Chessplayers: Past and Present

Elo AE (1978). The Rating of Chessplayers: Past and Present. Arco Publishing, New York. ISBN 978-0-668-04721-0

1978
[5]

Dominant

Fisher RA (1935). The Detection of Linkage with "Dominant" Abnormalities. Annals of Eugenics, 6(2), 187--201. ISSN 2050-1420. https://doi.org/10.1111/j.1469-1809.1935.tb02227.x

work page doi:10.1111/j.1469-1809.1935.tb02227.x 1935
[6]

Paired Comparison Models with Strength-Dependent Ties and Order Effects

Glickman ME (2025). Paired Comparison Models with Strength-Dependent Ties and Order Effects

2025
[7]

The Analysis and Forecasting of Tennis Matches by Using a High Dimensional Dynamic Model

Gorgi P, Koopman SJ, Lit R (2019). The Analysis and Forecasting of Tennis Matches by Using a High Dimensional Dynamic Model. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(4), 1393--1409. ISSN 0964-1998. https://doi.org/10.1111/rssa.12464

work page doi:10.1111/rssa.12464 2019
[8]

Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series

Harvey AC (2013). Dynamic Models for Volatility and Heavy Tails: With Applications to Financial and Economic Time Series. Cambridge University Press, New York. ISBN 978-1-107-63002-4. https://doi.org/10.1017/cbo9781139540933

work page doi:10.1017/cbo9781139540933 2013
[9]

Analyzing and Forecasting Success in the Men's Ice Hockey World (Junior) Championships Using a Dynamic Ranking Model

Hol \'y V (2025). Analyzing and Forecasting Success in the Men's Ice Hockey World (Junior) Championships Using a Dynamic Ranking Model. Journal of Quantitative Analysis in Sports. ISSN 1559-0410. https://doi.org/10.1515/jqas-2024-0137

work page doi:10.1515/jqas-2024-0137 2025
[10]

Modelling Time-Varying Rankings with Autoregressive and Score-Driven Dynamics

Hol \'y V, Zouhar J (2022). Modelling Time-Varying Rankings with Autoregressive and Score-Driven Dynamics. Journal of the Royal Statistical Society: Series C (Applied Statistics), 71(5), 1427--1450. ISSN 0035-9254. https://doi.org/10.1111/rssc.12584

work page doi:10.1111/rssc.12584 2022
[11]

Estimation of Non-Normalized Statistical Models by Score Matching

Hyv \"a rinen A (2005). Estimation of Non-Normalized Statistical Models by Score Matching. Journal of Machine Learning Research, 6(24), 695--709. ISSN 1533-7928

2005
[12]

(2021): How to extend E lo: a B ayesian perspective, Journal of Quantitative Analysis in Sports, 17, 203--219, ://doi.org/10.1515/jqas-2020-0066

Ingram M (2021). How to Extend Elo: A Bayesian Perspective. Journal of Quantitative Analysis in Sports, 17(3), 203--219. ISSN 2194-6388. https://doi.org/10.1515/jqas-2020-0066

work page doi:10.1515/jqas-2020-0066 2021
[13]

Forecasting Football Match Results in National League Competitions Using Score-Driven Time Series Models

Koopman SJ, Lit R (2019). Forecasting Football Match Results in National League Competitions Using Score-Driven Time Series Models. International Journal of Forecasting, 35(2), 797--809. ISSN 0169-2070. https://doi.org/10.1016/j.ijforecast.2018.10.011

work page doi:10.1016/j.ijforecast.2018.10.011 2019
[14]

Interpretable Sports Team Rating Models Based on the Gradient Descent Algorithm

Lasek J, Gagolewski M (2021). Interpretable Sports Team Rating Models Based on the Gradient Descent Algorithm. International Journal of Forecasting, 37(3), 1061--1071. ISSN 0169-2070. https://doi.org/10.1016/j.ijforecast.2020.11.008

work page doi:10.1016/j.ijforecast.2020.11.008 2021
[15]

Individual Choice Behavior: A Theoretical Analysis

Luce RD (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley, New York. ISBN 978-0-486-44136-8

1959
[16]

Regression Models for Ordinal Data

McCullagh P (1980). Regression Models for Ordinal Data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2), 109--127. ISSN 1369-7412. https://doi.org/10.1111/j.2517-6161.1980.tb01109.x

work page doi:10.1111/j.2517-6161.1980.tb01109.x 1980
[17]

Plackett

Plackett RL (1975). The Analysis of Permutations. Journal of the Royal Statistical Society: Series C (Applied Statistics), 24(2), 193--202. ISSN 0035-9254. https://doi.org/10.2307/2346567

work page doi:10.2307/2346567 1975
[18]

Generalizing the Elo Rating System for Multiplayer Games and Races: Why Endurance is Better Than Speed

Powell B (2023). Generalizing the Elo Rating System for Multiplayer Games and Races: Why Endurance is Better Than Speed. Journal of Quantitative Analysis in Sports, 19(3), 223--243. ISSN 2194-6388. https://doi.org/10.1515/jqas-2023-0004

work page doi:10.1515/jqas-2023-0004 2023
[19]

Large Sample Tests of Statistical Hypotheses Concerning Several Parameters with Applications to Problems of Estimation

Rao CR (1948). Large Sample Tests of Statistical Hypotheses Concerning Several Parameters with Applications to Problems of Estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44(1), 50--57. ISSN 0305-0041. https://doi.org/10.1017/S0305004100023987

work page doi:10.1017/s0305004100023987 1948
[20]

Theory of Statistics

Schervish MJ (1995). Theory of Statistics. Springer, New York. ISBN 978-0-387-94546-0. https://doi.org/10.1007/978-1-4612-4250-5

work page doi:10.1007/978-1-4612-4250-5 1995
[21]

The Frequency Distribution of the Difference Between Two Poisson Variates Belonging to Different Populations

Skellam JG (1946). The Frequency Distribution of the Difference Between Two Poisson Variates Belonging to Different Populations. Journal of the Royal Statistical Society, 109(3), 296. ISSN 0952-8385. https://doi.org/10.2307/2981372

work page doi:10.2307/2981372 1946
[22]

G-Elo: Generalization of the Elo Algorithm by Modelling the Discretized Margin of Victory

Szczecinski L (2022). G-Elo: Generalization of the Elo Algorithm by Modelling the Discretized Margin of Victory. Journal of Quantitative Analysis in Sports, 18(1), 1--14. ISSN 2194-6388. https://doi.org/10.1515/jqas-2020-0115

work page doi:10.1515/jqas-2020-0115 2022
[23]

Understanding Draws in Elo Rating Algorithm

Szczecinski L, Djebbi A (2020). Understanding Draws in Elo Rating Algorithm. Journal of Quantitative Analysis in Sports, 16(3), 211--220. ISSN 1559-0410. https://doi.org/10.1515/jqas-2019-0102

work page doi:10.1515/jqas-2019-0102 2020