Rethinking player evaluation in sports: Goals above expectation and beyond

Lucas Kook; Robert Bajons

arxiv: 2509.20083 · v2 · pith:DWTOQRDQnew · submitted 2025-09-24 · 📊 stat.AP

Rethinking player evaluation in sports: Goals above expectation and beyond

Robert Bajons , Lucas Kook This is my paper

Pith reviewed 2026-05-22 11:59 UTC · model grok-4.3

classification 📊 stat.AP

keywords player evaluationdouble machine learninggoals above expectationsports analyticssemiparametric modelsfrequentist inferenceresidualization

0 comments

The pith

Residualized machine learning metrics enable valid frequentist inference for player performance in sports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that uses flexible machine learning to estimate expected outcomes for events like shots in soccer, then derives player metrics such as goals above expectation from the difference between actual and expected results. Standard versions of these metrics suffer from bias and do not support reliable statistical inference when the underlying models are complex. By adding a residualization step drawn from double machine learning, the framework corrects for player involvement and restores valid frequentist properties. This matters for sports analysts because it turns point estimates into quantities that can be tested for significance, allowing identification of players who genuinely outperform expectations. The approach is also connected to semiparametric models that estimate directional player effects.

Core claim

Metrics based on differences between observed and model-predicted outcomes are equivalent to Rao's score tests in parametric regressions for the expected outcome; residualized versions of these metrics, obtained by additionally regressing on player involvement, inherit the Neyman orthogonality and rate conditions of double machine learning and therefore permit valid inference even when flexible nuisance estimators are used.

What carries the argument

Residualized outcome-difference metrics constructed via double machine learning, which adjust the original GAX-style quantities by an extra regression step that predicts player participation given the observed features.

If this is right

The residualized metrics support inference on whether individual players exert a positive directional effect on outcomes such as goals or shot success.
The same construction applies directly to goalkeeper save evaluation, basketball shooting skill, quarterback passing accuracy, and soccer player injury proneness.
Player-specific effect estimates become interpretable within semiparametric regression models that separate the contribution of each athlete from the baseline expectation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Analysts could use the framework to attach confidence intervals to player rankings, reducing the risk of overvaluing short-term luck in contract or lineup decisions.
The residualization idea may transfer to other observational domains where flexible models are used to benchmark individual performance against a population baseline.
Extensions could examine finite-sample coverage of the resulting confidence intervals under realistic sports-data sparsity.

Load-bearing premise

The chosen nuisance estimators for the expected outcome and for player involvement must be consistent at the rates required by double machine learning and must satisfy Neyman orthogonality after residualization.

What would settle it

In a large soccer dataset where the expected-goal model is known to be correctly specified, the residualized GAX statistic for a player with no true effect should be statistically indistinguishable from zero at conventional significance levels; systematic rejection would indicate that the regularity conditions fail in practice.

Figures

Figures reproduced from arXiv: 2509.20083 by Lucas Kook, Robert Bajons.

**Figure 1.** Figure 1: A snapshot of the data and the most important features used to compute xG models, GAX and rGAX. Premier League, and Serie A) provided by Hudl-Statsbomb and obtained via the StatsbombR R package [Yam, 2025]. These data comprise all soccer events captured during each game, where a soccer event is defined as any on-ball action performed by players, such as passes, dribbles, shots, crosses, etc. In particular,… view at source ↗

**Figure 2.** Figure 2: Goals and residualized goals above expectation plots for the 2015/16 season of the 5 big European soccer leagues. A: Scatterplot of empirical GAX and rGAX. The solid line indicates the identity. The correlation coefficient R is added to the plot. B: Player-wise empirical GAX and rGAX with one-sided 95% confidence intervals for rGAX for the top 10 players with respect to empirical rGAX and 5 selected well-k… view at source ↗

**Figure 3.** Figure 3: Player-wise empirical GAX and rGAX computed from three different xG models for the top 10 players with respect to empirical rGAX and 5 selected well-known players. The 95% confidence intervals for rGAX from the model using all data are shown. other aspects determining outstanding soccer players. On the other hand, there are a number of potential practical considerations which relate to the underlying assum… view at source ↗

**Figure 4.** Figure 4: Scatterplots of empirical GAX and rGAX values computed from different models. A: Scatterplot for empirical rGAX from an xG model computed with all data available against empirical rGAX from a model using only 2015/16 data. B: Scatterplot for empirical GAX from an xG model computed with all data available against empirical GAX from a model using only 2015/16 data C: Scatterplot for empirical rGAX from an xG… view at source ↗

**Figure 5.** Figure 5: Goals saved and residualized goals saved above expectation plots for the 2015/16 season of the 5 big European soccer leagues. A: Scatterplot of empirical GSAX and rGSAX. The solid line indicates the identity. The correlation coefficient R is added to the plot. B: Player-wise empirical GSAX and rGSAX with one-sided 95% confidence intervals for rGAX for the top 10 players with respect to empirical rGSAX. dif… view at source ↗

**Figure 6.** Figure 6: Goals and residualized goals above expectation plots for the 2015/16 season of the 5 big European soccer leagues using two different models for the regression of X on Z. A: Scatterplot of empirical GAX and rGAX using an untuned random forest. The solid line indicates the identity. The correlation coefficient R is added to the plot. B: Player-wise empirical GAX and rGAX from an untuned random forest with on… view at source ↗

**Figure 7.** Figure 7: Scatterplot of empirical rGAX values as obtained from different models for the regression of X on Z. A: Scatterplot of empirical rGAX from an untuned random forest and empirical rGAX from a tuned random forest. B: Scatterplot of empirical rGAX from an tuned xgboost model and empirical rGAX from a tuned random forest. R = 0.964 -100 0 100 200 300 -100 0 100 200 300 empricial rqSI empricial qSI p-value ≤ 0.0… view at source ↗

**Figure 8.** Figure 8: rqSI and qSI for the 2022/23 NBA seasons using a shot indicator as outcome (0 or 1). A: Scatterplot of empirical qSI and rqSI. The solid line indicates the identity. The correlation coefficient R is added to the plot. B: Player-wise empirical qSI and rqSI with one-sided 95% confidence interval for rqSI for the top 15 players with respect to empirical rqSI. where Y is again the outcome of a shot, X is a pla… view at source ↗

**Figure 9.** Figure 9: rqSI and qSI for the 2022/23 NBA seasons using the score value as outcome (0,2, or 3). A: Scatterplot of empirical qSI and rqSI. The solid line indicates the identity. The correlation coefficient R is added to the plot. B: Player-wise empirical qSI and rqSI with one-sided 95% confidence interval for rqSI for the top 15 players with respect to empirical rqSI. this season via the R package hoopR [Gilani, 202… view at source ↗

**Figure 10.** Figure 10: Comparison of rqSI values and GCM test statistic when using different outcomes. A: Scatterplot of empirical rqSI when using score indicator outcome and score value outcome. B: Scatterplot of the empirical GCM test statistic when using score indicator outcome and score value outcome. The solid line in B indicates the identity. The correlation coefficient R is added to both plots. above expectation (CPAE, s… view at source ↗

**Figure 11.** Figure 11: rCPAE and CPAE for the 2022/23 NFL seasons. A: Scatterplot of empirical rCPAE and CPAE. The solid line indicates the identity. The correlation coefficient R is added to the plot. B: Player-wise empirical rCPAE and CPAE with one-sided 95% confidence interval for rCPAE for the top 15 players with respect to empirical rCPAE. Besides Y , we also observe the indicator random variable δ := 1(Y ∗ ≤ C), which ind… view at source ↗

**Figure 12.** Figure 12: Events and residualized events above expectation plots for time to first injury in the Liverpool F.C. data. A: Scatterplot of the empirical IAX and rIAX. The solid line indicates the identity. B: Player-wise empirical IAX and rIAX with 95% confidence intervals for rIAX. C: Feature-specific rIAX with 95% confidence intervals. For the survival regression, a random survival forest was used. For the feature r… view at source ↗

read the original abstract

A popular quantitative approach to evaluating player performance in sports involves comparing an observed outcome to the expected outcome ignoring player involvement, which is estimated using statistical or machine learning methods. In soccer, for instance, goals above expectation (GAX) of a player measure how often shots of this player led to a goal compared to the model-derived expected outcome of the shots. Typically, sports data analysts rely on flexible machine learning models, which are capable of handling complex nonlinear effects and feature interactions, but fail to provide valid statistical inference due to finite-sample bias and slow convergence rates. In this paper, we close this gap by presenting a framework for player evaluation with metrics derived from differences in actual and expected outcomes using flexible machine learning algorithms, which nonetheless allows for valid frequentist inference. We first show that the commonly used metrics are directly related to Rao's score test in parametric regression models for the expected outcome. Motivated by this finding and recent developments in double machine learning, we then propose the use of residualized versions of the original metrics. For GAX, the residualization step corresponds to an additional regression predicting whether a given player would take the shot under the circumstances described by the features. We further relate metrics in the proposed framework to player-specific effect estimates in interpretable semiparametric regression models, allowing us to infer directional effects, e.g., to determine players that have a positive impact on the outcome. Our primary use case are GAX in soccer. We further apply our framework to evaluate goal-stopping ability of goalkeepers, shooting skill in basketball, quarterback passing skill in American football, and injury-proneness of soccer players.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links GAX-style metrics to Rao's score test and adds double ML residualization for inference with flexible models, but the key rate conditions need explicit checks in sports data.

read the letter

The main thing to know is that this work shows how common outcome-minus-expectation metrics relate to Rao's score test and then uses double ML residualization, including an auxiliary shot-probability regression for GAX, to claim valid frequentist inference even when the expected-outcome model is a flexible ML fit. That combination is the actual new piece rather than a routine extension of prior sports analytics work. It also connects the adjusted metrics to player-specific effects in semiparametric models, which lets you talk about directional impacts without losing the ML flexibility. The multi-sport examples (soccer GAX, goalkeepers, basketball, quarterbacks, injuries) show the framework is meant to be usable beyond one setting. That part is useful and earns credit for trying to bridge the gap between flexible modeling and inference. The soft spots sit around the double ML regularity conditions. The stress-test note is right to flag that the shot-taking nuisance model faces player heterogeneity and selection in typical sports datasets; if those estimators do not converge faster than n^{-1/4}, the orthogonality guarantee does not hold and the reported standard errors lose their justification. The abstract sketches the motivation but does not include derivations, simulations, or rate diagnostics, so it is not yet clear whether the claims survive in the data regimes the authors actually use. This is for sports analysts who already run ML expected-value models and want to add inference without switching to rigid parametric forms. A reader comfortable with semiparametrics and double ML will see the most value. The paper shows clear thinking on its own terms and deserves a serious referee, even if the revision list will include simulation studies and explicit nuisance-rate verification.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a framework for player evaluation in sports that compares observed outcomes to expected outcomes estimated via flexible machine learning models. It shows that standard metrics such as goals above expectation (GAX) are related to Rao's score test, then proposes residualized versions of these metrics motivated by double machine learning to restore valid frequentist inference. The residualization for GAX includes an auxiliary regression for shot-taking probability. The framework is further connected to player-specific effects in semiparametric models and is demonstrated on soccer GAX, goalkeeper performance, basketball shooting, American football passing, and soccer injury proneness.

Significance. If the inference claims hold after residualization, the work would offer a practical advance in sports analytics by permitting complex ML models for expected-outcome estimation while supplying valid standard errors and directional effect tests. The explicit links to Rao's score test and double ML provide theoretical grounding that is often missing in applied sports metrics, and the multi-sport applications illustrate generality.

major comments (2)

[§3.2] §3.2 (residualized GAX construction): the validity of frequentist inference after residualization rests on the double ML nuisance estimators (shot probability model and outcome model) satisfying the n^{-1/4} rate and Neyman orthogonality conditions. The manuscript invokes these conditions but provides neither simulation verification under sports-data regimes (modest per-player samples, high-dimensional covariates, player heterogeneity) nor empirical rate diagnostics; without this, the reported standard errors for residualized GAX lack guaranteed coverage.
[§4.1] §4.1 (semiparametric interpretation): the claim that residualized metrics correspond to player-specific coefficients in a partially linear model requires explicit derivation of the equivalence, including the precise form of the player indicator and the orthogonality condition after residualization. The current sketch leaves open whether the estimator remains consistent when the player-shot indicator is itself high-dimensional or sparse.

minor comments (2)

The abstract and introduction would benefit from a concise statement of the exact regularity conditions invoked from the double ML literature (e.g., Chernozhukov et al.).
[Figure 1] Figure 1 (GAX comparison): axis labels and legend should explicitly distinguish raw versus residualized versions to avoid reader confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which have prompted us to strengthen the theoretical and empirical support in the manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [§3.2] §3.2 (residualized GAX construction): the validity of frequentist inference after residualization rests on the double ML nuisance estimators (shot probability model and outcome model) satisfying the n^{-1/4} rate and Neyman orthogonality conditions. The manuscript invokes these conditions but provides neither simulation verification under sports-data regimes (modest per-player samples, high-dimensional covariates, player heterogeneity) nor empirical rate diagnostics; without this, the reported standard errors for residualized GAX lack guaranteed coverage.

Authors: We agree that additional verification tailored to sports data regimes would enhance confidence in the finite-sample properties. While the double ML framework supplies asymptotic guarantees under the n^{-1/4} rate and Neyman orthogonality conditions, we acknowledge the referee's point that explicit checks are valuable when per-player samples are modest and covariates are high-dimensional. In the revised manuscript we will add a dedicated simulation study that generates data under realistic sports regimes (limited shots per player, high-dimensional features, and player heterogeneity) and reports empirical coverage of the residualized GAX standard errors together with diagnostics for nuisance estimator convergence rates. revision: yes
Referee: [§4.1] §4.1 (semiparametric interpretation): the claim that residualized metrics correspond to player-specific coefficients in a partially linear model requires explicit derivation of the equivalence, including the precise form of the player indicator and the orthogonality condition after residualization. The current sketch leaves open whether the estimator remains consistent when the player-shot indicator is itself high-dimensional or sparse.

Authors: We thank the referee for requesting a more explicit derivation. In the revised Section 4.1 we will supply the full equivalence proof: consider the partially linear model Y = m(X) + θ D + ε where D is the binary player (or shot-taking) indicator and m(X) is estimated by machine learning. The residualized metric is exactly the Neyman-orthogonal score for θ obtained by regressing the outcome residual on the player-indicator residual. We will show that the orthogonality condition holds after double residualization and that the resulting estimator for θ is consistent and asymptotically normal. For the high-dimensional or sparse case we will clarify that, provided the nuisance functions are estimated at the required rate (e.g., via lasso or other sparse methods) and the number of players grows appropriately with sample size, consistency is preserved; we will add a short discussion of these conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external DML and score-test identity

full rationale

The paper first relates standard GAX-style metrics to Rao's score test via a direct algebraic connection in parametric models (a known statistical identity, not a self-definition). It then introduces residualized versions explicitly motivated by the external double machine learning literature (Chernozhukov et al. and follow-ups), with the auxiliary regression for shot probability presented as an application of Neyman orthogonality rather than a redefinition of the target metric. No self-citation chain, fitted parameter renamed as prediction, or ansatz smuggled via prior work by the same authors appears in the provided abstract or derivation outline. The central claim of valid frequentist inference therefore rests on independent external theory plus the paper's own residualization step, leaving the chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the regularity conditions of double machine learning and on the validity of the score-test equivalence for the expected-outcome model; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Double machine learning regularity conditions (consistent nuisance estimation at appropriate rates and Neyman orthogonality after residualization) hold for the chosen ML estimators.
Invoked when the authors propose residualized versions motivated by recent developments in double machine learning.

pith-pipeline@v0.9.0 · 5824 in / 1352 out tokens · 33103 ms · 2026-05-22T11:59:51.088870+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose to use empirical residualized GAX (rGAX) := sum (Yi - bh(Zi))(Xi - bf(Zi)), a scaled version of the sample GCM

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

[1]

D. G. Altman and J. M. Bland. Statistics notes: Absence of evidence is not evidence of absence. BMJ, 311 0 (7003): 0 485--485, 1995. doi:10.1136/bmj.311.7003.485

work page doi:10.1136/bmj.311.7003.485 1995
[2]

Anzer and P

G. Anzer and P. Bauer. A Goal Scoring Probability Model for Shots Based on Synchronized Positional and Event Data in Football (Soccer) . Frontiers in Sports and Active Living, 3: 0 53, 2021. doi:10.3389/fspor.2021.624475

work page doi:10.3389/fspor.2021.624475 2021
[3]

Baron, N

E. Baron, N. Sandholtz, D. Pleuler, and T. C. Y. Chan. Miss it like Messi : Extracting value from off-target shots in soccer. Journal of Quantitative Analysis in Sports, 20 0 (1): 0 37--50, 2024. doi:10.1515/jqas-2022-0107

work page doi:10.1515/jqas-2022-0107 2024
[4]

B. S. Baumer, G. J. Matthews, and Q. Nguyen. Big ideas in sports analytics and statistical tools for their investigation. WIREs Computational Statistics, 15 0 (6): 0 e1612, 2023. doi:10.1002/wics.1612

work page doi:10.1002/wics.1612 2023
[5]

Bender and S

R. Bender and S. Lange. Adjusting for multiple testing—when and how? Journal of Clinical Epidemiology, 54 0 (4): 0 343--349, 2001. doi:10.1016/s0895-4356(00)00314-0

work page doi:10.1016/s0895-4356(00)00314-0 2001
[6]

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing , volume =

Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 57 0 (1): 0 289--300, 1995. doi:10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995
[7]

L. Breiman. Random forests. Machine Learning, 45 0 (1): 0 5--32, 2001. doi:10.1023/A:1010933404324

work page doi:10.1023/a:1010933404324 2001
[8]

R. S. Brill, R. Yee, S. K. Deshpande, and A. J. Wyner. Moving from machine learning to statistics: the case of expected points in american football, 2024. URL https://arxiv.org/abs/2409.04889

work page arXiv 2024
[9]

Carl and B

S. Carl and B. Baldwin. nflfastR : Functions to Efficiently Access NFL Play by Play Data , 2024. URL https://CRAN.R-project.org/package=nflfastR. R package version 5.0.0

work page 2024
[10]

Y. H. Chang, R. Maheswaran, J. Su, S. Kwok, T. Levy, A. Wexler, and K. Squire. Quantifying Shot Quality in the NBA . In Proceedings of the 2014 MIT Sloan Sports Analytics Conference, 2014

work page 2014
[11]

T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, K. Chen, R. Mitchell, I. Cano, T. Zhou, M. Li, J. Xie, M. Lin, Y. Geng, Y. Li, and J. Yuan. xgboost: Extreme Gradient Boosting , 2025. URL https://CRAN.R-project.org/package=xgboost. R package version 1.7.9.1

work page 2025
[12]

Double/debiased/neyman machine learning of treatment effects

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and W. Newey. Double/Debiased/Neyman Machine Learning of Treatment Effects . American Economic Review, 107 0 (5): 0 261--65, 2017. doi:10.1257/aer.p20171038

work page doi:10.1257/aer.p20171038 2017
[13]

A. M. Christgau, L. Petersen, and N. R. Hansen. Nonparametric Conditional Local Independence Testing . The Annals of Statistics, 51 0 (5): 0 2116--2144, 2023. doi:10.1214/23-AOS2323

work page doi:10.1214/23-aos2323 2023
[14]

Cinelli, A

C. Cinelli, A. Forney, and J. Pearl. A crash course in good and bad controls. Sociological Methods & Research, 53 0 (3): 0 1071--1104, 2024. doi:10.1177/00491241221099552

work page doi:10.1177/00491241221099552 2024
[15]

Corsaro, G

S. Corsaro, G. Dello Ioio, and Z. Marino. The evaluation of football players: an in-depth look at the Expected Goal metric . Annals of Operations Research, 2025. doi:10.1007/s10479-025-06606-8

work page doi:10.1007/s10479-025-06606-8 2025
[16]

Daly-Grafstein and L

D. Daly-Grafstein and L. Bornn. Rao-Blackwellizing field goal percentage . Journal of Quantitative Analysis in Sports, 15 0 (2): 0 85--95, 2019. doi:doi:10.1515/jqas-2018-0064

work page doi:10.1515/jqas-2018-0064 2019
[17]

Davis and P

J. Davis and P. Robberechts. Expected Metrics as a Measure of Skill: Reflections on Finishing in Soccer . In Workshop on Machine Learning and Data Mining for Sports Analytics at ECML/PKDD 2023, MLSA. Springer, 2023

work page 2023
[18]

Davis and P

J. Davis and P. Robberechts. Biases in Expected Goals Models Confound Finishing Ability , 2024. URL https://arxiv.org/abs/2401.09940

work page arXiv 2024
[19]

Davis, L

J. Davis, L. Bransen, L. Devos, A. Jaspers, W. Meert, P. Robberechts, J. Van Haaren, and M. Van Roy. Methodology and evaluation in sports analytics: challenges, approaches, and lessons learned. Machine Learning, 113 0 (9): 0 6977--7010, 2024. doi:10.1007/s10994-024-06585-0

work page doi:10.1007/s10994-024-06585-0 2024
[20]

A. P. Dawid. Conditional Independence in Statistical Theory . Journal of the Royal Statistical Society B, 41 0 (1): 0 1--15, 1979. doi:10.1111/j.2517-6161.1979.tb01052.x

work page doi:10.1111/j.2517-6161.1979.tb01052.x 1979
[21]

Fern \'a ndez, L

J. Fern \'a ndez, L. Bornn, and D. Cervone. A framework for the fine-grained evaluation of the instantaneous expected value of soccer possessions. Machine Learning, 110 0 (6): 0 1389--1427, 2021. doi:10.1007/s10994-021-05989-6

work page doi:10.1007/s10994-021-05989-6 2021
[22]

Fern \'a ndez-Delgado, E

M. Fern \'a ndez-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15 0 (90): 0 3133--3181, 2014. URL http://jmlr.org/papers/v15/delgado14a.html

work page 2014
[23]

S. Gilani. hoopR : Access Men's Basketball Play by Play Data , 2023. URL https://CRAN.R-project.org/package=hoopR. R package version 2.1.0

work page 2023
[24]

Groll, C

A. Groll, C. Ley, G. Schauberger, and H. V. Eetvelde. A hybrid random forest to predict soccer matches in international tournaments. Journal of Quantitative Analysis in Sports, 15 0 (4): 0 271--287, 2019. doi:doi:10.1515/jqas-2018-0060

work page doi:10.1515/jqas-2018-0060 2019
[25]

J. H. Hewitt and O. Karakuş. A machine learning approach for player and position adjusted expected goals in football (soccer). Franklin Open, 4: 0 100034, 2023. doi:10.1016/j.fraope.2023.100034

work page doi:10.1016/j.fraope.2023.100034 2023
[26]

Hothorn, K

T. Hothorn, K. Hornik, M. A. van de Wiel , and A. Zeileis. Implementing a class of permutation tests: The coin package. Journal of Statistical Software, 28 0 (8): 0 1--23, 2008. doi:10.18637/jss.v028.i08

work page doi:10.18637/jss.v028.i08 2008
[27]

Karlis and I

D. Karlis and I. Ntzoufras. Analysis of sports data by using bivariate Poisson models . Journal of the Royal Statistical Society: Series D (The Statistician), 52 0 (3): 0 381--393, 2003. doi:https://doi.org/10.1111/1467-9884.00366

work page doi:10.1111/1467-9884.00366 2003
[28]

E. H. Kennedy. Semiparametric doubly robust targeted double machine learning: a review. In Handbook of statistical methods for precision medicine, pages 207--236. Chapman and Hall/CRC, 2024. doi:10.1201/9781003216223

work page doi:10.1201/9781003216223 2024
[29]

J. P. Klein, H. C. Van Houwelingen, J. G. Ibrahim, and T. H. Scheike. Handbook of survival analysis. CRC Press Boca Raton, 2014

work page 2014
[30]

Kook and A

L. Kook and A. R. Lundborg. Algorithm-agnostic significance testing in supervised learning with multimodal data. Briefings in Bioinformatics, 25 0 (6), 2024. doi:10.1093/bib/bbae475

work page doi:10.1093/bib/bbae475 2024
[31]

L. Kook, S. Saengkyongam, A. R. Lundborg, T. Hothorn, and J. Peters. Model-based causal feature selection for general response types. Journal of the American Statistical Association, 120 0 (550): 0 1090--1101, 2025. doi:10.1080/01621459.2024.2395588

work page doi:10.1080/01621459.2024.2395588 2025
[32]

Metulini and M

R. Metulini and M. L. Carre. Measuring sport performances under pressure by classification trees with application to basketball shooting. Journal of Applied Statistics, 47 0 (12): 0 2120--2135, 2020. doi:10.1080/02664763.2019.1704702

work page doi:10.1080/02664763.2019.1704702 2020
[33]

Nunnally

J. Nunnally. The place of statistics in psychology. Educational and Psychological Measurement, 20 0 (4): 0 641--650, 1960. doi:10.1177/001316446002000401

work page doi:10.1177/001316446002000401 1960
[34]

Pollard and C

R. Pollard and C. Reep. Measuring the effectiveness of playing strategies at soccer. Journal of the Royal Statistical Society: Series D (The Statistician), 46 0 (4): 0 541--550, 1997. doi:10.1111/1467-9884.00108

work page doi:10.1111/1467-9884.00108 1997
[35]

R: A Language and Environment for Statistical Computing

R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2025. URL https://www.R-project.org/

work page 2025
[36]

C. R. Rao. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44 0 (1): 0 50–57, 1948. doi:10.1017/S0305004100023987

work page doi:10.1017/s0305004100023987 1948
[37]

Robberechts and J

P. Robberechts and J. Davis. How Data Availability Affects the Ability to Learn Good xG Models . In U. Brefeld, J. Davis, J. Van Haaren, and A. Zimmermann, editors, Machine Learning and Data Mining for Sports Analytics, pages 17--27, Cham, 2020. Springer International Publishing

work page 2020
[38]

Scarf, A

P. Scarf, A. Khare, and N. Alotaibi. On skill and chance in sport. IMA Journal of Management Mathematics, 33 0 (1): 0 53--73, 2021. doi:10.1093/imaman/dpab026

work page doi:10.1093/imaman/dpab026 2021
[39]

R. D. Shah and J. Peters. The hardness of conditional independence testing and the generalised covariance measure . The Annals of Statistics, 48 0 (3): 0 1514 -- 1538, 2020. doi:10.1214/19-AOS1857

work page doi:10.1214/19-aos1857 2020
[40]

T. M. Therneau, P. M. Grambsch, and T. R. Fleming. Martingale-based residuals for survival models. Biometrika, 77 0 (1): 0 147--160, 1990. ISSN 1464-3510. doi:10.1093/biomet/77.1.147

work page doi:10.1093/biomet/77.1.147 1990
[41]

Vansteelandt and O

S. Vansteelandt and O. Dukes. Assumption-lean Inference for Generalised Linear Model Parameters . Journal of the Royal Statistical Society Series B: Statistical Methodology, 84 0 (3): 0 657--685, 2022. doi:10.1111/rssb.12504

work page doi:10.1111/rssb.12504 2022
[42]

M. N. Wright and A. Ziegler. ranger : A fast implementation of random forests for high dimensional data in C++ and R . Journal of Statistical Software, 77 0 (1): 0 1--17, 2017. doi:10.18637/jss.v077.i01

work page doi:10.18637/jss.v077.i01 2017
[43]

D. Yam. StatsBombR : Cleans and pulls StatsBomb data from the API , 2025. URL https://github.com/statsbomb/StatsBombR. R package version 0.1.0

work page 2025
[44]

Zumeta Olaskoaga

L. Zumeta Olaskoaga . injurytools : A Toolkit for Sports Injury Data Analysis , 2023. URL https://CRAN.R-project.org/package=injurytools. R package version 1.0.3

work page 2023

[1] [1]

D. G. Altman and J. M. Bland. Statistics notes: Absence of evidence is not evidence of absence. BMJ, 311 0 (7003): 0 485--485, 1995. doi:10.1136/bmj.311.7003.485

work page doi:10.1136/bmj.311.7003.485 1995

[2] [2]

Anzer and P

G. Anzer and P. Bauer. A Goal Scoring Probability Model for Shots Based on Synchronized Positional and Event Data in Football (Soccer) . Frontiers in Sports and Active Living, 3: 0 53, 2021. doi:10.3389/fspor.2021.624475

work page doi:10.3389/fspor.2021.624475 2021

[3] [3]

Baron, N

E. Baron, N. Sandholtz, D. Pleuler, and T. C. Y. Chan. Miss it like Messi : Extracting value from off-target shots in soccer. Journal of Quantitative Analysis in Sports, 20 0 (1): 0 37--50, 2024. doi:10.1515/jqas-2022-0107

work page doi:10.1515/jqas-2022-0107 2024

[4] [4]

B. S. Baumer, G. J. Matthews, and Q. Nguyen. Big ideas in sports analytics and statistical tools for their investigation. WIREs Computational Statistics, 15 0 (6): 0 e1612, 2023. doi:10.1002/wics.1612

work page doi:10.1002/wics.1612 2023

[5] [5]

Bender and S

R. Bender and S. Lange. Adjusting for multiple testing—when and how? Journal of Clinical Epidemiology, 54 0 (4): 0 343--349, 2001. doi:10.1016/s0895-4356(00)00314-0

work page doi:10.1016/s0895-4356(00)00314-0 2001

[6] [6]

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing , volume =

Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 57 0 (1): 0 289--300, 1995. doi:10.1111/j.2517-6161.1995.tb02031.x

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995

[7] [7]

L. Breiman. Random forests. Machine Learning, 45 0 (1): 0 5--32, 2001. doi:10.1023/A:1010933404324

work page doi:10.1023/a:1010933404324 2001

[8] [8]

R. S. Brill, R. Yee, S. K. Deshpande, and A. J. Wyner. Moving from machine learning to statistics: the case of expected points in american football, 2024. URL https://arxiv.org/abs/2409.04889

work page arXiv 2024

[9] [9]

Carl and B

S. Carl and B. Baldwin. nflfastR : Functions to Efficiently Access NFL Play by Play Data , 2024. URL https://CRAN.R-project.org/package=nflfastR. R package version 5.0.0

work page 2024

[10] [10]

Y. H. Chang, R. Maheswaran, J. Su, S. Kwok, T. Levy, A. Wexler, and K. Squire. Quantifying Shot Quality in the NBA . In Proceedings of the 2014 MIT Sloan Sports Analytics Conference, 2014

work page 2014

[11] [11]

T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, K. Chen, R. Mitchell, I. Cano, T. Zhou, M. Li, J. Xie, M. Lin, Y. Geng, Y. Li, and J. Yuan. xgboost: Extreme Gradient Boosting , 2025. URL https://CRAN.R-project.org/package=xgboost. R package version 1.7.9.1

work page 2025

[12] [12]

Double/debiased/neyman machine learning of treatment effects

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and W. Newey. Double/Debiased/Neyman Machine Learning of Treatment Effects . American Economic Review, 107 0 (5): 0 261--65, 2017. doi:10.1257/aer.p20171038

work page doi:10.1257/aer.p20171038 2017

[13] [13]

A. M. Christgau, L. Petersen, and N. R. Hansen. Nonparametric Conditional Local Independence Testing . The Annals of Statistics, 51 0 (5): 0 2116--2144, 2023. doi:10.1214/23-AOS2323

work page doi:10.1214/23-aos2323 2023

[14] [14]

Cinelli, A

C. Cinelli, A. Forney, and J. Pearl. A crash course in good and bad controls. Sociological Methods & Research, 53 0 (3): 0 1071--1104, 2024. doi:10.1177/00491241221099552

work page doi:10.1177/00491241221099552 2024

[15] [15]

Corsaro, G

S. Corsaro, G. Dello Ioio, and Z. Marino. The evaluation of football players: an in-depth look at the Expected Goal metric . Annals of Operations Research, 2025. doi:10.1007/s10479-025-06606-8

work page doi:10.1007/s10479-025-06606-8 2025

[16] [16]

Daly-Grafstein and L

D. Daly-Grafstein and L. Bornn. Rao-Blackwellizing field goal percentage . Journal of Quantitative Analysis in Sports, 15 0 (2): 0 85--95, 2019. doi:doi:10.1515/jqas-2018-0064

work page doi:10.1515/jqas-2018-0064 2019

[17] [17]

Davis and P

J. Davis and P. Robberechts. Expected Metrics as a Measure of Skill: Reflections on Finishing in Soccer . In Workshop on Machine Learning and Data Mining for Sports Analytics at ECML/PKDD 2023, MLSA. Springer, 2023

work page 2023

[18] [18]

Davis and P

J. Davis and P. Robberechts. Biases in Expected Goals Models Confound Finishing Ability , 2024. URL https://arxiv.org/abs/2401.09940

work page arXiv 2024

[19] [19]

Davis, L

J. Davis, L. Bransen, L. Devos, A. Jaspers, W. Meert, P. Robberechts, J. Van Haaren, and M. Van Roy. Methodology and evaluation in sports analytics: challenges, approaches, and lessons learned. Machine Learning, 113 0 (9): 0 6977--7010, 2024. doi:10.1007/s10994-024-06585-0

work page doi:10.1007/s10994-024-06585-0 2024

[20] [20]

A. P. Dawid. Conditional Independence in Statistical Theory . Journal of the Royal Statistical Society B, 41 0 (1): 0 1--15, 1979. doi:10.1111/j.2517-6161.1979.tb01052.x

work page doi:10.1111/j.2517-6161.1979.tb01052.x 1979

[21] [21]

Fern \'a ndez, L

J. Fern \'a ndez, L. Bornn, and D. Cervone. A framework for the fine-grained evaluation of the instantaneous expected value of soccer possessions. Machine Learning, 110 0 (6): 0 1389--1427, 2021. doi:10.1007/s10994-021-05989-6

work page doi:10.1007/s10994-021-05989-6 2021

[22] [22]

Fern \'a ndez-Delgado, E

M. Fern \'a ndez-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15 0 (90): 0 3133--3181, 2014. URL http://jmlr.org/papers/v15/delgado14a.html

work page 2014

[23] [23]

S. Gilani. hoopR : Access Men's Basketball Play by Play Data , 2023. URL https://CRAN.R-project.org/package=hoopR. R package version 2.1.0

work page 2023

[24] [24]

Groll, C

A. Groll, C. Ley, G. Schauberger, and H. V. Eetvelde. A hybrid random forest to predict soccer matches in international tournaments. Journal of Quantitative Analysis in Sports, 15 0 (4): 0 271--287, 2019. doi:doi:10.1515/jqas-2018-0060

work page doi:10.1515/jqas-2018-0060 2019

[25] [25]

J. H. Hewitt and O. Karakuş. A machine learning approach for player and position adjusted expected goals in football (soccer). Franklin Open, 4: 0 100034, 2023. doi:10.1016/j.fraope.2023.100034

work page doi:10.1016/j.fraope.2023.100034 2023

[26] [26]

Hothorn, K

T. Hothorn, K. Hornik, M. A. van de Wiel , and A. Zeileis. Implementing a class of permutation tests: The coin package. Journal of Statistical Software, 28 0 (8): 0 1--23, 2008. doi:10.18637/jss.v028.i08

work page doi:10.18637/jss.v028.i08 2008

[27] [27]

Karlis and I

D. Karlis and I. Ntzoufras. Analysis of sports data by using bivariate Poisson models . Journal of the Royal Statistical Society: Series D (The Statistician), 52 0 (3): 0 381--393, 2003. doi:https://doi.org/10.1111/1467-9884.00366

work page doi:10.1111/1467-9884.00366 2003

[28] [28]

E. H. Kennedy. Semiparametric doubly robust targeted double machine learning: a review. In Handbook of statistical methods for precision medicine, pages 207--236. Chapman and Hall/CRC, 2024. doi:10.1201/9781003216223

work page doi:10.1201/9781003216223 2024

[29] [29]

J. P. Klein, H. C. Van Houwelingen, J. G. Ibrahim, and T. H. Scheike. Handbook of survival analysis. CRC Press Boca Raton, 2014

work page 2014

[30] [30]

Kook and A

L. Kook and A. R. Lundborg. Algorithm-agnostic significance testing in supervised learning with multimodal data. Briefings in Bioinformatics, 25 0 (6), 2024. doi:10.1093/bib/bbae475

work page doi:10.1093/bib/bbae475 2024

[31] [31]

L. Kook, S. Saengkyongam, A. R. Lundborg, T. Hothorn, and J. Peters. Model-based causal feature selection for general response types. Journal of the American Statistical Association, 120 0 (550): 0 1090--1101, 2025. doi:10.1080/01621459.2024.2395588

work page doi:10.1080/01621459.2024.2395588 2025

[32] [32]

Metulini and M

R. Metulini and M. L. Carre. Measuring sport performances under pressure by classification trees with application to basketball shooting. Journal of Applied Statistics, 47 0 (12): 0 2120--2135, 2020. doi:10.1080/02664763.2019.1704702

work page doi:10.1080/02664763.2019.1704702 2020

[33] [33]

Nunnally

J. Nunnally. The place of statistics in psychology. Educational and Psychological Measurement, 20 0 (4): 0 641--650, 1960. doi:10.1177/001316446002000401

work page doi:10.1177/001316446002000401 1960

[34] [34]

Pollard and C

R. Pollard and C. Reep. Measuring the effectiveness of playing strategies at soccer. Journal of the Royal Statistical Society: Series D (The Statistician), 46 0 (4): 0 541--550, 1997. doi:10.1111/1467-9884.00108

work page doi:10.1111/1467-9884.00108 1997

[35] [35]

R: A Language and Environment for Statistical Computing

R Core Team . R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2025. URL https://www.R-project.org/

work page 2025

[36] [36]

C. R. Rao. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 44 0 (1): 0 50–57, 1948. doi:10.1017/S0305004100023987

work page doi:10.1017/s0305004100023987 1948

[37] [37]

Robberechts and J

P. Robberechts and J. Davis. How Data Availability Affects the Ability to Learn Good xG Models . In U. Brefeld, J. Davis, J. Van Haaren, and A. Zimmermann, editors, Machine Learning and Data Mining for Sports Analytics, pages 17--27, Cham, 2020. Springer International Publishing

work page 2020

[38] [38]

Scarf, A

P. Scarf, A. Khare, and N. Alotaibi. On skill and chance in sport. IMA Journal of Management Mathematics, 33 0 (1): 0 53--73, 2021. doi:10.1093/imaman/dpab026

work page doi:10.1093/imaman/dpab026 2021

[39] [39]

R. D. Shah and J. Peters. The hardness of conditional independence testing and the generalised covariance measure . The Annals of Statistics, 48 0 (3): 0 1514 -- 1538, 2020. doi:10.1214/19-AOS1857

work page doi:10.1214/19-aos1857 2020

[40] [40]

T. M. Therneau, P. M. Grambsch, and T. R. Fleming. Martingale-based residuals for survival models. Biometrika, 77 0 (1): 0 147--160, 1990. ISSN 1464-3510. doi:10.1093/biomet/77.1.147

work page doi:10.1093/biomet/77.1.147 1990

[41] [41]

Vansteelandt and O

S. Vansteelandt and O. Dukes. Assumption-lean Inference for Generalised Linear Model Parameters . Journal of the Royal Statistical Society Series B: Statistical Methodology, 84 0 (3): 0 657--685, 2022. doi:10.1111/rssb.12504

work page doi:10.1111/rssb.12504 2022

[42] [42]

M. N. Wright and A. Ziegler. ranger : A fast implementation of random forests for high dimensional data in C++ and R . Journal of Statistical Software, 77 0 (1): 0 1--17, 2017. doi:10.18637/jss.v077.i01

work page doi:10.18637/jss.v077.i01 2017

[43] [43]

D. Yam. StatsBombR : Cleans and pulls StatsBomb data from the API , 2025. URL https://github.com/statsbomb/StatsBombR. R package version 0.1.0

work page 2025

[44] [44]

Zumeta Olaskoaga

L. Zumeta Olaskoaga . injurytools : A Toolkit for Sports Injury Data Analysis , 2023. URL https://CRAN.R-project.org/package=injurytools. R package version 1.0.3

work page 2023