Are penalty shootouts better than a coin toss? Evidence from international club football in Europe
Pith reviewed 2026-05-18 06:02 UTC · model grok-4.3
The pith
Penalty shootouts in European club football show no link to team strength or other factors and cannot be distinguished from a coin toss.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Based on all shootouts between 2000 and 2025, no evidence is found for the effect of the kicking order, the field of the match, or psychological momentum. In contrast to previous results, the authors detect no relationship between shootout success and relative team strength quantified by differences in Elo ratings and the implied winning probability. Thus the hypothesis that penalty shootouts are close to a coin toss in international competitions for European football clubs cannot be rejected.
What carries the argument
Statistical tests of historical shootout outcomes against Elo rating differences and binary indicators for kicking order, home/away status, and momentum.
If this is right
- Team strength measured before the shootout does not predict who advances.
- Kicking first or second confers no measurable advantage.
- Playing at home provides no detectable edge once the game reaches penalties.
- Any psychological momentum from earlier goals or saves does not affect the final result.
- The removal of the away-goals rule has increased the share of ties decided by a process that behaves like a fair coin.
Where Pith is reading between the lines
- Tournaments that rely more heavily on shootouts after 2021/22 are resolving more ties by chance.
- Coaches may gain little from special penalty-preparation tactics if outcomes remain unpredictable.
- Similar tests could be run on national-team shootouts to check whether club versus country contexts differ.
- If the pattern holds, rule changes that reduce extra-time duration would further amplify the role of randomness.
Load-bearing premise
That differences in Elo ratings give an accurate enough picture of which team should win a penalty shootout.
What would settle it
A new dataset of several hundred additional shootouts in which the team with the higher Elo rating wins significantly more often than 50 percent of the time.
read the original abstract
Penalty shootouts play a crucial role in the knockout stage of major football tournaments. Their importance has substantially increased from the 2021/22 season, when the Union of European Football Associations (UEFA) scrapped the away goals rule. Our paper examines whether the outcome of a penalty shootout can be predicted in UEFA club competitions. Based on all shootouts between 2000 and 2025, no evidence is found for the effect of the kicking order, the field of the match, or psychological momentum. In contrast to previous results, we do not detect any relationship between shootout success and relative team strength, quantified by differences in Elo ratings and the implied winning probability. Thus, the hypothesis that penalty shootouts are close to a coin toss in international competitions for European football clubs cannot be rejected.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts an empirical analysis of all penalty shootouts in UEFA club competitions from 2000 to 2025. It tests for dependence on kicking order, match venue, psychological momentum, and relative team strength (measured by Elo rating differences and implied win probabilities), reports no statistically significant associations for any factor, and concludes that the hypothesis of shootouts being equivalent to a coin toss cannot be rejected.
Significance. If robust, the results add to the sports economics literature by updating prior findings with a full recent census of high-stakes club matches and by failing to detect a strength-outcome link. This supports the interpretation of shootouts as largely random events, with potential implications for knockout tournament design and the away-goals rule change.
major comments (2)
- The central claim that outcomes are 'close to a coin toss' rests on interpreting the failure to reject the null as evidence of no meaningful effects. However, without a reported power analysis (e.g., in the methods or results section) showing adequate power to detect modest alternatives such as a 5–10 pp advantage for the stronger team or first-kicker, the null result is difficult to interpret as affirmative support for randomness.
- The use of overall Elo rating differences as the measure of relative strength for the penalty phase (reported in the main results tables) requires explicit justification. Teams reaching shootouts are selected from close matches, so realized Elo spreads may be small; moreover, overall Elo may correlate only weakly with penalty-specific skills, which would make the test uninformative about strength independence.
minor comments (2)
- Clarify the exact regression specifications, including any controls for match context and the handling of multiple testing across the four main hypotheses.
- Report exact sample sizes, number of shootouts per competition, and confidence intervals or standard errors alongside all null results to aid interpretation.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments. We respond to each major comment below and indicate the changes we will make in revision.
read point-by-point responses
-
Referee: The central claim that outcomes are 'close to a coin toss' rests on interpreting the failure to reject the null as evidence of no meaningful effects. However, without a reported power analysis (e.g., in the methods or results section) showing adequate power to detect modest alternatives such as a 5–10 pp advantage for the stronger team or first-kicker, the null result is difficult to interpret as affirmative support for randomness.
Authors: We agree that a power analysis would strengthen the interpretation of the null results. In the revised manuscript we will add a dedicated subsection in the methods section that reports ex-post power calculations for our main specifications. These calculations will show the minimum detectable effect sizes (at 80% power and 5% significance) for the first-kicker advantage, venue, momentum, and Elo-difference variables given our sample of 25 years of UEFA club shootouts. We expect this to confirm that the design has reasonable power to detect effects in the 5–10 percentage-point range. revision: yes
-
Referee: The use of overall Elo rating differences as the measure of relative strength for the penalty phase (reported in the main results tables) requires explicit justification. Teams reaching shootouts are selected from close matches, so realized Elo spreads may be small; moreover, overall Elo may correlate only weakly with penalty-specific skills, which would make the test uninformative about strength independence.
Authors: We will expand the data and methods section to provide the requested justification. We will report the distribution of Elo differences in the shootout sample to document that meaningful variation remains even after selection into close matches. We will note that overall Elo is the standard objective measure used in the sports-economics literature on football and has been shown to predict match outcomes well; we will also acknowledge that it is an imperfect proxy for penalty-specific ability and that any resulting attenuation would bias our estimates toward zero, which is consistent with the null findings we report. We will add a brief discussion of this limitation and its implications for interpreting the strength-independence result. revision: yes
Circularity Check
No significant circularity: direct empirical test on external historical data
full rationale
The paper conducts a statistical analysis of all UEFA club penalty shootouts from 2000 to 2025, testing for effects of kicking order, venue, momentum, and relative strength via Elo differences. The central result is a failure to reject the null that outcomes resemble a coin toss. No equations, derivations, or predictions are presented that reduce by construction to fitted parameters defined from the same data. No self-citation chains or uniqueness theorems are invoked as load-bearing premises. The analysis relies on external historical records and standard hypothesis testing, remaining self-contained against benchmarks outside the fitted values.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Observations from distinct shootouts are independent conditional on the measured covariates.
- domain assumption Elo rating differences capture the relevant dimension of team quality for penalty-shootout performance.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.