Are penalty shootouts better than a coin toss? Evidence from international club football in Europe

D\'ora Gr\'eta Petr\'oczy; L\'aszl\'o Csat\'o

arxiv: 2510.17641 · v6 · submitted 2025-10-20 · 💰 econ.GN · physics.soc-ph· q-fin.EC· stat.AP

Are penalty shootouts better than a coin toss? Evidence from international club football in Europe

L\'aszl\'o Csat\'o , D\'ora Gr\'eta Petr\'oczy This is my paper

Pith reviewed 2026-05-18 06:02 UTC · model grok-4.3

classification 💰 econ.GN physics.soc-phq-fin.ECstat.AP

keywords penalty shootoutsfootballsoccerElo ratingsUEFAknockout stagesrandomnesscoin toss

0 comments

The pith

Penalty shootouts in European club football show no link to team strength or other factors and cannot be distinguished from a coin toss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether outcomes of penalty shootouts in UEFA club competitions can be predicted from kicking order, match venue, psychological momentum, or relative team strength. Using every shootout from 2000 through 2025, the authors find none of these variables correlates with who wins. This matters because UEFA removed the away-goals rule in 2021/22, making shootouts more common in deciding advancement. If the results hold, then the final step of many knockout ties is effectively random rather than a test of overall quality.

Core claim

Based on all shootouts between 2000 and 2025, no evidence is found for the effect of the kicking order, the field of the match, or psychological momentum. In contrast to previous results, the authors detect no relationship between shootout success and relative team strength quantified by differences in Elo ratings and the implied winning probability. Thus the hypothesis that penalty shootouts are close to a coin toss in international competitions for European football clubs cannot be rejected.

What carries the argument

Statistical tests of historical shootout outcomes against Elo rating differences and binary indicators for kicking order, home/away status, and momentum.

If this is right

Team strength measured before the shootout does not predict who advances.
Kicking first or second confers no measurable advantage.
Playing at home provides no detectable edge once the game reaches penalties.
Any psychological momentum from earlier goals or saves does not affect the final result.
The removal of the away-goals rule has increased the share of ties decided by a process that behaves like a fair coin.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Tournaments that rely more heavily on shootouts after 2021/22 are resolving more ties by chance.
Coaches may gain little from special penalty-preparation tactics if outcomes remain unpredictable.
Similar tests could be run on national-team shootouts to check whether club versus country contexts differ.
If the pattern holds, rule changes that reduce extra-time duration would further amplify the role of randomness.

Load-bearing premise

That differences in Elo ratings give an accurate enough picture of which team should win a penalty shootout.

What would settle it

A new dataset of several hundred additional shootouts in which the team with the higher Elo rating wins significantly more often than 50 percent of the time.

read the original abstract

Penalty shootouts play a crucial role in the knockout stage of major football tournaments. Their importance has substantially increased from the 2021/22 season, when the Union of European Football Associations (UEFA) scrapped the away goals rule. Our paper examines whether the outcome of a penalty shootout can be predicted in UEFA club competitions. Based on all shootouts between 2000 and 2025, no evidence is found for the effect of the kicking order, the field of the match, or psychological momentum. In contrast to previous results, we do not detect any relationship between shootout success and relative team strength, quantified by differences in Elo ratings and the implied winning probability. Thus, the hypothesis that penalty shootouts are close to a coin toss in international competitions for European football clubs cannot be rejected.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Clean null on shootout predictors with updated data, but power and Elo validity as penalty proxy need checking before the coin-toss claim lands firmly.

read the letter

The key point is that this paper runs a full census of UEFA club penalty shootouts from 2000 through 2025 and finds no detectable link to kicking order, venue, momentum, or relative team strength as measured by Elo differences. That leads to the conclusion that the coin-toss hypothesis cannot be rejected in these competitions. The work is straightforward empirical testing against historical outcomes rather than any new modeling trick. What stands out is the expanded sample and the contrast with earlier papers that reported strength effects; the null on Elo is the clearest addition here. The authors also cover the usual suspects in one place, which makes the paper a useful reference for anyone tracking fairness questions in knockout formats. The data collection itself looks solid given the claim of a comprehensive list. The main soft spot is exactly the one flagged in the stress-test note. Teams that reach shootouts are already closely matched, so Elo spreads are probably narrow, and Elo itself is an overall rating that may not capture penalty-specific skills well. Without reported power calculations or checks on how much strength variation actually exists in the realized sample, the failure to reject does not yet rule out modest advantages. That assumption is load-bearing for the coin-toss interpretation. This paper is aimed at sports economists and tournament designers who care about penalty rules. A reader working on fairness in football or similar knockout systems will get direct value from the updated nulls. It is coherent on its own terms and shows honest engagement with the data, so it deserves a serious referee even if revisions are needed on the power and proxy issues. I would send it out for review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper conducts an empirical analysis of all penalty shootouts in UEFA club competitions from 2000 to 2025. It tests for dependence on kicking order, match venue, psychological momentum, and relative team strength (measured by Elo rating differences and implied win probabilities), reports no statistically significant associations for any factor, and concludes that the hypothesis of shootouts being equivalent to a coin toss cannot be rejected.

Significance. If robust, the results add to the sports economics literature by updating prior findings with a full recent census of high-stakes club matches and by failing to detect a strength-outcome link. This supports the interpretation of shootouts as largely random events, with potential implications for knockout tournament design and the away-goals rule change.

major comments (2)

The central claim that outcomes are 'close to a coin toss' rests on interpreting the failure to reject the null as evidence of no meaningful effects. However, without a reported power analysis (e.g., in the methods or results section) showing adequate power to detect modest alternatives such as a 5–10 pp advantage for the stronger team or first-kicker, the null result is difficult to interpret as affirmative support for randomness.
The use of overall Elo rating differences as the measure of relative strength for the penalty phase (reported in the main results tables) requires explicit justification. Teams reaching shootouts are selected from close matches, so realized Elo spreads may be small; moreover, overall Elo may correlate only weakly with penalty-specific skills, which would make the test uninformative about strength independence.

minor comments (2)

Clarify the exact regression specifications, including any controls for match context and the handling of multiple testing across the four main hypotheses.
Report exact sample sizes, number of shootouts per competition, and confidence intervals or standard errors alongside all null results to aid interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We respond to each major comment below and indicate the changes we will make in revision.

read point-by-point responses

Referee: The central claim that outcomes are 'close to a coin toss' rests on interpreting the failure to reject the null as evidence of no meaningful effects. However, without a reported power analysis (e.g., in the methods or results section) showing adequate power to detect modest alternatives such as a 5–10 pp advantage for the stronger team or first-kicker, the null result is difficult to interpret as affirmative support for randomness.

Authors: We agree that a power analysis would strengthen the interpretation of the null results. In the revised manuscript we will add a dedicated subsection in the methods section that reports ex-post power calculations for our main specifications. These calculations will show the minimum detectable effect sizes (at 80% power and 5% significance) for the first-kicker advantage, venue, momentum, and Elo-difference variables given our sample of 25 years of UEFA club shootouts. We expect this to confirm that the design has reasonable power to detect effects in the 5–10 percentage-point range. revision: yes
Referee: The use of overall Elo rating differences as the measure of relative strength for the penalty phase (reported in the main results tables) requires explicit justification. Teams reaching shootouts are selected from close matches, so realized Elo spreads may be small; moreover, overall Elo may correlate only weakly with penalty-specific skills, which would make the test uninformative about strength independence.

Authors: We will expand the data and methods section to provide the requested justification. We will report the distribution of Elo differences in the shootout sample to document that meaningful variation remains even after selection into close matches. We will note that overall Elo is the standard objective measure used in the sports-economics literature on football and has been shown to predict match outcomes well; we will also acknowledge that it is an imperfect proxy for penalty-specific ability and that any resulting attenuation would bias our estimates toward zero, which is consistent with the null findings we report. We will add a brief discussion of this limitation and its implications for interpreting the strength-independence result. revision: yes

Circularity Check

0 steps flagged

No significant circularity: direct empirical test on external historical data

full rationale

The paper conducts a statistical analysis of all UEFA club penalty shootouts from 2000 to 2025, testing for effects of kicking order, venue, momentum, and relative strength via Elo differences. The central result is a failure to reject the null that outcomes resemble a coin toss. No equations, derivations, or predictions are presented that reduce by construction to fitted parameters defined from the same data. No self-citation chains or uniqueness theorems are invoked as load-bearing premises. The analysis relies on external historical records and standard hypothesis testing, remaining self-contained against benchmarks outside the fitted values.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard statistical assumptions for hypothesis testing and the validity of Elo ratings as a strength proxy; no new free parameters, invented entities, or ad-hoc axioms are introduced beyond those implicit in regression-based tests of binary outcomes.

axioms (2)

domain assumption Observations from distinct shootouts are independent conditional on the measured covariates.
Required for the validity of the reported significance tests.
domain assumption Elo rating differences capture the relevant dimension of team quality for penalty-shootout performance.
Central to the claim that no relationship with relative strength exists.

pith-pipeline@v0.9.0 · 5687 in / 1326 out tokens · 48299 ms · 2026-05-18T06:02:09.826197+00:00 · methodology

Are penalty shootouts better than a coin toss? Evidence from international club football in Europe

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)