Anyone for chess? Analysing chess ratings above high thresholds
Pith reviewed 2026-05-21 14:01 UTC · model grok-4.3
The pith
Differences in variance can create large gaps among the very top chess players even when averages are similar.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The author develops models and tools for analyzing chess ratings above high thresholds using only the listed top scores, and applies them to the FIDE top-100 and above-2100 lists for active players. The central argument is that even when two or more distributions have close to identical expected values or medians, smaller differences in variance may explain gaps for the few very best ones.
What carries the argument
Tail models for ratings exceeding high thresholds, fitted using only the listed top scores from FIDE lists.
If this is right
- Gaps among the absolute top ranks can be attributed primarily to variance differences rather than shifts in central tendency.
- Comparisons between groups such as men and women in chess become possible through tail analysis alone.
- Similar tail models can be applied to other skill or performance measures where only elite scores are readily available.
- Predictions for the distribution of even rarer extreme ratings follow directly from the fitted tail parameters.
Where Pith is reading between the lines
- The same variance-driven mechanism could appear in other competitive domains such as scientific output or athletic records where full population data is unavailable.
- Efforts to close performance gaps at the elite level may need to target spread in addition to average ability.
- Historical FIDE rating lists could be reanalyzed with these models to test whether variance differences have changed over time.
Load-bearing premise
That tail behavior above high thresholds can be usefully modeled from the listed top scores alone without reference to the shape or parameters of the bulk distribution below the threshold.
What would settle it
If the number of players predicted to exceed a yet higher rating threshold like 2500, based on fitting the tail model to current top-100 data, deviates markedly from the actual observed count in updated FIDE lists.
read the original abstract
Suppose some cleverness score parameter is sufficiently interesting to be defined and then measured, perhaps for different strata of specialists or for the broader population. Such phenomena could have Gaussian distributions, when it comes to all players in a stratum, but when interest focuses on the very tails, for the top few percent, those above certain high thresholds, different models are called for, along with the need to analyse such based on the listed top scores only. In this note I develop such models and tools, and apply them to the top-100 and above 2100 points lists for regular chess ratings, for the currently active 14671 men and 753 women, as given by the FIDE, January 2026. It is argued that even when two or more distributions have close to identical expected values, or medians, even smaller differences in variance may explain gaps for the few very best ones.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops statistical models and tools for analyzing distributions above high thresholds using only the listed top scores. It applies these to FIDE chess ratings above 2100 for 14671 active men and 753 women as of January 2026, arguing that small differences in variance can explain gaps among the very top performers even when expected values or medians are nearly identical.
Significance. If the tail models are valid and the similarity of means can be substantiated, the work would provide useful methods for extreme-value analysis in rating systems without requiring the full distribution. The empirical focus on chess data offers a concrete illustration of how variance influences tail disparities, with potential relevance to performance gaps in other specialist domains.
major comments (2)
- [Abstract] Abstract: The claim that 'two or more distributions have close to identical expected values, or medians' is not supported by the top-scores analysis alone. The overall mean is dominated by the bulk below the 2100 threshold, which is neither modeled nor estimated from the given data for men and women; without this, it is impossible to verify mean similarity and the attribution of top gaps to variance differences is non-unique.
- [Application to chess ratings] Application section: The tail models fitted directly to the listed scores above 2100 do not reference the shape or parameters of the distribution below the threshold. This leaves open the possibility that shifts in the unmodeled lower tail could generate equivalent top-end gaps through mean differences rather than variance, undermining the central variance-gap explanation.
minor comments (1)
- [Abstract] The data reference 'January 2026' appears inconsistent with present timelines; confirm the exact FIDE list date used.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which correctly identify limitations in how the manuscript frames its claims given the tail-only data. We have revised the abstract, introduction, and application section to clarify the conditional nature of the variance explanation and to avoid implying empirical verification of overall mean or median similarity.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that 'two or more distributions have close to identical expected values, or medians' is not supported by the top-scores analysis alone. The overall mean is dominated by the bulk below the 2100 threshold, which is neither modeled nor estimated from the given data for men and women; without this, it is impossible to verify mean similarity and the attribution of top gaps to variance differences is non-unique.
Authors: We agree that the analysis relies exclusively on ratings above the 2100 threshold and provides no information on the distribution below it, so overall means or medians cannot be verified or compared from the available data. The manuscript develops tail-specific models for extremes using only listed top scores and illustrates that, within such conditional tail distributions, modest variance differences can produce large gaps at the highest quantiles. We did not claim to have empirically established mean equality from the tail data alone. In revision we have updated the abstract to present the argument as conditional ('even when two or more distributions have close to identical expected values or medians, small variance differences may explain...') and added an explicit limitations paragraph in the application section noting that mean shifts arising from the unmodeled bulk remain a possible alternative explanation. revision: yes
-
Referee: [Application to chess ratings] Application section: The tail models fitted directly to the listed scores above 2100 do not reference the shape or parameters of the distribution below the threshold. This leaves open the possibility that shifts in the unmodeled lower tail could generate equivalent top-end gaps through mean differences rather than variance, undermining the central variance-gap explanation.
Authors: This observation is accurate: the models are fitted only to the observed scores above 2100 and are therefore silent on the form or location of the distribution below the threshold. Consequently, differences in the lower bulk could alter overall means and thereby affect the upper tail without any change in conditional variance. Our contribution is the development of tail-specific tools that permit analysis of extremes from top-score lists alone; the variance parameter in these models governs spread within the conditional tail. We have added a clarifying subsection that states the variance-gap account is offered under the maintained assumption of similar central tendencies and acknowledges that unmodeled mean shifts constitute a competing explanation. The language in the application section has been revised to present the variance mechanism as one plausible account supported by the tail analysis rather than the sole or definitive cause. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper develops tail models for chess ratings using only the provided top scores above the 2100 threshold for the FIDE lists of active players. The central claim that small variance differences can account for gaps among the very top players even when means or medians are nearly identical is framed as an interpretive modeling result applied to the external data. No equations, self-citations, or derivations in the abstract reduce any prediction or uniqueness result to a fitted input or prior self-referential step by construction. The analysis remains self-contained against the listed scores without tautological redefinition of inputs as outputs.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.