Performance Anomaly Detection in Athletics: A Benchmarking System with Visual Analytics
Pith reviewed 2026-05-09 22:55 UTC · model grok-4.3
The pith
Trajectory-based methods best identify potential doping violations in athletics while minimizing false alarms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present a benchmarking system that applies statistical rules, machine learning models, and trajectory analysis to competition data from 2010 to 2025. When tested against publicly known anti-doping violations, the trajectory-based approaches that compare an athlete's performances against their predicted career progression achieve the strongest balance of high detection rates and low false positive rates. All methods encounter difficulties due to gaps in the competition records and the small number of confirmed violation cases available for training and validation. The system incorporates an interactive visual analytics interface to enable human experts to examine flagged performan
What carries the argument
Trajectory-based anomaly detection, which builds a model of typical career performance progression and identifies deviations from that expected path as potential anomalies.
Load-bearing premise
Publicly confirmed anti-doping violations serve as an unbiased and representative sample of true performance-enhancing drug use for evaluating the detection methods.
What would settle it
A controlled follow-up study that applies the trajectory method to a new set of athletes and tracks whether a significantly higher proportion of those it flags later receive confirmed violations compared to athletes not flagged by it.
Figures
read the original abstract
Anti-doping programs rely on biological testing to detect performance-enhancing drugs, but such testing costs over $800 per sample and is limited by short detection windows for many prohibited substances. These constraints leave large portions of athletes without regular testing, motivating complementary screening approaches that analyze routine competition results to identify suspicious performance patterns. We present a system that processes 1.6 million athletics performances from over 19,000 competitions (2010-2025) using eight detection methods ranging from statistical rules to machine learning and trajectory analysis. We validate all methods against publicly confirmed anti-doping violations to measure their effectiveness in identifying sanctioned athletes. Trajectory-based methods, which compare performances to expected career progression, achieve the best balance between detecting violations and limiting false alarms, though all methods face challenges from incomplete data and rare confirmed violations. The system provides an interactive interface for expert-driven investigation, emphasizing transparency and human judgment to support, rather than replace, established anti-doping processes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a benchmarking system for performance anomaly detection in athletics, processing 1.6 million competition results from over 19,000 events (2010-2025) with eight methods spanning statistical rules, machine learning, and trajectory analysis. It validates these methods against publicly confirmed anti-doping violations, reporting that trajectory-based approaches (comparing performances to expected career progression) achieve the best balance of violation detection and low false alarms. The system includes an interactive visual analytics interface to support expert review while emphasizing transparency and complementarity to existing biological testing.
Significance. If the validation results prove robust, the work could offer a practical, low-cost screening layer for anti-doping programs by leveraging routine competition data to prioritize testing. The inclusion of visual analytics and explicit support for human judgment strengthens its potential utility. However, the overall significance remains limited by the preliminary nature of the evidence, particularly the reliance on a small, potentially biased set of confirmed violations as ground truth, which weakens claims of method superiority.
major comments (2)
- [Validation procedure] Validation procedure (abstract and results sections): The claim that trajectory-based methods achieve the best balance of detection and false-alarm control rests on treating the small set of publicly confirmed anti-doping violations as ground truth. Because confirmed cases are rare and non-random (only sanctioned athletes receive positive labels), any method surfacing additional plausible anomalies is penalized as false positive while methods missing undetected cases appear stronger. No sensitivity checks (e.g., treating unconfirmed high performers as possible positives or reporting unlabeled ranking metrics) are described, so the reported superiority may be an artifact of the label set rather than a robust algorithmic property.
- [Results and evaluation] Results and evaluation (abstract): No quantitative performance metrics (precision, recall, F1, AUC), error bars, statistical tests, or implementation details (hyperparameters, feature definitions for trajectory models) are reported. Without these, it is impossible to judge the magnitude or statistical reliability of the claimed advantage of trajectory methods over the other seven approaches, rendering the central benchmarking conclusion unsupported.
minor comments (1)
- [Abstract] The abstract notes challenges from incomplete data and rare violations but provides no concrete description of how missing competition results or career gaps are handled by each method (e.g., imputation, filtering rules, or robustness checks).
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below, providing honest responses based on the current work and indicating where revisions will be made to improve clarity and robustness.
read point-by-point responses
-
Referee: Validation procedure (abstract and results sections): The claim that trajectory-based methods achieve the best balance of detection and false-alarm control rests on treating the small set of publicly confirmed anti-doping violations as ground truth. Because confirmed cases are rare and non-random (only sanctioned athletes receive positive labels), any method surfacing additional plausible anomalies is penalized as false positive while methods missing undetected cases appear stronger. No sensitivity checks (e.g., treating unconfirmed high performers as possible positives or reporting unlabeled ranking metrics) are described, so the reported superiority may be an artifact of the label set rather than a robust algorithmic property.
Authors: We agree that using the small set of publicly confirmed violations as ground truth introduces inherent limitations due to its rarity and non-random nature. The manuscript already notes challenges from incomplete data and rare confirmed violations, and this labeled set is the only publicly available ground truth tied to real sanctions. We did not perform the suggested sensitivity checks because labeling unconfirmed high performers as positives would require unsubstantiated assumptions without supporting evidence, potentially introducing new biases. In revision, we will add a dedicated limitations subsection explicitly discussing this ground-truth issue and its potential impact on method comparisons. We will also explore and report any feasible unlabeled ranking metrics to provide additional context. revision: partial
-
Referee: Results and evaluation (abstract): No quantitative performance metrics (precision, recall, F1, AUC), error bars, statistical tests, or implementation details (hyperparameters, feature definitions for trajectory models) are reported. Without these, it is impossible to judge the magnitude or statistical reliability of the claimed advantage of trajectory methods over the other seven approaches, rendering the central benchmarking conclusion unsupported.
Authors: We acknowledge that the abstract and results sections lack explicit quantitative metrics such as precision, recall, F1, and AUC, as well as error bars, statistical tests, and detailed implementation information. This makes it harder to evaluate the strength of the comparisons. In the revised manuscript, we will add a results table with these metrics for all eight methods, include implementation details (hyperparameters and feature definitions) in the methods section or supplementary material, and report statistical comparisons or error bars where the data and experimental setup allow. This will better support the benchmarking claims. revision: yes
Circularity Check
No circularity: benchmarking system relies on external validation against confirmed violations
full rationale
The paper presents a data-processing and benchmarking system that applies eight detection methods (statistical rules, ML, trajectory analysis) to 1.6M athletics records and evaluates them against an external set of publicly confirmed anti-doping violations. No equations, parameter-fitting steps, self-citations used as load-bearing premises, or predictions that reduce to the input data by construction are described. Validation uses independent ground-truth labels rather than internal consistency checks or renamed empirical patterns, so the central claims remain independent of the paper's own outputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
International standard for testing and investigations (isti),
World Anti-Doping Agency, “International standard for testing and investigations (isti),” PDF, 2023, effective 1 January 2023
work page 2023
-
[2]
Athlete performance monitoring in anti-doping,
J. Hopker, Y . O. Schumacher, M. Fedoruk, J. Mørkeberg, S. Bermon, S. Iljukov, R. Aikin, and P.-E. Sottas, “Athlete performance monitoring in anti-doping,”Frontiers in Physiology, vol. 9, p. 232, 2018
work page 2018
-
[3]
Doping control expenditure: 2010 anti-doping activities and expendi- ture,
Association of Summer Olympic International Federations (ASOIF), “Doping control expenditure: 2010 anti-doping activities and expendi- ture,” Report, 2010
work page 2010
-
[4]
Inefficiency of the anti-doping system: cost reduction proposals,
W. Maennig, “Inefficiency of the anti-doping system: cost reduction proposals,”Substance Use & Misuse, 2014
work page 2014
-
[5]
Annual banned- substance review: analytical approaches in human sports drug testing,
M. Thevis, T. Kuuranne, H. Geyer, and W. Sch ¨anzer, “Annual banned- substance review: analytical approaches in human sports drug testing,” Drug Testing and Analysis, vol. 5, no. 1, pp. 1–19, 2013
work page 2013
-
[6]
M. Overbye, “Doping control in sport: An investigation of how elite athletes perceive and trust the doping testing system,”Performance Enhancement & Health, 2016
work page 2016
-
[7]
Performance profiling—perspectives for anti-doping and beyond,
S. Iljukov and Y . O. Schumacher, “Performance profiling—perspectives for anti-doping and beyond,”Frontiers in Physiology, vol. 8, 2017
work page 2017
-
[8]
Performance profiling as an intelligence-led approach to anti-doping in sports,
J. Hopker, J. Griffin, J. Brookhouse, J. Peters, Y . O. Schumacher, and S. Iljukov, “Performance profiling as an intelligence-led approach to anti-doping in sports,”Drug Testing and Analysis, vol. 12, no. 3, pp. 402–409, 2020
work page 2020
-
[9]
Importance of weightlifting performance analysis in anti-doping,
H.-T. Ryooet al., “Importance of weightlifting performance analysis in anti-doping,”PLOS ONE, 2022
work page 2022
-
[10]
2023 anti-doping testing figures report,
W. A.-D. Agency, “2023 anti-doping testing figures report,”
work page 2023
-
[11]
[Online]. Available: https://www.wada-ama.org/sites/default/files/ 2025-06/2023 anti doping testing figures en 0.pdf
work page 2025
-
[12]
Athletes’ rights and the world anti-doping agency,
M. Hard, “Athletes’ rights and the world anti-doping agency,”Southern California Interdisciplinary Law Journal, vol. 19, no. 3, 2009
work page 2009
-
[13]
A mapping review of athletes’ perception of anti- doping legitimacy,
T. Woolwayet al., “A mapping review of athletes’ perception of anti- doping legitimacy,”International Journal of Drug Policy, 2020
work page 2020
-
[14]
A bayesian approach for the use of athlete performance data within anti-doping,
S. Montagnaet al., “A bayesian approach for the use of athlete performance data within anti-doping,”Frontiers in Physiology, vol. 9, 2018
work page 2018
-
[15]
Competitive performance as a discriminator of doping status in elite athletes,
J. G. Hopker, J. E. Griffin, L. C. Hinoveanu, J. Saugy, and R. Faiss, “Competitive performance as a discriminator of doping status in elite athletes,”Drug Testing and Analysis, 2023, in press
work page 2023
-
[16]
Athlete biological passport (abp) operating guidelines,
World Anti-Doping Agency, “Athlete biological passport (abp) operating guidelines,” PDF, Jul. 2023, version 9.0. [Online]. Available: https://www.wada-ama.org/en/resources/world-anti-doping-program/ athlete-biological-passport-abp-operating-guidelines
work page 2023
-
[17]
Athlete advisory notes: The athlete biological pass- port,
World Athletics, “Athlete advisory notes: The athlete biological pass- port,” PDF, 2014
work page 2014
-
[18]
Improvement in 100-m sprint performance at an altitude of 2250 m,
N. P. Linthorne, “Improvement in 100-m sprint performance at an altitude of 2250 m,”Sports, vol. 4, no. 2, 2016
work page 2016
-
[19]
The effect of wind on 100-m sprint times,
——, “The effect of wind on 100-m sprint times,”Journal of Applied Biomechanics, vol. 10, no. 2, pp. 110–131, 1994
work page 1994
-
[20]
Meso-pacing in olympic and world championship sprints and hurdles,
B. Hanleyet al., “Meso-pacing in olympic and world championship sprints and hurdles,”Journal of Sports Sciences, 2021
work page 2021
-
[21]
D. C. Hoaglin, F. Mosteller, and J. W. Tukey,Understanding Robust and Exploratory Data Analysis. New York: Wiley, 1983
work page 1983
- [22]
-
[23]
Performance profiling: A role for sport science in the fight against doping?
Y . O. Schumacher and T. Pottgiesser, “Performance profiling: A role for sport science in the fight against doping?”International Journal of Sports Physiology and Performance, vol. 4, no. 1, pp. 129–133, 2009
work page 2009
-
[24]
Wada welcomes enhanced long-term sam- ple storage and re-analysis program,
World Anti-Doping Agency, “Wada welcomes enhanced long-term sam- ple storage and re-analysis program,” Web page, 2020, published 21 Dec 2020
work page 2020
-
[25]
Re-analysis program: Olympic games london 2012,
International Testing Agency, “Re-analysis program: Olympic games london 2012,” PDF, 2022
work page 2012
-
[26]
Latest success of iaaf re-testing strategy reveals more positives (helsinki 2005 and osaka 2007),
World Athletics, “Latest success of iaaf re-testing strategy reveals more positives (helsinki 2005 and osaka 2007),” Press release, 2015
work page 2005
- [27]
-
[28]
V . Chandola, A. Banerjee, and V . Kumar, “Anomaly detection: A survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–58, 2009
work page 2009
-
[29]
A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data,
M. Goldstein and S. Uchida, “A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data,”PLOS ONE, vol. 11, no. 4, p. e0152173, 2016
work page 2016
-
[30]
The athlete biological passport,
P.-E. Sottas, M. Saugy, and C. Saudan, “The athlete biological passport,” Clinical Chemistry, vol. 57, no. 7, pp. 969–976, 2011
work page 2011
-
[31]
Procedures for detecting outlying observations in sam- ples,
F. E. Grubbs, “Procedures for detecting outlying observations in sam- ples,”Technometrics, vol. 11, no. 1, pp. 1–21, 1969
work page 1969
-
[32]
F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” inProceedings of the 8th IEEE International Conference on Data Mining (ICDM). IEEE, 2008, pp. 413–422
work page 2008
-
[33]
Xgboost: A scalable tree boosting system,
T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 785–794
work page 2016
-
[34]
R. B. Nelsen,An Introduction to Copulas, 2nd ed. New York: Springer, 2006. APPENDIXA HYPERPARAMETERCONFIGURATION TABLE V: ML-Based Detection Methods Hyperparameters Method Parameter Value Justification Isolation Forest contamination 0.1 Domain prior:∼10% outliers n estimators 100 sklearn default random state 42 Reproducibility XGBoost n estimators 100 Sta...
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.