pith. sign in

arxiv: 2604.21953 · v1 · submitted 2026-04-23 · 💻 cs.LG · cs.CY

Performance Anomaly Detection in Athletics: A Benchmarking System with Visual Analytics

Pith reviewed 2026-05-09 22:55 UTC · model grok-4.3

classification 💻 cs.LG cs.CY
keywords anomaly detectionathleticsanti-dopingperformance analysistrajectory modelingvisual analyticsmachine learning
0
0 comments X

The pith

Trajectory-based methods best identify potential doping violations in athletics while minimizing false alarms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a system to analyze over 1.6 million athletics performances and test eight different methods for spotting unusual results that might indicate performance-enhancing drugs. It validates these methods by checking how well they flag athletes who have already been caught and sanctioned. Trajectory methods, which model an athlete's expected improvement over time and compare actual results to that path, perform best at catching true cases without raising too many alarms on clean athletes. This approach matters because traditional drug testing is costly and cannot cover everyone, so data-driven screening could help focus resources on suspicious athletes. The system includes visual tools so experts can review the flags themselves rather than relying on automation alone.

Core claim

The authors present a benchmarking system that applies statistical rules, machine learning models, and trajectory analysis to competition data from 2010 to 2025. When tested against publicly known anti-doping violations, the trajectory-based approaches that compare an athlete's performances against their predicted career progression achieve the strongest balance of high detection rates and low false positive rates. All methods encounter difficulties due to gaps in the competition records and the small number of confirmed violation cases available for training and validation. The system incorporates an interactive visual analytics interface to enable human experts to examine flagged performan

What carries the argument

Trajectory-based anomaly detection, which builds a model of typical career performance progression and identifies deviations from that expected path as potential anomalies.

Load-bearing premise

Publicly confirmed anti-doping violations serve as an unbiased and representative sample of true performance-enhancing drug use for evaluating the detection methods.

What would settle it

A controlled follow-up study that applies the trajectory method to a new set of athletes and tracks whether a significantly higher proportion of those it flags later receive confirmed violations compared to athletes not flagged by it.

Figures

Figures reproduced from arXiv: 2604.21953 by Blessed Madukoma, Prasenjit Mitra.

Figure 1
Figure 1. Figure 1: Multi-method consensus flagging of a sanctioned 200 m [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Service-oriented architecture for performance anomaly [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Investigator workflow: coarse-to-fine filtering (event [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An athlete’s distribution plots including box plots and [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Consensus outliers interface. Athletes flagged by mul [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

Anti-doping programs rely on biological testing to detect performance-enhancing drugs, but such testing costs over $800 per sample and is limited by short detection windows for many prohibited substances. These constraints leave large portions of athletes without regular testing, motivating complementary screening approaches that analyze routine competition results to identify suspicious performance patterns. We present a system that processes 1.6 million athletics performances from over 19,000 competitions (2010-2025) using eight detection methods ranging from statistical rules to machine learning and trajectory analysis. We validate all methods against publicly confirmed anti-doping violations to measure their effectiveness in identifying sanctioned athletes. Trajectory-based methods, which compare performances to expected career progression, achieve the best balance between detecting violations and limiting false alarms, though all methods face challenges from incomplete data and rare confirmed violations. The system provides an interactive interface for expert-driven investigation, emphasizing transparency and human judgment to support, rather than replace, established anti-doping processes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a benchmarking system for performance anomaly detection in athletics, processing 1.6 million competition results from over 19,000 events (2010-2025) with eight methods spanning statistical rules, machine learning, and trajectory analysis. It validates these methods against publicly confirmed anti-doping violations, reporting that trajectory-based approaches (comparing performances to expected career progression) achieve the best balance of violation detection and low false alarms. The system includes an interactive visual analytics interface to support expert review while emphasizing transparency and complementarity to existing biological testing.

Significance. If the validation results prove robust, the work could offer a practical, low-cost screening layer for anti-doping programs by leveraging routine competition data to prioritize testing. The inclusion of visual analytics and explicit support for human judgment strengthens its potential utility. However, the overall significance remains limited by the preliminary nature of the evidence, particularly the reliance on a small, potentially biased set of confirmed violations as ground truth, which weakens claims of method superiority.

major comments (2)
  1. [Validation procedure] Validation procedure (abstract and results sections): The claim that trajectory-based methods achieve the best balance of detection and false-alarm control rests on treating the small set of publicly confirmed anti-doping violations as ground truth. Because confirmed cases are rare and non-random (only sanctioned athletes receive positive labels), any method surfacing additional plausible anomalies is penalized as false positive while methods missing undetected cases appear stronger. No sensitivity checks (e.g., treating unconfirmed high performers as possible positives or reporting unlabeled ranking metrics) are described, so the reported superiority may be an artifact of the label set rather than a robust algorithmic property.
  2. [Results and evaluation] Results and evaluation (abstract): No quantitative performance metrics (precision, recall, F1, AUC), error bars, statistical tests, or implementation details (hyperparameters, feature definitions for trajectory models) are reported. Without these, it is impossible to judge the magnitude or statistical reliability of the claimed advantage of trajectory methods over the other seven approaches, rendering the central benchmarking conclusion unsupported.
minor comments (1)
  1. [Abstract] The abstract notes challenges from incomplete data and rare violations but provides no concrete description of how missing competition results or career gaps are handled by each method (e.g., imputation, filtering rules, or robustness checks).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, providing honest responses based on the current work and indicating where revisions will be made to improve clarity and robustness.

read point-by-point responses
  1. Referee: Validation procedure (abstract and results sections): The claim that trajectory-based methods achieve the best balance of detection and false-alarm control rests on treating the small set of publicly confirmed anti-doping violations as ground truth. Because confirmed cases are rare and non-random (only sanctioned athletes receive positive labels), any method surfacing additional plausible anomalies is penalized as false positive while methods missing undetected cases appear stronger. No sensitivity checks (e.g., treating unconfirmed high performers as possible positives or reporting unlabeled ranking metrics) are described, so the reported superiority may be an artifact of the label set rather than a robust algorithmic property.

    Authors: We agree that using the small set of publicly confirmed violations as ground truth introduces inherent limitations due to its rarity and non-random nature. The manuscript already notes challenges from incomplete data and rare confirmed violations, and this labeled set is the only publicly available ground truth tied to real sanctions. We did not perform the suggested sensitivity checks because labeling unconfirmed high performers as positives would require unsubstantiated assumptions without supporting evidence, potentially introducing new biases. In revision, we will add a dedicated limitations subsection explicitly discussing this ground-truth issue and its potential impact on method comparisons. We will also explore and report any feasible unlabeled ranking metrics to provide additional context. revision: partial

  2. Referee: Results and evaluation (abstract): No quantitative performance metrics (precision, recall, F1, AUC), error bars, statistical tests, or implementation details (hyperparameters, feature definitions for trajectory models) are reported. Without these, it is impossible to judge the magnitude or statistical reliability of the claimed advantage of trajectory methods over the other seven approaches, rendering the central benchmarking conclusion unsupported.

    Authors: We acknowledge that the abstract and results sections lack explicit quantitative metrics such as precision, recall, F1, and AUC, as well as error bars, statistical tests, and detailed implementation information. This makes it harder to evaluate the strength of the comparisons. In the revised manuscript, we will add a results table with these metrics for all eight methods, include implementation details (hyperparameters and feature definitions) in the methods section or supplementary material, and report statistical comparisons or error bars where the data and experimental setup allow. This will better support the benchmarking claims. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmarking system relies on external validation against confirmed violations

full rationale

The paper presents a data-processing and benchmarking system that applies eight detection methods (statistical rules, ML, trajectory analysis) to 1.6M athletics records and evaluates them against an external set of publicly confirmed anti-doping violations. No equations, parameter-fitting steps, self-citations used as load-bearing premises, or predictions that reduce to the input data by construction are described. Validation uses independent ground-truth labels rather than internal consistency checks or renamed empirical patterns, so the central claims remain independent of the paper's own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no free parameters, axioms, or invented entities are explicitly introduced or detailed in the provided text.

pith-pipeline@v0.9.0 · 5462 in / 1229 out tokens · 21157 ms · 2026-05-09T22:55:46.143166+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    International standard for testing and investigations (isti),

    World Anti-Doping Agency, “International standard for testing and investigations (isti),” PDF, 2023, effective 1 January 2023

  2. [2]

    Athlete performance monitoring in anti-doping,

    J. Hopker, Y . O. Schumacher, M. Fedoruk, J. Mørkeberg, S. Bermon, S. Iljukov, R. Aikin, and P.-E. Sottas, “Athlete performance monitoring in anti-doping,”Frontiers in Physiology, vol. 9, p. 232, 2018

  3. [3]

    Doping control expenditure: 2010 anti-doping activities and expendi- ture,

    Association of Summer Olympic International Federations (ASOIF), “Doping control expenditure: 2010 anti-doping activities and expendi- ture,” Report, 2010

  4. [4]

    Inefficiency of the anti-doping system: cost reduction proposals,

    W. Maennig, “Inefficiency of the anti-doping system: cost reduction proposals,”Substance Use & Misuse, 2014

  5. [5]

    Annual banned- substance review: analytical approaches in human sports drug testing,

    M. Thevis, T. Kuuranne, H. Geyer, and W. Sch ¨anzer, “Annual banned- substance review: analytical approaches in human sports drug testing,” Drug Testing and Analysis, vol. 5, no. 1, pp. 1–19, 2013

  6. [6]

    Doping control in sport: An investigation of how elite athletes perceive and trust the doping testing system,

    M. Overbye, “Doping control in sport: An investigation of how elite athletes perceive and trust the doping testing system,”Performance Enhancement & Health, 2016

  7. [7]

    Performance profiling—perspectives for anti-doping and beyond,

    S. Iljukov and Y . O. Schumacher, “Performance profiling—perspectives for anti-doping and beyond,”Frontiers in Physiology, vol. 8, 2017

  8. [8]

    Performance profiling as an intelligence-led approach to anti-doping in sports,

    J. Hopker, J. Griffin, J. Brookhouse, J. Peters, Y . O. Schumacher, and S. Iljukov, “Performance profiling as an intelligence-led approach to anti-doping in sports,”Drug Testing and Analysis, vol. 12, no. 3, pp. 402–409, 2020

  9. [9]

    Importance of weightlifting performance analysis in anti-doping,

    H.-T. Ryooet al., “Importance of weightlifting performance analysis in anti-doping,”PLOS ONE, 2022

  10. [10]

    2023 anti-doping testing figures report,

    W. A.-D. Agency, “2023 anti-doping testing figures report,”

  11. [11]

    Available: https://www.wada-ama.org/sites/default/files/ 2025-06/2023 anti doping testing figures en 0.pdf

    [Online]. Available: https://www.wada-ama.org/sites/default/files/ 2025-06/2023 anti doping testing figures en 0.pdf

  12. [12]

    Athletes’ rights and the world anti-doping agency,

    M. Hard, “Athletes’ rights and the world anti-doping agency,”Southern California Interdisciplinary Law Journal, vol. 19, no. 3, 2009

  13. [13]

    A mapping review of athletes’ perception of anti- doping legitimacy,

    T. Woolwayet al., “A mapping review of athletes’ perception of anti- doping legitimacy,”International Journal of Drug Policy, 2020

  14. [14]

    A bayesian approach for the use of athlete performance data within anti-doping,

    S. Montagnaet al., “A bayesian approach for the use of athlete performance data within anti-doping,”Frontiers in Physiology, vol. 9, 2018

  15. [15]

    Competitive performance as a discriminator of doping status in elite athletes,

    J. G. Hopker, J. E. Griffin, L. C. Hinoveanu, J. Saugy, and R. Faiss, “Competitive performance as a discriminator of doping status in elite athletes,”Drug Testing and Analysis, 2023, in press

  16. [16]

    Athlete biological passport (abp) operating guidelines,

    World Anti-Doping Agency, “Athlete biological passport (abp) operating guidelines,” PDF, Jul. 2023, version 9.0. [Online]. Available: https://www.wada-ama.org/en/resources/world-anti-doping-program/ athlete-biological-passport-abp-operating-guidelines

  17. [17]

    Athlete advisory notes: The athlete biological pass- port,

    World Athletics, “Athlete advisory notes: The athlete biological pass- port,” PDF, 2014

  18. [18]

    Improvement in 100-m sprint performance at an altitude of 2250 m,

    N. P. Linthorne, “Improvement in 100-m sprint performance at an altitude of 2250 m,”Sports, vol. 4, no. 2, 2016

  19. [19]

    The effect of wind on 100-m sprint times,

    ——, “The effect of wind on 100-m sprint times,”Journal of Applied Biomechanics, vol. 10, no. 2, pp. 110–131, 1994

  20. [20]

    Meso-pacing in olympic and world championship sprints and hurdles,

    B. Hanleyet al., “Meso-pacing in olympic and world championship sprints and hurdles,”Journal of Sports Sciences, 2021

  21. [21]

    D. C. Hoaglin, F. Mosteller, and J. W. Tukey,Understanding Robust and Exploratory Data Analysis. New York: Wiley, 1983

  22. [22]

    Hastie, R

    T. Hastie, R. Tibshirani, and J. Friedman,The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York: Springer, 2009

  23. [23]

    Performance profiling: A role for sport science in the fight against doping?

    Y . O. Schumacher and T. Pottgiesser, “Performance profiling: A role for sport science in the fight against doping?”International Journal of Sports Physiology and Performance, vol. 4, no. 1, pp. 129–133, 2009

  24. [24]

    Wada welcomes enhanced long-term sam- ple storage and re-analysis program,

    World Anti-Doping Agency, “Wada welcomes enhanced long-term sam- ple storage and re-analysis program,” Web page, 2020, published 21 Dec 2020

  25. [25]

    Re-analysis program: Olympic games london 2012,

    International Testing Agency, “Re-analysis program: Olympic games london 2012,” PDF, 2022

  26. [26]

    Latest success of iaaf re-testing strategy reveals more positives (helsinki 2005 and osaka 2007),

    World Athletics, “Latest success of iaaf re-testing strategy reveals more positives (helsinki 2005 and osaka 2007),” Press release, 2015

  27. [27]

    Gelman, J

    A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin,Bayesian Data Analysis, 3rd ed. Boca Raton, FL: CRC Press, 2013

  28. [28]

    Anomaly detection: A survey,

    V . Chandola, A. Banerjee, and V . Kumar, “Anomaly detection: A survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–58, 2009

  29. [29]

    A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data,

    M. Goldstein and S. Uchida, “A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data,”PLOS ONE, vol. 11, no. 4, p. e0152173, 2016

  30. [30]

    The athlete biological passport,

    P.-E. Sottas, M. Saugy, and C. Saudan, “The athlete biological passport,” Clinical Chemistry, vol. 57, no. 7, pp. 969–976, 2011

  31. [31]

    Procedures for detecting outlying observations in sam- ples,

    F. E. Grubbs, “Procedures for detecting outlying observations in sam- ples,”Technometrics, vol. 11, no. 1, pp. 1–21, 1969

  32. [32]

    Isolation forest,

    F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” inProceedings of the 8th IEEE International Conference on Data Mining (ICDM). IEEE, 2008, pp. 413–422

  33. [33]

    Xgboost: A scalable tree boosting system,

    T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 785–794

  34. [34]

    R. B. Nelsen,An Introduction to Copulas, 2nd ed. New York: Springer, 2006. APPENDIXA HYPERPARAMETERCONFIGURATION TABLE V: ML-Based Detection Methods Hyperparameters Method Parameter Value Justification Isolation Forest contamination 0.1 Domain prior:∼10% outliers n estimators 100 sklearn default random state 42 Reproducibility XGBoost n estimators 100 Sta...