Recognition: unknown
An unsupervised decision-support framework for multivariate biomarker analysis in athlete monitoring
Pith reviewed 2026-05-10 12:18 UTC · model grok-4.3
The pith
An unsupervised framework distinguishes mechanical damage from metabolic stress in athletes using multivariate biomarker profiles without injury labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework identifies coherent profiles that distinguish mechanical damage from metabolic stress while preserving homeostatic states by using Ward hierarchical clustering for monitoring and Gaussian Mixture Models for stability analysis in the joint biomarker space, learned from real data of amateur soccer players and validated with synthetic augmentation.
What carries the argument
the modular computational framework operating in the joint biomarker space with preprocessing, safety screening, unsupervised clustering, and centroid-based interpretation
Load-bearing premise
The unsupervised clusters correspond to clinically meaningful physiological states that can be interpreted as mechanical damage versus metabolic stress without any ground truth labels or independent clinical validation.
What would settle it
Independent clinical validation data showing injury types for the same athletes would falsify the claim if athletes clustered as mechanical damage do not exhibit higher mechanical injury rates than those in other clusters.
read the original abstract
Purpose. Athlete monitoring is constrained by small cohorts, heterogeneous biomarker scales, limited feasibility of repeated sampling, and the lack of reliable injury ground truth. These limitations reduce the interpretability and utility of traditional univariate and binary risk models. This study addresses these challenges by proposing an unsupervised multivariate framework to identify latent physiological states in athletes using real data. Methods. We propose a modular computational framework that operates in the joint biomarker space, integrating preprocessing, clinical safety screening, unsupervised clustering, and centroid-based physiological interpretation. Profiles are learned exclusively from amateur soccer players during a competitive microcycle. Synthetic data augmentation evaluates robustness and scalability. Ward hierarchical clustering supports monitoring and etiological differentiation, while Gaussian Mixture Models (GMM) enable structural stability analysis in high-dimensional settings. Results. The framework identifies coherent profiles that distinguish mechanical damage from metabolic stress while preserving homeostatic states. Synthetic data augmentation demonstrates feasibility and detection of latent silent risk phenotypes typically missed by univariate monitoring. Structural analyses indicate robustness under augmentation and higher-dimensional settings. Conclusion. The framework enables interpretable identification of latent physiological states from multivariate biomarker data without injury labels. By distinguishing mechanisms and revealing silent risk patterns not captured by conventional monitoring, it provides actionable insights for individualized athlete monitoring and decision making.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a modular unsupervised framework for multivariate biomarker analysis in athlete monitoring. It combines preprocessing, clinical safety screening, Ward hierarchical clustering, and Gaussian Mixture Models (GMM) applied to real data from amateur soccer players over a competitive microcycle, supplemented by synthetic data augmentation. The central claim is that the resulting clusters identify coherent latent physiological profiles that distinguish mechanical damage from metabolic stress while preserving homeostatic states, enabling detection of silent risk phenotypes without requiring injury ground-truth labels.
Significance. If the unsupervised clusters can be shown to map reliably to distinguishable physiological regimes, the framework would address key limitations in athlete monitoring (small cohorts, heterogeneous scales, no labels) by providing an interpretable, multivariate alternative to univariate risk models. The modular design, use of both real microcycle data and synthetic augmentation for robustness testing, and structural stability analysis in higher dimensions are constructive elements. However, the current lack of quantitative validation metrics and independent confirmation of the post-hoc physiological interpretations substantially limits the immediate impact and generalizability.
major comments (3)
- [Abstract, Results] Abstract and Results: The reported outcomes are described only qualitatively (e.g., 'coherent profiles that distinguish mechanical damage from metabolic stress') with no quantitative metrics such as silhouette scores, Davies-Bouldin indices, cluster stability measures, or statistical comparisons of centroid differences. This absence makes it impossible to evaluate the strength or reproducibility of the claimed distinctions.
- [Methods, Results] Methods and Results: The mapping of Ward/GMM clusters to specific physiological states (mechanical damage vs. metabolic stress vs. homeostatic) is performed post-hoc via inspection of biomarker centroid elevations. No cross-validation against external criteria (expert annotation, longitudinal injury records, or controlled experiments) is described, so the etiological differentiation rests on domain assumptions rather than independent evidence.
- [Abstract] Abstract: Synthetic data augmentation is invoked to demonstrate robustness and detection of 'latent silent risk phenotypes,' yet no details are supplied on augmentation parameters, generation process, or how the augmented data were used to validate (or potentially bias) the observed profiles. This leaves open the possibility of circularity in the robustness claims.
minor comments (2)
- [Methods] The manuscript would benefit from explicit statements of the number of clusters/components selected, the criteria used for selection, and any sensitivity analyses around this choice.
- [Methods] Notation for the biomarker variables and the exact preprocessing steps (normalization, safety screening thresholds) could be presented more formally, perhaps in a dedicated table or pseudocode block, to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight opportunities to strengthen the quantitative aspects and transparency of our work. We address each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract, Results] Abstract and Results: The reported outcomes are described only qualitatively (e.g., 'coherent profiles that distinguish mechanical damage from metabolic stress') with no quantitative metrics such as silhouette scores, Davies-Bouldin indices, cluster stability measures, or statistical comparisons of centroid differences. This absence makes it impossible to evaluate the strength or reproducibility of the claimed distinctions.
Authors: We agree that quantitative metrics are needed for rigorous evaluation. In the revised manuscript, we will add silhouette scores, Davies-Bouldin indices, cluster stability measures via bootstrap resampling, and statistical comparisons (e.g., Kruskal-Wallis tests with post-hoc analyses) of biomarker centroid differences to objectively support the reported distinctions. revision: yes
-
Referee: [Methods, Results] Methods and Results: The mapping of Ward/GMM clusters to specific physiological states (mechanical damage vs. metabolic stress vs. homeostatic) is performed post-hoc via inspection of biomarker centroid elevations. No cross-validation against external criteria (expert annotation, longitudinal injury records, or controlled experiments) is described, so the etiological differentiation rests on domain assumptions rather than independent evidence.
Authors: The post-hoc mapping follows standard practice for unsupervised methods in the absence of ground truth, which is a core motivation of the study. Interpretations draw on established biomarker-physiology links from the literature. We will expand the Methods and Discussion with additional references and sensitivity analyses for robustness. However, cross-validation against injury records or expert annotations is not feasible, as these data were not collected, reflecting the real-world constraints the framework targets. revision: partial
-
Referee: [Abstract] Abstract: Synthetic data augmentation is invoked to demonstrate robustness and detection of 'latent silent risk phenotypes,' yet no details are supplied on augmentation parameters, generation process, or how the augmented data were used to validate (or potentially bias) the observed profiles. This leaves open the possibility of circularity in the robustness claims.
Authors: The Methods section already details the augmentation parameters and process. To enhance clarity, we will briefly summarize these in the Abstract and add a Results subsection explicitly describing the generation method, parameters, application to robustness testing, and validation metrics to demonstrate independence from the original profiles. revision: yes
- Independent cross-validation of physiological interpretations against external criteria such as injury records or expert annotations, which are unavailable in the study dataset and inherent to the unsupervised, label-free setting.
Circularity Check
No circularity detected in derivation chain
full rationale
The paper's framework applies standard unsupervised clustering (Ward, GMM) to preprocessed biomarker data from real athlete microcycles, followed by post-hoc centroid inspection to assign physiological labels. This interpretation step relies on external domain knowledge of biomarker meanings rather than deriving the mechanical-vs-metabolic distinction mathematically from the clustering equations themselves. No equations reduce a claimed prediction to a fitted input by construction, no uniqueness theorems are imported via self-citation, and synthetic augmentation is used only for robustness checks, not to generate the primary profiles. The central claim therefore remains an empirical partitioning plus interpretive overlay, not a self-referential loop.
Axiom & Free-Parameter Ledger
free parameters (1)
- Number of clusters or GMM components
axioms (1)
- domain assumption Unsupervised clusters represent distinct, interpretable physiological states
Reference graph
Works this paper leans on
-
[1]
The role of biomarkers in monitoring chronic fatigue among male pro- fessional team athletes: A systematic review,
G. Contreras-D´ ıaz, A. Galiano, J. M. Garc´ ıa- Mansoet al., “The role of biomarkers in monitoring chronic fatigue among male pro- fessional team athletes: A systematic review,” Sensors, Vol. 24, p. 6862, 2024
2024
-
[2]
N. Haller, M. Behringer, T. Reichel et al., “Blood-based biomarkers for man- aging workload in athletes: Considerations and recommendations for evidence-based use of established biomarkers,”Sports Medicine, Vol. 53, No. 7, pp. 1315–1333, 2023. [Online]: https://doi.org/10.1007/s40279-023-01836-x
-
[3]
J. M. L´ opez-Cuervo, A. Rojas-Jaramillo, A. Garc´ ıa-Caro, J. Gonz´ alez-Santamaria, G. Humeres, J. R. Stout, A. Odriozola- Mart´ ınez, and D. A. Bonilla, “Biochemical and perceptual markers of physiological stress during acute exercise overload in u20 elite basketball players,”Stresses, Vol. 5, No. 3, p. 52, 2025. [Online]: https://doi.org/ 10.3390/stre...
-
[4]
Hematology, hormones, inflammation, and muscle damage in elite and professional soccer players: a systematic review with implications for exercise,
K. Saidi, A. B. Abderrahman, A. C. Hackney, B. Bideau, S. Zouita, U. Granacher, and H. Zouhal, “Hematology, hormones, inflammation, and muscle damage in elite and professional soccer players: a systematic review with implications for exercise,”Sports medicine, Vol. 51, No. 12, pp. 2607– 2627, 2021. [Online]: https://doi.org/10. 3390/ijerph21111394
2021
-
[5]
Overtraining syndrome: A complex systems phenomenon,
L. E. Armstrong, J. L. VanHeest, H. O’Connor, W. J. Kraemer, and R. Meeusen, “Overtraining syndrome: A complex systems phenomenon,”Frontiers in Network Physiology, Vol. 1, p. 794392, 2022. [Online]: https://doi.org/10.3389/fnetp.2022. 794392
-
[6]
Digital medicine and the curse of dimensionality,
V. Berisha, C. Krantsevich, P. R. Hahn, S. Hahn, G. Dasarathy, P. Turaga, and J. Liss, “Digital medicine and the curse of dimensionality,”npj Digital Medicine, Vol. 4, No. 1, p. 153, 2021. [Online]: https://doi.org/10.1038/s41746-021-00521-5
-
[7]
Machine learning methods in sport injury prediction and prevention: A systematic review,
H. Van Eetvelde, L. D. Mendon¸ ca, C. Ley, R. Seil, and T. Tischer, “Machine learning methods in sport injury prediction and prevention: A systematic review,”Journal of Experimental Orthopaedics, Vol. 8, No. 1, p. 27, 2021. [Online]: https://doi.org/10. 1186/s40634-021-00346-x
2021
-
[8]
Blood sample profile helps to injury forecasting in elite soccer players,
A. Rossi, L. Pappalardo, C. Filetti, and P. Cintia, “Blood sample profile helps to injury forecasting in elite soccer players,” Sport Sciences for Health, Vol. 19, No. 1, pp. 285–296, 2023. [Online]: https://doi.org/10. 1007/s11332-022-00932-1
2023
-
[9]
J. M. Losciale, L. K. Truong, P. Ward, G. S. Collins, and G. S. Bullock, “Lim- itations of separating athletes into high or low-risk groups based on a cut-off: A clinical commentary,”International Jour- nal of Sports Physical Therapy, Vol. 19, No. 9, pp. 1151–1164, 2024. [Online]: https://doi.org/10.26603/001c.122644
-
[10]
Machine learning approaches to injury risk pre- diction in sport: A scoping review with evidence synthesis,
C. Leckey, N. van Dyk, C. Doherty, A. Lawlor, and E. Delahunt, “Machine learning approaches to injury risk pre- diction in sport: A scoping review with evidence synthesis,”British Journal of Sports Medicine, Vol. 59, No. 7, p. e108576, 2024. [Online]: https://doi.org/10. 1136/bjsports-2024-108576
2024
-
[11]
J. A. Martin, M. R. Stiffler-Joachim, C. M. Wille, and B. C. Heiderscheit, “A hierarchical clustering approach for examining potential risk factors for bone stress injury in runners,”Journal of Biomechanics, Vol. 141, p. 111136, 2022. [Online]: https://doi.org/ 10.1016/j.jbiomech.2022.111136
-
[12]
A. Foucaud, F. Durand, and H. Meric, “Using unsupervised machine learning to characterize recovery patterns in elite canoe- kayak athletes across the Olympic training year,”Frontiers in Sports and Active Living, Vol. 7, p. 1629924, 2025. [Online]: https: //doi.org/10.3389/fspor.2025.1629924
-
[13]
C. R. Pedlar, J. Newell, and N. A. Lewis, “Blood biomarker profiling and monitoring for high-performance physiology and nutri- tion: current perspectives, limitations and recommendations,”Sports Medicine, Vol. 49, No. Suppl 2, pp. 185–198, 2019. [Online]: https://doi.org/10.1007/s40279-019-01158-x 14
-
[14]
A. G. Souglis, G. C. Bogdanis, C. Chryssan- thopoulos, N. Apostolidis, and N. D. Geladas, “Position-specific biomarker responses to match vs. VAMEVAL test modalities in elite female soccer players,”Cogent Engi- neering, Vol. 11, No. 1, p. 2331188, 2024. [Online]: https://doi.org/10.1080/23311886. 2024.2447399
-
[15]
R. Gonz´ alez-Martos, J. Galeano, C. Ramirez- Castillejo, N. Gusi, E. Gesteiro, G. Vicente- Rodriguez, I. Ara, and A. Guadalupe-Grau, “Unsupervised clustering of biochemical markers reveals health profiles associ- ated with function and survival in active aging,”Scientific Reports, Vol. 15, No. 1, p. 30546, 2025. [Online]: https: //doi.org/10.1038/s41598-...
-
[16]
D. Popczyk, “Classifying soccer players based on physical capacities and match- specific running performance using machine learning,”Journal of Sports Science and Medicine, Vol. 24, pp. 1–12, 2025. [Online]: https://doi.org/10.52082/jssm.2025.764
-
[17]
J. F. Hair, W. C. Black, B. J. Babin, and R. E. Anderson,Multivariate Data Analy- sis, 8th Edt. Hampshire, Cengage Learning, 2019
2019
-
[18]
Synthetic data in biomedicine via generative artificial intel- ligence,
B. van Breugel, T. Liu, D. Oglic, and M. van der Schaar, “Synthetic data in biomedicine via generative artificial intel- ligence,”Nature Reviews Bioengineering, Vol. 2, No. 12, pp. 991–1004, 2024. [Online]: https://doi.org/10.1038/s44222-024-00245-7
-
[19]
Review of generative ai for synthetic data generation: a health- care perspective,
H. M. Waseem, S. U. Islam, N. Matragkas, G. Epiphaniou, T. N. Arvanitis, and C. Maple, “Review of generative ai for synthetic data generation: a health- care perspective,”Artificial Intelligence Review, 2025. [Online]: https://doi.org/10. 1007/s10462-025-11440-2
2025
-
[20]
R. Reale, G. Slater, and L. M. Burke, “Indi- vidualised dietary strategies for Olympic combat sports: Acute weight loss, recov- ery and competition nutrition,”European Journal of Sport Science, Vol. 17, No. 6, pp. 727–740, 2024. [Online]: https: //doi.org/10.1080/17461391.2017.1297489 15
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.