Stellar flare detection in XMM-Newton with gradient boosted trees

Andrea Belfiore; Andrea De Luca; Andrea Tiengo; Gaia Carenini; Mario Pasquato; Martino Marelli; Paolo Esposito; Ruben Salvaterra

arxiv: 2509.24954 · v1 · submitted 2025-09-29 · 🌌 astro-ph.HE · astro-ph.IM· astro-ph.SR

Stellar flare detection in XMM-Newton with gradient boosted trees

Mario Pasquato , Martino Marelli , Andrea De Luca , Ruben Salvaterra , Gaia Carenini , Andrea Belfiore , Andrea Tiengo , Paolo Esposito This is my paper

Pith reviewed 2026-05-18 12:38 UTC · model grok-4.3

classification 🌌 astro-ph.HE astro-ph.IMastro-ph.SR

keywords stellar flaresXMM-Newtongradient boostingmachine learningX-ray light curvesflare detectioncatalog

0 comments

The pith

A gradient boosted classifier identifies stellar flares in XMM-Newton data at 97.1 percent accuracy and releases the largest catalog.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that gradient boosted trees can classify stellar flares in X-ray light curves using a rich set of 108 features. Trained on 80 percent of 13,851 visually labeled sources, the model achieves 97.1 percent accuracy, 82.4 percent precision and 73.3 percent recall on the test set and beats both a simple flare template fit and a classifier limited to model-independent features. By running the model on the remaining unlabeled sources the authors produce and release what they describe as the largest catalog of X-ray stellar flares. Sympathetic readers would value this because it turns a large but unlabeled archive into a usable resource for studying stellar magnetic activity and because explainable AI tools clarify which light-curve properties matter most.

Core claim

We trained a gradient boosting classifier on 108 features from XMM-Newton light curves of variable sources. Using 80 percent of the 13,851 manually labeled examples we obtained 97.1 percent accuracy, 82.4 percent precision and 73.3 percent recall on the held-out 20 percent. The model outperforms a flare-template criterion and a version using only model-independent features. We then applied the classifier to the unlabeled sources and release the resulting catalog as the largest collection of X-ray stellar flares to date.

What carries the argument

Gradient boosted trees ensemble trained on 108 light-curve features and interpreted with SHAP values plus permutation importance scores.

If this is right

Stellar flare detection can be scaled to the full EXTraS database and future X-ray surveys without exhaustive visual inspection.
Feature importance analysis reveals which light-curve properties best indicate flares, guiding future observational strategies.
The catalog enables population studies of flare rates and energies across different stellar types.
False-positive analysis suggests the method captures flares from sources lacking obvious optical counterparts, potentially revealing new flare populations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar supervised learning pipelines could be applied to light curves from other high-energy missions to build cross-calibrated flare catalogs.
Combining the X-ray classifier with simultaneous multi-wavelength data could improve precision by confirming stellar origins of flares.
Retraining the model on newly labeled data from citizen-science or follow-up observations would further reduce false negatives for complex flare shapes.

Load-bearing premise

The manual visual inspection labels used as ground truth accurately distinguish stellar flares from other variability.

What would settle it

A large fraction of mismatches when the released catalog is cross-matched against independent optical flare detections from ground-based telescopes would show the classifier is unreliable.

Figures

Figures reproduced from arXiv: 2509.24954 by Andrea Belfiore, Andrea De Luca, Andrea Tiengo, Gaia Carenini, Mario Pasquato, Martino Marelli, Paolo Esposito, Ruben Salvaterra.

**Figure 1.** Figure 1: Histograms of the number of bins (Upper Panel) and mean counts per bin (Lower Panel) of our sample of 13, 851 light curves. The areas of histograms are normalized to 1. In blue, we show the light curves labeled as "not flaring" and in red the "flaring" ones. positive class: 𝑓 (0) = log 1−𝑝 𝑝 , 𝑝 = #positives #samples . (1) At this stage no tree is grown; the model outputs the same probability for all ob… view at source ↗

**Figure 2.** Figure 2: Precision-recall curves for setting a cutoff on [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: UMAP embedding calculated on the important features [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: ICE plot for F_NSIGMA_FLCON. Curves for 100 randomly chosen sources are shown. Non-flare sources are shown in gray, flares in cerulean blue. At the bottom a rug plot shows the actual values of the feature taken on by flares (cerulean blue) and non-flares (gray). already suspect flares. This can be understood as an example of interaction between features. Article number, page 7 of 15 [PITH_FULL_IMAGE:figu… view at source ↗

**Figure 6.** Figure 6: ICE plot for MEDMAXOFF. Curves for 100 randomly chosen sources are shown. Non-flare sources are shown in gray, flares in cerulean blue. At the bottom a rug plot shows the actual values of the feature taken on by flares (cerulean blue) and nonflares (gray). 4.3. Understanding misclassified instances Our classifier misclassifies 81 LC out of 2771. It is natural to wonder why such misclassification occurs. I… view at source ↗

**Figure 8.** Figure 8: Shapley values for a paradigmatic false positive source, [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 10.** Figure 10: Shapley values for false negative source [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 9.** Figure 9: Light curves for false negative sources 0728560301/4, 0841320101/2, 0822200101/6. The variability of the first LC has been ascribed to three random flares; the second LC shows a feature around t∼22 ks; the third LC shows a probable short (∼1 ks) flare at t∼95 ks. T0 is the time of the first photon of the observation [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 11.** Figure 11: Histogram of the modulus of Galactic latitude (in de [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗

read the original abstract

The EXTraS project, based on data collected with the XMM-Newton observatory, provided us with a vast amount of light curves for X-ray sources. For each light curve, EXTraS also provided us with a set of features (https://extras.inaf.it). We extract from the EXTraS database a tabular dataset of 31,832 variable sources by 108 features. Of these, 13,851 sources were manually labeled as stellar flares or non-flares based on direct visual inspection. We employ a supervised learning approach to produce a catalog of stellar flares based on our dataset, releasing it to the community. We leverage explainable AI tools and interpretable features to better understand our classifier. We train a gradient boosting classifier on 80\% of the data for which labels are available. We compute permutation feature importance scores, visualize feature space using UMAP, and analyze some false positive and false negative data points with the help of Shapley additive explanations -- an AI explainability technique used to measure the importance of each feature in determining the classifier's prediction for each instance. On the test set made up of the remainder 20\% of our labeled data, we obtain an accuracy of 97.1\%, with a precision of 82.4\% and a recall of 73.3\%. Our classifier outperforms a simple criterion based on fitting the light curve with a flare template and significantly surpasses a gradient-boosted classifier trained only on model-independent features. False positives appear related to flaring light curves that are not associated with a stellar counterpart, while false negatives often correspond to multiple flares or otherwise peculiar or noisy curves. We apply our trained classifier to currently unlabeled sources, releasing the largest catalog of X-ray stellar flares to date. [abridged]

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies gradient boosting to XMM-Newton light curves with visual labels, beats a template baseline, and releases the largest X-ray stellar flare catalog so far.

read the letter

The main thing to know is that the authors trained a gradient-boosted classifier on 108 EXTraS features from 13,851 visually labeled sources and applied it to produce a new catalog of X-ray flares. On the 20% held-out test set they report 97.1% accuracy, 82.4% precision, and 73.3% recall, which beats both a simple flare-template fit and a gradient booster using only model-independent features. They add permutation importance, UMAP embeddings, and SHAP explanations to show which features drive decisions and to inspect false positives and negatives. Releasing the catalog is the concrete output that others can actually use for statistical studies of stellar activity.

Referee Report

2 major / 2 minor

Summary. The manuscript applies gradient boosted trees to detect stellar flares in X-ray light curves from the EXTraS project on XMM-Newton. From 31,832 variable sources with 108 features, 13,851 are manually labeled via visual inspection as flares or non-flares. A classifier is trained on 80% of the labeled data and evaluated on the held-out 20%, yielding 97.1% accuracy, 82.4% precision and 73.3% recall. It outperforms a flare-template fit and a gradient-boosted model using only model-independent features. SHAP, UMAP and permutation importance are used for interpretability, and the model is applied to unlabeled sources to release the largest X-ray stellar-flare catalog to date.

Significance. If the visual labels are reliable and the performance generalizes, the work supplies a scalable, interpretable tool for mining large X-ray surveys for stellar flares and directly delivers a community catalog. The explicit comparison to a template baseline and the use of XAI methods to link predictions to physical features are strengths that increase scientific utility beyond black-box classification.

major comments (2)

The 13,851 visual labels constitute the sole ground truth. The manuscript notes false negatives on multiple flares or noisy/peculiar curves but provides no inter-rater agreement, blinding protocol, or independent labeling comparison. Because every reported metric (82.4% precision, 73.3% recall) and the downstream catalog rest on these labels, the absence of label-quality validation is load-bearing for the central performance claim.
Model-training section: no description is given of class-imbalance handling, hyper-parameter search procedure, or whether the 80/20 split was stratified. These omissions prevent assessment of whether the reported outperformance over the template-fit and model-independent-feature baselines is robust or merely an artifact of the particular training configuration.

minor comments (2)

Abstract: the phrase 'abridged' appears at the end; confirm whether the provided text is the complete abstract or whether additional sentences were omitted.
Figures: the UMAP embedding and SHAP summary plots would benefit from explicit legends indicating class colors and feature names to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough and constructive review of our manuscript. We address each of the major comments in detail below and have made revisions to the manuscript to improve clarity and address the concerns where possible.

read point-by-point responses

Referee: The 13,851 visual labels constitute the sole ground truth. The manuscript notes false negatives on multiple flares or noisy/peculiar curves but provides no inter-rater agreement, blinding protocol, or independent labeling comparison. Because every reported metric (82.4% precision, 73.3% recall) and the downstream catalog rest on these labels, the absence of label-quality validation is load-bearing for the central performance claim.

Authors: We agree that the quality and reliability of the visual labels are fundamental to our performance metrics and the released catalog. The labels were assigned by a single experienced researcher through systematic visual inspection of the light curves, focusing on the characteristic rapid rise and decay profiles typical of stellar flares in X-ray data. We did not implement a multi-rater agreement study or blinding protocol, primarily due to the substantial time required for such validation on a dataset of this size. To address this, we have added a dedicated paragraph in the Data Labeling subsection detailing the labeling criteria, providing representative examples of both flare and non-flare light curves, and explicitly discussing potential biases and uncertainties in the labels. We have also added a limitations section noting that future work could benefit from independent verification of a subset of labels. While this does not fully resolve the issue, we believe these additions provide greater transparency. revision: yes
Referee: Model-training section: no description is given of class-imbalance handling, hyper-parameter search procedure, or whether the 80/20 split was stratified. These omissions prevent assessment of whether the reported outperformance over the template-fit and model-independent-feature baselines is robust or merely an artifact of the particular training configuration.

Authors: We appreciate this observation and acknowledge that the original manuscript lacked sufficient detail on the training procedure. In practice, we utilized the XGBoost implementation with its default hyperparameters, as preliminary tests indicated robust performance without the need for extensive optimization. Regarding class imbalance, the labeled dataset contains approximately 25% flares and 75% non-flares; we did not apply explicit balancing techniques such as SMOTE or class weighting, relying instead on the algorithm's built-in handling. The 80/20 train-test split was performed using a random seed but was not explicitly stratified; however, post-hoc checks confirm that the class proportions are preserved within 1% in both sets. We have revised the manuscript to include a new subsection on 'Model Training and Validation' that specifies the exact hyperparameters, class distribution, split method, and includes results from a 5-fold stratified cross-validation to demonstrate the stability of the performance metrics. These changes should enable a better assessment of the robustness of our comparisons to the baseline methods. revision: yes

Circularity Check

0 steps flagged

No circularity: performance metrics derive from held-out test split on externally labeled data

full rationale

The paper extracts 31,832 sources with 108 features from the EXTraS database, manually labels 13,851 via visual inspection as ground truth, trains a gradient-boosted classifier on an 80% split, and reports accuracy/precision/recall on the independent 20% test set. These metrics are standard supervised-learning evaluations against fixed external labels and do not reduce to any model parameter or fitted quantity by construction. Baseline comparisons (flare-template fitting and model-independent features) are likewise external. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps for the core claims. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality of manual visual labels as ground truth and on the assumption that the 108 features capture sufficient information to distinguish flares without additional physical modeling.

free parameters (1)

Gradient boosting hyperparameters
Hyperparameters of the gradient boosted trees model are chosen or tuned on the training portion of the labeled data.

axioms (1)

domain assumption Manual visual labels accurately identify stellar flares without significant subjectivity or error.
Supervised learning performance metrics depend directly on these labels serving as correct ground truth.

pith-pipeline@v0.9.0 · 5883 in / 1408 out tokens · 59634 ms · 2026-05-18T12:38:45.601958+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We train a gradient boosting classifier on 80% of the data... On the test set... accuracy of 97.1%, precision of 82.4% and recall of 73.3%.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 6 internal anchors

[1]

2010, Bioinformatics, 26, 1340

Altmann, A., Tolo s i, L., Sander, O., & Lengauer, T. 2010, Bioinformatics, 26, 1340

work page 2010
[2]

Bevington , P. R. 1969, Data reduction and error analysis for the physical sciences

work page 1969
[3]

2020, Fairlearn: A toolkit for assessing and improving fairness in AI , Tech

Bird, S., Dud \'i k, M., Edgar, R., et al. 2020, Fairlearn: A toolkit for assessing and improving fairness in AI , Tech. Rep. MSR-TR-2020-32, Microsoft

work page 2020
[4]

& Haas, C

Caton, S. & Haas, C. 2024, ACM Comput. Surv., 56

work page 2024
[5]

SMOTE: Synthetic Minority Over-sampling Technique

Chawla , N. V., Bowyer , K. W., Hall , L. O., & Kegelmeyer , W. P. 2011, arXiv e-prints, arXiv:1106.1813

work page internal anchor Pith review Pith/arXiv arXiv 2011
[6]

XGBoost: A Scalable Tree Boosting System

Chen , T. & Guestrin , C. 2016, arXiv e-prints, arXiv:1603.02754

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

L., Salvaterra , R., et al

De Luca , A., Israel , G. L., Salvaterra , R., et al. 2022, in Memorie della Societa Astronomica Italiana, Vol. 93, 122

work page 2022
[8]

2021, , 650, A167

De Luca , A., Salvaterra , R., Belfiore , A., et al. 2021, , 650, A167

work page 2021
[9]

J., et al

De Luca , A., Stelzer , B., Burgasser , A. J., et al. 2020, , 634, L13

work page 2020
[10]

Horseshoes in multidimensional scaling and local kernel methods

Diaconis , P., Goel , S., & Holmes , S. 2008, arXiv e-prints, arXiv:0811.1477

work page internal anchor Pith review Pith/arXiv arXiv 2008
[11]

R., Soria , R., Stefano , R

Dillmann , S., Mart \' nez-Galarza , J. R., Soria , R., Stefano , R. D., & Kashyap , V. L. 2025, , 537, 931

work page 2025
[12]

A., Murphy , T., & Lo , K

Farrell , S. A., Murphy , T., & Lo , K. K. 2015, , 813, 28

work page 2015
[13]

2000, The annals of statistics, 28, 337

Friedman, J., Hastie, T., & Tibshirani, R. 2000, The annals of statistics, 28, 337

work page 2000
[14]

Friedman, J. H. 2001, Annals of statistics, 1189

work page 2001
[15]

Gaia Collaboration , Prusti , T., de Bruijne , J. H. J., et al. 2016, , 595, A1

work page 2016
[16]

Gaia Collaboration , Vallenari , A., Brown , A. G. A., et al. 2023, , 674, A1

work page 2023
[17]

2019, Science robotics, 4, eaay7120

Gunning, D., Stefik, M., Choi, J., et al. 2019, Science robotics, 4, eaay7120

work page 2019
[18]

& Tibshirani, R

Hastie, T. & Tibshirani, R. 1986, Statistical Science, 1, 297

work page 1986
[19]

2023, arXiv e-prints, arXiv:2310.12528

Huppenkothen , D., Ntampaka , M., Ho , M., et al. 2023, arXiv e-prints, arXiv:2310.12528

work page arXiv 2023
[20]

2022, , 659, A66

Kova c evi \'c , M., Pasquato , M., Marelli , M., et al. 2022, , 659, A66

work page 2022
[21]

Kowalski , A. F. 2024, Living Reviews in Solar Physics, 21, 1

work page 2024
[22]

Lemaitre , G., Nogueira , F., & Aridas , C. K. 2016, arXiv e-prints, arXiv:1609.06570

work page internal anchor Pith review Pith/arXiv arXiv 2016
[23]

A., & Barret , D

Lin , D., Webb , N. A., & Barret , D. 2012, , 756, 27

work page 2012
[24]

K., Farrell , S., Murphy , T., & Gaensler , B

Lo , K. K., Farrell , S., Murphy , T., & Gaensler , B. M. 2014, , 786, 20

work page 2014
[25]

Lundberg, S. M. & Lee, S.-I. 2017, Advances in neural information processing systems, 30

work page 2017
[26]

2018, , 866, 125

Marelli , M., De Martino , D., Mereghetti , S., et al. 2018, , 866, 125

work page 2018
[27]

2017, , 851, L27

Marelli , M., Tiengo , A., De Luca , A., et al. 2017, , 851, L27

work page 2017
[28]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

McInnes , L., Healy , J., & Melville , J. 2018, arXiv e-prints, arXiv:1802.03426

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

2018, , 616, A36

Mereghetti , S., De Luca , A., Salvetti , D., et al. 2018, , 616, A36

work page 2018
[30]

J., et al

Mushotzky , R., Aird , J., Barger , A. J., et al. 2019, in Bulletin of the American Astronomical Society, Vol. 51, 107

work page 2019
[31]

The Hot and Energetic Universe: A White Paper presenting the science theme motivating the Athena+ mission

Nandra , K., Barret , D., Barcons , X., et al. 2013, arXiv e-prints, arXiv:1306.2307

work page internal anchor Pith review Pith/arXiv arXiv 2013
[32]

K., Bird , A

Orwat-Kapola , J. K., Bird , A. J., Hill , A. B., Altamirano , D., & Huppenkothen , D. 2022, , 509, 1269

work page 2022
[33]

2024, , 965, 89

Pasquato , M., Trevisan , P., Askar , A., et al. 2024, , 965, 89

work page 2024
[34]

2011, Journal of Machine Learning Research, 12, 2825

Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of Machine Learning Research, 12, 2825

work page 2011
[35]

S., Mart \' nez-Galarza , J

P \'e rez-D \' az , V. S., Mart \' nez-Galarza , J. R., Caicedo , A., & D'Abrusco , R. 2024, , 528, 4852

work page 2024
[36]

2016, , 587, A36

Pizzocaro , D., Stelzer , B., Paladini , R., et al. 2016, , 587, A36

work page 2016
[37]

P., Rosen , S., Fyfe , D., & Schr \"o der , A

Pye , J. P., Rosen , S., Fyfe , D., & Schr \"o der , A. C. 2015, , 581, A28

work page 2015
[38]

E., Jonker , P

Quirola-V \'a squez , J., Bauer , F. E., Jonker , P. G., et al. 2023, , 675, A44

work page 2023
[39]

E., Jonker , P

Quirola-V \'a squez , J., Bauer , F. E., Jonker , P. G., et al. 2022, , 663, A168

work page 2022
[40]

W., Starr , D

Richards , J. W., Starr , D. L., Butler , N. R., et al. 2011, , 733, 10

work page 2011
[41]

J., Steiner , J

Ricketts , B. J., Steiner , J. F., Garraffo , C., Remillard , R. A., & Huppenkothen , D. 2023, , 523, 1946

work page 2023
[42]

Rijsbergen, C. v. 1979, Information retrieval (Butterworth-Heinemann)

work page 1979
[43]

2024, , 527, 3674

Ruiz , A., Georgakakis , A., Georgantopoulos , I., et al. 2024, , 527, 3674

work page 2024
[44]

D., Norris , J

Scargle , J. D., Norris , J. P., Jackson , B., & Chiang , J. 2013, , 764, 167

work page 2013
[45]

Shapley , L. S. 1953, in The Shapley Value (Princeton University Press, Princeton)

work page 1953
[46]

& Armon , A

Shwartz-Ziv , R. & Armon , A. 2021, arXiv e-prints, arXiv:2106.03253

work page arXiv 2021
[47]

A., Belfiore , A., et al

Sidoli , L., Postnov , K. A., Belfiore , A., et al. 2019, , 487, 420

work page 2019
[48]

A., Martinez-Galarza , J

Song , Y., Villar , V. A., Martinez-Galarza , J. R., & Dillmann , S. 2025, arXiv e-prints, arXiv:2502.01627

work page arXiv 2025
[49]

2001, , 365, L18

Str \"u der , L., Briel , U., Dennerl , K., et al. 2001, , 365, L18

work page 2001
[50]

Turner , M. J. L., Abbey , A., Arnaud , M., et al. 2001, , 365, L27

work page 2001
[51]

N., Zhu , S

Yang , G., Brandt , W. N., Zhu , S. F., et al. 2019, , 487, 4721

work page 2019
[52]

2024, , 971, 180

Yang , H., Hare , J., & Kargaltsev , O. 2024, , 971, 180

work page 2024
[53]

2021, , 503, 5263

Zhang , Y., Zhao , Y., & Wu , X.-B. 2021, , 503, 5263

work page 2021
[54]

2024, Research in Astronomy and Astrophysics, 24, 085016

Zuo , X., Tao , Y., Liu , Y., et al. 2024, Research in Astronomy and Astrophysics, 24, 085016

work page 2024

[1] [1]

2010, Bioinformatics, 26, 1340

Altmann, A., Tolo s i, L., Sander, O., & Lengauer, T. 2010, Bioinformatics, 26, 1340

work page 2010

[2] [2]

Bevington , P. R. 1969, Data reduction and error analysis for the physical sciences

work page 1969

[3] [3]

2020, Fairlearn: A toolkit for assessing and improving fairness in AI , Tech

Bird, S., Dud \'i k, M., Edgar, R., et al. 2020, Fairlearn: A toolkit for assessing and improving fairness in AI , Tech. Rep. MSR-TR-2020-32, Microsoft

work page 2020

[4] [4]

& Haas, C

Caton, S. & Haas, C. 2024, ACM Comput. Surv., 56

work page 2024

[5] [5]

SMOTE: Synthetic Minority Over-sampling Technique

Chawla , N. V., Bowyer , K. W., Hall , L. O., & Kegelmeyer , W. P. 2011, arXiv e-prints, arXiv:1106.1813

work page internal anchor Pith review Pith/arXiv arXiv 2011

[6] [6]

XGBoost: A Scalable Tree Boosting System

Chen , T. & Guestrin , C. 2016, arXiv e-prints, arXiv:1603.02754

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [7]

L., Salvaterra , R., et al

De Luca , A., Israel , G. L., Salvaterra , R., et al. 2022, in Memorie della Societa Astronomica Italiana, Vol. 93, 122

work page 2022

[8] [8]

2021, , 650, A167

De Luca , A., Salvaterra , R., Belfiore , A., et al. 2021, , 650, A167

work page 2021

[9] [9]

J., et al

De Luca , A., Stelzer , B., Burgasser , A. J., et al. 2020, , 634, L13

work page 2020

[10] [10]

Horseshoes in multidimensional scaling and local kernel methods

Diaconis , P., Goel , S., & Holmes , S. 2008, arXiv e-prints, arXiv:0811.1477

work page internal anchor Pith review Pith/arXiv arXiv 2008

[11] [11]

R., Soria , R., Stefano , R

Dillmann , S., Mart \' nez-Galarza , J. R., Soria , R., Stefano , R. D., & Kashyap , V. L. 2025, , 537, 931

work page 2025

[12] [12]

A., Murphy , T., & Lo , K

Farrell , S. A., Murphy , T., & Lo , K. K. 2015, , 813, 28

work page 2015

[13] [13]

2000, The annals of statistics, 28, 337

Friedman, J., Hastie, T., & Tibshirani, R. 2000, The annals of statistics, 28, 337

work page 2000

[14] [14]

Friedman, J. H. 2001, Annals of statistics, 1189

work page 2001

[15] [15]

Gaia Collaboration , Prusti , T., de Bruijne , J. H. J., et al. 2016, , 595, A1

work page 2016

[16] [16]

Gaia Collaboration , Vallenari , A., Brown , A. G. A., et al. 2023, , 674, A1

work page 2023

[17] [17]

2019, Science robotics, 4, eaay7120

Gunning, D., Stefik, M., Choi, J., et al. 2019, Science robotics, 4, eaay7120

work page 2019

[18] [18]

& Tibshirani, R

Hastie, T. & Tibshirani, R. 1986, Statistical Science, 1, 297

work page 1986

[19] [19]

2023, arXiv e-prints, arXiv:2310.12528

Huppenkothen , D., Ntampaka , M., Ho , M., et al. 2023, arXiv e-prints, arXiv:2310.12528

work page arXiv 2023

[20] [20]

2022, , 659, A66

Kova c evi \'c , M., Pasquato , M., Marelli , M., et al. 2022, , 659, A66

work page 2022

[21] [21]

Kowalski , A. F. 2024, Living Reviews in Solar Physics, 21, 1

work page 2024

[22] [22]

Lemaitre , G., Nogueira , F., & Aridas , C. K. 2016, arXiv e-prints, arXiv:1609.06570

work page internal anchor Pith review Pith/arXiv arXiv 2016

[23] [23]

A., & Barret , D

Lin , D., Webb , N. A., & Barret , D. 2012, , 756, 27

work page 2012

[24] [24]

K., Farrell , S., Murphy , T., & Gaensler , B

Lo , K. K., Farrell , S., Murphy , T., & Gaensler , B. M. 2014, , 786, 20

work page 2014

[25] [25]

Lundberg, S. M. & Lee, S.-I. 2017, Advances in neural information processing systems, 30

work page 2017

[26] [26]

2018, , 866, 125

Marelli , M., De Martino , D., Mereghetti , S., et al. 2018, , 866, 125

work page 2018

[27] [27]

2017, , 851, L27

Marelli , M., Tiengo , A., De Luca , A., et al. 2017, , 851, L27

work page 2017

[28] [28]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

McInnes , L., Healy , J., & Melville , J. 2018, arXiv e-prints, arXiv:1802.03426

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

2018, , 616, A36

Mereghetti , S., De Luca , A., Salvetti , D., et al. 2018, , 616, A36

work page 2018

[30] [30]

J., et al

Mushotzky , R., Aird , J., Barger , A. J., et al. 2019, in Bulletin of the American Astronomical Society, Vol. 51, 107

work page 2019

[31] [31]

The Hot and Energetic Universe: A White Paper presenting the science theme motivating the Athena+ mission

Nandra , K., Barret , D., Barcons , X., et al. 2013, arXiv e-prints, arXiv:1306.2307

work page internal anchor Pith review Pith/arXiv arXiv 2013

[32] [32]

K., Bird , A

Orwat-Kapola , J. K., Bird , A. J., Hill , A. B., Altamirano , D., & Huppenkothen , D. 2022, , 509, 1269

work page 2022

[33] [33]

2024, , 965, 89

Pasquato , M., Trevisan , P., Askar , A., et al. 2024, , 965, 89

work page 2024

[34] [34]

2011, Journal of Machine Learning Research, 12, 2825

Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of Machine Learning Research, 12, 2825

work page 2011

[35] [35]

S., Mart \' nez-Galarza , J

P \'e rez-D \' az , V. S., Mart \' nez-Galarza , J. R., Caicedo , A., & D'Abrusco , R. 2024, , 528, 4852

work page 2024

[36] [36]

2016, , 587, A36

Pizzocaro , D., Stelzer , B., Paladini , R., et al. 2016, , 587, A36

work page 2016

[37] [37]

P., Rosen , S., Fyfe , D., & Schr \"o der , A

Pye , J. P., Rosen , S., Fyfe , D., & Schr \"o der , A. C. 2015, , 581, A28

work page 2015

[38] [38]

E., Jonker , P

Quirola-V \'a squez , J., Bauer , F. E., Jonker , P. G., et al. 2023, , 675, A44

work page 2023

[39] [39]

E., Jonker , P

Quirola-V \'a squez , J., Bauer , F. E., Jonker , P. G., et al. 2022, , 663, A168

work page 2022

[40] [40]

W., Starr , D

Richards , J. W., Starr , D. L., Butler , N. R., et al. 2011, , 733, 10

work page 2011

[41] [41]

J., Steiner , J

Ricketts , B. J., Steiner , J. F., Garraffo , C., Remillard , R. A., & Huppenkothen , D. 2023, , 523, 1946

work page 2023

[42] [42]

Rijsbergen, C. v. 1979, Information retrieval (Butterworth-Heinemann)

work page 1979

[43] [43]

2024, , 527, 3674

Ruiz , A., Georgakakis , A., Georgantopoulos , I., et al. 2024, , 527, 3674

work page 2024

[44] [44]

D., Norris , J

Scargle , J. D., Norris , J. P., Jackson , B., & Chiang , J. 2013, , 764, 167

work page 2013

[45] [45]

Shapley , L. S. 1953, in The Shapley Value (Princeton University Press, Princeton)

work page 1953

[46] [46]

& Armon , A

Shwartz-Ziv , R. & Armon , A. 2021, arXiv e-prints, arXiv:2106.03253

work page arXiv 2021

[47] [47]

A., Belfiore , A., et al

Sidoli , L., Postnov , K. A., Belfiore , A., et al. 2019, , 487, 420

work page 2019

[48] [48]

A., Martinez-Galarza , J

Song , Y., Villar , V. A., Martinez-Galarza , J. R., & Dillmann , S. 2025, arXiv e-prints, arXiv:2502.01627

work page arXiv 2025

[49] [49]

2001, , 365, L18

Str \"u der , L., Briel , U., Dennerl , K., et al. 2001, , 365, L18

work page 2001

[50] [50]

Turner , M. J. L., Abbey , A., Arnaud , M., et al. 2001, , 365, L27

work page 2001

[51] [51]

N., Zhu , S

Yang , G., Brandt , W. N., Zhu , S. F., et al. 2019, , 487, 4721

work page 2019

[52] [52]

2024, , 971, 180

Yang , H., Hare , J., & Kargaltsev , O. 2024, , 971, 180

work page 2024

[53] [53]

2021, , 503, 5263

Zhang , Y., Zhao , Y., & Wu , X.-B. 2021, , 503, 5263

work page 2021

[54] [54]

2024, Research in Astronomy and Astrophysics, 24, 085016

Zuo , X., Tao , Y., Liu , Y., et al. 2024, Research in Astronomy and Astrophysics, 24, 085016

work page 2024