pith. sign in

arxiv: 2509.24954 · v1 · submitted 2025-09-29 · 🌌 astro-ph.HE · astro-ph.IM· astro-ph.SR

Stellar flare detection in XMM-Newton with gradient boosted trees

Pith reviewed 2026-05-18 12:38 UTC · model grok-4.3

classification 🌌 astro-ph.HE astro-ph.IMastro-ph.SR
keywords stellar flaresXMM-Newtongradient boostingmachine learningX-ray light curvesflare detectioncatalog
0
0 comments X

The pith

A gradient boosted classifier identifies stellar flares in XMM-Newton data at 97.1 percent accuracy and releases the largest catalog.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that gradient boosted trees can classify stellar flares in X-ray light curves using a rich set of 108 features. Trained on 80 percent of 13,851 visually labeled sources, the model achieves 97.1 percent accuracy, 82.4 percent precision and 73.3 percent recall on the test set and beats both a simple flare template fit and a classifier limited to model-independent features. By running the model on the remaining unlabeled sources the authors produce and release what they describe as the largest catalog of X-ray stellar flares. Sympathetic readers would value this because it turns a large but unlabeled archive into a usable resource for studying stellar magnetic activity and because explainable AI tools clarify which light-curve properties matter most.

Core claim

We trained a gradient boosting classifier on 108 features from XMM-Newton light curves of variable sources. Using 80 percent of the 13,851 manually labeled examples we obtained 97.1 percent accuracy, 82.4 percent precision and 73.3 percent recall on the held-out 20 percent. The model outperforms a flare-template criterion and a version using only model-independent features. We then applied the classifier to the unlabeled sources and release the resulting catalog as the largest collection of X-ray stellar flares to date.

What carries the argument

Gradient boosted trees ensemble trained on 108 light-curve features and interpreted with SHAP values plus permutation importance scores.

If this is right

  • Stellar flare detection can be scaled to the full EXTraS database and future X-ray surveys without exhaustive visual inspection.
  • Feature importance analysis reveals which light-curve properties best indicate flares, guiding future observational strategies.
  • The catalog enables population studies of flare rates and energies across different stellar types.
  • False-positive analysis suggests the method captures flares from sources lacking obvious optical counterparts, potentially revealing new flare populations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar supervised learning pipelines could be applied to light curves from other high-energy missions to build cross-calibrated flare catalogs.
  • Combining the X-ray classifier with simultaneous multi-wavelength data could improve precision by confirming stellar origins of flares.
  • Retraining the model on newly labeled data from citizen-science or follow-up observations would further reduce false negatives for complex flare shapes.

Load-bearing premise

The manual visual inspection labels used as ground truth accurately distinguish stellar flares from other variability.

What would settle it

A large fraction of mismatches when the released catalog is cross-matched against independent optical flare detections from ground-based telescopes would show the classifier is unreliable.

Figures

Figures reproduced from arXiv: 2509.24954 by Andrea Belfiore, Andrea De Luca, Andrea Tiengo, Gaia Carenini, Mario Pasquato, Martino Marelli, Paolo Esposito, Ruben Salvaterra.

Figure 1
Figure 1. Figure 1: Histograms of the number of bins (Upper Panel) and mean counts per bin (Lower Panel) of our sample of 13, 851 light curves. The areas of histograms are normalized to 1. In blue, we show the light curves labeled as "not flaring" and in red the "flaring" ones. positive class: 𝑓 (0) = log 1−𝑝 𝑝  , 𝑝 = #positives #samples . (1) At this stage no tree is grown; the model outputs the same prob￾ability for all ob… view at source ↗
Figure 2
Figure 2. Figure 2: Precision-recall curves for setting a cutoff on [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: UMAP embedding calculated on the important features [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: ICE plot for F_NSIGMA_FLCON. Curves for 100 ran￾domly chosen sources are shown. Non-flare sources are shown in gray, flares in cerulean blue. At the bottom a rug plot shows the actual values of the feature taken on by flares (cerulean blue) and non-flares (gray). already suspect flares. This can be understood as an example of interaction between features. Article number, page 7 of 15 [PITH_FULL_IMAGE:figu… view at source ↗
Figure 6
Figure 6. Figure 6: ICE plot for MEDMAXOFF. Curves for 100 randomly chosen sources are shown. Non-flare sources are shown in gray, flares in cerulean blue. At the bottom a rug plot shows the actual values of the feature taken on by flares (cerulean blue) and non￾flares (gray). 4.3. Understanding misclassified instances Our classifier misclassifies 81 LC out of 2771. It is natural to wonder why such misclassification occurs. I… view at source ↗
Figure 8
Figure 8. Figure 8: Shapley values for a paradigmatic false positive source, [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Shapley values for false negative source [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 9
Figure 9. Figure 9: Light curves for false negative sources 0728560301/4, 0841320101/2, 0822200101/6. The variability of the first LC has been ascribed to three random flares; the second LC shows a feature around t∼22 ks; the third LC shows a probable short (∼1 ks) flare at t∼95 ks. T0 is the time of the first photon of the observation [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Histogram of the modulus of Galactic latitude (in de [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
read the original abstract

The EXTraS project, based on data collected with the XMM-Newton observatory, provided us with a vast amount of light curves for X-ray sources. For each light curve, EXTraS also provided us with a set of features (https://extras.inaf.it). We extract from the EXTraS database a tabular dataset of 31,832 variable sources by 108 features. Of these, 13,851 sources were manually labeled as stellar flares or non-flares based on direct visual inspection. We employ a supervised learning approach to produce a catalog of stellar flares based on our dataset, releasing it to the community. We leverage explainable AI tools and interpretable features to better understand our classifier. We train a gradient boosting classifier on 80\% of the data for which labels are available. We compute permutation feature importance scores, visualize feature space using UMAP, and analyze some false positive and false negative data points with the help of Shapley additive explanations -- an AI explainability technique used to measure the importance of each feature in determining the classifier's prediction for each instance. On the test set made up of the remainder 20\% of our labeled data, we obtain an accuracy of 97.1\%, with a precision of 82.4\% and a recall of 73.3\%. Our classifier outperforms a simple criterion based on fitting the light curve with a flare template and significantly surpasses a gradient-boosted classifier trained only on model-independent features. False positives appear related to flaring light curves that are not associated with a stellar counterpart, while false negatives often correspond to multiple flares or otherwise peculiar or noisy curves. We apply our trained classifier to currently unlabeled sources, releasing the largest catalog of X-ray stellar flares to date. [abridged]

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript applies gradient boosted trees to detect stellar flares in X-ray light curves from the EXTraS project on XMM-Newton. From 31,832 variable sources with 108 features, 13,851 are manually labeled via visual inspection as flares or non-flares. A classifier is trained on 80% of the labeled data and evaluated on the held-out 20%, yielding 97.1% accuracy, 82.4% precision and 73.3% recall. It outperforms a flare-template fit and a gradient-boosted model using only model-independent features. SHAP, UMAP and permutation importance are used for interpretability, and the model is applied to unlabeled sources to release the largest X-ray stellar-flare catalog to date.

Significance. If the visual labels are reliable and the performance generalizes, the work supplies a scalable, interpretable tool for mining large X-ray surveys for stellar flares and directly delivers a community catalog. The explicit comparison to a template baseline and the use of XAI methods to link predictions to physical features are strengths that increase scientific utility beyond black-box classification.

major comments (2)
  1. The 13,851 visual labels constitute the sole ground truth. The manuscript notes false negatives on multiple flares or noisy/peculiar curves but provides no inter-rater agreement, blinding protocol, or independent labeling comparison. Because every reported metric (82.4% precision, 73.3% recall) and the downstream catalog rest on these labels, the absence of label-quality validation is load-bearing for the central performance claim.
  2. Model-training section: no description is given of class-imbalance handling, hyper-parameter search procedure, or whether the 80/20 split was stratified. These omissions prevent assessment of whether the reported outperformance over the template-fit and model-independent-feature baselines is robust or merely an artifact of the particular training configuration.
minor comments (2)
  1. Abstract: the phrase 'abridged' appears at the end; confirm whether the provided text is the complete abstract or whether additional sentences were omitted.
  2. Figures: the UMAP embedding and SHAP summary plots would benefit from explicit legends indicating class colors and feature names to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough and constructive review of our manuscript. We address each of the major comments in detail below and have made revisions to the manuscript to improve clarity and address the concerns where possible.

read point-by-point responses
  1. Referee: The 13,851 visual labels constitute the sole ground truth. The manuscript notes false negatives on multiple flares or noisy/peculiar curves but provides no inter-rater agreement, blinding protocol, or independent labeling comparison. Because every reported metric (82.4% precision, 73.3% recall) and the downstream catalog rest on these labels, the absence of label-quality validation is load-bearing for the central performance claim.

    Authors: We agree that the quality and reliability of the visual labels are fundamental to our performance metrics and the released catalog. The labels were assigned by a single experienced researcher through systematic visual inspection of the light curves, focusing on the characteristic rapid rise and decay profiles typical of stellar flares in X-ray data. We did not implement a multi-rater agreement study or blinding protocol, primarily due to the substantial time required for such validation on a dataset of this size. To address this, we have added a dedicated paragraph in the Data Labeling subsection detailing the labeling criteria, providing representative examples of both flare and non-flare light curves, and explicitly discussing potential biases and uncertainties in the labels. We have also added a limitations section noting that future work could benefit from independent verification of a subset of labels. While this does not fully resolve the issue, we believe these additions provide greater transparency. revision: yes

  2. Referee: Model-training section: no description is given of class-imbalance handling, hyper-parameter search procedure, or whether the 80/20 split was stratified. These omissions prevent assessment of whether the reported outperformance over the template-fit and model-independent-feature baselines is robust or merely an artifact of the particular training configuration.

    Authors: We appreciate this observation and acknowledge that the original manuscript lacked sufficient detail on the training procedure. In practice, we utilized the XGBoost implementation with its default hyperparameters, as preliminary tests indicated robust performance without the need for extensive optimization. Regarding class imbalance, the labeled dataset contains approximately 25% flares and 75% non-flares; we did not apply explicit balancing techniques such as SMOTE or class weighting, relying instead on the algorithm's built-in handling. The 80/20 train-test split was performed using a random seed but was not explicitly stratified; however, post-hoc checks confirm that the class proportions are preserved within 1% in both sets. We have revised the manuscript to include a new subsection on 'Model Training and Validation' that specifies the exact hyperparameters, class distribution, split method, and includes results from a 5-fold stratified cross-validation to demonstrate the stability of the performance metrics. These changes should enable a better assessment of the robustness of our comparisons to the baseline methods. revision: yes

Circularity Check

0 steps flagged

No circularity: performance metrics derive from held-out test split on externally labeled data

full rationale

The paper extracts 31,832 sources with 108 features from the EXTraS database, manually labels 13,851 via visual inspection as ground truth, trains a gradient-boosted classifier on an 80% split, and reports accuracy/precision/recall on the independent 20% test set. These metrics are standard supervised-learning evaluations against fixed external labels and do not reduce to any model parameter or fitted quantity by construction. Baseline comparisons (flare-template fitting and model-independent features) are likewise external. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps for the core claims. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the quality of manual visual labels as ground truth and on the assumption that the 108 features capture sufficient information to distinguish flares without additional physical modeling.

free parameters (1)
  • Gradient boosting hyperparameters
    Hyperparameters of the gradient boosted trees model are chosen or tuned on the training portion of the labeled data.
axioms (1)
  • domain assumption Manual visual labels accurately identify stellar flares without significant subjectivity or error.
    Supervised learning performance metrics depend directly on these labels serving as correct ground truth.

pith-pipeline@v0.9.0 · 5883 in / 1408 out tokens · 59634 ms · 2026-05-18T12:38:45.601958+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 6 internal anchors

  1. [1]

    2010, Bioinformatics, 26, 1340

    Altmann, A., Tolo s i, L., Sander, O., & Lengauer, T. 2010, Bioinformatics, 26, 1340

  2. [2]

    Bevington , P. R. 1969, Data reduction and error analysis for the physical sciences

  3. [3]

    2020, Fairlearn: A toolkit for assessing and improving fairness in AI , Tech

    Bird, S., Dud \'i k, M., Edgar, R., et al. 2020, Fairlearn: A toolkit for assessing and improving fairness in AI , Tech. Rep. MSR-TR-2020-32, Microsoft

  4. [4]

    & Haas, C

    Caton, S. & Haas, C. 2024, ACM Comput. Surv., 56

  5. [5]

    SMOTE: Synthetic Minority Over-sampling Technique

    Chawla , N. V., Bowyer , K. W., Hall , L. O., & Kegelmeyer , W. P. 2011, arXiv e-prints, arXiv:1106.1813

  6. [6]

    XGBoost: A Scalable Tree Boosting System

    Chen , T. & Guestrin , C. 2016, arXiv e-prints, arXiv:1603.02754

  7. [7]

    L., Salvaterra , R., et al

    De Luca , A., Israel , G. L., Salvaterra , R., et al. 2022, in Memorie della Societa Astronomica Italiana, Vol. 93, 122

  8. [8]

    2021, , 650, A167

    De Luca , A., Salvaterra , R., Belfiore , A., et al. 2021, , 650, A167

  9. [9]

    J., et al

    De Luca , A., Stelzer , B., Burgasser , A. J., et al. 2020, , 634, L13

  10. [10]

    Horseshoes in multidimensional scaling and local kernel methods

    Diaconis , P., Goel , S., & Holmes , S. 2008, arXiv e-prints, arXiv:0811.1477

  11. [11]

    R., Soria , R., Stefano , R

    Dillmann , S., Mart \' nez-Galarza , J. R., Soria , R., Stefano , R. D., & Kashyap , V. L. 2025, , 537, 931

  12. [12]

    A., Murphy , T., & Lo , K

    Farrell , S. A., Murphy , T., & Lo , K. K. 2015, , 813, 28

  13. [13]

    2000, The annals of statistics, 28, 337

    Friedman, J., Hastie, T., & Tibshirani, R. 2000, The annals of statistics, 28, 337

  14. [14]

    Friedman, J. H. 2001, Annals of statistics, 1189

  15. [15]

    Gaia Collaboration , Prusti , T., de Bruijne , J. H. J., et al. 2016, , 595, A1

  16. [16]

    Gaia Collaboration , Vallenari , A., Brown , A. G. A., et al. 2023, , 674, A1

  17. [17]

    2019, Science robotics, 4, eaay7120

    Gunning, D., Stefik, M., Choi, J., et al. 2019, Science robotics, 4, eaay7120

  18. [18]

    & Tibshirani, R

    Hastie, T. & Tibshirani, R. 1986, Statistical Science, 1, 297

  19. [19]

    2023, arXiv e-prints, arXiv:2310.12528

    Huppenkothen , D., Ntampaka , M., Ho , M., et al. 2023, arXiv e-prints, arXiv:2310.12528

  20. [20]

    2022, , 659, A66

    Kova c evi \'c , M., Pasquato , M., Marelli , M., et al. 2022, , 659, A66

  21. [21]

    Kowalski , A. F. 2024, Living Reviews in Solar Physics, 21, 1

  22. [22]

    Lemaitre , G., Nogueira , F., & Aridas , C. K. 2016, arXiv e-prints, arXiv:1609.06570

  23. [23]

    A., & Barret , D

    Lin , D., Webb , N. A., & Barret , D. 2012, , 756, 27

  24. [24]

    K., Farrell , S., Murphy , T., & Gaensler , B

    Lo , K. K., Farrell , S., Murphy , T., & Gaensler , B. M. 2014, , 786, 20

  25. [25]

    Lundberg, S. M. & Lee, S.-I. 2017, Advances in neural information processing systems, 30

  26. [26]

    2018, , 866, 125

    Marelli , M., De Martino , D., Mereghetti , S., et al. 2018, , 866, 125

  27. [27]

    2017, , 851, L27

    Marelli , M., Tiengo , A., De Luca , A., et al. 2017, , 851, L27

  28. [28]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    McInnes , L., Healy , J., & Melville , J. 2018, arXiv e-prints, arXiv:1802.03426

  29. [29]

    2018, , 616, A36

    Mereghetti , S., De Luca , A., Salvetti , D., et al. 2018, , 616, A36

  30. [30]

    J., et al

    Mushotzky , R., Aird , J., Barger , A. J., et al. 2019, in Bulletin of the American Astronomical Society, Vol. 51, 107

  31. [31]

    The Hot and Energetic Universe: A White Paper presenting the science theme motivating the Athena+ mission

    Nandra , K., Barret , D., Barcons , X., et al. 2013, arXiv e-prints, arXiv:1306.2307

  32. [32]

    K., Bird , A

    Orwat-Kapola , J. K., Bird , A. J., Hill , A. B., Altamirano , D., & Huppenkothen , D. 2022, , 509, 1269

  33. [33]

    2024, , 965, 89

    Pasquato , M., Trevisan , P., Askar , A., et al. 2024, , 965, 89

  34. [34]

    2011, Journal of Machine Learning Research, 12, 2825

    Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of Machine Learning Research, 12, 2825

  35. [35]

    S., Mart \' nez-Galarza , J

    P \'e rez-D \' az , V. S., Mart \' nez-Galarza , J. R., Caicedo , A., & D'Abrusco , R. 2024, , 528, 4852

  36. [36]

    2016, , 587, A36

    Pizzocaro , D., Stelzer , B., Paladini , R., et al. 2016, , 587, A36

  37. [37]

    P., Rosen , S., Fyfe , D., & Schr \"o der , A

    Pye , J. P., Rosen , S., Fyfe , D., & Schr \"o der , A. C. 2015, , 581, A28

  38. [38]

    E., Jonker , P

    Quirola-V \'a squez , J., Bauer , F. E., Jonker , P. G., et al. 2023, , 675, A44

  39. [39]

    E., Jonker , P

    Quirola-V \'a squez , J., Bauer , F. E., Jonker , P. G., et al. 2022, , 663, A168

  40. [40]

    W., Starr , D

    Richards , J. W., Starr , D. L., Butler , N. R., et al. 2011, , 733, 10

  41. [41]

    J., Steiner , J

    Ricketts , B. J., Steiner , J. F., Garraffo , C., Remillard , R. A., & Huppenkothen , D. 2023, , 523, 1946

  42. [42]

    Rijsbergen, C. v. 1979, Information retrieval (Butterworth-Heinemann)

  43. [43]

    2024, , 527, 3674

    Ruiz , A., Georgakakis , A., Georgantopoulos , I., et al. 2024, , 527, 3674

  44. [44]

    D., Norris , J

    Scargle , J. D., Norris , J. P., Jackson , B., & Chiang , J. 2013, , 764, 167

  45. [45]

    Shapley , L. S. 1953, in The Shapley Value (Princeton University Press, Princeton)

  46. [46]

    & Armon , A

    Shwartz-Ziv , R. & Armon , A. 2021, arXiv e-prints, arXiv:2106.03253

  47. [47]

    A., Belfiore , A., et al

    Sidoli , L., Postnov , K. A., Belfiore , A., et al. 2019, , 487, 420

  48. [48]

    A., Martinez-Galarza , J

    Song , Y., Villar , V. A., Martinez-Galarza , J. R., & Dillmann , S. 2025, arXiv e-prints, arXiv:2502.01627

  49. [49]

    2001, , 365, L18

    Str \"u der , L., Briel , U., Dennerl , K., et al. 2001, , 365, L18

  50. [50]

    Turner , M. J. L., Abbey , A., Arnaud , M., et al. 2001, , 365, L27

  51. [51]

    N., Zhu , S

    Yang , G., Brandt , W. N., Zhu , S. F., et al. 2019, , 487, 4721

  52. [52]

    2024, , 971, 180

    Yang , H., Hare , J., & Kargaltsev , O. 2024, , 971, 180

  53. [53]

    2021, , 503, 5263

    Zhang , Y., Zhao , Y., & Wu , X.-B. 2021, , 503, 5263

  54. [54]

    2024, Research in Astronomy and Astrophysics, 24, 085016

    Zuo , X., Tao , Y., Liu , Y., et al. 2024, Research in Astronomy and Astrophysics, 24, 085016