Variability classification of TESS targets in LOPS2, the first long-term pointing field of PLATO. Version 1 of the public variability catalogue
Pith reviewed 2026-05-10 14:19 UTC · model grok-4.3
The pith
Machine learning on 38 million TESS light curves identifies 3.6 million candidate variable stars in PLATO's LOPS2 field.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We classified 38 million calibrated aperture light curves from the TESS-Gaia Light Curve pipeline for 6 million unique sources in LOPS2 with two machine learning frameworks -- a deep neural network and a feature-based gradient-boosted decision-tree ensemble. We combined their predictions to create this first version of the LOPS2 variability catalogue, performed manual vetting of a sub-sample of classified light curves, and a statistical analysis of the results to validate our methodology and to assess the variability properties and parameters of the stars in the catalogue. Our classification resulted in the identification of approximately 72% of the light curves having dominant instrument- 0
What carries the argument
Combined predictions from a deep neural network and a feature-based gradient-boosted decision-tree ensemble, followed by manual vetting of a subsample.
If this is right
- Filtering candidates on colour, luminosity, dominant frequency, amplitude, and proximity of neighbours increases sample purity.
- Candidate pulsators display a wide range of frequencies, amplitudes, rotation rates, and stellar parameters.
- The released catalogue supplies one of the largest automated variability lists for immediate use by the community.
- The same two-framework approach can be applied to future TESS sectors that overlap PLATO fields.
Where Pith is reading between the lines
- The catalogue could serve as a target list for PLATO Guest Observer proposals focused on variable-star science.
- Similar classification pipelines might be tested on upcoming wide-field surveys to handle even larger data volumes.
- Discrepancies between the neural network and tree ensemble outputs could highlight specific artifact types worth separate study.
Load-bearing premise
The combined machine-learning predictions after manual vetting of a subsample reliably separate genuine stellar variability from TESS pipeline artifacts across the entire set of 38 million light curves.
What would settle it
Independent variability measurements from a different instrument or survey on a statistically significant random sample of the 3.6 million candidates would show whether the reported 28 percent fraction matches the true rate of detectable stellar variability.
Figures
read the original abstract
The PLAnetary Transits and Oscillations of stars (PLATO) mission is expected to launch in January 2027. A total of 8\% of its data rate will be dedicated to complementary science targets selected from approved Guest Observer proposals. We seek to provide an open-source catalogue of variable stars in PLATO's first long-term observing field, LOPS2. We want to use existing observations from the Transiting Exoplanet Survey Satellite (TESS), which has observed many stars in LOPS2. We classified 38 million calibrated aperture light curves from the TESS-Gaia Light Curve pipeline (TGLC, $G\lesssim17$) for 6 million unique sources in LOPS2 with two machine learning frameworks -- a deep neural network and a feature-based gradient-boosted decision-tree ensemble. We combined their predictions to create this first version of the LOPS2 variability catalogue, performed manual vetting of a sub-sample classified light curves, and a statistical analysis of the results to validate our methodology and to assess the variability properties and parameters of the stars in the catalogue. Our classification resulted in the identification of approximately 72% of the light curves having dominant instrument- or pipeline-induced signal, with the remaining 28% representing 3.6 million individual candidate variable stars, including pulsating, rotating, and eclipsing stars. Candidate pulsators exhibit varied behaviour in terms of their frequencies, amplitudes, rotation, and fundamental parameters. To ensure purity of the samples, filtering on colour, luminosity, the dominant frequency and its amplitude, and presence of close neighbours is helpful. We provide the first version of our PLATO LOPS2 variability catalogue to the community for further study and scrutiny. It is to date one of the largest catalogues of variable stars from an automated classification pipeline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the creation of the first version of a public variability catalogue for PLATO's LOPS2 field. Using 38 million TESS-Gaia Light Curve (TGLC) aperture light curves for 6 million sources, the authors apply two machine-learning frameworks—a deep neural network and a feature-based gradient-boosted decision-tree ensemble—combine their predictions, perform manual vetting on a sub-sample, and conduct statistical analysis. This yields a classification in which ~72% of light curves are dominated by instrument- or pipeline-induced signals and the remaining 28% (~3.6 million candidates) are flagged as pulsating, rotating, or eclipsing variables. The catalogue is released publicly with suggestions for purity filters based on colour, luminosity, frequency, amplitude, and neighbours.
Significance. If the reported classification fractions prove robust, the work would deliver one of the largest public variability catalogues derived from an automated pipeline, directly supporting complementary-science target selection for PLATO's first long-term field. The combination of two independent ML frameworks, manual vetting, and statistical checks is a constructive approach, and the public release plus explicit purity-filter recommendations add practical value. The significance is currently limited by the absence of quantitative performance metrics that would allow readers to gauge uncertainty in the headline 72%/28% split.
major comments (2)
- [Section 3 (Machine Learning Classification and Validation)] The headline result that 28% of the 38 million light curves are genuine variable-star candidates (and thus 3.6 million objects) rests on the assumption that the combined DNN + GBT predictions generalize reliably to the full TGLC set. The manuscript describes model training, output combination, sub-sample manual vetting, and statistical checks, but supplies no precision, recall, confusion matrix, or agreement statistics on a large, representative held-out test set that spans the observed range of TESS systematics (scattered light, momentum dumps, etc.) and the full diversity of variable classes. Without these numbers, even modest per-class error rates under the reported class imbalance can shift the reported fractions by hundreds of thousands of objects.
- [Section 4 (Results)] The generalization step from the manually vetted sub-sample to the entire 38 million light curves is not accompanied by any quantitative uncertainty estimate. The central claim of the catalogue therefore lacks the error bars or sensitivity analysis that would be required to assess how robust the 72%/28% division is to plausible variations in model performance.
minor comments (3)
- [Abstract] The abstract quotes approximate percentages and an integer count (3.6 million); providing the exact counts or ranges with any available uncertainty would improve precision.
- [Figure captions] Several example light-curve figures would benefit from explicit labels indicating the final assigned variability class and the dominant frequency/amplitude values used in the statistical analysis.
- [Section 3] A short table summarizing the exact training/validation split sizes, hyper-parameter choices, and any agreement metric between the DNN and GBT outputs would aid reproducibility.
Simulated Author's Rebuttal
We appreciate the referee's comments highlighting the need for more rigorous quantitative validation of our machine learning classifications. We have revised the manuscript to include additional performance metrics and uncertainty analyses as detailed in the point-by-point responses below.
read point-by-point responses
-
Referee: [Section 3 (Machine Learning Classification and Validation)] The headline result that 28% of the 38 million light curves are genuine variable-star candidates (and thus 3.6 million objects) rests on the assumption that the combined DNN + GBT predictions generalize reliably to the full TGLC set. The manuscript describes model training, output combination, sub-sample manual vetting, and statistical checks, but supplies no precision, recall, confusion matrix, or agreement statistics on a large, representative held-out test set that spans the observed range of TESS systematics (scattered light, momentum dumps, etc.) and the full diversity of variable classes. Without these numbers, even modest per-class error rates under the reported class imbalance can shift the reported fractions by hundreds of thousands of objects.
Authors: We acknowledge this limitation in the current version of the manuscript. While we performed manual vetting on a sub-sample and conducted statistical checks, we did not include a comprehensive held-out test set evaluation spanning all systematics. In the revised manuscript, we will add precision, recall, and a confusion matrix derived from the cross-validation during model training, as well as the agreement statistics between the DNN and GBT on the full dataset. We will also discuss the challenges in creating a fully representative test set for TESS data. These additions will help quantify the potential impact of misclassifications on the reported fractions. revision: yes
-
Referee: [Section 4 (Results)] The generalization step from the manually vetted sub-sample to the entire 38 million light curves is not accompanied by any quantitative uncertainty estimate. The central claim of the catalogue therefore lacks the error bars or sensitivity analysis that would be required to assess how robust the 72%/28% division is to plausible variations in model performance.
Authors: We agree that the manuscript would be strengthened by quantitative uncertainty estimates. In the revised version, we will include a sensitivity analysis varying the model combination parameters and report the resulting variation in the 28% fraction. We will also provide uncertainty estimates based on the vetted sub-sample proportions and discuss potential biases from the class imbalance. This will allow readers to better gauge the robustness of the headline results. revision: yes
Circularity Check
No significant circularity; classification applies standard ML models to external TESS data.
full rationale
The paper trains a DNN and GBT ensemble on variability patterns from TESS light curves, combines outputs, performs manual vetting on a sub-sample, and applies the result to the full 38M set to report the 72%/28% split. No step reduces a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction; the output fractions are direct consequences of the trained classifiers on independent observations rather than a redefinition or tautological renaming of inputs. The pipeline remains self-contained against external benchmarks with no load-bearing self-referential definitions or ansatz smuggling.
Axiom & Free-Parameter Ledger
free parameters (1)
- ML ensemble combination rules
axioms (2)
- domain assumption TGLC provides calibrated aperture light curves that faithfully capture stellar signals after removal of instrumental effects
- domain assumption Models trained on known variable stars generalize to classify variability in new TESS observations of the LOPS2 field
Forward citations
Cited by 1 Pith paper
-
Plato's view on supermassive black hole binaries: Exploring the faint limit of ESA's Plato space mission
Simulations show Plato can recover relativistic photometric signatures of supermassive black hole binaries in bright quasars (G≤18) via Bayesian inference on mock light curves.
Reference graph
Works this paper leans on
- [1]
- [2]
-
[3]
Aerts, C., Christensen-Dalsgaard, J., & Kurtz, D. W. 2010, Asteroseismology (Springer Science & Business Media)
work page 2010
-
[4]
Aerts, C., Molenberghs, G., & De Ridder, J. 2023, A&A, 672, A183
work page 2023
- [5]
-
[6]
Aerts, C. & Tkachenko, A. 2024, in 8th TESS/15th Kepler Asteroseismic Sci- ence Consortium Workshop, 22
work page 2024
-
[7]
Aerts, C., Van Reeth, T., Mombarg, J. S., & Hey, D. 2025, A&A, 695, A214
work page 2025
-
[8]
Antoci, V ., Cantiello, M., Khalack, V ., et al. 2025, A&A, 696, A111
work page 2025
- [9]
-
[10]
2025, Ap&SS, 370, 72 Article number, page 12 of 18 M
Audenaert, J. 2025, Ap&SS, 370, 72 Article number, page 12 of 18 M. Kliapets et al.: PLATO LOPS2 Variability Catalogue, Version1
work page 2025
-
[11]
Audenaert, J., Kuszlewicz, J. S., Handberg, R., et al. 2021, AJ, 162, 209
work page 2021
-
[12]
Audenaert, J., Muthukrishna, D., Gregory, P. F. X., Hogg, D. W., & Villar, V . A. 2025, in 1st ICML Workshop on Foundation Models for Structured Data
work page 2025
- [13]
-
[14]
Audenaert, J., Tkachenko, A., Skarka, M., Eschen, Y . N. E., & Muthukrishna, D. 2024, in 8th TESS/15th Kepler Asteroseismic Science Consortium Workshop, 47
work page 2024
-
[15]
Barac, N., Bedding, T. R., Murphy, S. J., & Hey, D. R. 2022, MNRAS, 516, 2080
work page 2022
-
[16]
Baran, A. S., Sahoo, S. K., Sanjayan, S., & Ostrowski, J. 2021, MNRAS, 503, 3828
work page 2021
-
[17]
Barbara, N. H., Bedding, T. R., Fulcher, B. D., Murphy, S. J., & Van Reeth, T. 2022, MNRAS, 514, 2793
work page 2022
-
[18]
Bedding, T. R., Murphy, S. J., Crawford, C., et al. 2023, ApJ, 946, L10
work page 2023
-
[19]
J., Koch, D., Basri, G., et al
Borucki, W. J., Koch, D., Basri, G., et al. 2010, Science, 327, 977
work page 2010
-
[20]
Bowman, D. M. & Bugnet, L. 2026, in Encyclopedia of Astrophysics (First Edi- tion), first edition edn., ed. I. Mandel (Oxford: Elsevier), 133–153
work page 2026
-
[21]
Bowman, D. M., Kurtz, D. W., Breger, M., Murphy, S. J., & Holdsworth, D. L. 2016, MNRAS, 460, 1970
work page 2016
- [22]
-
[23]
Burssens, S., Bowman, D. M., Michielsen, M., et al. 2023, Nat. Astron., 7, 913
work page 2023
- [24]
- [25]
-
[26]
Assessment of PLATO Science Performance
Cabrera, J., Rauer, H., Samadi, R., et al. 2026, Exp. Astron., submitted, arXiv:2604.04818
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[27]
Caldwell, D. A., Tenenbaum, P., Twicken, J. D., et al. 2020, RNAAS, 4, 201
work page 2020
-
[28]
Chen, T., He, T., Benesty, M., & Khotilovich, V . 2019, R version, 90, 40
work page 2019
-
[29]
2015, R package version 0.4-2, 1, 1
Chen, T., He, T., Benesty, M., et al. 2015, R package version 0.4-2, 1, 1
work page 2015
-
[30]
L., Angus, R., David, T., et al
Colman, I. L., Angus, R., David, T., et al. 2024, AJ, 167, 189
work page 2024
-
[31]
Debosscher, J., Sarro, L., Aerts, C., et al. 2007, A&A, 475, 1159
work page 2007
-
[32]
Deeg, H. & Alonso, R. 2024, Contrib. Astron. Obs. Skalnaté Pleso, 54, 142
work page 2024
-
[33]
Dong, X., Yu, Z., Cao, W., Shi, Y ., & Ma, Q. 2020, Front. Comput. Sci., 14, 241
work page 2020
-
[34]
Dupret, M.-A., Grigahcène, A., Garrido, R., Gabriel, M., & Scuflaire, R. 2004, A&A, 414, L17
work page 2004
-
[35]
Eschen, Y . N. E., Bayliss, D., Wilson, T. G., et al. 2024, MNRAS, 535, 1778
work page 2024
-
[36]
Eyer, L. & Mowlavi, N. 2008, in Journal of Physics Conference Series, V ol. 118, Journal of Physics Conference Series (IOP), 012010
work page 2008
-
[37]
Friedman, J. H. 2001, Ann. Stat., 1189
work page 2001
-
[38]
Fritzewski, D., Van Reeth, T., Aerts, C., et al. 2024, A&A, 681, A13
work page 2024
-
[39]
Fritzewski, D., Vanrespaille, M., Aerts, C., et al. 2025, A&A, 698, A253
work page 2025
-
[40]
J., Kemp, A., Li, G., & Aerts, C
Fritzewski, D. J., Kemp, A., Li, G., & Aerts, C. 2026, A&A, 706, A131 Gaia Collaboration; De Ridder, J., Ripepi, V ., Aerts, C., et al. 2023, A&A, 674, A36 Gaia Collaboration; Prusti, T., De Bruijne, J., Brown, A. G., et al. 2016, A&A, 595, A1
work page 2026
- [41]
-
[42]
Gregory, P. F. X., Audenaert, J., Kliapets, M., et al. 2026, ApJ, submitted, arXiv:2604.07437 Grigahcène, A., Antoci, V ., Balona, L., et al. 2010, ApJ, 713, L192
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[43]
Guo, C., Pleiss, G., Sun, Y ., & Weinberger, K. Q. 2017, in International confer- ence on machine learning, PMLR, 1321–1330
work page 2017
-
[44]
Hambleton, K., Degroote, P., Conroy, K., et al. 2013, EAS Publ. Ser., 64, 285
work page 2013
- [45]
- [46]
-
[47]
Hattori, S., Angus, R., Foreman-Mackey, D., Lu, Y ., & Colman, I. 2025, AJ, 170, 15
work page 2025
- [48]
-
[49]
PLATO input catalogs for technical calibration and fine guidance
Heller, R., Jiang, C., Bluhm, P., et al. 2026, Exp. Astron., submitted, arXiv:2604.02437
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [50]
-
[51]
Hey, D. R., Montet, B. T., Pope, B. J., Murphy, S. J., & Bedding, T. R. 2021, AJ, 162, 204
work page 2021
-
[52]
Higgins, M. E. & Bell, K. J. 2023, AJ, 165, 141
work page 2023
-
[53]
Huijse, P., De Ridder, J., Eyer, L., et al. 2025, A&A, 701, A150
work page 2025
-
[54]
Jannsen, N., De Ridder, J., Seynaeve, D., et al. 2024, A&A, 681, A18
work page 2024
-
[55]
Jannsen, N., Tkachenko, A., Royer, P., et al. 2025, A&A, 694, A185
work page 2025
- [56]
-
[57]
Kliapets, M., Huijse, P., Tkachenko, A., et al. 2025, A&A, 703, A240
work page 2025
-
[58]
A., Rix, H.-W., Aerts, C., et al
Kollmeier, J. A., Rix, H.-W., Aerts, C., et al. 2026, AJ, 171, 52
work page 2026
- [59]
-
[60]
Lakshminarayanan, B., Pritzel, A., & Blundell, C. 2017, NeurIPS, 30
work page 2017
-
[61]
Lecoanet, D., Bowman, D. M., & Van Reeth, T. 2022, MNRASL, 512, L16
work page 2022
- [62]
-
[63]
Lund, M. N., Handberg, R., Buzasi, D. L., et al. 2021, ApJS, 257, 53
work page 2021
-
[64]
Mercader-Perez, P., Cuesta-Lazaro, C., Muthukrishna, D., et al. 2026, in ICLR 2026 Workshop on Foundation Models for Science: Real-World Impact and Science-First Design
work page 2026
-
[65]
Mitchell, T. M. 1980, Technical Report No. CBM-TR-117
work page 1980
-
[66]
Mombarg, J. S. G., Aerts, C., Van Reeth, T., & Hey, D. 2024, A&A, 691, A131
work page 2024
- [67]
-
[68]
The PLATO Input Catalogue of targets (tPIC) for the first Long Pointing Field
Montalto, M., Piotto, G., Marrese, P. M., et al. 2026, A&A, submitted, arXiv:2604.03369
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[69]
J., Hey, D., Van Reeth, T., & Bedding, T
Murphy, S. J., Hey, D., Van Reeth, T., & Bedding, T. R. 2019, MNRAS, 485, 2380
work page 2019
-
[70]
Nascimbeni, V ., Piotto, G., Börner, A., et al. 2022, A&A, 658, A31
work page 2022
-
[71]
Nascimbeni, V ., Piotto, G., Cabrera, J., et al. 2025, A&A, 694, A313
work page 2025
-
[72]
The PLATO field selection process III. Selection of the Prime Sample for the LOPS2 field
Nascimbeni, V ., Piotto, G., Granata, V ., et al. 2026, A&A, submitted, arXiv:2604.03365 Pápics, P. I., Briquet, M., Baglin, A., et al. 2012, A&A, 542, A55
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[73]
Parmar, A., Katariya, R., & Patel, V . 2018, in International conference on in- telligent data communication technologies and internet of things, Springer, 758–763
work page 2018
-
[74]
Pedersen, M. G., Aerts, C., Pápics, P. I., et al. 2021, Nat. Astron., 5, 715
work page 2021
-
[75]
Pedersen, M. G. & Bell, K. J. 2023, AJ, 165, 239
work page 2023
-
[76]
2021, in International Conference on Space Optics—ICSO 2020, V ol
Pertenais, M., Cabrera, J., Paproth, C., et al. 2021, in International Conference on Space Optics—ICSO 2020, V ol. 11852, SPIE, 2043–2054
work page 2021
- [77]
-
[78]
2019, in Stars and their Variability Observed from Space, 465–470
Plachy, E. 2019, in Stars and their Variability Observed from Space, 465–470
work page 2019
- [79]
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.