Recognition: unknown
Variability classification of TESS targets in LOPS2, the first long-term pointing field of PLATO. Version 1 of the public variability catalogue
Pith reviewed 2026-05-10 14:19 UTC · model grok-4.3
The pith
Machine learning on 38 million TESS light curves identifies 3.6 million candidate variable stars in PLATO's LOPS2 field.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We classified 38 million calibrated aperture light curves from the TESS-Gaia Light Curve pipeline for 6 million unique sources in LOPS2 with two machine learning frameworks -- a deep neural network and a feature-based gradient-boosted decision-tree ensemble. We combined their predictions to create this first version of the LOPS2 variability catalogue, performed manual vetting of a sub-sample of classified light curves, and a statistical analysis of the results to validate our methodology and to assess the variability properties and parameters of the stars in the catalogue. Our classification resulted in the identification of approximately 72% of the light curves having dominant instrument- 0
What carries the argument
Combined predictions from a deep neural network and a feature-based gradient-boosted decision-tree ensemble, followed by manual vetting of a subsample.
If this is right
- Filtering candidates on colour, luminosity, dominant frequency, amplitude, and proximity of neighbours increases sample purity.
- Candidate pulsators display a wide range of frequencies, amplitudes, rotation rates, and stellar parameters.
- The released catalogue supplies one of the largest automated variability lists for immediate use by the community.
- The same two-framework approach can be applied to future TESS sectors that overlap PLATO fields.
Where Pith is reading between the lines
- The catalogue could serve as a target list for PLATO Guest Observer proposals focused on variable-star science.
- Similar classification pipelines might be tested on upcoming wide-field surveys to handle even larger data volumes.
- Discrepancies between the neural network and tree ensemble outputs could highlight specific artifact types worth separate study.
Load-bearing premise
The combined machine-learning predictions after manual vetting of a subsample reliably separate genuine stellar variability from TESS pipeline artifacts across the entire set of 38 million light curves.
What would settle it
Independent variability measurements from a different instrument or survey on a statistically significant random sample of the 3.6 million candidates would show whether the reported 28 percent fraction matches the true rate of detectable stellar variability.
Figures
read the original abstract
The PLAnetary Transits and Oscillations of stars (PLATO) mission is expected to launch in January 2027. A total of 8\% of its data rate will be dedicated to complementary science targets selected from approved Guest Observer proposals. We seek to provide an open-source catalogue of variable stars in PLATO's first long-term observing field, LOPS2. We want to use existing observations from the Transiting Exoplanet Survey Satellite (TESS), which has observed many stars in LOPS2. We classified 38 million calibrated aperture light curves from the TESS-Gaia Light Curve pipeline (TGLC, $G\lesssim17$) for 6 million unique sources in LOPS2 with two machine learning frameworks -- a deep neural network and a feature-based gradient-boosted decision-tree ensemble. We combined their predictions to create this first version of the LOPS2 variability catalogue, performed manual vetting of a sub-sample classified light curves, and a statistical analysis of the results to validate our methodology and to assess the variability properties and parameters of the stars in the catalogue. Our classification resulted in the identification of approximately 72% of the light curves having dominant instrument- or pipeline-induced signal, with the remaining 28% representing 3.6 million individual candidate variable stars, including pulsating, rotating, and eclipsing stars. Candidate pulsators exhibit varied behaviour in terms of their frequencies, amplitudes, rotation, and fundamental parameters. To ensure purity of the samples, filtering on colour, luminosity, the dominant frequency and its amplitude, and presence of close neighbours is helpful. We provide the first version of our PLATO LOPS2 variability catalogue to the community for further study and scrutiny. It is to date one of the largest catalogues of variable stars from an automated classification pipeline.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the creation of the first version of a public variability catalogue for PLATO's LOPS2 field. Using 38 million TESS-Gaia Light Curve (TGLC) aperture light curves for 6 million sources, the authors apply two machine-learning frameworks—a deep neural network and a feature-based gradient-boosted decision-tree ensemble—combine their predictions, perform manual vetting on a sub-sample, and conduct statistical analysis. This yields a classification in which ~72% of light curves are dominated by instrument- or pipeline-induced signals and the remaining 28% (~3.6 million candidates) are flagged as pulsating, rotating, or eclipsing variables. The catalogue is released publicly with suggestions for purity filters based on colour, luminosity, frequency, amplitude, and neighbours.
Significance. If the reported classification fractions prove robust, the work would deliver one of the largest public variability catalogues derived from an automated pipeline, directly supporting complementary-science target selection for PLATO's first long-term field. The combination of two independent ML frameworks, manual vetting, and statistical checks is a constructive approach, and the public release plus explicit purity-filter recommendations add practical value. The significance is currently limited by the absence of quantitative performance metrics that would allow readers to gauge uncertainty in the headline 72%/28% split.
major comments (2)
- [Section 3 (Machine Learning Classification and Validation)] The headline result that 28% of the 38 million light curves are genuine variable-star candidates (and thus 3.6 million objects) rests on the assumption that the combined DNN + GBT predictions generalize reliably to the full TGLC set. The manuscript describes model training, output combination, sub-sample manual vetting, and statistical checks, but supplies no precision, recall, confusion matrix, or agreement statistics on a large, representative held-out test set that spans the observed range of TESS systematics (scattered light, momentum dumps, etc.) and the full diversity of variable classes. Without these numbers, even modest per-class error rates under the reported class imbalance can shift the reported fractions by hundreds of thousands of objects.
- [Section 4 (Results)] The generalization step from the manually vetted sub-sample to the entire 38 million light curves is not accompanied by any quantitative uncertainty estimate. The central claim of the catalogue therefore lacks the error bars or sensitivity analysis that would be required to assess how robust the 72%/28% division is to plausible variations in model performance.
minor comments (3)
- [Abstract] The abstract quotes approximate percentages and an integer count (3.6 million); providing the exact counts or ranges with any available uncertainty would improve precision.
- [Figure captions] Several example light-curve figures would benefit from explicit labels indicating the final assigned variability class and the dominant frequency/amplitude values used in the statistical analysis.
- [Section 3] A short table summarizing the exact training/validation split sizes, hyper-parameter choices, and any agreement metric between the DNN and GBT outputs would aid reproducibility.
Simulated Author's Rebuttal
We appreciate the referee's comments highlighting the need for more rigorous quantitative validation of our machine learning classifications. We have revised the manuscript to include additional performance metrics and uncertainty analyses as detailed in the point-by-point responses below.
read point-by-point responses
-
Referee: [Section 3 (Machine Learning Classification and Validation)] The headline result that 28% of the 38 million light curves are genuine variable-star candidates (and thus 3.6 million objects) rests on the assumption that the combined DNN + GBT predictions generalize reliably to the full TGLC set. The manuscript describes model training, output combination, sub-sample manual vetting, and statistical checks, but supplies no precision, recall, confusion matrix, or agreement statistics on a large, representative held-out test set that spans the observed range of TESS systematics (scattered light, momentum dumps, etc.) and the full diversity of variable classes. Without these numbers, even modest per-class error rates under the reported class imbalance can shift the reported fractions by hundreds of thousands of objects.
Authors: We acknowledge this limitation in the current version of the manuscript. While we performed manual vetting on a sub-sample and conducted statistical checks, we did not include a comprehensive held-out test set evaluation spanning all systematics. In the revised manuscript, we will add precision, recall, and a confusion matrix derived from the cross-validation during model training, as well as the agreement statistics between the DNN and GBT on the full dataset. We will also discuss the challenges in creating a fully representative test set for TESS data. These additions will help quantify the potential impact of misclassifications on the reported fractions. revision: yes
-
Referee: [Section 4 (Results)] The generalization step from the manually vetted sub-sample to the entire 38 million light curves is not accompanied by any quantitative uncertainty estimate. The central claim of the catalogue therefore lacks the error bars or sensitivity analysis that would be required to assess how robust the 72%/28% division is to plausible variations in model performance.
Authors: We agree that the manuscript would be strengthened by quantitative uncertainty estimates. In the revised version, we will include a sensitivity analysis varying the model combination parameters and report the resulting variation in the 28% fraction. We will also provide uncertainty estimates based on the vetted sub-sample proportions and discuss potential biases from the class imbalance. This will allow readers to better gauge the robustness of the headline results. revision: yes
Circularity Check
No significant circularity; classification applies standard ML models to external TESS data.
full rationale
The paper trains a DNN and GBT ensemble on variability patterns from TESS light curves, combines outputs, performs manual vetting on a sub-sample, and applies the result to the full 38M set to report the 72%/28% split. No step reduces a claimed prediction or uniqueness result to a fitted parameter or self-citation by construction; the output fractions are direct consequences of the trained classifiers on independent observations rather than a redefinition or tautological renaming of inputs. The pipeline remains self-contained against external benchmarks with no load-bearing self-referential definitions or ansatz smuggling.
Axiom & Free-Parameter Ledger
free parameters (1)
- ML ensemble combination rules
axioms (2)
- domain assumption TGLC provides calibrated aperture light curves that faithfully capture stellar signals after removal of instrumental effects
- domain assumption Models trained on known variable stars generalize to classify variability in new TESS observations of the LOPS2 field
Forward citations
Cited by 1 Pith paper
-
Plato's view on supermassive black hole binaries: Exploring the faint limit of ESA's Plato space mission
Simulations show Plato can recover relativistic photometric signatures of supermassive black hole binaries in bright quasars (G≤18) via Bayesian inference on mock light curves.
Reference graph
Works this paper leans on
-
[1]
2021, Rev
Aerts, C. 2021, Rev. Mod. Phys, 93, 015001
2021
-
[2]
2025, A&A, 704, A332
Aerts, C. 2025, A&A, 704, A332
2025
-
[3]
Aerts, C., Christensen-Dalsgaard, J., & Kurtz, D. W. 2010, Asteroseismology (Springer Science & Business Media)
2010
-
[4]
2023, A&A, 672, A183
Aerts, C., Molenberghs, G., & De Ridder, J. 2023, A&A, 672, A183
2023
-
[5]
& Tkachenko, A
Aerts, C. & Tkachenko, A. 2024, A&A, 692, R1
2024
-
[6]
& Tkachenko, A
Aerts, C. & Tkachenko, A. 2024, in 8th TESS/15th Kepler Asteroseismic Sci- ence Consortium Workshop, 22
2024
-
[7]
S., & Hey, D
Aerts, C., Van Reeth, T., Mombarg, J. S., & Hey, D. 2025, A&A, 695, A214
2025
-
[8]
2025, A&A, 696, A111
Antoci, V ., Cantiello, M., Khalack, V ., et al. 2025, A&A, 696, A111
2025
-
[9]
M., Lim, P
Astropy, C., Price-Whelan, A. M., Lim, P. L., et al. 2022, ApJ, 935
2022
-
[10]
2025, Ap&SS, 370, 72 Article number, page 12 of 18 M
Audenaert, J. 2025, Ap&SS, 370, 72 Article number, page 12 of 18 M. Kliapets et al.: PLATO LOPS2 Variability Catalogue, Version1
2025
-
[11]
S., Handberg, R., et al
Audenaert, J., Kuszlewicz, J. S., Handberg, R., et al. 2021, AJ, 162, 209
2021
-
[12]
Audenaert, J., Muthukrishna, D., Gregory, P. F. X., Hogg, D. W., & Villar, V . A. 2025, in 1st ICML Workshop on Foundation Models for Structured Data
2025
-
[13]
& Tkachenko, A
Audenaert, J. & Tkachenko, A. 2022, A&A, 666, A76
2022
-
[14]
Audenaert, J., Tkachenko, A., Skarka, M., Eschen, Y . N. E., & Muthukrishna, D. 2024, in 8th TESS/15th Kepler Asteroseismic Science Consortium Workshop, 47
2024
-
[15]
R., Murphy, S
Barac, N., Bedding, T. R., Murphy, S. J., & Hey, D. R. 2022, MNRAS, 516, 2080
2022
-
[16]
S., Sahoo, S
Baran, A. S., Sahoo, S. K., Sanjayan, S., & Ostrowski, J. 2021, MNRAS, 503, 3828
2021
-
[17]
H., Bedding, T
Barbara, N. H., Bedding, T. R., Fulcher, B. D., Murphy, S. J., & Van Reeth, T. 2022, MNRAS, 514, 2793
2022
-
[18]
R., Murphy, S
Bedding, T. R., Murphy, S. J., Crawford, C., et al. 2023, ApJ, 946, L10
2023
-
[19]
J., Koch, D., Basri, G., et al
Borucki, W. J., Koch, D., Basri, G., et al. 2010, Science, 327, 977
2010
-
[20]
Bowman, D. M. & Bugnet, L. 2026, in Encyclopedia of Astrophysics (First Edi- tion), first edition edn., ed. I. Mandel (Oxford: Elsevier), 133–153
2026
-
[21]
M., Kurtz, D
Bowman, D. M., Kurtz, D. W., Breger, M., Murphy, S. J., & Holdsworth, D. L. 2016, MNRAS, 460, 1970
2016
-
[22]
2001, Mach
Breiman, L. 2001, Mach. Learn., 45, 5
2001
-
[23]
M., Michielsen, M., et al
Burssens, S., Bowman, D. M., Michielsen, M., et al. 2023, Nat. Astron., 7, 913
2023
-
[24]
M., et al
Buysschaert, B., Aerts, C., Bowman, D. M., et al. 2018, A&A, 616, A148
2018
-
[25]
2018, Astron
Cabral, J., Sánchez, B., Ramos, F., et al. 2018, Astron. Comput
2018
-
[26]
Assessment of PLATO Science Performance
Cabrera, J., Rauer, H., Samadi, R., et al. 2026, Exp. Astron., submitted, arXiv:2604.04818
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[27]
A., Tenenbaum, P., Twicken, J
Caldwell, D. A., Tenenbaum, P., Twicken, J. D., et al. 2020, RNAAS, 4, 201
2020
-
[28]
2019, R version, 90, 40
Chen, T., He, T., Benesty, M., & Khotilovich, V . 2019, R version, 90, 40
2019
-
[29]
2015, R package version 0.4-2, 1, 1
Chen, T., He, T., Benesty, M., et al. 2015, R package version 0.4-2, 1, 1
2015
-
[30]
L., Angus, R., David, T., et al
Colman, I. L., Angus, R., David, T., et al. 2024, AJ, 167, 189
2024
-
[31]
2007, A&A, 475, 1159
Debosscher, J., Sarro, L., Aerts, C., et al. 2007, A&A, 475, 1159
2007
-
[32]
& Alonso, R
Deeg, H. & Alonso, R. 2024, Contrib. Astron. Obs. Skalnaté Pleso, 54, 142
2024
-
[33]
2020, Front
Dong, X., Yu, Z., Cao, W., Shi, Y ., & Ma, Q. 2020, Front. Comput. Sci., 14, 241
2020
-
[34]
2004, A&A, 414, L17
Dupret, M.-A., Grigahcène, A., Garrido, R., Gabriel, M., & Scuflaire, R. 2004, A&A, 414, L17
2004
-
[35]
Eschen, Y . N. E., Bayliss, D., Wilson, T. G., et al. 2024, MNRAS, 535, 1778
2024
-
[36]
& Mowlavi, N
Eyer, L. & Mowlavi, N. 2008, in Journal of Physics Conference Series, V ol. 118, Journal of Physics Conference Series (IOP), 012010
2008
-
[37]
Friedman, J. H. 2001, Ann. Stat., 1189
2001
-
[38]
2024, A&A, 681, A13
Fritzewski, D., Van Reeth, T., Aerts, C., et al. 2024, A&A, 681, A13
2024
-
[39]
2025, A&A, 698, A253
Fritzewski, D., Vanrespaille, M., Aerts, C., et al. 2025, A&A, 698, A253
2025
-
[40]
J., Kemp, A., Li, G., & Aerts, C
Fritzewski, D. J., Kemp, A., Li, G., & Aerts, C. 2026, A&A, 706, A131 Gaia Collaboration; De Ridder, J., Ripepi, V ., Aerts, C., et al. 2023, A&A, 674, A36 Gaia Collaboration; Prusti, T., De Bruijne, J., Brown, A. G., et al. 2016, A&A, 595, A1
2026
-
[41]
2020, Nat
Geirhos, R., Jacobsen, J.-H., Michaelis, C., et al. 2020, Nat. Mach. Intell., 2, 665
2020
-
[42]
Gregory, P. F. X., Audenaert, J., Kliapets, M., et al. 2026, ApJ, submitted, arXiv:2604.07437 Grigahcène, A., Antoci, V ., Balona, L., et al. 2010, ApJ, 713, L192
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[43]
Guo, C., Pleiss, G., Sun, Y ., & Weinberger, K. Q. 2017, in International confer- ence on machine learning, PMLR, 1321–1330
2017
-
[44]
2013, EAS Publ
Hambleton, K., Degroote, P., Conroy, K., et al. 2013, EAS Publ. Ser., 64, 285
2013
-
[45]
& Brandt, T
Han, T. & Brandt, T. D. 2023, AJ, 165, 71
2023
-
[46]
N., White, T
Handberg, R., Lund, M. N., White, T. R., et al. 2021, AJ, 162, 170
2021
-
[47]
2025, AJ, 170, 15
Hattori, S., Angus, R., Foreman-Mackey, D., Lu, Y ., & Colman, I. 2025, AJ, 170, 15
2025
-
[48]
2009, ARA&A, 47, 211
Heber, U. 2009, ARA&A, 47, 211
2009
-
[49]
PLATO input catalogs for technical calibration and fine guidance
Heller, R., Jiang, C., Bluhm, P., et al. 2026, Exp. Astron., submitted, arXiv:2604.02437
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[50]
& Aerts, C
Hey, D. & Aerts, C. 2024, A&A, 688, A93
2024
-
[51]
R., Montet, B
Hey, D. R., Montet, B. T., Pope, B. J., Murphy, S. J., & Bedding, T. R. 2021, AJ, 162, 204
2021
-
[52]
Higgins, M. E. & Bell, K. J. 2023, AJ, 165, 141
2023
-
[53]
2025, A&A, 701, A150
Huijse, P., De Ridder, J., Eyer, L., et al. 2025, A&A, 701, A150
2025
-
[54]
2024, A&A, 681, A18
Jannsen, N., De Ridder, J., Seynaeve, D., et al. 2024, A&A, 681, A18
2024
-
[55]
2025, A&A, 694, A185
Jannsen, N., Tkachenko, A., Royer, P., et al. 2025, A&A, 694, A185
2025
-
[56]
S., et al
Kemp, A., Vrancken, J., Mombarg, J. S., et al. 2025, A&A, 704, A280
2025
-
[57]
2025, A&A, 703, A240
Kliapets, M., Huijse, P., Tkachenko, A., et al. 2025, A&A, 703, A240
2025
-
[58]
A., Rix, H.-W., Aerts, C., et al
Kollmeier, J. A., Rix, H.-W., Aerts, C., et al. 2026, AJ, 171, 52
2026
-
[59]
2021, RNAAS, 5, 234
Kunimoto, M., Huang, C., Tey, E., et al. 2021, RNAAS, 5, 234
2021
-
[60]
2017, NeurIPS, 30
Lakshminarayanan, B., Pritzel, A., & Blundell, C. 2017, NeurIPS, 30
2017
-
[61]
M., & Van Reeth, T
Lecoanet, D., Bowman, D. M., & Van Reeth, T. 2022, MNRASL, 512, L16
2022
-
[62]
R., et al
Li, G., Van Reeth, T., Bedding, T. R., et al. 2020, MNRAS, 491, 3586
2020
-
[63]
N., Handberg, R., Buzasi, D
Lund, M. N., Handberg, R., Buzasi, D. L., et al. 2021, ApJS, 257, 53
2021
-
[64]
2026, in ICLR 2026 Workshop on Foundation Models for Science: Real-World Impact and Science-First Design
Mercader-Perez, P., Cuesta-Lazaro, C., Muthukrishna, D., et al. 2026, in ICLR 2026 Workshop on Foundation Models for Science: Real-World Impact and Science-First Design
2026
-
[65]
Mitchell, T. M. 1980, Technical Report No. CBM-TR-117
1980
-
[66]
Mombarg, J. S. G., Aerts, C., Van Reeth, T., & Hey, D. 2024, A&A, 691, A131
2024
-
[67]
M., et al
Montalto, M., Piotto, G., Marrese, P. M., et al. 2021, A&A, 653, A98
2021
-
[68]
The PLATO Input Catalogue of targets (tPIC) for the first Long Pointing Field
Montalto, M., Piotto, G., Marrese, P. M., et al. 2026, A&A, submitted, arXiv:2604.03369
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[69]
J., Hey, D., Van Reeth, T., & Bedding, T
Murphy, S. J., Hey, D., Van Reeth, T., & Bedding, T. R. 2019, MNRAS, 485, 2380
2019
-
[70]
2022, A&A, 658, A31
Nascimbeni, V ., Piotto, G., Börner, A., et al. 2022, A&A, 658, A31
2022
-
[71]
2025, A&A, 694, A313
Nascimbeni, V ., Piotto, G., Cabrera, J., et al. 2025, A&A, 694, A313
2025
-
[72]
The PLATO field selection process III. Selection of the Prime Sample for the LOPS2 field
Nascimbeni, V ., Piotto, G., Granata, V ., et al. 2026, A&A, submitted, arXiv:2604.03365 Pápics, P. I., Briquet, M., Baglin, A., et al. 2012, A&A, 542, A55
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[73]
2018, in International conference on in- telligent data communication technologies and internet of things, Springer, 758–763
Parmar, A., Katariya, R., & Patel, V . 2018, in International conference on in- telligent data communication technologies and internet of things, Springer, 758–763
2018
-
[74]
G., Aerts, C., Pápics, P
Pedersen, M. G., Aerts, C., Pápics, P. I., et al. 2021, Nat. Astron., 5, 715
2021
-
[75]
Pedersen, M. G. & Bell, K. J. 2023, AJ, 165, 239
2023
-
[76]
2021, in International Conference on Space Optics—ICSO 2020, V ol
Pertenais, M., Cabrera, J., Paproth, C., et al. 2021, in International Conference on Space Optics—ICSO 2020, V ol. 11852, SPIE, 2043–2054
2021
-
[77]
2026, RNAAS, 10, 69
Petitpas, G., Haviland, J., Han, T., et al. 2026, RNAAS, 10, 69
2026
-
[78]
2019, in Stars and their Variability Observed from Space, 465–470
Plachy, E. 2019, in Stars and their Variability Observed from Space, 465–470
2019
-
[79]
2021, ApJS, 253, 11
Plachy, E., Pál, A., Bódi, A., et al. 2021, ApJS, 253, 11
2021
-
[80]
& Szabó, R
Plachy, E. & Szabó, R. 2021, Front. Astron. Space Sci., 7, 577695
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.