GTLS: A GPU-accelerated method for periodic transit detection

Jian Ge; Kevin Willis; Luoxi Jin; QuanQuan Hu

arxiv: 2607.00348 · v1 · pith:Q6EPXY45new · submitted 2026-07-01 · 🌌 astro-ph.IM · astro-ph.EP

GTLS: A GPU-accelerated method for periodic transit detection

Quanquan Hu , Jian Ge , Luoxi Jin , Kevin Willis This is my paper

Pith reviewed 2026-07-02 00:33 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.EP

keywords transit detectionGPU accelerationexoplanet searchphotometric surveysTransit Least Squareslight curve analysiscomputational efficiency

0 comments

The pith

A GPU implementation of Transit Least Squares reduces search time for periodic transits by more than twenty times while keeping detection rates statistically identical to the CPU version.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops GTLS to make periodic transit searches feasible on the large light-curve collections produced by modern surveys. It moves the core steps of the TLS algorithm—phase folding, duration checks, depth estimation, and chi-squared scoring—onto GPU kernels that run in parallel. Tests on synthetic 3000-day curves show runtime falling from 3289 seconds on CPU to 138 seconds on one RTX 4090 GPU, with two GPUs bringing it to 79 seconds. Precision and recall stay within a few tenths of a percent of the reference TLS numbers, confirming that the accelerated version recovers the same signals.

Core claim

GTLS parallelizes the dominant TLS operations on GPUs and delivers a 3000-day light-curve search in 138 seconds on a single RTX 4090, with precision 9.3 percent and recall 79.4 percent that match the CPU TLS values of 9.4 percent and 81.1 percent to within statistical noise.

What carries the argument

Parallel GPU kernels that perform phase folding, moving-window depth estimation, and chi-squared evaluation to replicate the TLS search exactly.

If this is right

TLS-style searches become practical for the full data volumes expected from TESS, PLATO, and future missions.
Two-GPU runs cut the same 3000-day search to roughly 79 seconds.
Detection statistics remain consistent enough that existing TLS recovery pipelines can adopt GTLS without re-tuning thresholds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same kernel structure could be adapted to other GPU vendors or to multi-node clusters for even larger surveys.
Analogous parallelization might accelerate other periodic-signal searches that rely on phase folding and least-squares scoring.
Processing times short enough for near-real-time analysis could open new observing strategies for upcoming wide-field telescopes.

Load-bearing premise

The GPU kernels produce floating-point results that are numerically identical to the CPU reference for every light curve examined.

What would settle it

Running both GTLS and TLS on the same collection of real Kepler light curves and finding a statistically significant difference in the number or properties of recovered planets.

Figures

Figures reproduced from arXiv: 2607.00348 by Jian Ge, Kevin Willis, Luoxi Jin, QuanQuan Hu.

**Figure 1.** Figure 1: Overview of the processing pipeline from the raw light curve to the GTLS output. Part of the preprocessing is performed outside the GTLS framework. Keplerian dynamics give the relationship between orbital period P and orbital semi-major axis a: P 2 = 4π 2 GM a 3 . (2) Considering the minimum orbital semi-major axis amin to be the Roche limit, assumed to be 3R, where R is the radius of the star (Guillochon… view at source ↗

**Figure 2.** Figure 2: also shows that, at short orbital periods, few known exoplanets occupy the region with very small T14/P. This may be related to the finite time resolution and cadence limitations of current transit surveys. As a reference, the blue dashed line in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Padding of the folded light curve. Here, pi denotes the normalized phase of the i-th data point, fi is the corresponding flux value, Dm is the maximum transit duration considered in the search, and N is the total number of data points in the folded light curve. χ 2 value. For each trial combination of period, duration, and T0, a corresponding χ 2 is computed and recorded. After detrending the χ 2 series, … view at source ↗

**Figure 4.** Figure 4: The composition of the χ 2 value outside the transit signal. outa denotes the part before the transit signal, and outb is the part after the transit signal. As T0 changes, the part before the transit signal(outa) is gradually increased, and the part after the transit signal(outb) is gradually reduced. Np is the number of points in the patched light curve, and d is the number of points in the transit signa… view at source ↗

**Figure 5.** Figure 5: The receiver operating characteristic (ROC) curves for GTLS, TLS, and BLS. The closer the ROC curve is to the upper-left corner, the better the diagnostic ability of the test. See the text for more details. This behavior reflects the role of GTLS, TLS, and BLS as first-stage screening tools in transit detection pipelines. At this stage, the primary goal is not to minimize the false positive rate at the exp… view at source ↗

**Figure 6.** Figure 6: shows the AREs of the recovered transit parameters for GTLS, TLS, and BLS. Both GTLS and TLS outperform BLS across all parameters; although GTLS shows slightly lower accuracy than TLS in recovering orbital periods for KOI light curves, it achieves better recovery of the transit duration, T0, and transit depth [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Runtime comparison among GTLS, TLS, and the GPU-based BLS implementation in cuvarbase. GTLS is significantly faster than both comparison methods over the tested light-curve baselines [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

Computational efficiency is a critical requirement for transit searches in modern large-scale photometric surveys. We present Graphics Processing Units Transit Least Squares (GTLS), a GPU-accelerated implementation of the Transit Least Squares algorithm designed to reduce the computational cost of periodic transit detection while preserving TLS-like sensitivity to transit-shaped signals. GTLS parallelizes the dominant steps of the TLS search, including phase folding, transit-duration evaluation, moving-window depth estimation, and chi-squared calculation. Using Kepler-like long-cadence light curves and synthetic Kepler-like time series, we benchmark GTLS against the reference CPU implementation of TLS and the GPU-based BLS implementation in cuvarbase. On an AMD Ryzen 9 7950X CPU and an NVIDIA RTX 4090 GPU, GTLS processes a 3000-day synthetic light curve in approximately 138 seconds, compared with 3289 seconds for TLS. With two RTX 4090 GPUs, the runtime is reduced to approximately 79 seconds. In recovery tests, GTLS achieves detection performance statistically consistent with TLS, with a precision of 9.3 percent and recall of 79.4 percent, compared with 9.4 percent and 81.1 percent for TLS. These results demonstrate that GTLS enables efficient TLS-style searches for large photometric data sets from Kepler, TESS, PLATO, ET, and future missions. The source code is publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GTLS is a straightforward GPU port of TLS that cuts runtime by ~20x on an RTX 4090 while keeping synthetic recovery stats close to the CPU reference.

read the letter

GTLS is a GPU port of the Transit Least Squares algorithm that delivers roughly 20x faster runtimes on an RTX 4090 compared to the CPU version, with detection precision and recall on synthetic Kepler-like data that stay statistically close.

The paper moves the main TLS steps—phase folding, moving-window depth estimation, and chi-squared evaluation—into GPU kernels and benchmarks the result against the original TLS and cuvarbase BLS. They report clear wall-clock times (138 s vs 3289 s for a 3000-day curve, 79 s with two GPUs) and recovery numbers (9.3 % precision and 79.4 % recall versus 9.4 % and 81.1 % for TLS). The code is released, which helps.

This is useful engineering for anyone who needs to run exhaustive TLS searches on large photometric sets from TESS, Kepler, or upcoming missions. The speedup is measured directly and addresses a real bottleneck.

The soft spot is the limited check on numerical equivalence. The recall is a couple points lower, yet the paper shows only aggregate statistics and no side-by-side comparison of periods or depths per light curve. Floating-point differences in the parallel kernels could matter on real data with gaps or non-Gaussian noise, and the abstract gives no extra detail on precision handling or edge cases.

The work is aimed at pipeline developers and survey teams who care about compute cost. A reader who needs faster TLS without changing the method will find the benchmarks and code release helpful. It is solid enough to deserve a serious referee.

Referee Report

2 major / 2 minor

Summary. The paper presents GTLS, a GPU-accelerated implementation of the Transit Least Squares (TLS) algorithm for periodic transit detection. It parallelizes phase folding, transit-duration evaluation, moving-window depth estimation, and chi-squared calculation, claiming a runtime reduction from 3289 s (TLS on CPU) to 138 s (GTLS on single RTX 4090) or 79 s (two GPUs) for a 3000-day synthetic Kepler-like light curve, while achieving statistically consistent detection performance (precision 9.3% and recall 79.4% for GTLS vs. 9.4% and 81.1% for TLS) on synthetic data. The source code is stated to be publicly available.

Significance. If numerical equivalence to TLS holds, GTLS would enable practical application of the sensitive TLS method to the large data volumes expected from PLATO, ET, and similar future surveys, where CPU-based TLS is currently prohibitive. The public availability of the source code is a strength that supports reproducibility of the reported wall-clock times and recovery statistics.

major comments (2)

[Recovery tests] Recovery tests (as summarized in the abstract): aggregate precision and recall are reported as close but not identical (9.3%/79.4% vs 9.4%/81.1%), yet no per-light-curve comparison of best-fit periods, depths, or chi-squared values between GTLS and TLS is described. This leaves open the possibility that small differences arise from unquantified floating-point accumulation in the parallelized phase-folding or moving-window kernels rather than statistical variation alone.
[Methods / Implementation] The manuscript provides no details on numerical precision handling, memory layout, or treatment of edge cases in the period grid within the GPU kernels. These omissions are load-bearing for the central claim of functional equivalence to TLS, as real data with gaps or non-Gaussian noise could amplify any discrepancies beyond what synthetic-injection statistics reveal.

minor comments (2)

The abstract and text should explicitly state the number of synthetic light curves used in the recovery tests and the exact criteria for a 'detection' to allow direct auditing of the precision/recall numbers.
Clarify whether the reported runtimes include data transfer overhead between host and device or are kernel-only timings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for minor revision. We address each major comment below, proposing targeted revisions to strengthen the manuscript's claims of numerical equivalence while preserving the focus on performance.

read point-by-point responses

Referee: [Recovery tests] Recovery tests (as summarized in the abstract): aggregate precision and recall are reported as close but not identical (9.3%/79.4% vs 9.4%/81.1%), yet no per-light-curve comparison of best-fit periods, depths, or chi-squared values between GTLS and TLS is described. This leaves open the possibility that small differences arise from unquantified floating-point accumulation in the parallelized phase-folding or moving-window kernels rather than statistical variation alone.

Authors: We agree that aggregate statistics alone leave room for ambiguity regarding the source of the small observed differences. In the revised manuscript we will add a new paragraph and supplementary figure that reports the per-light-curve distributions of absolute differences in recovered period, transit depth, and minimum chi-squared between GTLS and TLS across the full set of synthetic injections. These distributions will be used to demonstrate that the discrepancies fall within the range expected from Monte-Carlo noise realizations rather than systematic kernel-level divergence. We retain the original aggregate precision/recall numbers but qualify them with this per-curve evidence. revision: yes
Referee: [Methods / Implementation] The manuscript provides no details on numerical precision handling, memory layout, or treatment of edge cases in the period grid within the GPU kernels. These omissions are load-bearing for the central claim of functional equivalence to TLS, as real data with gaps or non-Gaussian noise could amplify any discrepancies beyond what synthetic-injection statistics reveal.

Authors: We acknowledge that the current Methods section is insufficiently explicit on these implementation choices. The revised manuscript will contain a new subsection (2.3) that specifies: (i) use of IEEE-754 single-precision (float32) arithmetic throughout the GPU kernels with explicit accumulation of the chi-squared statistic in double precision to limit rounding error; (ii) memory layout employing coalesced global-memory accesses and shared-memory tiling for the phase-folding and moving-window steps; and (iii) identical period-grid construction and gap-handling logic to the reference TLS implementation, with explicit padding of light-curve edges to prevent boundary artifacts. These additions will be accompanied by a short verification that the GPU kernels reproduce the CPU reference results to within machine epsilon on a set of edge-case light curves containing large gaps. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical implementation and benchmarks

full rationale

The paper presents a GPU port of the existing TLS algorithm together with runtime measurements and recovery statistics on synthetic light curves. No mathematical derivation, fitted model, or prediction step is claimed; all reported numbers (138 s vs 3289 s, precision 9.3 % vs 9.4 %, recall 79.4 % vs 81.1 %) are direct empirical outputs. No self-citations, ansatzes, or uniqueness theorems are invoked to support any derivation. The work is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a computational implementation paper; the central claim rests on the assumption that GPU parallelization preserves TLS numerical behavior and that the chosen synthetic test set is representative. No free parameters, new physical entities, or ad-hoc axioms beyond standard floating-point arithmetic and transit model assumptions are introduced.

axioms (1)

domain assumption GPU kernels for phase folding, transit-duration evaluation, moving-window depth estimation, and chi-squared calculation produce results numerically equivalent to the reference CPU TLS code.
This assumption is required for the claim that detection performance remains statistically consistent.

pith-pipeline@v0.9.1-grok · 5783 in / 1402 out tokens · 30931 ms · 2026-07-02T00:33:40.578395+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 1 canonical work pages

[1]

L., Chen, X., Ciardi, D., et al

Akeson, R. L., Chen, X., Ciardi, D., et al. 2013, Publications of the Astronomical Society of the Paciﬁc, 125, 989

2013
[2]

P ., et al

Ansdell, M., Ioannou, Y ., Osborn, H. P ., et al. 2018, ApJ, 869, L7

2018
[3]

J., Koch, D., Basri, G., et al

Borucki, W. J., Koch, D., Basri, G., et al. 2010, Science, 327, 977

2010
[4]

2023, A&A, 672, A144

Canocchi, G., Malavolta, L., Pagano, I., et al. 2023, A&A, 672, A144

2023
[5]

2022b, eprint arXiv:2206.06693 [arXiv:2206.06693]

Ge, J., Zhang, H., Zang, W., et al. 2022b, eprint arXiv:2206.06693 [arXiv:2206.06693]

work page arXiv
[6]

2024b, Space Telescopes and Instrumentation 2024: Optical, Infrared, and Millimeter Wave, 45

Ge, J., Zhang, H., Zhang, Y ., et al. 2024b, Space Telescopes and Instrumentation 2024: Optical, Infrared, and Millimeter Wave, 45

2024
[7]

L., Chaplin, W

Gilliland, R. L., Chaplin, W. J., Jenkins, J. M., Ramsey, L. W., & Smith, J. C. 2015, The Astronomical Journal, 150, 133

2015
[8]

2011, The Astrophysical Journal, 732, 74

Guillochon, J., Ramirez-Ruiz, E., & Lin, D. 2011, The Astrophysical Journal, 732, 74

2011
[9]

J., Mulders, G

Hippke, M., David, T. J., Mulders, G. D., & Heller, R. 2019, AJ, 158, 143

2019
[10]

& Heller, R

Hippke, M. & Heller, R. 2019, A&A, 623, A39 Hoﬀman, J. 2017, cuvarbase

2019
[11]

B., Sobeck, C., Haas, M., et al

Howell, S. B., Sobeck, C., Haas, M., et al. 2014, PASP , 126, 398 Kovács, G., Zucker, S., & Mazeh, T. 2002, A&A, 391, 369

2014
[12]

2015, Publications of the Astronomical Society of the Paciﬁc, 127, 1161

Kreidberg, L. 2015, Publications of the Astronomical Society of the Paciﬁc, 127, 1161

2015
[13]

2023, Research Notes of the AAS, 7, 28 Oﬁr, A

Kunimoto, M., Tey, E., Fong, W., et al. 2023, Research Notes of the AAS, 7, 28 Oﬁr, A. 2014, A&A, 561, A138

2023
[14]

2017, in NIPS

Okuta, R., Unno, Y ., Nishino, D., Hido, S., & Loomis, C. 2017, in NIPS

2017
[15]

2021, MNRAS, 502, 2845

Rao, S., Mahabal, A., Rao, N., & Raghavendra, C. 2021, MNRAS, 502, 2845

2021
[16]

2025, Experimental Astronomy, 59

Rauer, H., Aerts, C., Cabrera, J., et al. 2025, Experimental Astronomy, 59

2025
[17]

2014, Exp

Rauer, H., Catala, C., Aerts, C., et al. 2014, Exp. Astron., 38, 249

2014
[18]

R., Winn, J

Ricker, G. R., Winn, J. N., V anderspek, R., et al. 2014, J. Astron. Telescopes Instrum. Syst., 1, 014003

2014
[19]

Shallue, C. J. & V anderburg, A. 2018, AJ, 155, 94

2018
[20]

1966, Applied Regression Analysis (John Wiley & Sons, Inc.)

Smith, D. . 1966, Applied Regression Analysis (John Wiley & Sons, Inc.)

1966
[21]

2024, Monthly Notices of the Royal Astronomical Society, 528, 4053 Article number, page 11

Wang, K., Ge, J., Willis, K., Wang, K., & Zhao, Y . 2024, Monthly Notices of the Royal Astronomical Society, 528, 4053 Article number, page 11

2024

[1] [1]

L., Chen, X., Ciardi, D., et al

Akeson, R. L., Chen, X., Ciardi, D., et al. 2013, Publications of the Astronomical Society of the Paciﬁc, 125, 989

2013

[2] [2]

P ., et al

Ansdell, M., Ioannou, Y ., Osborn, H. P ., et al. 2018, ApJ, 869, L7

2018

[3] [3]

J., Koch, D., Basri, G., et al

Borucki, W. J., Koch, D., Basri, G., et al. 2010, Science, 327, 977

2010

[4] [4]

2023, A&A, 672, A144

Canocchi, G., Malavolta, L., Pagano, I., et al. 2023, A&A, 672, A144

2023

[5] [5]

2022b, eprint arXiv:2206.06693 [arXiv:2206.06693]

Ge, J., Zhang, H., Zang, W., et al. 2022b, eprint arXiv:2206.06693 [arXiv:2206.06693]

work page arXiv

[6] [6]

2024b, Space Telescopes and Instrumentation 2024: Optical, Infrared, and Millimeter Wave, 45

Ge, J., Zhang, H., Zhang, Y ., et al. 2024b, Space Telescopes and Instrumentation 2024: Optical, Infrared, and Millimeter Wave, 45

2024

[7] [7]

L., Chaplin, W

Gilliland, R. L., Chaplin, W. J., Jenkins, J. M., Ramsey, L. W., & Smith, J. C. 2015, The Astronomical Journal, 150, 133

2015

[8] [8]

2011, The Astrophysical Journal, 732, 74

Guillochon, J., Ramirez-Ruiz, E., & Lin, D. 2011, The Astrophysical Journal, 732, 74

2011

[9] [9]

J., Mulders, G

Hippke, M., David, T. J., Mulders, G. D., & Heller, R. 2019, AJ, 158, 143

2019

[10] [10]

& Heller, R

Hippke, M. & Heller, R. 2019, A&A, 623, A39 Hoﬀman, J. 2017, cuvarbase

2019

[11] [11]

B., Sobeck, C., Haas, M., et al

Howell, S. B., Sobeck, C., Haas, M., et al. 2014, PASP , 126, 398 Kovács, G., Zucker, S., & Mazeh, T. 2002, A&A, 391, 369

2014

[12] [12]

2015, Publications of the Astronomical Society of the Paciﬁc, 127, 1161

Kreidberg, L. 2015, Publications of the Astronomical Society of the Paciﬁc, 127, 1161

2015

[13] [13]

2023, Research Notes of the AAS, 7, 28 Oﬁr, A

Kunimoto, M., Tey, E., Fong, W., et al. 2023, Research Notes of the AAS, 7, 28 Oﬁr, A. 2014, A&A, 561, A138

2023

[14] [14]

2017, in NIPS

Okuta, R., Unno, Y ., Nishino, D., Hido, S., & Loomis, C. 2017, in NIPS

2017

[15] [15]

2021, MNRAS, 502, 2845

Rao, S., Mahabal, A., Rao, N., & Raghavendra, C. 2021, MNRAS, 502, 2845

2021

[16] [16]

2025, Experimental Astronomy, 59

Rauer, H., Aerts, C., Cabrera, J., et al. 2025, Experimental Astronomy, 59

2025

[17] [17]

2014, Exp

Rauer, H., Catala, C., Aerts, C., et al. 2014, Exp. Astron., 38, 249

2014

[18] [18]

R., Winn, J

Ricker, G. R., Winn, J. N., V anderspek, R., et al. 2014, J. Astron. Telescopes Instrum. Syst., 1, 014003

2014

[19] [19]

Shallue, C. J. & V anderburg, A. 2018, AJ, 155, 94

2018

[20] [20]

1966, Applied Regression Analysis (John Wiley & Sons, Inc.)

Smith, D. . 1966, Applied Regression Analysis (John Wiley & Sons, Inc.)

1966

[21] [21]

2024, Monthly Notices of the Royal Astronomical Society, 528, 4053 Article number, page 11

Wang, K., Ge, J., Willis, K., Wang, K., & Zhao, Y . 2024, Monthly Notices of the Royal Astronomical Society, 528, 4053 Article number, page 11

2024