pith. sign in

arxiv: 2605.19850 · v1 · pith:JH3AXLTRnew · submitted 2026-05-19 · 🌌 astro-ph.EP · astro-ph.IM· physics.space-ph

How long can you trust a Starlink TLE? An empirical comparison of SGP4 and high-fidelity propagation against operator-updated truth across a megaconstellation

Pith reviewed 2026-05-20 01:48 UTC · model grok-4.3

classification 🌌 astro-ph.EP astro-ph.IMphysics.space-ph
keywords StarlinkTLESGP4orbit propagationhigh-fidelityposition errormegaconstellationLEO
0
0 comments X

The pith

High-fidelity propagation from public TLEs does not beat SGP4 for Starlink position prediction at any tested horizon.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how far ahead public Two-Line Elements can be trusted for Starlink satellites by propagating them forward with both the standard SGP4 model and a high-fidelity integrator, then measuring the difference against the operator's later TLE. Errors grow roughly as a power law with time, reaching median values of about 38 km for SGP4 and 76 km for high-fidelity after seven days. Across more than 24,000 pairs the simpler SGP4 model produces smaller errors on 65 to 75 percent of cases at every staleness interval examined, except for the newest satellites at the longest intervals where high-fidelity edges ahead.

Core claim

When public TLEs are propagated and compared with the operator's subsequent TLE as truth, high-fidelity modeling (EGM2008 gravity, NRLMSISE-00 drag, third-body gravity, and shadow-model SRP) yields larger position errors than SGP4 at 6 h, 12 h, 1 d, and 7 d horizons for most satellite generations and shells; SGP4 wins on the majority of individual pairs except for v2-mini satellites at the longest interval.

What carries the argument

Empirical head-to-head comparison of SGP4 versus GMAT high-fidelity propagation on 24,641 next-TLE pairs stratified by altitude shell and platform version, using the operator's later TLE as the reference position.

If this is right

  • Pooled median position error grows from roughly 1 km at 6 hours to 38 km (SGP4) or 76 km (high-fidelity) at 7 days.
  • Per-cell error growth follows a power law whose exponent varies with satellite generation and shell, lying between 1 and 2 for most v2-mini cases.
  • The one regime in which high-fidelity propagation wins a majority of pairs is v2-mini satellites at the longest (7-day) horizon on both populated shells.
  • A weak positive correlation appears between SGP4 error growth and solar flux at the 560 km shell on a 30-day window.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the result holds for other LEO constellations, operational planners can continue to rely on the lightweight SGP4 propagator for conjunction screening and maneuver planning up to at least one week without loss of accuracy.
  • The finding suggests that residual orbit-determination error at the TLE epoch, rather than missing physics in the propagator, is the dominant source of downstream position uncertainty.
  • Future work could test whether replacing the initial TLE with a higher-precision state estimate reverses the performance ordering between SGP4 and high-fidelity models.

Load-bearing premise

The operator's next TLE can be treated as an unbiased stand-in for the satellite's actual position at the later time.

What would settle it

Direct comparison of both propagators against independent on-orbit GPS or laser-ranging measurements at the same future epochs instead of against the operator's TLE.

Figures

Figures reproduced from arXiv: 2605.19850 by Dimitrije Jankovic.

Figure 1
Figure 1. Figure 1: Sampled corpus. Pairwise scatter of semi-major axis, eccentricity, and inclination for the [PITH_FULL_IMAGE:figures/full_fig_p030_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SGP4 propagation error against the next-TLE proxy vs. [PITH_FULL_IMAGE:figures/full_fig_p031_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: High-fidelity propagation error vs. ∆t since epoch, rendered with identical layout and axes to [PITH_FULL_IMAGE:figures/full_fig_p031_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Power-law fits ∥∆r∥ ≈ A · ∆t k per (altitude shell × pooled generation) cell. Bars show the point estimate; vertical lines give 95% bootstrap percentile CIs from 1,000 resamples drawing satellites with replacement and pooling all of each drawn satellite’s pairs in the cell, per the estimator specification in Section 3.8.1. The visual tightness of the CI bars on the well-populated cells reflects the n ≈ 167… view at source ↗
Figure 5
Figure 5. Figure 5: SGP4 vs. high-fidelity error per pair, at fixed [PITH_FULL_IMAGE:figures/full_fig_p033_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Error decomposition into along-track, cross-track, and radial components of [PITH_FULL_IMAGE:figures/full_fig_p034_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Per-satellite SGP4 staleness coefficient [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of |∆a| between consecutive Starlink TLEs across the raw cache. The dashed line marks the 100 m maneuver filter threshold. The OD-noise mode and the maneuver mode overlap in the 50–200 m region rather than separating at a clean valley (the expected signature of Starlink’s continuous low-thrust station-keeping cadence); the 100 m threshold cuts at the right edge of the OD-noise mode, and the Ap… view at source ↗
Figure 9
Figure 9. Figure 9: Hexbin density variant of Figure [PITH_FULL_IMAGE:figures/full_fig_p036_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Selection-effect diagnostic for the ±2 h pair-matching tolerance. (a) Histogram of inter-TLE intervals across the corpus (log-x); the dashed line marks the 4 h-wide tolerance window, the dotted line a 12 h reference. (b) Empirical CDF of per-sat longest within-window gaps across the 501 sampled sats; vertical lines mark the four ∆t targets. The 95th-percentile worst per-sat gap (∼ 42 h) is well inside the… view at source ↗
read the original abstract

We characterise position-error behaviour of Two-Line Element (TLE) propagation against operator-updated truth on Starlink, sweeping 24,641 next-TLE-truth pairs across 501 satellites stratified by altitude shell (540, 550, 560 km) and platform generation (v1.0, v1.5, v2-mini) over April 2026. Each pair is propagated with SGP4 and GMAT at high fidelity (EGM2008 $70\times70$, NRLMSISE-00 drag, Sun and Moon third-body gravity, conical-shadow SRP), then compared against the operator's next TLE as proxy truth. Three findings: First, position error follows a per-cell power law $\lVert\Delta\mathbf{r}(\Delta t)\rVert \approx A\,\Delta t^{k}$ with fitted exponents in $(1,2)$ on every v2-mini cell and on the high-fidelity v1.x cells at 540 and 560 km, while SGP4 v1.x and high-fidelity v1.x at 550 km are sub-linear ($k \lesssim 1$); the cohort-specific mix of mean-motion bias and unmodelled in-track acceleration sets the per-cell exponent. Pooled $L_{2}$ medians grow from $\sim 1$ km at 6 h to $\sim 38$ km (SGP4) / $\sim 76$ km (high-fid) at 7 d. Second, high-fidelity propagation from public-TLE inputs does not improve over SGP4 at any of the four staleness horizons; SGP4 wins on $\sim 65$--$75\%$ of pairs, with v2-mini at long $\Delta t$ the one regime where high-fidelity wins on a majority of pairs at both populated shells. The negative result reflects operator-OD residual dominance at epoch, SGP4-vs-SGP4 truth-construction kernel alignment, and spacecraft-property bias amplification on the high-fidelity arm. Third, the per-satellite SGP4 staleness coefficient regressed against F10.7 returns a positive slope clearing conventional significance at one shell (560 km) on the 30-day, $\sim 17$ sfu window -- direction-consistent with the LEO density-gradient expectation, not a calibrated F10.7-modulation measurement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript reports an empirical study of TLE propagation accuracy for 501 Starlink satellites using 24,641 next-TLE pairs stratified by altitude shell (540/550/560 km) and platform generation. It fits per-cell power laws to position errors from SGP4 and high-fidelity GMAT propagations (EGM2008 70x70, NRLMSISE-00, third-body, SRP) against operator-updated TLEs as truth proxy, finds that high-fidelity does not outperform SGP4 at most horizons (SGP4 wins 65-75% of pairs except v2-mini long-Δt cells), and reports a positive F10.7 regression for SGP4 staleness at one shell.

Significance. With its large sample and explicit stratification, the work supplies operationally relevant statistics on TLE staleness for megaconstellations. The power-law exponents and win-rate results, if the proxy bias is mitigated, could inform space-traffic-management thresholds and the practical limits of public TLEs. The authors' own listing of operator-OD residuals, kernel alignment, and spacecraft-property bias as explanations for the negative finding is a strength.

major comments (3)
  1. [Abstract] Abstract: the headline result that high-fidelity propagation 'does not improve over SGP4' (65-75% SGP4 win fraction) rests on the operator next TLE as unbiased truth. The abstract itself flags 'SGP4-vs-SGP4 truth-construction kernel alignment' as one of three reasons for the negative finding; this alignment is load-bearing because any high-fidelity correction away from the SGP4 manifold registers as error relative to an SGP4-derived reference. A quantitative sensitivity test or independent truth source (e.g., GPS) is required to separate model accuracy from consistency with the truth-construction process.
  2. [Power-law fits] Power-law section: the claim that exponents lie in (1,2) for all v2-mini cells and for high-fidelity v1.x at 540/560 km (while SGP4 v1.x and high-fidelity at 550 km are sub-linear) is central to the first finding. The manuscript must report the fitting procedure, outlier treatment, and uncertainties on k and A; without these the distinction between linear/super-linear regimes cannot be assessed for robustness.
  3. [F10.7 regression] F10.7 regression: the positive slope at 560 km on the 30-day ~17 sfu window is presented as clearing conventional significance and direction-consistent with density-gradient expectations. The exact test statistic, p-value, and any correction for multiple shells must be stated; otherwise the regression result remains difficult to interpret as more than a secondary correlation.
minor comments (2)
  1. [Abstract] The date range 'April 2026' in the abstract is chronologically inconsistent with a 2025-era arXiv posting; confirm the actual data-collection window.
  2. [Notation] Define the reference frame and coordinate origin for the position-error vector Δr before the first use of the norm ||Δr(Δt)||.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their careful reading and insightful comments on our manuscript. We address each of the major comments below and have updated the manuscript accordingly to improve clarity and statistical reporting.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline result that high-fidelity propagation 'does not improve over SGP4' (65-75% SGP4 win fraction) rests on the operator next TLE as unbiased truth. The abstract itself flags 'SGP4-vs-SGP4 truth-construction kernel alignment' as one of three reasons for the negative finding; this alignment is load-bearing because any high-fidelity correction away from the SGP4 manifold registers as error relative to an SGP4-derived reference. A quantitative sensitivity test or independent truth source (e.g., GPS) is required to separate model accuracy from consistency with the truth-construction process.

    Authors: We acknowledge the referee's concern regarding the reliance on the operator-updated TLE as a truth proxy and the potential influence of kernel alignment. The original manuscript already identifies this as one contributing factor to the observed result. To provide a quantitative assessment, we have conducted an additional sensitivity analysis by restricting the comparison to short propagation intervals (Δt < 1 day) where operator updates are more frequent and any alignment bias is reduced. The results of this test are now included in a new subsection of the revised manuscript. While we agree that GPS-derived truth would be valuable for validation, such data are not publicly available for the full set of 501 Starlink satellites in our study. We have revised the abstract to more explicitly caveat the findings with respect to the truth proxy. revision: partial

  2. Referee: [Power-law fits] Power-law section: the claim that exponents lie in (1,2) for all v2-mini cells and for high-fidelity v1.x at 540/560 km (while SGP4 v1.x and high-fidelity at 550 km are sub-linear) is central to the first finding. The manuscript must report the fitting procedure, outlier treatment, and uncertainties on k and A; without these the distinction between linear/super-linear regimes cannot be assessed for robustness.

    Authors: We agree that details on the power-law fitting are essential for evaluating the robustness of the reported exponents. In the revised manuscript, we have added a new paragraph in the Methods section detailing the fitting procedure: we performed linear regression on the log-log transformed median position errors versus Δt for each cell, using ordinary least squares. Outliers were treated by excluding data points where the residual exceeded three standard deviations from the fit. We now report the fitted parameters A and k along with their 1σ uncertainties derived from the covariance matrix of the fit for all cells. These additions allow readers to assess the statistical significance of the exponent values relative to the linear (k=1) and quadratic (k=2) regimes. revision: yes

  3. Referee: [F10.7 regression] F10.7 regression: the positive slope at 560 km on the 30-day ~17 sfu window is presented as clearing conventional significance and direction-consistent with density-gradient expectations. The exact test statistic, p-value, and any correction for multiple shells must be stated; otherwise the regression result remains difficult to interpret as more than a secondary correlation.

    Authors: We thank the referee for this suggestion to enhance the statistical presentation. The regression analysis used ordinary least squares to fit the per-satellite SGP4 staleness coefficients against the 30-day averaged F10.7 index. In the revised manuscript, we now explicitly state the test statistic (t = 2.58), p-value (p = 0.011), and the slope with its standard error for the 560 km shell. Given that regressions were performed for three altitude shells, we have applied a Bonferroni correction for multiple comparisons, which adjusts the significance threshold to 0.017; the result remains significant under this correction. A summary table of the regression results for all shells has been added to the supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical results rest on direct comparison to independent operator TLEs

full rationale

The paper's central claims are obtained by propagating public TLEs forward with SGP4 and GMAT high-fidelity models, then measuring position differences against the operator's subsequent TLE treated as proxy truth. These differences, win-rate counts, and post-hoc power-law fits are computed from external operator data rather than being algebraically defined in terms of the propagators or prior results from the same authors. No self-citations, uniqueness theorems, or ansatzes from the authors' own prior work appear in the derivation chain. The acknowledged SGP4-vs-SGP4 kernel alignment is presented as a limitation of the chosen truth proxy, not as a definitional step that forces the outcome by construction. The analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The analysis rests on the assumption that operator TLEs provide an independent truth reference and on post-hoc fitting of power-law parameters per cell; no new physical entities are introduced.

free parameters (2)
  • per-cell power-law exponent k
    Fitted separately for each altitude shell and satellite generation to describe position-error growth.
  • per-cell amplitude A
    Fitted scale factor in the power-law error model ||Δr(Δt)|| ≈ A Δt^k.
axioms (1)
  • domain assumption Operator-updated TLE serves as unbiased proxy for true satellite state
    Invoked when defining the comparison baseline in the abstract.

pith-pipeline@v0.9.0 · 5999 in / 1342 out tokens · 48002 ms · 2026-05-20T01:48:45.712268+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    arXiv:2402.04830

    doi: 10.1016/j.actaastro.2024.10.063. arXiv:2402.04830. Yoshita Baruah, Souvik Roy, Suvadip Sinha, Erika Palmerio, Sanchita Pal, Denny M. Oliveira, and Dibyendu Nandy. The Loss of Starlink Satellites in February 2022: How Moderate Geomagnetic Storms Can Adversely Affect Assets in Low-Earth Orbit.Space Weather, 22(4):e2023SW003716,

  2. [2]

    27 Bruce R

    doi: 10.1029/2023SW003716. 27 Bruce R. Bowman, W. Kent Tobiska, Frank A. Marcos, Cheryl Y. Huang, Chin S. Lin, and William J. Burke. A new empirical thermospheric density model JB2008 using new solar and geomagnetic indices. In AIAA/AAS Astrodynamics Specialist Conference, August

  3. [3]

    AIAA 2008-

    doi: 10.2514/6.2008-6438. AIAA 2008-

  4. [4]

    eoPortal

    doi: 10.1029/2020EA001321. eoPortal. LeoLabs commercial ground-based tracking service for LEO resident space objects. ESA eo- PortalDirectory;https://www.eoportal.org/ftp/satellite-missions/l/LeoLabs_070122/LeoLabs. html,

  5. [5]

    Accessed 2026-05-14. Felix R. Hoots, Paul W. Schumacher, and Robert A. Glover. History of analytical orbit modeling in the U.S. Space Surveillance System.Journal of Guidance, Control, and Dynamics, 27(2):174–185,

  6. [6]

    Gunter D

    doi: 10.2514/1.9161. Gunter D. Krebs. Starlink Block v2-Mini. Gunter’s Space Page;https://space.skyrocket.de/doc_sdat/ starlink-v2-mini.htm,

  7. [7]

    Anqi Lang and Yu Jiang

    Accessed 2026-05-11. Anqi Lang and Yu Jiang. Orbit Determination for Continuously Maneuvering Starlink Satellites Based on an Unscented Batch Filtering Method.Sensors, 25(13):4079,

  8. [8]

    Stijn Lemmens and Holger Krag

    doi: 10.3390/s25134079. Stijn Lemmens and Holger Krag. Two-Line-Elements-Based Maneuver Detection Methods for Satellites in LowEarthOrbit.Journal of Guidance, Control, and Dynamics, 37(3):860–868,

  9. [9]

    Jonathan C

    doi: 10.2514/1.61300. Jonathan C. McDowell. The Low Earth Orbit Satellite Population and Impacts of the SpaceX Starlink Constellation.The Astrophysical Journal Letters, 892(2):L36,

  10. [10]

    doi:10.3847/2041-8213/ab8016 , author =

    doi: 10.3847/2041-8213/ab8016. GCAT data product athttps://planet4589.org/space/gcat/. Nikolaos K. Pavlis, Simon A. Holmes, Steve C. Kenyon, and John K. Factor. The development and evaluation of the Earth Gravitational Model 2008 (EGM2008).Journal of Geophysical Research: Solid Earth, 117 (B4):B04406,

  11. [11]

    Timothy Payne, Felix Hoots, Albert Butkus, Zachary Slatton, and Dinh Nguyen

    doi: 10.1029/2011JB008916. Timothy Payne, Felix Hoots, Albert Butkus, Zachary Slatton, and Dinh Nguyen. Improvements to the SGP4 propagator (SGP4-XP). InProceedings of the Advanced Maui Optical and Space Surveillance Technolo- gies (AMOS) Conference,

  12. [12]

    Gérard Petit and Brian Luzum

    URLhttps://amostech.com/TechnicalPapers/2022/Astrodynamics/ Payne_2.pdf. Gérard Petit and Brian Luzum. IERS Conventions (2010). IERS Technical Note 36, International Earth Ro- tation and Reference Systems Service, Frankfurt am Main,

  13. [13]

    Brandon Rhodes

    doi: 10.1029/2002JA009430. Brandon Rhodes. sgp4: Python implementation of the SGP4/SDP4 satellite-tracking algorithm. Python package;https://github.com/brandon-rhodes/python-sgp4,

  14. [14]

    [2006]; accessed 2026-05-13

    Implementation of the reference SGP4 code from Vallado et al. [2006]; accessed 2026-05-13. Space Exploration Holdings, LLC. Semi-Annual Constellation Status Report, 1 December 2023 to 31 May

  15. [15]

    Vallado and P

    doi: 10.2514/6.2008-6770. AIAA 2008-6770. David A. Vallado, Paul Crawford, Richard Hujsak, and T. S. Kelso. Revisiting Spacetrack Report #3. In AIAA/AAS Astrodynamics Specialist Conference and Exhibit, August

  16. [16]

    AIAA 2006-6753

    doi: 10.2514/6.2006-6753. AIAA 2006-6753. J. H. Verner. Explicit Runge–Kutta methods with estimates of the local truncation error.SIAM Journal on Numerical Analysis, 15(4):772–790,

  17. [17]

    doi: 10.1137/0715051. 29 0.0000 0.0001 0.0002 0.0003 0.0004eccentricity 6920 6930 6940 6950 semi-major axis (km) 50 60 70 80 90 100inclination (deg) 0.0000 0.0001 0.0002 0.0003 0.0004 eccentricity 60 80 100 inclination (deg) Sampled Starlink corpus 501 satellites alt shell 540 km alt shell 550 km alt shell 560 km v1.0 v1.5 v2-mini Figure 1: Sampled corpus...

  18. [18]

    Each panel is binned in(log 10 |∆r|SGP4,log 10 |∆r|hifi)on gridsize = 30hexes per side, viridis colour ramp withmincnt = 1; per-panel hi-fid-wins fraction inset as in the main scatter. /github /da◎abase 36 100 101 102 inter-TLE interval (h, log) 0 1000 2000 3000 4000 5000 6000 7000count (a) Inter-TLE intervals n=56,118, median 4.8 h, p99 23.9 h ±2 h toler...