Reduced-Order Data Assimilation for Thermospheric Density Using Physics-informed SINDyc Models

Daniele Sicoli; Piyush Mehta; Sriram Narayanan

arxiv: 2604.24646 · v2 · submitted 2026-04-27 · 📡 eess.SY · cs.SY

Reduced-Order Data Assimilation for Thermospheric Density Using Physics-informed SINDyc Models

Sriram Narayanan , Daniele Sicoli , Piyush Mehta This is my paper

Pith reviewed 2026-05-08 01:39 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords thermospheric densitydata assimilationSINDyreduced-order modelingKalman filtergeomagnetic stormsorbit predictionspace situational awareness

0 comments

The pith

Coupling a physics-informed reduced-order model with a Kalman filter reduces thermospheric density estimation errors compared to open-loop predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a method to estimate thermospheric mass density more accurately by combining a data-driven reduced-order model with data assimilation. The model is built using SINDy_c-AR from the TIE-GCM simulation and assimilates real satellite observations via a Kalman filter. It shows that this assimilation lowers errors relative to running the model without observations, especially during geomagnetic storms and with data from only one satellite. The approach offers a computationally efficient alternative to full physics models while providing dynamic states that empirical models lack. Performance is compared to another reduced-order method and to standard empirical references like NRLMSIS.

Core claim

The central discovery is that assimilating in-situ density observations from satellites such as CHAMP, GRACE, and Swarm into a SINDy_c-AR reduced-order model derived from TIE-GCM produces lower density estimation errors than the open-loop model predictions. This improvement is most visible during geomagnetic storms and under single-satellite coverage. On assimilated orbits the method performs similarly to a linear DMDc reference, while on withheld orbits results vary by scenario, and empirical models can sometimes be more accurate far from the observation track.

What carries the argument

The SINDy_c-AR autoregressive sparse identification of nonlinear dynamics with control model, which extracts dominant modes of thermospheric variability and their response to solar and geomagnetic inputs from the parent TIE-GCM simulation.

If this is right

Density estimates improve over open-loop forecasts particularly in storm conditions.
The method works with sparse observations from limited satellite coverage.
SINDy_c-AR matches or exceeds DMDc in training scenarios but not always out-of-training.
Assimilated results are positioned as enhancements to the forecast rather than replacements for empirical models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could enable better real-time orbit predictions for satellites during space weather events.
Extending the assimilation to multiple parameters beyond density might improve overall atmospheric modeling.
The approach suggests that reduced-order models can bridge the gap between expensive physics simulations and fast empirical ones when augmented with data.
Testing on future satellite missions with different orbital configurations would further validate the method.

Load-bearing premise

The SINDy_c-AR model trained on TIE-GCM continues to capture the main dynamics accurately when forced with actual solar and geomagnetic data and updated with real but sparse satellite observations.

What would settle it

A direct comparison of density estimation errors between the assimilated SINDy_c-AR model and the open-loop version during a major geomagnetic storm not included in the training data, using independent validation measurements.

Figures

Figures reproduced from arXiv: 2604.24646 by Daniele Sicoli, Piyush Mehta, Sriram Narayanan.

**Figure 1.** Figure 1: Integrated framework for thermospheric density estimation, partitioned into offline training and real-time assimilation phases. view at source ↗

**Figure 2.** Figure 2: Reconstructed thermospheric density during the Halloween storm period com view at source ↗

**Figure 3.** Figure 3: External drivers and assimilated reduced-order state during the Halloween view at source ↗

**Figure 4.** Figure 4: Global density diagnostics at 400 km (latitude vs. local solar time) for the view at source ↗

**Figure 5.** Figure 5: Reconstructed thermospheric density during November 2009 compared against view at source ↗

**Figure 6.** Figure 6: External drivers and assimilated reduced-order state during November 2009. view at source ↗

**Figure 7.** Figure 7: Global density diagnostics at 400 km for the dual-satellite November 2009 sce view at source ↗

**Figure 8.** Figure 8: Timeline of satellite missions providing density measurements from August view at source ↗

read the original abstract

Accurate estimation of thermospheric mass density is a prerequisite for orbit prediction and space situational awareness, where the upper atmosphere responds nonlinearly to solar and geomagnetic forcing across several orders of magnitude. Physics-based general circulation models resolve this response but are computationally expensive, while empirical models run cheaply but lack a time-evolving atmospheric state. This work couples a data-driven reduced-order thermospheric model with a Kalman filter that assimilates in situ density observations. An autoregressive Sparse Identification of Nonlinear Dynamics with control (SINDy$_c$-AR) reduced-order model derived from the Thermosphere-Ionosphere-Electrodynamics General Circulation Model (TIE-GCM) captures the dominant modes of variability and their dependence on solar and geomagnetic drivers at a fraction of the parent model's cost. Density observations from CHAMP, GRACE, GRACE-FO, GOCE, and Swarm are assimilated across a range of orbital configurations and geomagnetic conditions, with a linear DMDc model evaluated as a reference. Assimilation reduces density estimation error relative to open-loop predictions, most visibly during geomagnetic storms and under single-satellite coverage. SINDy$_c$-AR and DMDc perform comparably on assimilated orbits; on withheld orbits, SINDy$_c$-AR is more accurate in the in-training scenarios while DMDc is better in the out-of-training 2024 Swarm-C case. Benchmarks against NRLMSIS~2.1 and HASDM (2000--2019, where available) show that empirical references can outperform the assimilated model far from the assimilated track, so results are framed as improvements over the open-loop forecast.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper develops a reduced-order data assimilation system for thermospheric mass density by extracting a physics-informed SINDy_c-AR model from TIE-GCM simulations and coupling it to a Kalman filter that ingests in-situ density observations from CHAMP, GRACE, GRACE-FO, GOCE, and Swarm. The central claim is that the assimilated SINDy_c-AR forecasts reduce estimation error relative to open-loop runs, with largest gains during geomagnetic storms and under single-satellite coverage; performance is benchmarked against a DMDc reference, NRLMSIS 2.1, and HASDM, with results framed as improvements over the open-loop forecast rather than absolute superiority.

Significance. If the reported error reductions hold under broader validation, the approach offers a computationally lightweight bridge between expensive physics-based GCMs and purely empirical models, enabling improved orbit prediction and space situational awareness. The use of real multi-satellite observations across varied geomagnetic conditions and the explicit comparison to both data-driven (DMDc) and empirical baselines are strengths; the work also demonstrates how SINDy-derived reduced-order models can be made control-aware for assimilation.

major comments (3)

[§4.3, §5.2] §4.3 and §5.2 (out-of-training 2024 Swarm-C results): The manuscript reports that SINDy_c-AR outperforms DMDc on in-training withheld orbits but DMDc is superior on the 2024 out-of-training Swarm-C case. This reversal directly challenges the load-bearing assumption that the TIE-GCM-derived SINDy_c-AR structure generalizes to real solar/geomagnetic inputs and sparse observations outside the training distribution; the assimilation gains claimed in the abstract therefore require additional quantitative cross-validation metrics (e.g., RMSE tables stratified by storm/non-storm and in/out-of-training) to be considered robust.
[§5.3] §5.3 (benchmarking against NRLMSIS 2.1 and HASDM): The text notes that empirical models can outperform the assimilated result far from the assimilated track, yet no quantitative error maps or distance-to-track dependence are provided. Because the central claim is framed as improvement over open-loop rather than over state-of-the-art empirical products, this omission leaves open whether the reported gains are practically meaningful for operational use where tracks are sparse.
[§3.2] §3.2 (SINDy_c-AR derivation and sparsity threshold): The reduced-order dimension and SINDy sparsity threshold are listed as free parameters; the manuscript does not report a systematic sensitivity study or cross-validation procedure for these choices when the model is driven by observed (rather than TIE-GCM) solar/geomagnetic indices. This choice is load-bearing for the claimed generalization.

minor comments (3)

[Figures 4,5] Figure 4 and 5 captions should explicitly state the time periods, geomagnetic indices, and satellite configurations used for each panel to allow readers to map results to the in-training vs. out-of-training distinction.
[Abstract, §5] The abstract states that assimilation lowers error versus open-loop runs but provides no numerical values; the results section should include a concise table of percentage error reductions (or RMSE ratios) for the key regimes (storm vs. quiet, single- vs. multi-satellite).
[§3.1] Notation for the autoregressive term in SINDy_c-AR should be clarified in §3.1 to distinguish it from standard SINDy_c; a short equation block would help.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review of our manuscript. We address each major comment point by point below, with clear indications of planned revisions to improve the robustness and clarity of the work.

read point-by-point responses

Referee: [§4.3, §5.2] The manuscript reports that SINDy_c-AR outperforms DMDc on in-training withheld orbits but DMDc is superior on the 2024 out-of-training Swarm-C case. This reversal directly challenges the load-bearing assumption that the TIE-GCM-derived SINDy_c-AR structure generalizes to real solar/geomagnetic inputs and sparse observations outside the training distribution; the assimilation gains claimed in the abstract therefore require additional quantitative cross-validation metrics (e.g., RMSE tables stratified by storm/non-storm and in/out-of-training) to be considered robust.

Authors: We appreciate the referee highlighting this performance reversal, which we already note in the manuscript for the 2024 out-of-training Swarm-C case. This observation underscores the challenges of generalization beyond the TIE-GCM training distribution. To strengthen the evidence for the claimed assimilation gains, we will add stratified RMSE tables in the revised manuscript, broken down by storm versus non-storm periods and in-training versus out-of-training scenarios. These metrics will provide quantitative cross-validation to better contextualize where the SINDy_c-AR approach offers advantages. revision: yes
Referee: [§5.3] The text notes that empirical models can outperform the assimilated result far from the assimilated track, yet no quantitative error maps or distance-to-track dependence are provided. Because the central claim is framed as improvement over open-loop rather than over state-of-the-art empirical products, this omission leaves open whether the reported gains are practically meaningful for operational use where tracks are sparse.

Authors: The referee correctly identifies that we acknowledge empirical models (NRLMSIS 2.1 and HASDM) can outperform the assimilated results far from the track. Our claims are deliberately framed as improvements over the open-loop forecast, not as superiority to empirical baselines. We will revise §5.3 to include a more quantitative discussion of distance-to-track dependence, adding analysis or supplementary figures showing error trends as a function of distance from assimilated orbits where the multi-satellite data permit. This will better address operational relevance under sparse coverage. revision: partial
Referee: [§3.2] The reduced-order dimension and SINDy sparsity threshold are listed as free parameters; the manuscript does not report a systematic sensitivity study or cross-validation procedure for these choices when the model is driven by observed (rather than TIE-GCM) solar/geomagnetic indices. This choice is load-bearing for the claimed generalization.

Authors: We agree that the selection of reduced-order dimension and sparsity threshold is critical for generalization claims. These hyperparameters were chosen via cross-validation on the TIE-GCM training simulations to balance fidelity and sparsity. However, we did not conduct an explicit sensitivity analysis when the model is forced by observed solar/geomagnetic indices. In the revision, we will expand §3.2 with details of the original selection procedure and add a sensitivity study (or appendix) evaluating performance variations under observed drivers to support the generalization to real data assimilation scenarios. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation uses independent TIE-GCM source and external benchmarks

full rationale

The reduced-order SINDy_c-AR model is extracted from the external TIE-GCM physics simulation and then driven by real solar/geomagnetic indices while assimilating independent in-situ observations from CHAMP/GRACE/Swarm. Performance is benchmarked against NRLMSIS 2.1 and HASDM on held-out orbits and out-of-training 2024 data. No equation, prediction, or uniqueness claim reduces by construction to a quantity defined only by the paper's own fitted parameters or self-citations. The central assimilation gains are therefore tested against external references rather than being tautological.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that a low-dimensional SINDy_c-AR surrogate extracted from TIE-GCM retains predictive skill under real solar/geomagnetic forcing and sparse observations. Standard Kalman-filter linearity and Gaussian-noise assumptions are also invoked. No new physical entities are postulated.

free parameters (1)

reduced-order dimension and SINDy sparsity threshold
The number of retained modes and the sparsity parameter that selects active terms in the SINDy library are chosen to balance fidelity and cost; these choices affect the surrogate's accuracy.

axioms (2)

domain assumption Thermospheric density variability can be adequately captured by a finite set of dominant modes extracted from TIE-GCM output
Invoked when the reduced-order model is constructed from the parent simulation.
domain assumption Kalman filter update equations remain valid for the nonlinear, externally forced thermospheric system
Standard assumption of the assimilation step.

pith-pipeline@v0.9.0 · 5600 in / 1606 out tokens · 58213 ms · 2026-05-08T01:39:58.586847+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Arbabi, H., & Mezic, I. (2017). Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the koopman operator.SIAM Journal on Applied Dynamical Systems,16(4), 2096–2126. Barlier, F., Berger, C., Falin, J. L., Kockarts, G., & Thuillier, G. (1978). A thermo- spheric model based on satellite drag data.Annales de Geophysique,34, 9...

work page doi:10.1007/s001900050158 2017
[2]

Retrieved from https://api.semanticscholar.org/CorpusID:197551811 Dudok de Wit, Thierry, Bruinsma, Sean, & Shibasaki, Kiyoto

aeron- omy swarm science results after two years in space.. Retrieved from https://api.semanticscholar.org/CorpusID:197551811 Dudok de Wit, Thierry, Bruinsma, Sean, & Shibasaki, Kiyoto. (2014). Synop- tic radio observations as proxies for upper atmosphere modelling.J. Space Weather Space Clim.,4, A06. Retrieved fromhttps://doi.org/10.1051/ swsc/2014003doi...

work page doi:10.1051/swsc/2014003 2014
[3]

Narayanan, S., Mohamed, M. N. G., Paranjape, I., Nayak, I., Chakravorty, S., & Kumar, M. (2026).State forecasting in an estimation framework with surrogate sensor modeling.Retrieved fromhttps://arxiv.org/abs/2604.19442 Paetzold, H. K., & Zsch¨ orner, H. (1961). An annual and a semiannual variation of the upper air density.Pure and Applied Geophysics,48(1)...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1029/2004ja010585 2026
[4]

Brandon Rhodes

doi: 10.1029/2002JA009430 Proctor, J. L., Kutz, J. N., & Brunton, S. L. (2016). Dynamic mode decomposition with control.SIAM Journal on Applied Dynamical Systems,15(1), 142–161. –27– manuscript submitted toAGU Pr¨ olss, G. W. (2004).Physics of the earth’s space environment: An introduction. New York: Springer. Qian, L., Burns, A. G., Emery, B. A., Foster,...

work page doi:10.1029/2002ja009430 2016
[5]

Retrieved from https://doi.org/10.1051/swsc/2023014doi: 10.1051/swsc/2023014 Sterne, T. E. (1958). An atmospheric model, and some remarks on the inference of density from the orbit of a close earth satellite.Astronomical Journal,63, 81–

work page doi:10.1051/swsc/2023014doi: 1958
[6]

F., Bowman, B

doi: 10.1086/107696 Storz, M. F., Bowman, B. R., Branson, M. J. I., Casali, S. J., & Tobiska, W. K. (2005). High accuracy satellite drag model (hasdm).Advances in Space Re- search,36(12), 2497–2505. Sutton, E. K. (2018). A new method of physics-based data assimilation for the quiet and disturbed thermosphere.Space Weather,16, 736–753. doi: 10.1002/ 2017SW...

work page doi:10.1086/107696 2005

[1] [1]

Arbabi, H., & Mezic, I. (2017). Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the koopman operator.SIAM Journal on Applied Dynamical Systems,16(4), 2096–2126. Barlier, F., Berger, C., Falin, J. L., Kockarts, G., & Thuillier, G. (1978). A thermo- spheric model based on satellite drag data.Annales de Geophysique,34, 9...

work page doi:10.1007/s001900050158 2017

[2] [2]

Retrieved from https://api.semanticscholar.org/CorpusID:197551811 Dudok de Wit, Thierry, Bruinsma, Sean, & Shibasaki, Kiyoto

aeron- omy swarm science results after two years in space.. Retrieved from https://api.semanticscholar.org/CorpusID:197551811 Dudok de Wit, Thierry, Bruinsma, Sean, & Shibasaki, Kiyoto. (2014). Synop- tic radio observations as proxies for upper atmosphere modelling.J. Space Weather Space Clim.,4, A06. Retrieved fromhttps://doi.org/10.1051/ swsc/2014003doi...

work page doi:10.1051/swsc/2014003 2014

[3] [3]

Narayanan, S., Mohamed, M. N. G., Paranjape, I., Nayak, I., Chakravorty, S., & Kumar, M. (2026).State forecasting in an estimation framework with surrogate sensor modeling.Retrieved fromhttps://arxiv.org/abs/2604.19442 Paetzold, H. K., & Zsch¨ orner, H. (1961). An annual and a semiannual variation of the upper air density.Pure and Applied Geophysics,48(1)...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1029/2004ja010585 2026

[4] [4]

Brandon Rhodes

doi: 10.1029/2002JA009430 Proctor, J. L., Kutz, J. N., & Brunton, S. L. (2016). Dynamic mode decomposition with control.SIAM Journal on Applied Dynamical Systems,15(1), 142–161. –27– manuscript submitted toAGU Pr¨ olss, G. W. (2004).Physics of the earth’s space environment: An introduction. New York: Springer. Qian, L., Burns, A. G., Emery, B. A., Foster,...

work page doi:10.1029/2002ja009430 2016

[5] [5]

Retrieved from https://doi.org/10.1051/swsc/2023014doi: 10.1051/swsc/2023014 Sterne, T. E. (1958). An atmospheric model, and some remarks on the inference of density from the orbit of a close earth satellite.Astronomical Journal,63, 81–

work page doi:10.1051/swsc/2023014doi: 1958

[6] [6]

F., Bowman, B

doi: 10.1086/107696 Storz, M. F., Bowman, B. R., Branson, M. J. I., Casali, S. J., & Tobiska, W. K. (2005). High accuracy satellite drag model (hasdm).Advances in Space Re- search,36(12), 2497–2505. Sutton, E. K. (2018). A new method of physics-based data assimilation for the quiet and disturbed thermosphere.Space Weather,16, 736–753. doi: 10.1002/ 2017SW...

work page doi:10.1086/107696 2005