Reduced-Order Data Assimilation for Thermospheric Density Using Physics-informed SINDyc Models
Pith reviewed 2026-05-08 01:39 UTC · model grok-4.3
The pith
Coupling a physics-informed reduced-order model with a Kalman filter reduces thermospheric density estimation errors compared to open-loop predictions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that assimilating in-situ density observations from satellites such as CHAMP, GRACE, and Swarm into a SINDy_c-AR reduced-order model derived from TIE-GCM produces lower density estimation errors than the open-loop model predictions. This improvement is most visible during geomagnetic storms and under single-satellite coverage. On assimilated orbits the method performs similarly to a linear DMDc reference, while on withheld orbits results vary by scenario, and empirical models can sometimes be more accurate far from the observation track.
What carries the argument
The SINDy_c-AR autoregressive sparse identification of nonlinear dynamics with control model, which extracts dominant modes of thermospheric variability and their response to solar and geomagnetic inputs from the parent TIE-GCM simulation.
If this is right
- Density estimates improve over open-loop forecasts particularly in storm conditions.
- The method works with sparse observations from limited satellite coverage.
- SINDy_c-AR matches or exceeds DMDc in training scenarios but not always out-of-training.
- Assimilated results are positioned as enhancements to the forecast rather than replacements for empirical models.
Where Pith is reading between the lines
- This could enable better real-time orbit predictions for satellites during space weather events.
- Extending the assimilation to multiple parameters beyond density might improve overall atmospheric modeling.
- The approach suggests that reduced-order models can bridge the gap between expensive physics simulations and fast empirical ones when augmented with data.
- Testing on future satellite missions with different orbital configurations would further validate the method.
Load-bearing premise
The SINDy_c-AR model trained on TIE-GCM continues to capture the main dynamics accurately when forced with actual solar and geomagnetic data and updated with real but sparse satellite observations.
What would settle it
A direct comparison of density estimation errors between the assimilated SINDy_c-AR model and the open-loop version during a major geomagnetic storm not included in the training data, using independent validation measurements.
Figures
read the original abstract
Accurate estimation of thermospheric mass density is a prerequisite for orbit prediction and space situational awareness, where the upper atmosphere responds nonlinearly to solar and geomagnetic forcing across several orders of magnitude. Physics-based general circulation models resolve this response but are computationally expensive, while empirical models run cheaply but lack a time-evolving atmospheric state. This work couples a data-driven reduced-order thermospheric model with a Kalman filter that assimilates in situ density observations. An autoregressive Sparse Identification of Nonlinear Dynamics with control (SINDy$_c$-AR) reduced-order model derived from the Thermosphere-Ionosphere-Electrodynamics General Circulation Model (TIE-GCM) captures the dominant modes of variability and their dependence on solar and geomagnetic drivers at a fraction of the parent model's cost. Density observations from CHAMP, GRACE, GRACE-FO, GOCE, and Swarm are assimilated across a range of orbital configurations and geomagnetic conditions, with a linear DMDc model evaluated as a reference. Assimilation reduces density estimation error relative to open-loop predictions, most visibly during geomagnetic storms and under single-satellite coverage. SINDy$_c$-AR and DMDc perform comparably on assimilated orbits; on withheld orbits, SINDy$_c$-AR is more accurate in the in-training scenarios while DMDc is better in the out-of-training 2024 Swarm-C case. Benchmarks against NRLMSIS~2.1 and HASDM (2000--2019, where available) show that empirical references can outperform the assimilated model far from the assimilated track, so results are framed as improvements over the open-loop forecast.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a reduced-order data assimilation system for thermospheric mass density by extracting a physics-informed SINDy_c-AR model from TIE-GCM simulations and coupling it to a Kalman filter that ingests in-situ density observations from CHAMP, GRACE, GRACE-FO, GOCE, and Swarm. The central claim is that the assimilated SINDy_c-AR forecasts reduce estimation error relative to open-loop runs, with largest gains during geomagnetic storms and under single-satellite coverage; performance is benchmarked against a DMDc reference, NRLMSIS 2.1, and HASDM, with results framed as improvements over the open-loop forecast rather than absolute superiority.
Significance. If the reported error reductions hold under broader validation, the approach offers a computationally lightweight bridge between expensive physics-based GCMs and purely empirical models, enabling improved orbit prediction and space situational awareness. The use of real multi-satellite observations across varied geomagnetic conditions and the explicit comparison to both data-driven (DMDc) and empirical baselines are strengths; the work also demonstrates how SINDy-derived reduced-order models can be made control-aware for assimilation.
major comments (3)
- [§4.3, §5.2] §4.3 and §5.2 (out-of-training 2024 Swarm-C results): The manuscript reports that SINDy_c-AR outperforms DMDc on in-training withheld orbits but DMDc is superior on the 2024 out-of-training Swarm-C case. This reversal directly challenges the load-bearing assumption that the TIE-GCM-derived SINDy_c-AR structure generalizes to real solar/geomagnetic inputs and sparse observations outside the training distribution; the assimilation gains claimed in the abstract therefore require additional quantitative cross-validation metrics (e.g., RMSE tables stratified by storm/non-storm and in/out-of-training) to be considered robust.
- [§5.3] §5.3 (benchmarking against NRLMSIS 2.1 and HASDM): The text notes that empirical models can outperform the assimilated result far from the assimilated track, yet no quantitative error maps or distance-to-track dependence are provided. Because the central claim is framed as improvement over open-loop rather than over state-of-the-art empirical products, this omission leaves open whether the reported gains are practically meaningful for operational use where tracks are sparse.
- [§3.2] §3.2 (SINDy_c-AR derivation and sparsity threshold): The reduced-order dimension and SINDy sparsity threshold are listed as free parameters; the manuscript does not report a systematic sensitivity study or cross-validation procedure for these choices when the model is driven by observed (rather than TIE-GCM) solar/geomagnetic indices. This choice is load-bearing for the claimed generalization.
minor comments (3)
- [Figures 4,5] Figure 4 and 5 captions should explicitly state the time periods, geomagnetic indices, and satellite configurations used for each panel to allow readers to map results to the in-training vs. out-of-training distinction.
- [Abstract, §5] The abstract states that assimilation lowers error versus open-loop runs but provides no numerical values; the results section should include a concise table of percentage error reductions (or RMSE ratios) for the key regimes (storm vs. quiet, single- vs. multi-satellite).
- [§3.1] Notation for the autoregressive term in SINDy_c-AR should be clarified in §3.1 to distinguish it from standard SINDy_c; a short equation block would help.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review of our manuscript. We address each major comment point by point below, with clear indications of planned revisions to improve the robustness and clarity of the work.
read point-by-point responses
-
Referee: [§4.3, §5.2] The manuscript reports that SINDy_c-AR outperforms DMDc on in-training withheld orbits but DMDc is superior on the 2024 out-of-training Swarm-C case. This reversal directly challenges the load-bearing assumption that the TIE-GCM-derived SINDy_c-AR structure generalizes to real solar/geomagnetic inputs and sparse observations outside the training distribution; the assimilation gains claimed in the abstract therefore require additional quantitative cross-validation metrics (e.g., RMSE tables stratified by storm/non-storm and in/out-of-training) to be considered robust.
Authors: We appreciate the referee highlighting this performance reversal, which we already note in the manuscript for the 2024 out-of-training Swarm-C case. This observation underscores the challenges of generalization beyond the TIE-GCM training distribution. To strengthen the evidence for the claimed assimilation gains, we will add stratified RMSE tables in the revised manuscript, broken down by storm versus non-storm periods and in-training versus out-of-training scenarios. These metrics will provide quantitative cross-validation to better contextualize where the SINDy_c-AR approach offers advantages. revision: yes
-
Referee: [§5.3] The text notes that empirical models can outperform the assimilated result far from the assimilated track, yet no quantitative error maps or distance-to-track dependence are provided. Because the central claim is framed as improvement over open-loop rather than over state-of-the-art empirical products, this omission leaves open whether the reported gains are practically meaningful for operational use where tracks are sparse.
Authors: The referee correctly identifies that we acknowledge empirical models (NRLMSIS 2.1 and HASDM) can outperform the assimilated results far from the track. Our claims are deliberately framed as improvements over the open-loop forecast, not as superiority to empirical baselines. We will revise §5.3 to include a more quantitative discussion of distance-to-track dependence, adding analysis or supplementary figures showing error trends as a function of distance from assimilated orbits where the multi-satellite data permit. This will better address operational relevance under sparse coverage. revision: partial
-
Referee: [§3.2] The reduced-order dimension and SINDy sparsity threshold are listed as free parameters; the manuscript does not report a systematic sensitivity study or cross-validation procedure for these choices when the model is driven by observed (rather than TIE-GCM) solar/geomagnetic indices. This choice is load-bearing for the claimed generalization.
Authors: We agree that the selection of reduced-order dimension and sparsity threshold is critical for generalization claims. These hyperparameters were chosen via cross-validation on the TIE-GCM training simulations to balance fidelity and sparsity. However, we did not conduct an explicit sensitivity analysis when the model is forced by observed solar/geomagnetic indices. In the revision, we will expand §3.2 with details of the original selection procedure and add a sensitivity study (or appendix) evaluating performance variations under observed drivers to support the generalization to real data assimilation scenarios. revision: yes
Circularity Check
No significant circularity; derivation uses independent TIE-GCM source and external benchmarks
full rationale
The reduced-order SINDy_c-AR model is extracted from the external TIE-GCM physics simulation and then driven by real solar/geomagnetic indices while assimilating independent in-situ observations from CHAMP/GRACE/Swarm. Performance is benchmarked against NRLMSIS 2.1 and HASDM on held-out orbits and out-of-training 2024 data. No equation, prediction, or uniqueness claim reduces by construction to a quantity defined only by the paper's own fitted parameters or self-citations. The central assimilation gains are therefore tested against external references rather than being tautological.
Axiom & Free-Parameter Ledger
free parameters (1)
- reduced-order dimension and SINDy sparsity threshold
axioms (2)
- domain assumption Thermospheric density variability can be adequately captured by a finite set of dominant modes extracted from TIE-GCM output
- domain assumption Kalman filter update equations remain valid for the nonlinear, externally forced thermospheric system
Reference graph
Works this paper leans on
-
[1]
Arbabi, H., & Mezic, I. (2017). Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the koopman operator.SIAM Journal on Applied Dynamical Systems,16(4), 2096–2126. Barlier, F., Berger, C., Falin, J. L., Kockarts, G., & Thuillier, G. (1978). A thermo- spheric model based on satellite drag data.Annales de Geophysique,34, 9...
-
[2]
aeron- omy swarm science results after two years in space.. Retrieved from https://api.semanticscholar.org/CorpusID:197551811 Dudok de Wit, Thierry, Bruinsma, Sean, & Shibasaki, Kiyoto. (2014). Synop- tic radio observations as proxies for upper atmosphere modelling.J. Space Weather Space Clim.,4, A06. Retrieved fromhttps://doi.org/10.1051/ swsc/2014003doi...
-
[3]
Narayanan, S., Mohamed, M. N. G., Paranjape, I., Nayak, I., Chakravorty, S., & Kumar, M. (2026).State forecasting in an estimation framework with surrogate sensor modeling.Retrieved fromhttps://arxiv.org/abs/2604.19442 Paetzold, H. K., & Zsch¨ orner, H. (1961). An annual and a semiannual variation of the upper air density.Pure and Applied Geophysics,48(1)...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1029/2004ja010585 2026
-
[4]
doi: 10.1029/2002JA009430 Proctor, J. L., Kutz, J. N., & Brunton, S. L. (2016). Dynamic mode decomposition with control.SIAM Journal on Applied Dynamical Systems,15(1), 142–161. –27– manuscript submitted toAGU Pr¨ olss, G. W. (2004).Physics of the earth’s space environment: An introduction. New York: Springer. Qian, L., Burns, A. G., Emery, B. A., Foster,...
-
[5]
Retrieved from https://doi.org/10.1051/swsc/2023014doi: 10.1051/swsc/2023014 Sterne, T. E. (1958). An atmospheric model, and some remarks on the inference of density from the orbit of a close earth satellite.Astronomical Journal,63, 81–
-
[6]
doi: 10.1086/107696 Storz, M. F., Bowman, B. R., Branson, M. J. I., Casali, S. J., & Tobiska, W. K. (2005). High accuracy satellite drag model (hasdm).Advances in Space Re- search,36(12), 2497–2505. Sutton, E. K. (2018). A new method of physics-based data assimilation for the quiet and disturbed thermosphere.Space Weather,16, 736–753. doi: 10.1002/ 2017SW...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.