Connecting the Dots: A Machine Learning Ready Dataset for Ionospheric Forecasting Models
Pith reviewed 2026-05-17 20:13 UTC · model grok-4.3
The pith
A curated open dataset integrates solar, geomagnetic and ionospheric measurements into a single machine-learning-ready structure for forecasting total electron content.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a curated, open-access dataset that integrates diverse ionospheric and heliospheric measurements into a coherent, machine learning-ready structure, designed specifically to support next-generation forecasting models and address gaps in current operational frameworks. Our workflow integrates Solar Dynamic Observatory data, solar irradiance indices (F10.7), solar wind parameters, geomagnetic activity indices (Kp, AE, SYM-H), NASA JPL Global Ionospheric Maps, World-Wide GNSS Receiver Network TEC, and crowdsourced Android smartphone TEC. This heterogeneous dataset is temporally and spatially aligned into a single modular data structure that supports both physical and data-driven model
What carries the argument
The temporally and spatially aligned modular data structure that unifies satellite, ground-network, and smartphone-derived total electron content observations with solar and geomagnetic drivers.
If this is right
- The dataset directly enables training and benchmarking of spatiotemporal architectures for vertical TEC forecasting under quiet and active geomagnetic conditions.
- It supplies a consistent foundation for exploring coupled Sun-Earth interactions through both physics-based and data-driven approaches.
- Operational space-weather services can use the aligned structure to fill gaps caused by sparse observations in current forecasting pipelines.
- The modular format supports extension to additional data layers while preserving machine-learning readiness.
Where Pith is reading between the lines
- Real-time ingestion of smartphone TEC could raise spatial resolution of forecasts in regions with dense mobile coverage.
- The same alignment workflow could be reused for other coupled geophysical systems such as magnetosphere-ionosphere or troposphere-stratosphere forecasting.
- Public release of the dataset may accelerate community benchmarks that compare physical models against machine learning approaches on identical input streams.
Load-bearing premise
The selected data sources can be aligned in time and space without introducing systematic biases that would harm downstream machine learning model accuracy, particularly when including crowdsourced smartphone TEC values.
What would settle it
Training models on the aligned dataset and then observing substantially higher forecast errors on a test set drawn from independently aligned or higher-volume crowdsourced measurements would indicate that alignment biases degrade performance.
Figures
read the original abstract
Operational forecasting of the ionosphere remains a critical space weather challenge due to sparse observations, complex coupling across geospatial layers, and a growing need for timely, accurate predictions that support Global Navigation Satellite System (GNSS), communications, aviation safety, as well as satellite operations. As part of the 2025 NASA Heliolab, we present a curated, open-access dataset that integrates diverse ionospheric and heliospheric measurements into a coherent, machine learning-ready structure, designed specifically to support next-generation forecasting models and address gaps in current operational frameworks. Our workflow integrates a large selection of data sources comprising Solar Dynamic Observatory data, solar irradiance indices (F10.7), solar wind parameters (velocity and interplanetary magnetic field), geomagnetic activity indices (Kp, AE, SYM-H), and NASA JPL's Global Ionospheric Maps of Total Electron Content (GIM-TEC). We also implement geospatially sparse data such as the TEC derived from the World-Wide GNSS Receiver Network and crowdsourced Android smartphone measurements. This novel heterogeneous dataset is temporally and spatially aligned into a single, modular data structure that supports both physical and data-driven modeling. Leveraging this dataset, we train and benchmark several spatiotemporal machine learning architectures for forecasting vertical TEC under both quiet and geomagnetically active conditions. This work presents an extensive dataset and modeling pipeline that enables exploration of not only ionospheric dynamics but also broader Sun-Earth interactions, supporting both scientific inquiry and operational forecasting efforts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a curated, open-access dataset integrating heterogeneous ionospheric and heliospheric measurements—including SDO data, F10.7 indices, solar wind parameters, geomagnetic indices (Kp, AE, SYM-H), JPL GIM-TEC, WWGNSS, and crowdsourced Android smartphone TEC—into a temporally and spatially aligned, modular structure for machine learning. The authors benchmark several spatiotemporal ML architectures for vertical TEC forecasting under quiet and geomagnetically active conditions.
Significance. If the alignment quality is rigorously validated, the dataset could serve as a valuable open resource for advancing ML-based ionospheric forecasting and Sun-Earth interaction studies, particularly by incorporating sparse crowdsourced observations to address data gaps in operational space weather applications. The integration of diverse sources and the benchmarking exercise are positive contributions to reproducibility in the field.
major comments (2)
- [Data integration and alignment section] Data integration and alignment section: The manuscript states that sources are 'temporally and spatially aligned into a single, modular data structure' but provides no quantitative validation of alignment quality (e.g., RMSE, bias, or correlation statistics between smartphone TEC and GIM-TEC references after interpolation or gridding). Given the known receiver-specific offsets, multipath, and pierce-point uncertainties in crowdsourced TEC, this omission leaves the central claim that the dataset is free of systematic biases that could degrade downstream ML performance unsubstantiated.
- [Benchmarking section] Benchmarking section: The description of training and benchmarking spatiotemporal ML models for TEC forecasting mentions evaluation under quiet and active conditions but reports no specific quantitative metrics, error analysis, or ablation studies on how alignment choices affect forecast skill. This makes it difficult to evaluate whether the dataset supports reliable forecasting as claimed.
minor comments (2)
- [Abstract] Abstract: The claim of a 'coherent, machine learning-ready structure' would benefit from a brief mention of any quality-control steps or example alignment accuracy to strengthen the summary for readers.
- [Figures and tables] Figure and table captions: Ensure captions explicitly list the data sources and time periods shown to improve clarity for users reproducing the dataset.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. We have carefully considered each major comment and provide point-by-point responses below. Where appropriate, we have revised the manuscript to address the concerns raised.
read point-by-point responses
-
Referee: [Data integration and alignment section] Data integration and alignment section: The manuscript states that sources are 'temporally and spatially aligned into a single, modular data structure' but provides no quantitative validation of alignment quality (e.g., RMSE, bias, or correlation statistics between smartphone TEC and GIM-TEC references after interpolation or gridding). Given the known receiver-specific offsets, multipath, and pierce-point uncertainties in crowdsourced TEC, this omission leaves the central claim that the dataset is free of systematic biases that could degrade downstream ML performance unsubstantiated.
Authors: We agree that the manuscript would benefit from explicit quantitative validation of the alignment procedure, particularly for the crowdsourced smartphone TEC data relative to the JPL GIM-TEC reference. Although our alignment workflow applies standard temporal interpolation to common 15-minute timestamps and spatial projection onto a uniform 1° × 1° grid, we did not report summary statistics such as RMSE, mean bias, or Pearson correlation in the submitted version. In the revised manuscript we will add a dedicated validation subsection (within the Data Integration and Alignment section) that presents these metrics computed over a multi-month hold-out period, stratified by quiet and disturbed geomagnetic conditions. This addition will directly address concerns regarding potential systematic offsets and strengthen the claim that the integrated dataset is suitable for downstream ML applications. revision: yes
-
Referee: [Benchmarking section] Benchmarking section: The description of training and benchmarking spatiotemporal ML models for TEC forecasting mentions evaluation under quiet and active conditions but reports no specific quantitative metrics, error analysis, or ablation studies on how alignment choices affect forecast skill. This makes it difficult to evaluate whether the dataset supports reliable forecasting as claimed.
Authors: We acknowledge that the benchmarking results in the current manuscript are presented at a high level and lack the detailed quantitative metrics, error distributions, and ablation experiments requested. While the paper does describe the model architectures and the separation into quiet versus geomagnetically active test periods, specific skill scores (RMSE, MAE, and correlation) and an analysis of how different alignment choices (e.g., inclusion or exclusion of smartphone data) influence forecast performance are not reported. In the revision we will expand the Benchmarking section to include these metrics for each model, together with an ablation study that quantifies the contribution of the crowdsourced observations and the impact of alternative gridding/interpolation choices on forecast skill. These additions will allow readers to assess the dataset’s utility for reliable TEC forecasting more rigorously. revision: yes
Circularity Check
No significant circularity in dataset curation and model benchmarking
full rationale
The paper describes integration of existing external data sources (SDO, F10.7, solar wind, Kp/AE/SYM-H, GIM-TEC, WWGNSS, smartphone TEC) into a temporally/spatially aligned structure, followed by training and benchmarking of ML models on the resulting dataset. No equations, first-principles derivations, or predictions are claimed that reduce by construction to fitted parameters or self-referential inputs. Alignment and quality filtering are explicit preprocessing choices, not outputs presented as independent results. The central contribution is empirical data preparation and evaluation rather than a closed derivation loop.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Data from disparate instruments can be aligned in time and space without introducing biases larger than the natural variability of the ionosphere.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This novel heterogeneous dataset is temporally and spatially aligned into a single, modular data structure that supports both physical and data-driven modeling.
-
IndisputableMonolith/Foundation/Atomicity.leanatomic_tick unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We align our dataset to the start and end dates of the SDO Foundation Model... forward-filling approach to fill in short gaps
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Flying through uncertainty.Space Weather, 18(1):e2019SW002373, 2020
Thomas E Berger, MJ Holzinger, EK Sutton, and JP Thayer. Flying through uncertainty.Space Weather, 18(1):e2019SW002373, 2020
work page 2020
-
[2]
Jr. Kintner, P. M. Observations of velocity shear driven plasma turbulence.Journal of Geophys- ical Research, 81(A28):5114–5122, October 1976
work page 1976
-
[3]
Ryuho Kataoka, Daikou Shiota, Hitoshi Fujiwara, Hidekatsu Jin, Chihiro Tao, Hiroyuki Shi- nagawa, and Yasunobu Miyoshi. Unexpected space weather causing the reentry of 38 starlink satellites in february 2022.Journal of Space Weather and Space Climate, 12:41, 2022
work page 2022
-
[4]
Antti Pulkkinen, E Bernabeu, A Thomson, A Viljanen, R Pirjola, D Boteler, J Eichner, PJ Cilliers, D Welling, NP Savani, et al. Geomagnetically induced currents: Science, engineering, and applications readiness.Space weather, 15(7):828–856, 2017
work page 2017
-
[5]
Steven M Petrinec, CA Kletzing, DM Miles, Stephen A Fuselier, IW Christopher, Danielle Crawford, Sanny Omar, Scott R Bounds, John W Bonnell, Jasper S Halekas, et al. The tandem reconnection and cusp electrodynamics reconnaissance satellites (tracers) mission design.Space science reviews, 221(5):1–23, 2025
work page 2025
-
[6]
JP Eastwood, P Brown, W Magnes, CM Carr, M Agu, R Baughen, G Berghofer, J Hodgkins, I Jernej, C Möstl, et al. The vigil magnetometer for operational space weather services from the sun-earth l5 point.Space Weather, 22(6):e2024SW003867, 2024
work page 2024
-
[7]
The advanced composition explorer.Space Science Reviews, 86(1):1–22, 1998
Edward C Stone, AM Frandsen, RA Mewaldt, ER Christian, D Margolies, JF Ormes, and F Snow. The advanced composition explorer.Space Science Reviews, 86(1):1–22, 1998
work page 1998
-
[8]
A quarter century of wind spacecraft discoveries, 2021
Lynn B Wilson III, Alexandra L Brosius, Natchimuthuk Gopalswamy, Teresa Nieves-Chinchilla, Adam Szabo, Kevin Hurley, Tai Phan, Justin C Kasper, Noé Lugaz, Ian G Richardson, et al. A quarter century of wind spacecraft discoveries, 2021
work page 2021
-
[9]
A Nishida, K Uesugi, I Nakatani, T Mukai, DH Fairfield, and MH Acuna. Geotail mission to explore earth’s magnetotail.Eos, Transactions American Geophysical Union, 73(40):425–429, 1992
work page 1992
-
[10]
Norbert Jakowski, C Mayer, MM Hoque, and V Wilken. Total electron content models and their use in ionosphere monitoring.Radio Science, 46(06):1–11, 2011
work page 2011
- [11]
-
[12]
AJ Mannucci, BD Wilson, DN Yuan, CH Ho, UJ Lindqwister, and TF Runge. A global mapping technique for gps-derived ionospheric total electron content measurements.Radio science, 33(3):565–582, 1998
work page 1998
-
[13]
Léo Martire, Thomas F Runge, Xing Meng, Siddharth Krishnamoorthy, Panagiotis Vergados, Anthony J Mannucci, Olga P Verkhoglyadova, Attila Komjáthy, Angelyn W Moore, Robert F Meyer, et al. The jpl-gim algorithm and products: multi-gnss high-rate global mapping of total electron content.Journal of Geodesy, 98(5), 2024
work page 2024
-
[14]
Global ionospheric maps for research – jpld data prod- uct
Olga Verkhoglyadova and Xing Meng. Global ionospheric maps for research – jpld data prod- uct. https://sideshow.jpl.nasa.gov/pub/iono_daily/gim_for_research/jpld/, April 2024. Last updated: 8 Apr 2024. Government sponsorship acknowledged
work page 2024
- [15]
-
[16]
A foundation model for the solar dynamics observatory.arXiv preprint arXiv:2410.02530, 2024
James Walsh, Daniel G Gass, Raul Ramos Pollan, Paul J Wright, Richard Galvez, Noah Kasmanoff, Jason Naradowsky, Anne Spalding, James Parr, and Atılım Güne¸ s Baydin. A foundation model for the solar dynamics observatory.arXiv preprint arXiv:2410.02530, 2024
-
[17]
W Kent Tobiska, BR Bowman, and SD Bouwer. Solar and geomagnetic indices for thermo- spheric density models.COSPAR International Reference Atmosphere, edited by Rees D. and Tobiska WK, 2012. 6
work page 2012
-
[18]
Jpl’s on-line solar system data service
JD Giorgini, DK Yeomans, AB Chamberlin, PW Chodas, RA Jacobson, MS Keesey, JH Lieske, SJ Ostro, EM Standish, and RN Wimberly. Jpl’s on-line solar system data service. In AAS/Division for Planetary Sciences Meeting Abstracts# 28, volume 28, pages 25–04, 1996
work page 1996
-
[19]
Magnetic coordinate systems.Space science reviews, 206(1):27–59, 2017
Karl Magnus Laundal and Arthur D Richmond. Magnetic coordinate systems.Space science reviews, 206(1):27–59, 2017
work page 2017
-
[20]
Manuel Hernández-Pajares, JM Juan, J Sanz, R Orus, A Garcia-Rigo, J Feltens, A Komjathy, SC Schaer, and A Krankowski. The igs vtec maps: a reliable source of ionospheric information since 1998.Journal of Geodesy, 83(3):263–275, 2009
work page 1998
-
[21]
Halil S. Kelebek, Linnea M. Wolniewicz, Michael D. Vergalla, Simone Mestici, Giacomo Acciarini, Bala Poduval, Umaa Rebbapragada, Olga Verkhoglyadova, Madhulika Guhathakurta, Thomas Berger, Frank Soboczenski, and Atılım Güne¸ s Baydin. Ioncast: A deep learning framework for forecasting ionospheric dynamics. InProceedings of the Machine Learning for the Phy...
work page 2025
-
[22]
JH King and NE Papitashvili. Solar wind spatial scales in and comparisons of hourly wind and ace plasma and magnetic field data.Journal of Geophysical Research: Space Physics, 110(A2), 2005
work page 2005
-
[23]
Long short-term memory.Neural Computation, 9(8):1735–1780, 11 1997
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735–1780, 11 1997
work page 1997
-
[24]
Boris Bonev, Thorsten Kurth, Ankur Mahesh, Mauro Bisson, Jean Kossaifi, Karthik Kashinath, Anima Anandkumar, William D. Collins, Michael S. Pritchard, and Alexander Keller. Four- castnet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale, 2025
work page 2025
-
[25]
Learning skillful medium-range global weather forecasting
Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, and Peter Battaglia. Learning skillful medium-range global weather forecasting. Scien...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.