pith. sign in

arxiv: 2511.15743 · v2 · pith:OJUEAO3Ynew · submitted 2025-11-18 · 💻 cs.LG · astro-ph.EP· astro-ph.IM

Connecting the Dots: A Machine Learning Ready Dataset for Ionospheric Forecasting Models

Pith reviewed 2026-05-17 20:13 UTC · model grok-4.3

classification 💻 cs.LG astro-ph.EPastro-ph.IM
keywords ionospheric forecastingmachine learning datasettotal electron contentspace weatherGNSSgeomagnetic indicessolar windspatiotemporal models
0
0 comments X

The pith

A curated open dataset integrates solar, geomagnetic and ionospheric measurements into a single machine-learning-ready structure for forecasting total electron content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a new open-access dataset that combines solar observatory images, solar wind and irradiance indices, geomagnetic activity measures, global ionospheric maps, ground GNSS receiver data, and crowdsourced smartphone total electron content readings. These sources are aligned in time and space into one modular format explicitly built for training spatiotemporal machine learning models. The resulting resource is intended to improve forecasts of vertical total electron content during both quiet and disturbed conditions, which in turn supports more reliable GNSS positioning, aviation routing, and satellite operations.

Core claim

We present a curated, open-access dataset that integrates diverse ionospheric and heliospheric measurements into a coherent, machine learning-ready structure, designed specifically to support next-generation forecasting models and address gaps in current operational frameworks. Our workflow integrates Solar Dynamic Observatory data, solar irradiance indices (F10.7), solar wind parameters, geomagnetic activity indices (Kp, AE, SYM-H), NASA JPL Global Ionospheric Maps, World-Wide GNSS Receiver Network TEC, and crowdsourced Android smartphone TEC. This heterogeneous dataset is temporally and spatially aligned into a single modular data structure that supports both physical and data-driven model

What carries the argument

The temporally and spatially aligned modular data structure that unifies satellite, ground-network, and smartphone-derived total electron content observations with solar and geomagnetic drivers.

If this is right

  • The dataset directly enables training and benchmarking of spatiotemporal architectures for vertical TEC forecasting under quiet and active geomagnetic conditions.
  • It supplies a consistent foundation for exploring coupled Sun-Earth interactions through both physics-based and data-driven approaches.
  • Operational space-weather services can use the aligned structure to fill gaps caused by sparse observations in current forecasting pipelines.
  • The modular format supports extension to additional data layers while preserving machine-learning readiness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Real-time ingestion of smartphone TEC could raise spatial resolution of forecasts in regions with dense mobile coverage.
  • The same alignment workflow could be reused for other coupled geophysical systems such as magnetosphere-ionosphere or troposphere-stratosphere forecasting.
  • Public release of the dataset may accelerate community benchmarks that compare physical models against machine learning approaches on identical input streams.

Load-bearing premise

The selected data sources can be aligned in time and space without introducing systematic biases that would harm downstream machine learning model accuracy, particularly when including crowdsourced smartphone TEC values.

What would settle it

Training models on the aligned dataset and then observing substantially higher forecast errors on a test set drawn from independently aligned or higher-volume crowdsourced measurements would indicate that alignment biases degrade performance.

Figures

Figures reproduced from arXiv: 2511.15743 by At{\i}l{\i}m G\"une\c{s} Baydin, Bala Poduval, Frank Soboczenski, Giacomo Acciarini, Halil S. Kelebek, Linnea M. Wolniewicz, Madhulika Guhathakurta, Michael D. Vergalla, Olga Verkhoglyadova, Simone Mestici, Thomas E. Berger.

Figure 1
Figure 1. Figure 1: Visualization of dataset inputs and alignment in time and dimension. Output dataset [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of the ’Monitoring Event Space-weather TEC Ionospheric Catalog Index’ (the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Operational forecasting of the ionosphere remains a critical space weather challenge due to sparse observations, complex coupling across geospatial layers, and a growing need for timely, accurate predictions that support Global Navigation Satellite System (GNSS), communications, aviation safety, as well as satellite operations. As part of the 2025 NASA Heliolab, we present a curated, open-access dataset that integrates diverse ionospheric and heliospheric measurements into a coherent, machine learning-ready structure, designed specifically to support next-generation forecasting models and address gaps in current operational frameworks. Our workflow integrates a large selection of data sources comprising Solar Dynamic Observatory data, solar irradiance indices (F10.7), solar wind parameters (velocity and interplanetary magnetic field), geomagnetic activity indices (Kp, AE, SYM-H), and NASA JPL's Global Ionospheric Maps of Total Electron Content (GIM-TEC). We also implement geospatially sparse data such as the TEC derived from the World-Wide GNSS Receiver Network and crowdsourced Android smartphone measurements. This novel heterogeneous dataset is temporally and spatially aligned into a single, modular data structure that supports both physical and data-driven modeling. Leveraging this dataset, we train and benchmark several spatiotemporal machine learning architectures for forecasting vertical TEC under both quiet and geomagnetically active conditions. This work presents an extensive dataset and modeling pipeline that enables exploration of not only ionospheric dynamics but also broader Sun-Earth interactions, supporting both scientific inquiry and operational forecasting efforts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a curated, open-access dataset integrating heterogeneous ionospheric and heliospheric measurements—including SDO data, F10.7 indices, solar wind parameters, geomagnetic indices (Kp, AE, SYM-H), JPL GIM-TEC, WWGNSS, and crowdsourced Android smartphone TEC—into a temporally and spatially aligned, modular structure for machine learning. The authors benchmark several spatiotemporal ML architectures for vertical TEC forecasting under quiet and geomagnetically active conditions.

Significance. If the alignment quality is rigorously validated, the dataset could serve as a valuable open resource for advancing ML-based ionospheric forecasting and Sun-Earth interaction studies, particularly by incorporating sparse crowdsourced observations to address data gaps in operational space weather applications. The integration of diverse sources and the benchmarking exercise are positive contributions to reproducibility in the field.

major comments (2)
  1. [Data integration and alignment section] Data integration and alignment section: The manuscript states that sources are 'temporally and spatially aligned into a single, modular data structure' but provides no quantitative validation of alignment quality (e.g., RMSE, bias, or correlation statistics between smartphone TEC and GIM-TEC references after interpolation or gridding). Given the known receiver-specific offsets, multipath, and pierce-point uncertainties in crowdsourced TEC, this omission leaves the central claim that the dataset is free of systematic biases that could degrade downstream ML performance unsubstantiated.
  2. [Benchmarking section] Benchmarking section: The description of training and benchmarking spatiotemporal ML models for TEC forecasting mentions evaluation under quiet and active conditions but reports no specific quantitative metrics, error analysis, or ablation studies on how alignment choices affect forecast skill. This makes it difficult to evaluate whether the dataset supports reliable forecasting as claimed.
minor comments (2)
  1. [Abstract] Abstract: The claim of a 'coherent, machine learning-ready structure' would benefit from a brief mention of any quality-control steps or example alignment accuracy to strengthen the summary for readers.
  2. [Figures and tables] Figure and table captions: Ensure captions explicitly list the data sources and time periods shown to improve clarity for users reproducing the dataset.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript. We have carefully considered each major comment and provide point-by-point responses below. Where appropriate, we have revised the manuscript to address the concerns raised.

read point-by-point responses
  1. Referee: [Data integration and alignment section] Data integration and alignment section: The manuscript states that sources are 'temporally and spatially aligned into a single, modular data structure' but provides no quantitative validation of alignment quality (e.g., RMSE, bias, or correlation statistics between smartphone TEC and GIM-TEC references after interpolation or gridding). Given the known receiver-specific offsets, multipath, and pierce-point uncertainties in crowdsourced TEC, this omission leaves the central claim that the dataset is free of systematic biases that could degrade downstream ML performance unsubstantiated.

    Authors: We agree that the manuscript would benefit from explicit quantitative validation of the alignment procedure, particularly for the crowdsourced smartphone TEC data relative to the JPL GIM-TEC reference. Although our alignment workflow applies standard temporal interpolation to common 15-minute timestamps and spatial projection onto a uniform 1° × 1° grid, we did not report summary statistics such as RMSE, mean bias, or Pearson correlation in the submitted version. In the revised manuscript we will add a dedicated validation subsection (within the Data Integration and Alignment section) that presents these metrics computed over a multi-month hold-out period, stratified by quiet and disturbed geomagnetic conditions. This addition will directly address concerns regarding potential systematic offsets and strengthen the claim that the integrated dataset is suitable for downstream ML applications. revision: yes

  2. Referee: [Benchmarking section] Benchmarking section: The description of training and benchmarking spatiotemporal ML models for TEC forecasting mentions evaluation under quiet and active conditions but reports no specific quantitative metrics, error analysis, or ablation studies on how alignment choices affect forecast skill. This makes it difficult to evaluate whether the dataset supports reliable forecasting as claimed.

    Authors: We acknowledge that the benchmarking results in the current manuscript are presented at a high level and lack the detailed quantitative metrics, error distributions, and ablation experiments requested. While the paper does describe the model architectures and the separation into quiet versus geomagnetically active test periods, specific skill scores (RMSE, MAE, and correlation) and an analysis of how different alignment choices (e.g., inclusion or exclusion of smartphone data) influence forecast performance are not reported. In the revision we will expand the Benchmarking section to include these metrics for each model, together with an ablation study that quantifies the contribution of the crowdsourced observations and the impact of alternative gridding/interpolation choices on forecast skill. These additions will allow readers to assess the dataset’s utility for reliable TEC forecasting more rigorously. revision: yes

Circularity Check

0 steps flagged

No significant circularity in dataset curation and model benchmarking

full rationale

The paper describes integration of existing external data sources (SDO, F10.7, solar wind, Kp/AE/SYM-H, GIM-TEC, WWGNSS, smartphone TEC) into a temporally/spatially aligned structure, followed by training and benchmarking of ML models on the resulting dataset. No equations, first-principles derivations, or predictions are claimed that reduce by construction to fitted parameters or self-referential inputs. Alignment and quality filtering are explicit preprocessing choices, not outputs presented as independent results. The central contribution is empirical data preparation and evaluation rather than a closed derivation loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the standard assumption that the input data products from NASA and other agencies are sufficiently accurate and that the chosen alignment procedure preserves physical meaning; no new physical entities or free parameters are introduced in the abstract.

axioms (1)
  • domain assumption Data from disparate instruments can be aligned in time and space without introducing biases larger than the natural variability of the ionosphere.
    Invoked when the workflow integrates SDO, solar wind, geomagnetic indices, GIM-TEC, and smartphone measurements into a single structure.

pith-pipeline@v0.9.0 · 5635 in / 1389 out tokens · 52204 ms · 2026-05-17T20:13:22.290102+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Flying through uncertainty.Space Weather, 18(1):e2019SW002373, 2020

    Thomas E Berger, MJ Holzinger, EK Sutton, and JP Thayer. Flying through uncertainty.Space Weather, 18(1):e2019SW002373, 2020

  2. [2]

    Kintner, P

    Jr. Kintner, P. M. Observations of velocity shear driven plasma turbulence.Journal of Geophys- ical Research, 81(A28):5114–5122, October 1976

  3. [3]

    Unexpected space weather causing the reentry of 38 starlink satellites in february 2022.Journal of Space Weather and Space Climate, 12:41, 2022

    Ryuho Kataoka, Daikou Shiota, Hitoshi Fujiwara, Hidekatsu Jin, Chihiro Tao, Hiroyuki Shi- nagawa, and Yasunobu Miyoshi. Unexpected space weather causing the reentry of 38 starlink satellites in february 2022.Journal of Space Weather and Space Climate, 12:41, 2022

  4. [4]

    Geomagnetically induced currents: Science, engineering, and applications readiness.Space weather, 15(7):828–856, 2017

    Antti Pulkkinen, E Bernabeu, A Thomson, A Viljanen, R Pirjola, D Boteler, J Eichner, PJ Cilliers, D Welling, NP Savani, et al. Geomagnetically induced currents: Science, engineering, and applications readiness.Space weather, 15(7):828–856, 2017

  5. [5]

    The tandem reconnection and cusp electrodynamics reconnaissance satellites (tracers) mission design.Space science reviews, 221(5):1–23, 2025

    Steven M Petrinec, CA Kletzing, DM Miles, Stephen A Fuselier, IW Christopher, Danielle Crawford, Sanny Omar, Scott R Bounds, John W Bonnell, Jasper S Halekas, et al. The tandem reconnection and cusp electrodynamics reconnaissance satellites (tracers) mission design.Space science reviews, 221(5):1–23, 2025

  6. [6]

    The vigil magnetometer for operational space weather services from the sun-earth l5 point.Space Weather, 22(6):e2024SW003867, 2024

    JP Eastwood, P Brown, W Magnes, CM Carr, M Agu, R Baughen, G Berghofer, J Hodgkins, I Jernej, C Möstl, et al. The vigil magnetometer for operational space weather services from the sun-earth l5 point.Space Weather, 22(6):e2024SW003867, 2024

  7. [7]

    The advanced composition explorer.Space Science Reviews, 86(1):1–22, 1998

    Edward C Stone, AM Frandsen, RA Mewaldt, ER Christian, D Margolies, JF Ormes, and F Snow. The advanced composition explorer.Space Science Reviews, 86(1):1–22, 1998

  8. [8]

    A quarter century of wind spacecraft discoveries, 2021

    Lynn B Wilson III, Alexandra L Brosius, Natchimuthuk Gopalswamy, Teresa Nieves-Chinchilla, Adam Szabo, Kevin Hurley, Tai Phan, Justin C Kasper, Noé Lugaz, Ian G Richardson, et al. A quarter century of wind spacecraft discoveries, 2021

  9. [9]

    Geotail mission to explore earth’s magnetotail.Eos, Transactions American Geophysical Union, 73(40):425–429, 1992

    A Nishida, K Uesugi, I Nakatani, T Mukai, DH Fairfield, and MH Acuna. Geotail mission to explore earth’s magnetotail.Eos, Transactions American Geophysical Union, 73(40):425–429, 1992

  10. [10]

    Total electron content models and their use in ionosphere monitoring.Radio Science, 46(06):1–11, 2011

    Norbert Jakowski, C Mayer, MM Hoque, and V Wilken. Total electron content models and their use in ionosphere monitoring.Radio Science, 46(06):1–11, 2011

  11. [11]

    Matzka, C

    J. Matzka, C. Stolle, Y . Yamazaki, O. Bronkalla, and A. Morschhauser. The Geomagnetic Kp Index and Derived Indices of Geomagnetic Activity.Space Weather, 19(5):e2020SW002641, May 2021

  12. [12]

    A global mapping technique for gps-derived ionospheric total electron content measurements.Radio science, 33(3):565–582, 1998

    AJ Mannucci, BD Wilson, DN Yuan, CH Ho, UJ Lindqwister, and TF Runge. A global mapping technique for gps-derived ionospheric total electron content measurements.Radio science, 33(3):565–582, 1998

  13. [13]

    The jpl-gim algorithm and products: multi-gnss high-rate global mapping of total electron content.Journal of Geodesy, 98(5), 2024

    Léo Martire, Thomas F Runge, Xing Meng, Siddharth Krishnamoorthy, Panagiotis Vergados, Anthony J Mannucci, Olga P Verkhoglyadova, Attila Komjáthy, Angelyn W Moore, Robert F Meyer, et al. The jpl-gim algorithm and products: multi-gnss high-rate global mapping of total electron content.Journal of Geodesy, 98(5), 2024

  14. [14]

    Global ionospheric maps for research – jpld data prod- uct

    Olga Verkhoglyadova and Xing Meng. Global ionospheric maps for research – jpld data prod- uct. https://sideshow.jpl.nasa.gov/pub/iono_daily/gim_for_research/jpld/, April 2024. Last updated: 8 Apr 2024. Government sponsorship acknowledged

  15. [15]

    Rideout W

    Cariglia K. Rideout W. The open madrigal initiative

  16. [16]

    A foundation model for the solar dynamics observatory.arXiv preprint arXiv:2410.02530, 2024

    James Walsh, Daniel G Gass, Raul Ramos Pollan, Paul J Wright, Richard Galvez, Noah Kasmanoff, Jason Naradowsky, Anne Spalding, James Parr, and Atılım Güne¸ s Baydin. A foundation model for the solar dynamics observatory.arXiv preprint arXiv:2410.02530, 2024

  17. [17]

    Solar and geomagnetic indices for thermo- spheric density models.COSPAR International Reference Atmosphere, edited by Rees D

    W Kent Tobiska, BR Bowman, and SD Bouwer. Solar and geomagnetic indices for thermo- spheric density models.COSPAR International Reference Atmosphere, edited by Rees D. and Tobiska WK, 2012. 6

  18. [18]

    Jpl’s on-line solar system data service

    JD Giorgini, DK Yeomans, AB Chamberlin, PW Chodas, RA Jacobson, MS Keesey, JH Lieske, SJ Ostro, EM Standish, and RN Wimberly. Jpl’s on-line solar system data service. In AAS/Division for Planetary Sciences Meeting Abstracts# 28, volume 28, pages 25–04, 1996

  19. [19]

    Magnetic coordinate systems.Space science reviews, 206(1):27–59, 2017

    Karl Magnus Laundal and Arthur D Richmond. Magnetic coordinate systems.Space science reviews, 206(1):27–59, 2017

  20. [20]

    The igs vtec maps: a reliable source of ionospheric information since 1998.Journal of Geodesy, 83(3):263–275, 2009

    Manuel Hernández-Pajares, JM Juan, J Sanz, R Orus, A Garcia-Rigo, J Feltens, A Komjathy, SC Schaer, and A Krankowski. The igs vtec maps: a reliable source of ionospheric information since 1998.Journal of Geodesy, 83(3):263–275, 2009

  21. [21]

    Kelebek, Linnea M

    Halil S. Kelebek, Linnea M. Wolniewicz, Michael D. Vergalla, Simone Mestici, Giacomo Acciarini, Bala Poduval, Umaa Rebbapragada, Olga Verkhoglyadova, Madhulika Guhathakurta, Thomas Berger, Frank Soboczenski, and Atılım Güne¸ s Baydin. Ioncast: A deep learning framework for forecasting ionospheric dynamics. InProceedings of the Machine Learning for the Phy...

  22. [22]

    Solar wind spatial scales in and comparisons of hourly wind and ace plasma and magnetic field data.Journal of Geophysical Research: Space Physics, 110(A2), 2005

    JH King and NE Papitashvili. Solar wind spatial scales in and comparisons of hourly wind and ace plasma and magnetic field data.Journal of Geophysical Research: Space Physics, 110(A2), 2005

  23. [23]

    Long short-term memory.Neural Computation, 9(8):1735–1780, 11 1997

    Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory.Neural Computation, 9(8):1735–1780, 11 1997

  24. [24]

    Collins, Michael S

    Boris Bonev, Thorsten Kurth, Ankur Mahesh, Mauro Bisson, Jean Kossaifi, Karthik Kashinath, Anima Anandkumar, William D. Collins, Michael S. Pritchard, and Alexander Keller. Four- castnet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale, 2025

  25. [25]

    Learning skillful medium-range global weather forecasting

    Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, and Peter Battaglia. Learning skillful medium-range global weather forecasting. Scien...