Solar Energetic Particle Forecasting with Multi-Task Deep Learning: SEPNET

Kathryn Whitman; Lulu Zhao; Tamas Gombosi; Ward Manchester; Yang Chen; Yian Yu

arxiv: 2512.12786 · v3 · submitted 2025-12-14 · ⚛️ physics.space-ph

Solar Energetic Particle Forecasting with Multi-Task Deep Learning: SEPNET

Yian Yu , Yang Chen , Lulu Zhao , Kathryn Whitman , Ward Manchester , Tamas Gombosi This is my paper

Pith reviewed 2026-05-16 22:21 UTC · model grok-4.3

classification ⚛️ physics.space-ph

keywords solar energetic particlesspace weather forecastingmulti-task learningdeep neural networkssolar flarescoronal mass ejectionsSHARP parametersSEP prediction

0 comments

The pith

SEPNET, a multi-task neural network, jointly forecasts solar flares, CMEs, and energetic particle events using magnetic field data to raise detection rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SEPNET, a deep learning system that predicts solar flares, coronal mass ejections, and solar energetic particles in one model by combining long short-term memory and transformer layers. It feeds the network an array of inputs that includes flare and CME records plus magnetic parameters measured in active regions. When run on a dedicated validation set of SEP events, the model records higher detection rates and skill scores than earlier machine learning baselines and current operational approaches. These improvements matter because SEP events threaten spacecraft, astronauts, and high-altitude flights, and earlier reliable warnings could trigger protective measures in time. The authors note that the system still produces relatively high false alarms due to imbalanced data but remains fast enough for real-time use.

Core claim

SEPNET is a multi-task neural network that jointly predicts future solar eruptive events including flares and CMEs along with SEPs by incorporating LSTM and transformer architectures that capture contextual dependencies across an extensive set of predictors such as solar flares, CMEs, and SHARP magnetic field parameters. Evaluated on the SEPVAL dataset, SEPNET with SHARP parameters achieves higher detection rates and skill scores than classical machine learning methods and current state-of-the-art pre-eruptive SEP models while remaining suitable for real-time space weather alert operations, even though class imbalance produces relatively high false alarm rates.

What carries the argument

The multi-task neural network SEPNET that jointly trains on flare, CME, and SEP targets using LSTM and transformer layers fed with SHARP active-region magnetic parameters.

If this is right

SEPNET delivers timely SEP forecasts that outperform reference methods on the validation set.
Adding SHARP magnetic parameters measurably improves detection rates over models that omit them.
The framework runs fast enough to support real-time space weather alert operations.
Multi-task deep learning can handle the interdependent nature of solar eruptive events in a single model.
Public release of data and code allows direct replication and further testing by other groups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the model maintains its skill on new events, space weather centers could integrate it to shorten warning times for astronaut EVA decisions.
Extending the input set with real-time coronal imagery might reduce false alarms by supplying additional context the current predictors lack.
The same joint-prediction structure could be adapted to forecast other coupled space-weather hazards such as geomagnetic storms.
Techniques for handling class imbalance, such as cost-sensitive training, could be tested directly on the released code to lower false positives without sacrificing detection rates.

Load-bearing premise

That performance gains measured on the existing SEPVAL dataset will continue for future unseen solar events and that the resulting false-alarm rates will stay acceptable for operational alerts.

What would settle it

A side-by-side test of SEPNET against existing operational models on all SEP events recorded during the next solar maximum, checking whether detection rates remain higher and false alarms stay within operational tolerance on events never seen during training.

Figures

Figures reproduced from arXiv: 2512.12786 by Kathryn Whitman, Lulu Zhao, Tamas Gombosi, Ward Manchester, Yang Chen, Yian Yu.

**Figure 1.** Figure 1: Visualization of the timeline for operational SEP (> 10 MeV 10 pfu), flare, CME, and SHARP records used in this study. For each data source, only records occurring between 24 hours before the first SEP event search time and the minimum of the latest recorded times across all sources are included. Each colored band marks the temporal occurrence of a record by type: operational SEP (red), flare (orange), CME… view at source ↗

**Figure 2.** Figure 2: Diagram illustrating the architectures of the proposed multi-task learning models. Left: SEPNET, composed of shared feed-forward layers with layer normalization, ReLU activations, and dropout, followed by regression and classification heads for predicting flare/CME counts and SEP event probability. Middle: SEPNET-TS, an updated version introducing sequential processing via a unidirectional LSTM and trans… view at source ↗

**Figure 3.** Figure 3: Performance metrics for SEPVAL prediction models, showing the median and target quantile values across different feature sets and model architectures. The shaded light blue region represents the median and target quantile achieved by state-of-the-art pre-eruption models. Feature set abbreviations: F = flare-related features; S = SHARP parameters; C = CME-related features. Performance metric abbreviations: … view at source ↗

**Figure 4.** Figure 4: Performance metrics on the 20% testing set for different feature sets and models, targeting classification of general SEP events. Results for each criterion are the median values across five independent random stratified data splits. Feature set abbreviations: F = flare-related features; S = SHARP parameters; C = CME-related features. Performance metric abbreviations: ACC = accuracy; AUC = area under the c… view at source ↗

**Figure 5.** Figure 5: Performance of re-validated models (optimize the decision threshold for operational SEP event prediction) compared to original models, targeting classification of operational SEP events. Metrics are derived on the 20% testing set using SHARP parameters with flare features, with results for each criterion being the median values across five independent random stratified data splitting. Performance metric a… view at source ↗

**Figure 6.** Figure 6: SEPNET-O’s forecasting performance for flare counts and SEP event probabilities over a recent 23-day period in November 2025. Left panel: The black curve indicates observed flare counts, while the red curve shows the median forecast with shaded regions representing the interquartile range (25th to 75th percentiles). Right panel: The blue curve corresponds to the forecasted median SEP event probability, and… view at source ↗

read the original abstract

Solar energetic particle (SEP) events pose severe threats to spacecraft, astronaut safety, and aviation operations. Accurate SEP forecasting remains a critical challenge in space weather research due to their complex origins and highly variable propagation. In this work, we built SEPNET, an innovative multi-task neural network that jointly predicts future solar eruptive events, including solar flares and coronal mass ejections (CMEs) and SEPs, incorporating long short-term memory and transformer architectures that capture contextual dependencies. SEPNet is a machine learning framework for SEP prediction that utilizes an extensive set of predictors, including solar flares, CMEs, and space-weather HMI active region patches (SHARP) magnetic field parameters. SEPNET is rigorously evaluated on the SEPVAL SEP dataset (Whitman, 2025b), which is used to evaluate the performance of the current SEP prediction models. The performance of SEPNet is compared with classical machine learning methods and current state-of-the-art pre-eruptive SEP prediction models. The results show that SEPNET, particularly with SHARP parameters, achieves higher detection rates and skill scores while maintaining suitable for real-time space weather alert operations. Although class imbalance in the data leads to relatively high false alarm rates, SEPNET consistently outperforms reference methods and provides timely SEP forecasts, highlighting the capability of deep multi-task learning for next-generation space weather prediction. All data and code are available on GitHub at https://github.com/yuyian/SEP-Prediction.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SEPNET applies multi-task LSTM-transformer learning to joint flare-CME-SEP prediction from SHARP data and beats some baselines on SEPVAL, but the evaluation lacks temporal split details needed for real forecasting claims.

read the letter

SEPNET is a multi-task neural net using LSTM and transformers to jointly forecast solar flares, CMEs, and SEPs from SHARP parameters, and it claims better performance than baselines on the SEPVAL set while being suitable for real-time use. The joint training is the new part here. By predicting multiple related events together, the model can leverage shared patterns in the solar magnetic data, which separate models might miss. Releasing the code on GitHub is a plus for reproducibility, and the comparisons to other methods show consistent outperformance, even if class imbalance causes high false alarms. The soft spots are in the evaluation. The abstract doesn't detail the train-test split, so it's unclear if it's a proper temporal hold-out for forecasting – that's critical to avoid leakage from future events. No error bars, p-values, or ablation results are mentioned, which makes it hard to gauge if the gains are robust. The high false alarm rate is noted but not shown to be manageable for operations. This work is for space weather forecasters and ML practitioners in solar physics. It has practical intent and public resources, so a reader interested in applying deep learning to SEP prediction would find it useful. I'd recommend sending it for peer review. The core idea holds potential, but the methods need checking to confirm the claims.

Referee Report

3 major / 2 minor

Summary. The paper introduces SEPNET, a multi-task neural network combining LSTM and transformer layers to jointly forecast solar flares, CMEs, and solar energetic particle (SEP) events. It incorporates solar flare, CME, and SHARP magnetic field parameters as inputs and evaluates performance on the SEPVAL dataset against classical ML baselines and existing state-of-the-art SEP predictors, claiming higher detection rates and skill scores suitable for real-time operations despite elevated false-alarm rates from class imbalance. All code and data are released on GitHub.

Significance. If the reported gains survive rigorous temporal validation, SEPNET would represent a meaningful advance in operational space-weather forecasting by demonstrating that multi-task deep learning with active-region magnetic parameters can improve SEP detection over single-task or classical approaches. The public release of code and data strengthens reproducibility and enables direct community follow-up.

major comments (3)

[Evaluation] Evaluation section: the manuscript provides no description of the train/test partitioning strategy on SEPVAL (random vs. chronological split, embargo period, or forward-chaining). For any forecasting claim, this detail is load-bearing; without explicit confirmation that test events post-date all training data, the reported skill-score improvements cannot be distinguished from leakage artifacts.
[Results] Results section: no error bars, bootstrap confidence intervals, or statistical significance tests are reported for the detection rates and skill scores. Given the small number of SEP events and class imbalance, it is impossible to assess whether the claimed outperformance over baselines is robust.
[Methods and Results] Methods and Results: no ablation experiments isolate the contribution of the multi-task architecture versus the addition of SHARP parameters, nor do they test performance under strict temporal hold-out. These omissions leave the central operational-suitability claim unsupported.

minor comments (2)

[Abstract] Abstract, final sentence: the phrase 'maintaining suitable for real-time' is grammatically incomplete and should be rephrased for clarity.
[Figures and Tables] Figure captions and tables: axis labels and metric definitions (e.g., exact formulas for the skill scores) should be stated explicitly rather than assumed from prior literature.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We have addressed each of the major comments point by point below. Where revisions are needed, we will update the manuscript accordingly to improve the description of our methodology and strengthen the statistical analysis.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the manuscript provides no description of the train/test partitioning strategy on SEPVAL (random vs. chronological split, embargo period, or forward-chaining). For any forecasting claim, this detail is load-bearing; without explicit confirmation that test events post-date all training data, the reported skill-score improvements cannot be distinguished from leakage artifacts.

Authors: We fully agree that the train/test partitioning strategy must be clearly described to support any forecasting claims. In our work, we employed a chronological split on the SEPVAL dataset to ensure that all test events occur after the training period, preventing data leakage. We will revise the Evaluation section to explicitly detail this partitioning strategy, including the specific time periods used for training and testing, and confirm the forward-chaining approach. The released code on GitHub implements this split. revision: yes
Referee: [Results] Results section: no error bars, bootstrap confidence intervals, or statistical significance tests are reported for the detection rates and skill scores. Given the small number of SEP events and class imbalance, it is impossible to assess whether the claimed outperformance over baselines is robust.

Authors: We recognize the importance of providing uncertainty estimates and statistical tests given the limited number of SEP events and the class imbalance. In the revised manuscript, we will add bootstrap confidence intervals for the detection rates and skill scores. We will also include statistical significance tests to compare SEPNET's performance against the baselines. These additions will be incorporated into the Results section. revision: yes
Referee: [Methods and Results] Methods and Results: no ablation experiments isolate the contribution of the multi-task architecture versus the addition of SHARP parameters, nor do they test performance under strict temporal hold-out. These omissions leave the central operational-suitability claim unsupported.

Authors: We agree that ablation studies are necessary to isolate the effects of the multi-task learning and the inclusion of SHARP parameters. We will perform additional ablation experiments in the revision: comparing the full multi-task model against single-task variants and models without SHARP inputs. We will also evaluate and report results under strict temporal hold-out conditions. These experiments and their results will be added to the Methods and Results sections to better support our claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical ML evaluation on external held-out data

full rationale

The paper describes training a multi-task neural network (LSTM + transformer) on solar flare, CME, and SHARP parameter inputs to predict SEP events, then reports detection rates and skill scores on the SEPVAL dataset. No equations, ansatzes, or self-citations reduce the reported metrics to quantities defined inside the model or by the authors' prior work. Performance is measured against an external benchmark dataset using standard classification metrics; the evaluation chain does not collapse to the training inputs by construction. Minor author overlap on the cited dataset does not create load-bearing circularity because the data itself is independent observational input.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard neural-network training assumptions and domain knowledge that SHARP magnetic parameters and flare/CME observations are predictive of SEPs; no new physical entities are introduced.

free parameters (1)

neural network weights and hyperparameters
All model parameters are fitted to the SEPVAL training data during optimization.

axioms (1)

domain assumption SHARP magnetic field parameters and flare/CME observations contain information relevant to SEP occurrence
Invoked when selecting predictors; standard in space-physics literature but not re-derived here.

pith-pipeline@v0.9.0 · 5580 in / 1247 out tokens · 41231 ms · 2026-05-16T22:21:02.306040+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SEPNET, an innovative multi-task neural network that jointly predicts future solar eruptive events, including solar flares and coronal mass ejections (CMEs) and SEPs, incorporating long short-term memory and transformer architectures

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

P., Biffis, E., Hapgood, M

Retrieved fromhttps://doi.org/10.3847/1538-4365/ab65efdoi: 10.3847/1538-4365/ab65ef –23– manuscript submitted toJGR: Machine Learning and Computation Eastwood, J. P., Biffis, E., Hapgood, M. A., Green, L., Bisi, M. M., Bentley, R. D., . . . Burnett, C. (2017). The economic impact of space weather: Where do we stand?Risk Analysis,37(2), 206-218. Retrieved ...

work page doi:10.3847/1538-4365/ab65efdoi: 2017
[2]

D., Park, S.-H., Kusano, K., Andries, J., Barnes, G., Bingham, S.,

Retrieved fromhttps://doi.org/10.1007/s11207-021-01837-x Leka, K. D., Park, S.-H., Kusano, K., Andries, J., Barnes, G., Bingham, S., . . . Terkildsen, M. (2019, aug). A comparison of flare forecasting methods. ii. benchmarks, metrics, and performance results for operational solar flare forecasting systems.The Astrophysical Journal Supplement Series,243(2),

work page doi:10.1007/s11207-021-01837-x 2019
[3]

Retrieved fromhttps://doi.org/10.3847/1538-4365/ab2e12doi: 10.3847/1538-4365/ab2e12 Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Doll´ ar, P. (2017). Focal loss for dense object detection. InProceedings of the ieee international conference on com- puter vision (iccv)(pp. 2980–2988). Liu, C., Deng, N., Wang, J. T. L., & Wang, H. (2017). Predicting solar ...

work page doi:10.3847/1538-4365/ab2e12doi: 2017
[4]

R., & R´ e, C

Retrieved fromhttps://agupubs.onlinelibrary.wiley.com/doi/abs/ 10.1002/2015SW001170doi: 10.1002/2015SW001170 Wu, S., Zhang, H. R., & R´ e, C. (2020). Understanding and improving information transfer in multi-task learning. InInternational conference on learning repre- sentations.Retrieved fromhttps://openreview.net/forum?id=SylzhkBtDB Young, M. A., Schwad...

work page doi:10.1002/2015sw001170doi: 2020

[1] [1]

P., Biffis, E., Hapgood, M

Retrieved fromhttps://doi.org/10.3847/1538-4365/ab65efdoi: 10.3847/1538-4365/ab65ef –23– manuscript submitted toJGR: Machine Learning and Computation Eastwood, J. P., Biffis, E., Hapgood, M. A., Green, L., Bisi, M. M., Bentley, R. D., . . . Burnett, C. (2017). The economic impact of space weather: Where do we stand?Risk Analysis,37(2), 206-218. Retrieved ...

work page doi:10.3847/1538-4365/ab65efdoi: 2017

[2] [2]

D., Park, S.-H., Kusano, K., Andries, J., Barnes, G., Bingham, S.,

Retrieved fromhttps://doi.org/10.1007/s11207-021-01837-x Leka, K. D., Park, S.-H., Kusano, K., Andries, J., Barnes, G., Bingham, S., . . . Terkildsen, M. (2019, aug). A comparison of flare forecasting methods. ii. benchmarks, metrics, and performance results for operational solar flare forecasting systems.The Astrophysical Journal Supplement Series,243(2),

work page doi:10.1007/s11207-021-01837-x 2019

[3] [3]

Retrieved fromhttps://doi.org/10.3847/1538-4365/ab2e12doi: 10.3847/1538-4365/ab2e12 Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Doll´ ar, P. (2017). Focal loss for dense object detection. InProceedings of the ieee international conference on com- puter vision (iccv)(pp. 2980–2988). Liu, C., Deng, N., Wang, J. T. L., & Wang, H. (2017). Predicting solar ...

work page doi:10.3847/1538-4365/ab2e12doi: 2017

[4] [4]

R., & R´ e, C

Retrieved fromhttps://agupubs.onlinelibrary.wiley.com/doi/abs/ 10.1002/2015SW001170doi: 10.1002/2015SW001170 Wu, S., Zhang, H. R., & R´ e, C. (2020). Understanding and improving information transfer in multi-task learning. InInternational conference on learning repre- sentations.Retrieved fromhttps://openreview.net/forum?id=SylzhkBtDB Young, M. A., Schwad...

work page doi:10.1002/2015sw001170doi: 2020